Building an Entity-Component-System Framework in Elixir

By Aldric "Trevoke" Giacomoni

What’s an ECS?

ECS is a pattern that has value when there is a need to develop complex (possibly emergent) behavior based on simple properties. It is based upon the two basic ideas of:

These ideas transform into the elements of an ECS as such:

An example might help. Please do keep in mind while reading the following example that there are no rules indicating what a good ECS organization is, so you might find my example great, or you might find it abominable, and both are perfectly valid viewpoints. Determining a good granularity for Components is an important decision that impacts the rest of the project. Similarly, determining how Systems work and how much responsibility they will have is crucial.

Let’s say we have an Entity. This Entity has a SocialCharacteristics Component which holds that its name is Alice. Alice also has a StatusComponent which indicates that her health is 3. Alice also has a Fightingcomponent which indicates that her target is Bob Bob is a dragon with a health of 804 and he is also fighting Alice. We have a FightingSystem which runs every second and resolves the next round of fighting between entities that are fighting each other by finding all entities with the Fighting component and operating on them.

Our FightingSystem determines that Alice hits Bob for 0 points of health and Bob hits Alice for 34 points of health. The FightingSystem then realizes he needs to queue up the DeathSystem for Alice after it finishes, because Alice is now at -31 health, and we have decided that 0 health is the boundary between life and death. The DeathSystem will perform several checks, including whether there is a life-saving spell or other such components acting on Alice which might change the outcome.

One of the things that we might not realize upon reading this is just how many design decisions were packed into this paragraph. Let’s go through a couple of the design decisions that were made during the creation of Ecstatic’s 0.1 release and how they were further constrained by the implementation in Elixir.

What’s Elixir? The TLDR

Elixir is language built on top of the Erlang virtual machine. Erlang’s prime differentiator is that it was built to run distributed code. As such, it uses its own “processes”, which are significantly more lightweight than UN*X processes. It also has immutable data, and uses what may be the purest form of message-passing that has yet been implemented (for inter-process communication). Creating an Erlang process can be thought of as saying “Hey, virtual machine, when you are allocating the resources for code execution, allocate some separate resources for this code”. When the code is done running, the process dies.

One of Elixir’s additional advantages is that it has real lisp-style macros. This means it is possible to define code to generate additional code, and to create your own Domain-Specific Languages (DSL) with relative ease.

The first design decisions

Since one of the key elements of Elixir code is code running in separate processes, I always knew this would be a major differentiator in the way I created the framework.

I also decided very early on that I would try and stick as closely as possible to the pure definition for each element of ECS that I described above.

And finally, since I had no idea what I was doing (something that probably hasn’t changed much), a lot of my inspiration for how systems would work came from reading the code for the Artemis-ODB project, an ECS in Java.

This left me with a number of unsolved questions:

The building blocks

One thing I realized very quickly was that I didn’t even know what the API footprint of the framework would be. I knew I wanted to make the footprint as small as possible, because not only will boilerplate become an obstacle to adoption, but… I will be using macros! Any boilerplate, I should be able to write for the user.

So, here is what turned out to be the elements we would start with:

The Watchers are something I was very proud of because they allowed the systems to be, as much as possible, black boxes. There was no “configuring” the system, there was simply “running” the system.

In such a way, I was building a very neat self-contained framework, forgetting about one crucial element: events triggered by actions taken outside the system. We’ll get to that.

Connecting the pieces

Conversations in the MUD Coders Slack led me to deciding that I would try a unified log; I had already kind of decided that I wanted to return a set of changes from the systems, as this would allow me to trigger actions based on changes in value (say your health decreases by 2/3 of its max value in one change, you might want to start panicking).

This is a fairly significant turning point in the implementation, and I can’t even justify it by saying I weighed the implementation trade-offs; I just thought I would learn a lot more doing it this way than some other way, so that’s the way I went.

Now that I knew I had a unified log, I needed to both feed it and consume it. This is where GenStage comes in. GenStage is an Elixir library that provides an abstration of a producer-consumer system with backpressure (GenStage: “Generic Stage”, for multi-stage processing), and one of the event dispatchers that comes built-in is a broadcast dispatcher: it will send each event to every consumer, and it will do so only when all consumers have requested an event (that is, when they are all ready to do work). This, along with setting each consumer to only ask for one event at a time, guarantees that the world will not fall out of sync: each event gets processed by the entire world before moving on to the next event.

It’s worth noting that I have done nothing that even remotely resembles performance testing or benchmarking here, so YMMV.

At this point, I knew that the systems needed to actually emit events, and therefore that the actual changes would happen somewhere else.

Peppering in Erlang processes

It would likely be more standard to have each =System= be its own process (or be run in its own thread, etc.), and this would have the advantage of keeping the growth of Erlang processes at O(1), but this would then lead me down a different set of decisions: I would eventually (and maybe sooner rather than later) have to start to figure out how to split the workload, and maybe process subsets of entities in each systems. I am not yet interested in these decisions, and I would rather see how far I can push the decision to let the Erlang scheduler decides how to distribute the workload across available CPU cycles through liberal application of processes..

This all led me to an interesting choice: each entity in the ECS will have its own process, the only responsibility of which will be to take the incoming events that affect its matched entity and apply the given changes. A benefit of this, I think, is that I’m actually using a process to do work instead of just holding state.

Summarizing design choices

The main benefit of all these choices is that logic around entities _actually_ changing is completely encapsulated within a single function, and all paths that lead to that function are squeezed through the unified log.

At this point, what we have is the ability to trigger systems based on components being added, removed, or changed.

What we don’t have is the ability to trigger systems based on external actions… Including, say, the player wanting to move, or, even simpler, time ticks!

My first hack, shame be unto me

A tick is just something that happens regularly. In most programming languages, the way to do this is something like “sleep for a while, send a message, sleep some more”, probably in a separate thread. Maybe even a completely different UNIX process (cronjobs, queues, etc). Within Erlang/Elixir, there are a few ways to do it, and the most canonical one is an Erlang stdlib package called :timer that has a function called send_interval. It sends a given message to a given process every X milliseconds.

This sounds perfect, right? There’s only one downside: every time you call this function, it creates a new process responsible for sending the given message. Thinking in the future, since any one entity may have multiple components, and any given component may trigger a tick, this might mean a O(mn) growth for the number of processes, where m is the number of components and n is the number of entities, instead of the current approach which is O(n). In a world where many entities exist, this might seriously limit how much the game can grow on a single machine… And I’m considering a world where the number of entities can grow dynamically (because of reproduction), so that’s a concern.

The good news is that there’s a function called Process.send_after which will ask the current process to send a message to some other process after some time interval. I can simply make sure that whatever message is received also calls Process.send_after again. It looks like this:Process.send_after(destination, message, time_until_message_is_sent, options)

This gives me two relatively simple additional choices: I could have one “god tick process”, which is responsible for all ticks, though that choice might also offset some of the messages more than I want (if many messages should trigger around the same millisecond, will that turn out to be a problem?), or I could have the entity process manage its own ticks, which severely limits the number of simultaneous ticks that a given process has to handle. Either way, I’m still at O(n).

I went with the second choice. And I realized I had some additional difficulties: how would I get the initial message to start a tick to that process?

So, I cheated. The current implementation of Watchers takes a lambda as a predicate function, so for ticks, I decided that when a component gets attached, its predicate function would run the Process.send_after code, because the predicate function gets run by the entity’s event consumer.

The message I send has enough information to determine which system to run and how long the delay before the next message is.

When I detach a component, I also use the predicate function to send the consumer process a message to stop the given tick; when it receives the next tick message, it will know not to run the system and to not queue another message.

Say it with code!

The following is code taken from my project, Dwarlixir, which was started under the laughably pretentious premise to be a mix between a MUD and Dwarf Fortress (it’s unclear how I thought this might be playable).

defmodule Dwarlixir.Components.Age do
    use Ecstatic.Component
    @default_value %{age: 1, life_expectancy: 80}
end
defmodule Dwarlixir.Components.Mortal do
    use Ecstatic.Component
    @default_value %{mortal: true}
end

This is fairly simple: we’re just looking at two components as described above, with a default value. Since we are doing use Ecstatic.Component above, we can call Dwarlixir.Components.Age.new — the function is provided for us.

defmodule Dwarlixir.Mobs.Dwarf do
    use Ecstatic.Entity
    alias Dwarlixir.Components, as: C
    @default_components [C.Age, C.Mortal]
end

And here we have a dwarf. When you initialize the dwarf, some components are marked as being set by default. This is almost “a factory”: it’s a convenience to create new entities quickly from given presets.

Now let’s connect the dots from the other side, starting with the watchers:

defmodule Dwarlixir.Watchers do
    use Ecstatic.Watcher
    alias Dwarlixir.Components, as: C
    alias Dwarlixir.Systems, as: S
    watch_component C.Age, run: S.Aging, every: 6_000
    watch_component C.Age,
    run: S.DyingOfOldAge,
    when: fn(_e, c) -> c.state.age > c.state.life_expectancy end
end

This should be fairly readable by now. When the Age component is attached, run the Aging system every six seconds, and when the Age component has been updated and the age is greater than the life expectancy, trigger the DyingOfOldAge system.

defmodule Dwarlixir.Systems.Aging do
    use Ecstatic.System
    alias Dwarlixir.Components, as: C
    def aspect, do: %Ecstatic.Aspect{with: [C.Age]}
    def dispatch(entity) do
        age = Entity.find_component(entity, C.Age)
        %Ecstatic.Changes{updated: [%{age | state: %{age.state | age: age.state.age + 1}}]}
    end
end

The Aging system will only run on entities that have the Age component, and on the last line you can see the one abstraction leak I currently still have: the actual values of the component are stored under a key called “state”, and the client code should have no knowledge of this, but right now does.

defmodule Dwarlixir.Systems.DyingOfOldAge do
    use Ecstatic.System
    alias Dwarlixir.Components, as: C
    def aspect, do: %Ecstatic.Aspect{with: [C.Age, C.Mortal]}
    def dispatch(entity) do
        %Ecstatic.Changes{attached: [C.Dead], removed: [C.Age]}
    end
end

The DyingOfOldAge system will only run if the entity has both the Age and Mortal component (immortal entities certainly won’t die of old age)

And what happens when an entity “dies” (receives the Dead component)? Oh, I haven’t figured that out yet, so… There’s no code for that. Hopefully, though, you get a sense of how the ECS system creates disconnected pieces that combine to create a powerful effect.

Conclusion

There’s a number of warts in this 0.1 version that I will work on and tweak; the API footprint will change and get much cleaner. Some logic is sort of duplicated, some logic feels bolted on.

Nonetheless, this _works_ and I am proud of this accomplishment. If you decide to try out ecstatic, I’d be delighted to hear about your experiences with it, as well as any and all feedback you have!

Acknowledgements

Just about everyone in the MUD Coders’ Slack space provided inspiration, encouragement, guidance, advice, and support while I got started on this path. Without them, this post — as well as ecstatic — would never have seen the light of day. The folks I know I need to thank, in no particular order, are:

Outside the MUD Coders, I need to thank:

For everyone else who helped, and whom I didn’t list: thank you! I’m sorry I forgot you in this list.