6
votes

I'm considering solving a problem using Elixir, mainly because of the ability to spawn large numbers of processes cheaply.

In my scenario, I'd want to create several "original" processes, which load specific, immutable data into memory, then make copies of those processes as needed. The copies would all use the same base data, but do different, read-only tasks with it; eg, imagine that one "original" has the text of "War and Peace" in memory, and each copy of that original does a different kind of analysis on the text.

My questions:

  • Is it possible to copy an existing process, memory contents and all, in Elixir / the Erlang VM?
  • If so, does each copy consume as much memory as the original, or can they share memory, as Unix processes do with the "copy on write" strategy? (And in this case, there would be no subsequent writes.)
3
Processes don't share, or rather only very limited specific types of data. Large binaries are shared.rvirding
I think I may have asked the wrong question. Looking back, I really wanted to ask "can I load or build some readonly data, spawn multiple processes, and have them all use it?" The answer is yes. Eg, the data could be stored in an Elixir module attribute or function body during compilation using a macro, and spawned processes could call the function or access the attribute at runtime.Nathan Long

3 Answers

6
votes

There is no built-in way to copy processes. The easiest way to do it is to start the "original" process and the "copies" and send all the relevant data in messages to the copies. Processes don't share data so there is no more efficient way of doing it. Putting the data in ETS tables only partially helps with sharing as the data in the ETS tables are copied to the process when they are used, however, you don't need to have all the data in the process heap.

5
votes

An Erlang process has no process-specific data apart from what's stored in variables (and the process dictionary), so to make a copy of the memory of a process, just spawn a new process passing all relevant variables as arguments to the function.

In general, memory is not shared between processes; everything is copied. The exceptions are ETS tables (though data is copied from ETS tables when processes read it), and binaries larger than 64 bytes. If you store "War and Peace" in a binary, and send it to each worker process (or pass it along when you spawn those worker processes), then the processes would share the memory, only copying it if they wanted to modify the binary. See the chapter on binaries in the Erlang efficiency guide for more details.

1
votes

You are thinking of Erlang/Elixir processes as similar to Unix processes. They aren't at all, I really wish they had a different name, because they really aren't either threads or processes in the standard Unix sense. It took me some time to wrap my head around the differences.

You have to throw out all your preconceived ideas about processes, they are all wrong. Eprocesses have the following characteristics.

  • They are cheap and fast. Use lot's, there are always more.

  • They share no resources[1]. ( Even writing to stdout is a message to another Eprocess. )

  • IPC ( or messages ) are very fast with relatively low overhead compared to standard Unix IPC.

What I would try in your case is to create a server that managed the data and have each analysis worker message the server for data chunks that it needs. It's perfectly acceptable to have an Eprocess be more or less a manager of shared memory.

To me the most useful way to think of Eprocesses is as objects with their own thread of execution.

[1] Well, there is the ETS table, but it's best to think of them as not sharing resources until you absolutely have to.