I have heard this concept used frequently, but I don't have a really good grasp of what it is.
10 Answers
I beg to differ, Wikipedia is pretty clear on this.
In computer science, marshalling (similar to serialization) is the process of transforming the memory representation of an object to a data format suitable for storage or transmission. It is typically used when data must be moved between different parts of a computer program or from one program to another.
Marshalling is the process of transforming the memory representation of an object to a data format that could be stored or transmitted. It's also called serialization (although it could be different in certain contexts). The memory representation of the object could be stored as binary or XML or any format suitable for storage and/or transmission in a way that allows you to unmarshal it and get the original object back.
For an example of usage, if you have some online game with a client and server components and you wanted to send the player object containing player stats and world coordinates from the client to the server (or the other way around), you could simply marshal it at the client, send it over the network, and unmarshal it at the other end and it would appear for the server as if the object was created on the server itself. Here's a ruby example:
srcplayer = Player.new
# marshal (store it as string)
str = Marshal.dump(srcplayer)
#unmarshal (get it back)
destplayer = Marshal.load(str)
I clarified a google search to "data marshalling" and the first hit was on some place called webopedia which is pretty good. The gist is that you transform data back and forth to a form for things like transmission over a network. The problem it solves is that you can't really transmit data over a network in a form that is usable by a program. You have to solve a number of issues including things like endianness of data, how you store complex data types like strings, etc.
Marshalling is not just to solve network transmission problems but other problems such as going from one architecture to another, maybe different languages especially those that might use things like virtual machines, and other "translation" problems.
Marshalling is the process of transferring data across application boundaries or between different data formats. Marshalling is very common, for example writing data to disk or to a database is technically marshalling, however the term tends to be used to describe data conversion for "foreign" APIs or for interprocess communication.
For example, in .NET, communicating between managed and unmanaged code (such as accessing certain win32 APIs) will likely require marshalling in order to convert back and forth between managed C# objects and C/C++ style objects (structs, handles, output buffers, etc.) The help for the static Marshal class might be helpful.
Basically it's an expression for generically transforming an object (or similar) into another representation that (e.g.) can be sent over the wire or stored to disk (typically string or binary stream. The opposite, unmarshalling, describes the opposite direction of reading the marshalled representation and re-creating an object or whatever in-memory-structure existed earlier.
Another current everyday example is JSON
Marshalling is the conversion of call parameters that needs to occur when calling across an ABI boundary. The boundary may be between a COM client and a COM server, where the types of the ABI of the COM client need to be marshalled by the COM library to the ABI of the COM binary (in COM, marshalling can also refer to the conversion of parameters required when crossing an apartment boundary within the same process to the format of a message to be sent to the owning thread's message queue to be then handled and unmarshalled by the COM window procedure, and in the event of crossing a process boundary, the additional step of marshalling to an RPC/LPC by a COM proxy, i.e. an LPC message to an LPC port). The boundary may be between the execution of high-level code in a virtual environment and the native code that the environment is implemented in / set up the environment, where a conversion takes place between the ABI of the high-level code, implemented in an ABI of the native language, and a typical ABI of the native language concerning those types.
One example of the second case is Mono .NET. You can call managed code (high level language, which is managed and run by the virtual machine library and represented by internal objects and structures) from unmanaged (native) code (C++, which isn't managed by the virtual machine library, but instead is linked to the library), and you can also perform native calls from C# to unmanaged (native) code (C++) based on internal bindings made by C++ code when setting up the virtual machine using the virtual machine library API. For instance, System.String
in C# is internally represented by a MonoString
. MonoString
is a C++ object which uses the C++ ABI, but in a different way to how it is used standardly and how the native code expects parameters of a string type to be represented in the ABI, because the VM library has logically implemented its own ABI using a certain arrangement of the C++ ABI -- boxed in a C++ object of type MonoString*
instead of const wchar_t*
. Passing a System.String
to a native call in C# using P/Invoke (which performs automatic marshalling) causes a const wchar_t*
to be passed to the native call as automatic marshalling takes place. When you use internal calls however, it will be passed as a MonoString*
, which the C++ function will have to marshal itself and then marshal whatever it needs to return back to a type of the VM's logical ABI. Only blittable types don't need to be marshalled when using internal calls, for instance, int
, which is a System.Int32
is passed as a gint32
, which is just an int
.
Another example is Spidermonkey JS engine, which marshals between a C++ native type of HTMLElement
and an internal runtime representation, JSObject
, which represents the HTMLElement
type in javascript.