4
votes

Since syscall emulation is much easier to setup, I'm wondering what are the advantages of using the full system emulation when running an userland program.

Or in other words, what interesting aspects are modeled in the full system but not syscall emulation mode, and when are they significant?

It is mentioned in the docs at: http://gem5.org/Splash_benchmarks that full system is

Realistic: you're getting the actual Linux thread scheduler to schedule your threads

Is this the only advantage, or are there any other advantage for users that are optimizing their applications or investigating micro-architecture?

I also suspect that the MMU simulation is another important feature that is only modeled properly in full system mode, and could affect program performance.

2

2 Answers

3
votes

Full system mode should be preferred (when it is possible to use it). There are benefits to using it, primarily fidelity in the simulation which is not possible with system call emulation mode. (The kernel interactions with an application can be important depending on the study that a researcher is trying to conduct.) Also, the user does not need to worry about implementing (or debugging) the system call implementation.

With that said, system call emulation mode can be useful under the right conditions. It is faster to run application code because there is no kernel running in the background. There is also no system noise if you want to mitigate it entirely. Arguably, it is easier to bootstrap a new device model as well. You can work on the model without driver support and make magic happen though fake interfaces. (It saves you having to model the bare-metal interface perfectly or having to write your own device driver.)

Your comments about dynamic linking and multi-threading support are related. If dynamic linking is fixed, you should be able to use your system's pthreads library and can forget about linking with m5threads entirely. The pthread library support has existed in the simulator for a while now (the system calls necessary for it to work properly).

However, there's a caveat to the threading implementation. You need to preallocate enough thread contexts at the start of simulation (by invoking with the -n option on the se.py script).

To elaborate, there is no operating system running in the background to schedule threads on the processors. (I use the terms threads and processors very loosely here.) To obviate the scheduling problem, you have to preallocate enough processors so that the threads can be created on calls to clone/execve. There is a constraint that you can never have more threads than processors (unlike a real system where the operating system can schedule them as it pleases).

The configuration scripts probably do not behave how a researcher would want them to behave for a multi-threaded workload. The researcher would need to verify that the caches were configured correctly and that they are sharing certain cache levels like a real machine would do. If the application calls clone/execve many times, it may not be possible to cause the generated configuration to behave realistically.

Your last statement about modeling accelerators is incorrect. The AMD GFX8 model does use system call emulation mode. (Also, we developed a NIC model which was never publicly released.) It involves creating a fake driver and manipulating it through the same ioctl interfaces that a real driver would use. Linux treats everything like a file so the driver is opened through the open system call interface and you can capture it there. There are other things which you might need to do (like map mmio ranges in the configuration), but the driver interface is the main piece. The application interacts with the driver and the driver interacts with the accelerator model.

3
votes

Advantages of SE:

Disadvantages of SE:

So my recommendation is:

  • try SE first. If it works, great. If it doesn't, try to fix it quickly, since most problems are trivial. Having the SE setup will save you a lot of time over full system, and it is often representative enough.

  • otherwise, use FS mode. It is just simpler to setup, more representative, and the performance hit is acceptable for most.

    You could also use SE first, and then go to FS to further validate only your most important SE results, since FS is slower and you can therefore validate less different setups.