As far as I understand Java compiles to Java bytecode, which can then be interpreted by any machine running Java for its specific CPU. Java uses JIT to interpret the bytecode, and I know it's gotten really fast at doing so, but why doesn't/didn't the language designers just statically compile down to machine instructions once it detects the particular machine it's running on? Is the bytecode interpreted every single pass through the code?
6 Answers
The original design was in the premise of "compile once run anywhere". So every implementer of the virtual machine can run the bytecodes generated by a compiler.
In the book Masterminds for Programming, James Gosling explained:
James: Exactly. These days we’re beating the really good C and C++ compilers pretty much always. When you go to the dynamic compiler, you get two advantages when the compiler’s running right at the last moment. One is you know exactly what chipset you’re running on. So many times when people are compiling a piece of C code, they have to compile it to run on kind of the generic x86 architecture. Almost none of the binaries you get are particularly well tuned for any of them. You download the latest copy of Mozilla,and it’ll run on pretty much any Intel architecture CPU. There’s pretty much one Linux binary. It’s pretty generic, and it’s compiled with GCC, which is not a very good C compiler.
When HotSpot runs, it knows exactly what chipset you’re running on. It knows exactly how the cache works. It knows exactly how the memory hierarchy works. It knows exactly how all the pipeline interlocks work in the CPU. It knows what instruction set extensions this chip has got. It optimizes for precisely what machine you’re on. Then the other half of it is that it actually sees the application as it’s running. It’s able to have statistics that know which things are important. It’s able to inline things that a C compiler could never do. The kind of stuff that gets inlined in the Java world is pretty amazing. Then you tack onto that the way the storage management works with the modern garbage collectors. With a modern garbage collector, storage allocation is extremely fast.
Static compiling can blow up on you because all the other libraries you use also need to be write-once run everywhere (i.e. byte-code), including all of their dependencies. This can lead to a chain of compilations following dependencies that can blow up on you. Compiling only the code as (while running) the runtime discovers it actually needs that section of code compiled is the general idea I think. There may be many code paths you don't actually follow, especially when libraries come into question.
One important difference of dynamic compiling is that it optimises the code base don how it is run. There is an option -XX:CompileThreshold=
which is 10000 by default. You can decrease this so it optimises the code sooner, but if you run a complex application or benchmark, you can find that reducing this number can result in slower code. If you run a simple benchmark, you may not find it makes any difference.
One example where dynamic compiling has an advantage over static compiling is inlining "virtual" methods, esp those which can be replaced. For example, the JVM can inline up to two heavily used "virtual" methods, which may be in a separate jar compiled after the caller was compiled. The called jar(s) can even be removed from the running system e.g. OSGi and have another jar added or replace it. The replacement JAR's methods can then be inlined. This can only be achieved with dynamic compiling.