6
votes

I am going to learn Ruby. I know it is a interpreted language. I know that compiled languages are translated to machine code eventually, but what does the ruby interpreter do? I read that the interpreter was written in C, but does each line of ruby convert to c, which again compiles to machine code? I also heard of JIT, but if that adds much of complexity to the answer you don't need to answer that. What I am looking for is what happens to my Ruby code.

1

1 Answers

8
votes

It converts the Ruby code into some form of simpler, "intermediate" representation (in recent versions, it compiles to bytecode). It also builds, in your computer's memory, a virtual machine that simulates a physical machine executing that representation.

This machine mirrors a physical one, at least as far as reasonable and useful. It frequently has a memory for instructions, a program counter, a stack for storing intermediate values and return adresses, etc. Some more sophisticated machines also have registers. There is a fixed and relatively primitive (compared to lanugages like Ruby, not compared to actual CPU instruction sets) instruction set. Like a CPU, the virtual machine loops endlessly:

  • Read the current instruction (identified by the program counter).
  • (Decodes it, although this is usually much simpler than in real CPUs, at least than the CISC ones.)
  • Executes it (propably manipulating stack and/or registers in the process).
  • Updates the program counter.

With an interpreter, all of this happens through a layer of indirection. Your actual physical CPU has no idea what it's doing. The VM is software itself, each of the steps above is delegates to the CPU in several (in cases with rather high-level bytecode instructions, possibly dozens or hundreds) physical CPU cycles. And this happens every time an instruction is read.

Enter JIT compilation. The simplest form just replaces each bytecode instruction with a (somewhat optimized) copy of the code that would be executed when the interpreter encountered it. This already gives a speed win, e.g. the program counter manipulation can be left out. But there are even smarter variants.

Tracing JITs, for example, start off as regular interpreter, and additionally observe the program they execute. Should they notice the program spends a lot of time in a particular section of code (almost always, a loop or a function called from loops), it starts to record what it does during this - it generates a trace. When it reaches the point where it started recording (after one iteration of the loop), it calls it a day and compiles the trace to machine code. But since it saw how the program actually behaves at runtime, it can generate code that fits this behaviour exactly. Take for example a loop adding integers. The machine code won't contain any of the typechecks and function calls the interpreter actually perform. At least, it won't contain most of them. It will, to ensure correctness, add checks that the conditions under which the trace was recorded (e.g. the variables involved are integers) still hold. When such s check fails, it bails out and resumes interpreting until another trace is recorded. But until that happens, it could have performed a hundred iterations at speed that rivals handwritten C code.