9
votes

Does the Just In Time Compiler(JIT) really map each of the Common Intermediate Language(CIL) instructions in a program to underlying processor's opcodes?

And If so can we call CIL an assembly language and JIT an assembler

Note: Wikipedia doesn't list CIL as an assembly language in its list of assembly languages

4
interesting question, I tryied to reply, but it is not so easy. I think you can't consider it an assembly language since tehre is no real cpu running it directly.Felice Pollano
@FelicePollano then CIL maybe a partial assembly language..:)Anirudha
Assembly language mnemonics correspond 1:1 with CPU specific machine code instructions. An assembler just maps the (sorta) human-readable assembly code to those instructions. This is definitely not the case with CIL. It's not partial, it just isn't - assembly language has a very clear definition.Jamie Treworgy
@jamietre you are right but then y people call it an Object Oriented Assembly languageAnirudha
I'm not even sure if "object-oriented" is that important for CIL (even though the CLI's architecture clearly favours the OO paradigm). Its stack-based evaluation model is much more prominent, as is the emphasis on providing metadata besides bytecode. your typical assembly language wouldn't care about metadata at all.stakx - no longer contributing

4 Answers

9
votes

This question is all about definitions, so let's define the terms properly. First, assembly language:

Assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices in which each statement corresponds to a single machine language instruction. An assembly language is specific to a certain computer architecture, in contrast to most high-level programming languages, which generally are portable to multiple systems.

Now, CIL:

Common Intermediate Language is the lowest-level human-readable programming language defined by the Common Language Infrastructure (CLI) specification and is used by the .NET Framework and Mono. Languages which target a CLI-compatible runtime environment compile to CIL, which is assembled into an object code that has a bytecode-style format.

Okay, this part is technically not correct: for example C# compiler compiles directly to the bytecode, it doesn't go through CIL (the human-readable language), but theoretically, we can imagine that's what's happening.

With these two definitions, CIL is an assembly language, because each statement in it is compiled down to a single bytecode instruction. The fact that there is no physical computer that can execute that bytecode directly doesn't matter.

The definition says that each assembly language is “specific to a certain computer architecture”. In this case, the architecture is the CLR virtual machine.


About JIT: the JIT compiler can't be considered an assembler: it doesn't do the 1:1 translation from human-readable form to bytecode, ilasm does that.

The JIT compiler is an optimizing compiler that compiles from bytecode to native machine code (for whatever ISA / CPU it's running on), while making optimizations.

4
votes

Assembly is made up of mnemonics for the machine code instructions of a particular processor. A direct representation of the 1s and 0s that make the core execute code, but written in text to make it easy on a human. Which is very unlike CIL:

  • you can't buy a processor that executes CIL
  • CIL doesn't target a specific processor, the jitter does
  • CIL assumes a stack-based execution model, processors are primarily register based
  • CIL code is optimized from its original form
  • there is no one-to-one translation of a CIL instruction to a processor instruction

That last bullet is a key one, a design decision that makes CIL strongly different from bytecode is that CIL instructions are type-less. There is only one ADD instruction but processors have many versions of it. Specific ones that take byte, short, int, long, float and double operands. Required because different parts of the processor core are used to execute the add. The jitter picks the right one, based on the type of the operands it infers from previous CIL instructions.

Just like the + operator in the C# language, it also can work with different operand types. Which really make the L in CIL significant, it is a Language. A simple one, but it is only simple to help make it easy to write a jitter for it.

2
votes

The line is actually pretty blurry... the arguments I've seen against calling CIL an "assembly language" can apply almost as well to x86/x86-64 in practice.

Intel and AMD haven't made processors that execute assembly instructions exactly as emitted in decades (if ever), so even so-called "native" code is not much different from running on a virtual machine whose bytecode is specified in x86/x86-64.

x86/x86-64 are the lowest-level thing typical developers have access to, so if we had to put our foot down and call something in our ecosystem an "assembly language", that would win, and since CIL bytecode ultimately requires x86/x86-64 instructions to be able to run on a processor in that family, then there's a pretty strong case to be made that it indeed doesn't "feel" like it should count.

So in a sense, maybe neither can be considered to be "assembly language". When referring to x86/x86-64 processors, we almost never refer to processors that execute x86/x86-64 without translating it into something else (i.e., whatever the microcode does).

To add in yet another wrinkle, the way in which an x86/x86-64 processor executes a given sequence of instructions can change simply by updating the microcode. A quick search shows that Linux can even make it easy to do this yourself in software!

So I guess, here are criteria that can justify putting them in two separate categories:

  1. Does it matter that all current machines that run CIL bytecode are implemented in software?
  2. Does it matter that the same hardware can interpret the same x86/x86-64 instructions in a different way after being instructed to do so in software?
  3. Does it matter that we don't currently have a way of bypassing the microcode and issuing commands directly to the physical units of x86/x86-64 processors?

So regarding the "is CIL an assembly language` question, the best answers I can give are "it depends" (for scientists) and "pretty much" (for engineers).

1
votes

The CIL is more a bytecode than an assembly language. In particular, it is not a textual human readable form, unlike assembler languages (Probably CIL also defines the format of bytecode files).

The MSIL JIT is an implementation of a virtual machine for that bytecode. How implementations (from Microsoft or from Mono) translate CIL into machine code is an implementation detail which should not really matter to you (and given that Microsoft VM is probably proprietary, then won't tell you how it is done). I think that Mono -a free software implementation of CIL- is using LLVM so probably don't translate each bytecode at a time but probably entire methods or functions.