3
votes

So I'm writing a little interpreter in C at the moment for a language I have created (which is pretty similar to Python). I have written the lexer and parser and currently my program outputs an AST, an now I am attempting to turn this AST into bytecode. Currently my algorithm traverses the AST (depth-first) and can generate bytecode for simple arithmetic, and now I am trying to implement if statements.

I cannot copy all my code in here because it's a pretty large amount of code, but currently the program takes an AST which might look something like

ADD
|-- 1
|-- MUL
    |-- 2
    |-- 3

and turns this into

LOAD 1 //the real code doesn't put the value here, but a number representing the position of this value in an array
LOAD 2
LOAD 3
MUL
ADD

This is easy for simple expressions but I really don't know how to generate the bytecode for an if statement. I know that I will have to jump to the else clause if the comparison is false, and also jump from the end of every if/else if block, but how do I deal with this if the jump is more than 256 bytes of bytecode?

1
Why is your jump distance limited to 256?Jongware
@usr2564301 because it is bytecode, an each byte is 8bits, allowing 256 bytes to jump over.dangee1705
If i use multiple bytes that will waste space for jumps less than 256 bytesdangee1705
Not necessarily. Some bytecode are 16 bits.Basile Starynkevitch
@DanielGee: yes, you can have variable-length opcodes or operands. For instance a relative distance of 0 is clearly useless, so you can have the range -128..127 excluding 0, and treat 0 as meaning "read two more bytes to get a 16 bit number". If the first byte of that 16 bits is zero, there was no point in encoding it this way, so you could treat that as a flag meaning "read 3 or 4 bytes to get a 24 or 32 bit number" (depending on whether you think 24 bit numbers are good), and so on. Or, you can just have FARJUMP, etc.torek

1 Answers

2
votes

You should read SICP, the Dragon Book, then Lisp In Small Pieces.

I hope you can redesign your bytecode. Then you could have some FARJUMP bytecode, which is followed by four bytes a, b, c, d (considered as uint8_t unsigned integers of 8 bits each), for which you'll jump to (a<<24) + (b<<16) + (c<<8) +d offset.

You probably want to be able to jump backward and forward. Either have a BACKFARJUMP to jump backward, or use some signed offset...

With such an opcode, you'll be able to jump to over four billion bytecodes (232 exactly). That could be more easy.

If a four billion bytecode offset is not enough, you could generalize.

Don't forget that your computer is unlikely to have more than a terabyte of RAM (and such a computer costs more than a car).