4
votes

I am currently developing a compiler for a very limited object oriented language. I want to treat all values as objects and operators on those values will be implemented as methods. The compiler transforms the program into assembler for a stack-based virtual machine.

During compilation I transform integer literals into objects of a special "Integer" class. Arithmetic operators are implemented as methods of that class (using inline assembler). So that 4 + 5 basically equals to 4.add(5).

The problem I am facing right now is the special case for boolean values. If there is an if statement:

if(10 > 5 || 12 < 10)

this would currently be transformed into: 10.greaterThan(5).or(12.lessThan(10))

Now obviously those integer literals can also be calls to a function with side-effects. Implementing those binary operators as method calls yields a problem in this case as short-circuit evaluation gets impossible.

So my questions are:

  1. How do other languages achieve short-circuit evaluation but still treating every value as an object?

  2. According to Wikipedia "ALGOL 68 used "proceduring" to achieve user defined short-circuit operators & procedures." - How does this work?

3

3 Answers

4
votes

The usual technique, I believe, involves either call by name or call by need. The idea is that the parameter of or is not the result of the comparison, instead it is the comparison itself converted in to a thunk, which is converted into the result value whenever (in call by name) or the first time (in call by need) it is needed.

When converting an expression into a thunk, you are basically creating an anonymous function, and you can treat the compilation problem as such. It involves compiling the expression into code that will evaluate the expression. It also involves creating an object (the thunk itself) which refers to (or contains copies of) those local variables that the expression uses, along with a pointer to the compiled code of the expression. The object needs to be interface compatible with your boolean class so that the code using it does not have to care whether it has a genuine boolean or a thunk posing as such. Whenever someone needs the boolean value, the thunk would execute the compiled expression code and provide the resulting value.

If call by name is sufficient to you, that's all there is to it. For call by need, there's added complexity from caching the result of the evaluation.

1
votes

IIRC ALGOL uses call-by-name parameters, and that is why that solution work.

The || operator can be implemented as (pseudo code):

if (cond1) goto label;
if (cond2) goto label;

label:
  <body>

nomatch:
  ...

For && the inverse of above can be done.

1
votes

re: According to Wikipedia "ALGOL68 used "proceduring" to achieve user defined short-circuit operators & procedures." - How does this work?

Algol68-r0 (original/unrevised definition) had two concepts related to short circuit evaluation.

Imagine the coder wants to define an matrix multiplication operator that did "short-circuit-evaluation", so once the left argument is a "zero" matrix, then there would be further evaluation of the right hand side... Such a user defined definition could be:

MODE MAT = FLEX[0,0]REAL;
OP ISZERO = (MAT a)BOOL: ¢ insert actual code here ¢ ~;

PRIO TIMESF = 7;
OP TIMESF = (MAT a, PROC MAT in b)MAT: 
  IF ISZERO a THEN a ELSE MAT b = in b; ¢ insert actual code here ¢ ~ FI;

MAT a = 0, b = 16, c = 25;
print(a TIMESF b TIMESF c) ¢ would print 0 without calculating 16*25 ¢

Conversely the coder want the the left-hand and the right-hand arguments to be evaluated in parallel. Such a user defined definitions could be:

PRIO TIMESPAR = 7;
OP TIMESPAR = (MAT a, MAT b)MAT: ¢ insert actual code here ¢ ~;

The comma tells the compiler that it is free to evaluate the left-hand and the right-hand in any order, or even in parallel. (This leaves the compiler the option to optimise evaluation)

Or the coder may want to force the evaluation to be sequential:

PRIO TIMESSEQ = 7;
OP TIMESSEQ = (MAT a; MAT b)MAT: ¢ insert actual code here ¢ ~;

In this case the ";" is called a "gomma", shofthand for "go on comma".

Algol68-r1 (revised in 1974, available at sourceforge for windows and linux) removed all these abilities leaving the coder to manually/specifically apply "proceduring"... eg

The first set of matrix definitions is the same:

MODE MAT = FLEX[0,0]REAL;
PRIO TIMESF = 7;
OP TIMESF = (MAT a, PROC MAT in b)MAT: 
  IF ISZERO a THEN a ELSE MAT b = in b; ¢ insert actual code here ¢ ~ FI;

MAT a = 0, b = 16, c = 25; ¢ 3 1x1 matrices are "widening" from REAL numbers ¢ 

But the usage is different, note the use of two lambdas (MAT:b TIMES MAT:c) and (MAT:c):

print(a TIMESF MAT:b TIMESF MAT:c) ¢ would print 0 without calculating 16*25 ¢

Transparent deproceduring and widening are retained in Algol68-r1.