2
votes

For all intents and purposes, I have a bunch of functions and function calls with this sort of AST structure. It's an array of functions.

const ast = [
  {
    type: 'function',
    name: 'doX',
    inputs: [
      {
        name: 'x',
        type: 'String'
      }
    ],
    calls: [
      {
        type: 'call',
        name: 'do123',
        args: [
          {
            type: 'literal',
            value: 123
          },
          {
            type: 'reference',
            value: 'x'
          }
        ]
      },
      {
        type: 'call',
        name: 'test',
        args: [
          {
            type: 'borrow',
            value: 'x'
          }
        ],
        block: [
          {
            type: 'call',
            name: 'set',
            args: [
              {
                type: 'reference',
                value: 'a'
              },
              {
                type: 'move',
                value: {
                  type: 'call',
                  name: 'multiply',
                  args: [
                    {
                      type: 'borrow',
                      value: 'x'
                    },
                    {
                      type: 'literal',
                      value: 2
                    }
                  ]
                }
              }
            ]
          },
          {
            type: 'call',
            name: 'doFoo',
            args: [
              {
                name: 'a',
                value: {
                  type: 'literal',
                  value: 'foo'
                }
              },
              {
                name: 'b',
                value: {
                  type: 'reference',
                  value: 'x'
                }
              }
            ]
          }
        ]
      }
    ]
  }
]

That would be the result of this:

function doX(x) {
  do123(123, x)
  test(x) {
    set(a, multiply(x, 2))
    doFoo('foo', a)
  }
}

Forget now the fact that I am also trying to handle lexical scopes (i.e. nested functions), because that will probably just make this question unnecessarily complicated.

Also note that everything is a function, so you have set(foo, 'bar') to instantiate a variable. Although, it executes in an imperative fashion, it's not a functional language AST. This is just to simplify the question, so there's not all kinds of complicated types getting in the way. There are just functions and calls for this example.

Notice too that we have borrow in there (one of a few types of "references"). You can have either a borrow (share ownership) or a move (transfer ownership), and the borrow can be marked mutable or not (defaults to not). The goal is to reproduce what Rust does, to have this mini/demo "compiler" do exactly what Rust does with the borrow-checker.

Also, we have created a new local scope in that function, and defined the variable a.

The goal is to figure out this:

The lifetime of each variable (when it can be freed from memory, like in Rust). And to insert a free(name) call into the AST.

With the Rust borrow-checker, it checks for the owner and lifetime of the variable so it can tell if it is used properly and when it falls out of scope.

So first we must gather the variable declarations (and just hoist them, so we know how many local variables there are, so we can create the activation record at the right size). To do this I think we just need to traverse down the AST one time.

Second, I start to get lost of what exactly to do to accomplish (1). First, create a map. Then go through the AST from the beginning. For each variable, check if it is in the map (if it has been marked/traversed/encountered yet). If it is not in the map, add it to the map, don't really know why we need to do this yet. Create a new map for each scope. At the end of each scope, free the variable. This is sort of where I'm at.

process(ast)

function process(ast) {
  ast.forEach(fxn => {
    let stack = []
    let locals = { count: 0, vars: {} }
    fxn.inputs.forEach(input => {
      locals.vars[input.name] = locals.count++
    })
    fxn.calls.forEach(call => {
      handleCall(call, locals)
    })

    function handleCall(call, locals) {
      if (call.name == 'set') {
        let name = call.args[0].value
        locals.vars[name] = locals.count++
      }
      if (call.block) {
        call.block.forEach(nestedCall => {
          handleCall(nestedCall, locals)
        })
      }
    }
  })
}

Now the question is, how do you do add the borrow-checking so that you know where to insert the free(name)?

process(ast)

function process(ast) {
  ast.forEach(fxn => {
    let stack = []
    let locals = { count: 0, vars: {} }
    fxn.inputs.forEach(input => {
      locals.vars[input.name] = {
        id: locals.count++,
        status: '?'
      }
    })
    fxn.calls.forEach(call => {
      handleCall(call, locals)
    })

    function handleCall(call, locals) {
      if (call.name == 'set') {
        let name = call.args[0].value
        let local = locals.vars[name] = {
          id: locals.count++
        }
        let value = call.args[1]
        if (value.type == 'move') {
          local.status = 'owner'
        } else if (value.type == 'borrow') {
          local.status = 'borrow'
        } else {
          // literal
        }
        if (value.value.type == 'call') {
          handleCall(value.value, locals)
        }
      } else {
        
      }
      if (call.block) {
        let newLocals = {}
        call.block.forEach(nestedCall => {
          handleCall(nestedCall, newLocals)
        })
      }
    }
  })
}

I start to get lost in the weeds, don't see the forest for the trees. I have read a lot about the borrow-checkr in Rust so far but don't know how it's implemented. I have looked through the source code of Polonius, read through most of the Oxide paper, and read through the docs on lifetimes, borrowing, mutability, and ownership, as well as some of the compiler group meeting notes and blog posts. But none seem to explain in a simple way an algorithm for doing the borrow checking in practice.

Looking for some help on this specific example to get me started on an algorithm for a borrow-checker in JavaScript. Wondering if one could outline what the algorithm should do in order to figure out if the variables are properly borrowed and when they can be freed, using this or a slightly more complicated came-up-with example.

Before I can really write the algorithm though, I need to have a better sense of what the algorithm should be doing, how it should work. Which is what this question is about. If you know how to write a demo of it, that would be great! But just having a deeper explanation of steps (and not glossing over key steps) will be helpful too.

1
I feel your pain. I'm not trying to reimplement the borrow checker, but just like you I'm trying to understand how it works including all edge cases. Don't give up man. I will never give up either. My hope is that the situation will be better after the polonius working group has finnished it work. I'm thinking aboout starting to contribute to rust just in order to understand how it works and make the documentation better.Supremum
That would be great, if you figure anything out of the deeper details definitely share on the web and let me know, I'm looking forward to implementing some of these ideas in a compiler. Cheers.Lance Pollard
I know that isn't answer but it's a fix for last code: let newLocals = {count: 0, vars: {}} also may be it will be helpful rust-lang.github.io/poloniusDaniil Loban
what the algorithm should do in order to figure out if the variables are properly borrowed and when they can be freed This is almost the opposite of how the Rust borrow checker works. In Rust borrowck is pass/fail and only runs when the compiler has already figured out where and when to drop (free) things. No information from lifetime analysis is fed back into code generation or any other compiler pass.trentcl

1 Answers

2
votes

There are a few issues I can see:

  1. I don't see any reference syntax in your code, since I don't see any "&"s. The only thing you have are moves. If you start to break away from that syntax, it begins to ruin things.
  2. You're also using both "reference" and "borrow" in your javascript, which is confusing, because they mean the same thing in Rust.
  3. You don't have a type for doX's parameters, which means you can't handle that variable properly, because it could be a move, which could cause scope problems for the calling function.
  4. How did b become a reference to x?

Rust / Understanding Ownership:
https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html

Here's a synopsis of the above link:

For all of this, when I say "variable," I mean "variable that uses the heap." Also, feel free to substitute/interpret "reference" as "borrow."

A variable gets dropped at the end of its scope, if it is still valid. Dropping means freeing the memory. Scope is from when the variable is introduced to the last time it's used. If there are any references to this variable that are still in scope, it's an error.

By default, variables are moved instead of copied.

When a variable is copied, a new, unique variable is created and the data is copied. This creates an entirely new, independent variable.

When a variable is moved to another variable, the initial variable is marked invalid and can no longer be used. (A big tradeoff of this style of memory management.) The new variable points to the same heap data that the old variable did. If the initial variable is used after being marked invalid, it's an error.

A variable can be moved by assigning it to another variable in one of three ways:

  1. directly. e.g. x = y
  2. by setting the value of a function parameter. e.g. f(x)
  3. by returning it from a function, e.g. x = f()

If you pass a variable to another function and want to continue using it after that call, then the function has to "give it back" by returning it (another major deviation from expectations). This is just (2) followed by (3), e.g. x = f(x).

It's possible to create two types of references to a variable: immutable (default) and mutable. A reference just points to the variable instead of the data.

You can have an unlimited number of immutable references to a variable. You can only have one mutable reference, and only if you have no other types of references (including immutable) in scope.

When references are out of scope, they do not call drop. It's an error for references to continue to exist when the variable to which they point has been dropped.


If I were to implement this, I would do the following in order:

  • get scope working. This is the time from when a variable or reference is first introduced to the time it is last used. Note that scopes in the same block may or may not overlap.
  • get copy working
  • get move working, including the drop aspect. detect scope extending beyond validity. Do this in stages, for move types (1), (2), (3) as shown above.
  • get immutable reference working, error if attempt to change variable, error if scope beyond validity.
  • get mutable reference working, error if any other reference in scope, error if scope is beyond validity.