0
votes

I'm working on a lexer in Rust.

Desired API:

enum Token<'input> {
    Name(&'input str)
}

let mut lexicon = LexiconBuilder::<Token>::new()
    .token("[a-zA-Z]+", |s| Token::Name(s))
    // among others
    .build();

let mut lexer = Lexer::new(lexicon, "input");

The idea is the user can provide a regular expression, along with a closure that gets run when the regex matches the input text. However, I'm having trouble proving to the lifetime checker that the slice that gets passed into token()'s closure lives long enough. From my POV, it seems safe, as the tokens aren't returned until you provide a string.

Rust Playground link

I've spent quite awhile trying to thread input lifetime through all of the types, however I can't ever seem to prove that the lexicon's (ergo, the rule's handler) lifetime will match/dominate the input's.

1
This API seems implausible to me, because it is strange that a lexer should be parameterized by the type of token it outputs -- I feel that would logically be determined by the lexer itself. lexicon() in particular stands out because there's almost never any reason to have a function with an output lifetime but no input lifetimes. I imagine there is some reason why you're not just doing this, but I figured I'd just mention it anyway. - trentcl
Good feedback. I ended up having the lexer's next() method return the rule ID, the current slice, and the position. The library is meant to be substrate for the user's lexer, hence being parameterized by the token type. - Matt Green

1 Answers

5
votes
type Handler<T> = fn(&str) -> T;

Is not a complete type. The &str needs to have a lifetime on it, but one is not specified. Lifetime elision means that this expands to

type Handler<T> = for<'a> fn(&'a str) -> T;

So Handlers don't know the lifetime of the &strs being given to them. For some 'input, to construct a Rule<Token<'input>>, you need a Handler<Token<'input>>, but that means you need for<'a> fn(&'a str) -> Token<'input>, where the Token wants a &'input str but you only have a &'a str. You need to make 'input a parameter of Handler, so it can restrict the arguments it will accept:

type Handler<'input, T> = fn(&'input str) -> T;

And this must propagate through all your other types. Playground link.

The code in your question is incomplete, and the code in the playground doesn't match it. If you've already tried this, then you're going to have to tell us what went wrong more clearly.