14
votes

I am trying to build a static analysis tool for a demo project. We are free to choose the language to analyze. I started off by writing a Java code analyzer using ANTLR. I now want to do the same for Scala code. However, I could not find the ANTLR grammar for Scala. Does it exist? Is there any other machine readable form of Scala grammar?

4

4 Answers

14
votes

I don't believe there is such a thing.

The thing is that for any language, but especially for a library language like Scala, lexical analysis and syntactic analysis is the least interesting and most trivial part of static analysis. In order to do anything even remotely interesting, you need to perform a significant amount of semantic analysis: desugaring, type inference, type checking, kind checking, macro expansion, overload resolution, implicit resolution, name binding. In short: you need to re-implement more or less the entire Scala compiler, modulo the actual code generation part. Remember that both Scala's macro system and Scala's type system are Turing-complete (in fact, Scala's macro system is Scala!): there could be significant compile-time and type-level computation going on that is impossible to analyze without actually performing macro expansion, type inference and type checking.

That is a massive task, and there are in fact only two projects that have successfully done it: one is the Scala compiler itself, the other is the IntelliJ IDEA Scala plugin.

And let's not even talk about compiler plugins, which are able to change the syntax and semantics of Scala in almost arbitrary ways.

But behold, there is hope: The Scala compiler itself provides an API called the Presentation Compiler, which is specifically designed for use by IDEs, code highlighters, and all kinds of static analysis tools. It gives you access to the entire information the compiler has during compilation, just before the optimization and code generation phases. It is used by ScalaDoc, the Scala REPL, the Scala Eclipse Plugin, the NetBeans Scala Plugin, SimplyScala.Com, the ENSIME Plugin for Emacs, some static analysis tools, and many others.

8
votes
2
votes

Is Appendix A of the Scala Language Reference useful for you? It is in EBNF format.

1
votes

Scalastyle uses scalariform to do the parsing for it. With this, you get an AST of case classes. However, you only get the information which is in the file, so for instance, you don't get inferred types.

If you don't need all of the extra information, then look at Scalariform. The Scalastyle code is fairly easy to understand, start with Checker.scala.