6
votes

As an exercise to learn Haskell (and torture myself), I am considering writing a configurable Haskell code beautifier.

It will support a configuration file written in JSON or YAML (or something better?) that specifies choices like sorting imports, sorting/grouping data and class statements, number of lines between sections, etc.

I am looking for a parser for Haskell 98 that generates an abstract syntax tree (AST) and retains comments. Parsing GHC, with its language extensions, will be a bonus.

In the absence of such a thing, I guess I can write a recursive-descent parser or one using Parsec or a parser generator. Maybe rolling my own will increase the learning (and torture :-)).

Is there a complete Haskell->AST parser available under one of the open source licenses? If I make any progress on this project, I'll put it up on github.

2
One of my pet peeves is ugly code. Most of my work is in Java and it amazes me how people can check in code with no brace alignment, inconsistent spacing around operators, etc. Don't they read their own code? Doesn't it bother them? Maybe its my OCD. Anyway, I agree that out-of-box Haskell is an order of magnitude nicer looking. But I still want to write a beautifier :-).Ralph
Yes. I really was joking, because I like the sort of things you're suggesting. Why not use haskell itself for the language of the config file? I'm sure I read about someone gradually evolving their domain specific language for config until they realised they just wanted haskell, but this is the closest I could find. Or you could use the ConfigFile package.AndrewC

2 Answers

16
votes

There is a parser available in the haskell-src-exts package. Not only does the parser parse most of the GHC extensions; it also recognizes common extensions such as syntactic XML literals and so on. You should use the parseModuleWithComments function if you also want to gain access to comment information.

Note, however, that the comments aren't stored in the actual syntax tree; they are stored as a separate list of comments with location information. It should be rather trivial to include the comments in the tree, if you really need to have them there, by merging the tree with the list using a linear merging algorithm (both sequences can be considered "sorted"). The comments can even be stored along with associated AST nodes, because "annotated" ASTs can contain arbitrary meta data in each node (by default, only SrcSpanInfo). The reason why this hasn't been done in the actual haskell-src-exts package is presumably because the AST parser was written before the comment parser.

2
votes

I've written a super simple tool which autoformats Haskell code. It does by using the parsing and pretty printing functions from haskell-src-exts. You can find it at https://github.com/djv/small/blob/master/tidy.hs. It could be a start for something more flexible and powerful.