14
votes

I am writing a Delphi code parser using Parsec, my current AST data structures look like this:

module Text.DelphiParser.Ast where

data TypeName = TypeName String [String] deriving (Show)
type UnitName = String
data ArgumentKind = Const | Var | Out | Normal deriving (Show)
data Argument = Argument ArgumentKind String TypeName deriving (Show)
data MethodFlag = Overload | Override | Reintroduce | Static | StdCall deriving (Show)
data ClassMember = 
      ConstField String TypeName
    | VarField String TypeName
    | Property String TypeName String (Maybe String)
    | ConstructorMethod String [Argument] [MethodFlag]
    | DestructorMethod String [Argument] [MethodFlag]
    | ProcMethod String [Argument] [MethodFlag]
    | FunMethod String [Argument] TypeName [MethodFlag]
    | ClassProcMethod String [Argument] [MethodFlag]
    | ClassFunMethod String [Argument] TypeName [MethodFlag]
     deriving (Show)
data Visibility = Private | Protected | Public | Published deriving (Show)
data ClassSection = ClassSection Visibility [ClassMember] deriving (Show)
data Class = Class String [ClassSection] deriving (Show)
data Type = ClassType Class deriving (Show)
data Interface = Interface [UnitName] [Type] deriving (Show)
data Implementation = Implementation [UnitName]  deriving (Show)
data Unit = Unit String Interface Implementation deriving (Show)

I want to preserve comments in my AST data structures and I'm currently trying to figure out how to do this.

My parser is split into a lexer and a parser (both written with Parsec) and I have already implemented lexing of comment tokens.

unit SomeUnit;

interface

uses
  OtherUnit1, OtherUnit2;

type
  // This is my class that does blabla
  TMyClass = class
  var
    FMyAttribute: Integer;
  public
    procedure SomeProcedure;
    { The constructor takes an argument ... }
    constructor Create(const Arg1: Integer);
  end;

implementation

end.

The token stream looks like this:

[..., Type, LineComment " This is my class that does blabla", Identifier "TMyClass", Equals, Class, ...]

The parser translates this into:

Class "TMyClass" ...

The Class data type doesn't have any way to attach comments and since comments (especially block comments) could appear almost anywhere in the token stream I would have to add an optional comment to all data types in the AST?

How can I deal with comments in my AST?

1
see also stackoverflow.com/questions/9392546/… and check how these ASTs are defined. E.g., hackage.haskell.org/package/haskell-src-exts-1.16.0.1/docs/… is polymorphic in the annotation.d8d0d65b3f7cf42

1 Answers

12
votes

A reasonable approach for dealing with annotated data on an AST is to thread an extra type parameter through that can contain whatever metadata you like. Apart from being able to selectively include or ignore comments, this will also let you include other sorts of information with your tree.

First, you would rewrite all your AST types with an extra parameter:

data TypeName a = TypeName a String [String]
{- ... -}
data ClassSection a = ClassSection a Visibility [ClassMember a]
{- ... -}

It would be useful to add deriving Functor to all of them as well, making it easy to transform the annotations on a given AST.

Now an AST with the comments remaining would have the type Class Comment or something to that effect. You could also reuse this for additional information like scope analysis, where you would include the current scope with the relevant part of the AST.

If you wanted multiple annotations at once, the simplest solution would be to use a record, although that's a bit awkward because (at least for now¹) we can't easily write code polymorphic over record fields. (Ie we can't easily write the type "any record with a comments :: Comment field".)

One additional neat thing you can do is use PatternSynonyms (available from GHC 7.8) to have a suite of patterns that work just like your current unannotated AST, letting you reuse your existing case statements. (To do this, you'll also have to rename the constructors for the annotated types so they don't overlap.)

pattern TypeName a as <- TypeName' _ a as

Footnotes

¹ Hopefully part 2 the revived overloaded record fields proposal will help in this regard when it actually gets added to the language.