4
votes

I am new to compiler business and using ANTLR grammar (open source) to parse C source files that have many external header files i.e. include files and library files etc.

What is the way the define grammar for these header files? Is there some way to parse these include files as simple source files?

Is it possible to integrate all these source +include files into a package and parse it using ANTLR or other C parser (JavaCC).

Waiting for your kind suggestions.

1
The standard approach would be to preprocess the #include statements and then use your c grammar against the resulting file. - Jimmy
Do you want to build your own parser, or are you interested in getting at the information in the C headers? - Ira Baxter
How to preprocess the #include files. My domain is Java and windows and I don't want to use gcc preprocessor. - user628127
@Ira: I want to build C parser with the ANTLR/JavaCC C grammar but found it not enough for parsing include files. - user628127
What "it" are you discussing? And why isn't "it" good enough? Presumably the header files contain valid C syntax. Why can't Antlr parse the header file, if given it as input? [You may have a problem that Antlr already has a stream open to the main file, and you have to open another (sub)parser on the include file; that's an organizational issue in your parser, and not a grammar problem; you need a stack of input streams and the machinery to switch when you encounter a new #include or an EOF). - Ira Baxter

1 Answers

1
votes

It won't be easy to implement a full chain of a preprocessor and a parser for C. But you can reuse the existing preprocessor (e.g., gcc -E) and an existing parser (clang -Xclang -ast-print-xml or gcc-xml are both good choices) and then parse a simple XML output instead.