5
votes

Is there a publicly available grammar or parser for ARM's Unified Assembler Language as described in ARM Architecture Reference Manual A4.2

This document uses the ARM Unified Assembler Language (UAL). This assembly language syntax provides a canonical form for all ARM and Thumb instructions.

UAL describes the syntax for the mnemonic and the operands of each instruction.

Simply I'm interested in the code for parsing mnemonic and the operands of each instruction. For example how you could define a grammar for these lines?

ADC{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm>, <type> <Rs>
IT{<x>{<y>{<z>}}}{<q>} <firstcond>
LDC{L}<c> <coproc>, <CRd>, [<Rn>, #+/-<imm>]{!}
1
@dwelch I tried to improve the question.auselen
Sorry I misunderstood the question. Perhaps gnu assembler or gnu c has something you can use.old_timer
I think Chapter A4.2 of ARM DDI 0406B or the ARMv7A ARM, entitled Unified Assembler Language is the specification. It has sub-sections of conditionals and labels. The mnemonic (ASCII letters) are already equivalent between thumb-2 and ARM; it is up to the assembler to pick a physical encoding. I am not sure I understand the question either?artless noise
@dwelch no need to apologize, you helped me clear my mind.auselen

1 Answers

4
votes

If you need to create a simple parser based on an example-based grammar, nothing beats ANTLR:

http://www.antlr.org/

ANTLR translates a grammar specification into lexer and parser code. It's much more intuitive to use than Lexx and Yacc. The grammar below covers part of what you specified above, and it's fairly easy to extend to do what you want:

grammar armasm;

/* Rules */
program: (statement | NEWLINE) +;

statement: (ADC (reg ',')? reg ',' reg ',' reg
    | IT firstcond
    | LDC coproc ',' cpreg (',' reg ','  imm )? ('!')? ) NEWLINE;

reg: 'r' INT;
coproc: 'p' INT;
cpreg: 'cr' INT;
imm: '#' ('+' | '-')? INT;
firstcond: '?';

/* Tokens */
ADC: 'ADC' ('S')? ; 
IT:   'IT';
LDC:  'LDC' ('L')?;

INT: [0-9]+;
NEWLINE: '\r'? '\n';
WS: [ \t]+ -> skip;

From the ANTLR site (OSX instructions):

$ cd /usr/local/lib
$ wget http://antlr4.org/download/antlr-4.0-complete.jar
$ export CLASSPATH=".:/usr/local/lib/antlr-4.0-complete.jar:$CLASSPATH"
$ alias antlr4='java -jar /usr/local/lib/antlr-4.0-complete.jar'
$ alias grun='java org.antlr.v4.runtime.misc.TestRig'

Then on the grammar file run:

antlr4 armasm.g4
javac *.java
grun armasm program -tree

    ADCS r1, r2, r3
    IT ?
    LDC p3, cr2, r1, #3 
    <EOF>

This yields the parse tree broken down into tokens, rules, and data:

(program (statement ADCS (reg r 1) , (reg r 2) , (reg r 3) \n) (statement IT (firstcond ?) \n) (statement LDC (coproc p 3) (cpreg cr 2) (reg r 1) , (imm # - 3) ! \n))

The grammar doesn't yet include the instruction condition codes, nor the details for the IT instruction at all (I'm pressed for time). ANTLR generates a lexer and parser, and then the grun macro wraps them in a test rig so I can run text snippets through the generated code. The generated API is straightfoward to use in your own applications.

For completeness, I looked online for an existing grammar and didn't find one. Your best bet there might be to take apart gasm and extract its parser spec, but it won't be UAL syntax and it will be GPL if that matters to you. If you only need to handle a subset of the instructions then this is a good way to go.