0
votes

I am trying to build a scanner for AWK source code using (F)Lex analysis. I have been able to identify AWK keyworkds, comments, string literals, and digits however I am stuck on how to generate regular expressions for matching variable instance names since these are quite dynamic.

Could someone please help me develop a regular expression for matching AWK variables. http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html provides definition for the AWK language.

Variables must start with a letter but can be alphanumerical without regard to case. The only special character that can be used is an underscore ("_"). I apologize I am not very experienced with REGEX let alone regular expressions for FLEX.

Thank you for your help.

1
Thank you this is definitely a start for me. How would I add a condition to the regular expression to match "if not keyword" where keywords are defined as: KEYWORD BEGIN|delete|END|function|in|printf|break|do|exit|getline|next|return|continue|e‌​lse|for|if|print|whileAlex Hendren
be sure to search for the compat awk script that is in the original 'The Awk Programming Language'. It lists all the keywords/functions in the (Old) awk specification. Intended to help those migrating to (new) awk to find name collisions in old code, but has a good general solution for looking at awk code and a list a many of the functions/key words. You can easily extend it with the new keywords. (Just as a cross check). Good luck.shellter

1 Answers

1
votes
[a-zA-Z_][a-zA-Z_0-9]*

Alphabetic or underscore to start, followed by zero or more alphanumerics or underscore.

Special cases will be fields, which are prefixed by $:

$0
$1

and also

$NF
$i

You'll have to decide how you're going to deal with those.