I'm writing a tool to parse Ada source file with the grammar provided in 2005 Annex P.
With following piece of code, I know ["03C0"] stands for "greek letter Pi", but is it a legal variable name?
01 package Ada.Numerics is 02 Pi : constant := 3.14159_26535_89793_23846_26433_83279_50288_41971_69399_37511; 03 ["03C0"] : constant := Pi; 04 e : constant := 2.71828_18284_59045_23536_02874_71352_66249_77572_47093_69996; 05 end Ada.Numerics;
When using the grammar to parse line 03, I currently come to "basic_declaration". Which is the next rule? And next next rule? Next next next rule? Until ["03C0"] can be successfully parsed. Eventually, the question should be: Which rule parsed ["03C0"]?
The Ada Reference Manual is at: http://www.adaic.org/resources/add_content/standards/05rm/RM-Final.pdf
Ada Reference Manual Page 702 in PDF, Page 676 at the right down corner of the page . Annex P / 3.1
3.1
basic_declaration ::=
type_declaration | subtype_declaration
| object_declaration | number_declaration
| subprogram_declaration | abstract_subprogram_declaration
| null_procedure_declaration | package_declaration
| renaming_declaration | exception_declaration
| generic_declaration | generic_instantiation
I've done further investigation based on oenone's answer.
- If I use ["03C0"] in the code, the character set does not need to be "UTF-8", which makes sense. When compile, I need "gnatmake -gnatWb Hello.adb".
- If I use p in the code, I must change the character set to "UTF-8", otherwise the GPS will not recognize this character and prompt a message. After I change it to UTF-8, I need to use "gnatmake -gnatW8 Hello.adb" to compile.
- I tried to change ["03C0"] to ["abcd"] and compile again, it will fail, saying "invalid wide character in identifier".
I GUESS: If ["03C0"] is parsed by grammar only, ["abcd"] will also pass the grammar check. So from the fail result and message, I can say, GNAT works in this way: there is a pre-process before send source file to the grammar parser. The pre-process will evaluate the unicode value, check whether or not it is in the valid wide character set. If it is inside the valid wide character set, it will continue send to the grammar parser. Otherwise, fail.