4
votes

I'm writing a tool to parse Ada source file with the grammar provided in 2005 Annex P.

  1. With following piece of code, I know ["03C0"] stands for "greek letter Pi", but is it a legal variable name?

    01 package Ada.Numerics is
    02    Pi : constant := 3.14159_26535_89793_23846_26433_83279_50288_41971_69399_37511;
    03    ["03C0"] : constant := Pi;
    04    e : constant := 2.71828_18284_59045_23536_02874_71352_66249_77572_47093_69996;
    05 end Ada.Numerics;
    
  2. When using the grammar to parse line 03, I currently come to "basic_declaration". Which is the next rule? And next next rule? Next next next rule? Until ["03C0"] can be successfully parsed. Eventually, the question should be: Which rule parsed ["03C0"]?

The Ada Reference Manual is at: http://www.adaic.org/resources/add_content/standards/05rm/RM-Final.pdf

Ada Reference Manual Page 702 in PDF, Page 676 at the right down corner of the page . Annex P / 3.1

    3.1
    basic_declaration ::=
        type_declaration | subtype_declaration
        | object_declaration | number_declaration
        | subprogram_declaration | abstract_subprogram_declaration
        | null_procedure_declaration | package_declaration
        | renaming_declaration | exception_declaration
        | generic_declaration | generic_instantiation

I've done further investigation based on oenone's answer.

  1. If I use ["03C0"] in the code, the character set does not need to be "UTF-8", which makes sense. When compile, I need "gnatmake -gnatWb Hello.adb".
  2. If I use p in the code, I must change the character set to "UTF-8", otherwise the GPS will not recognize this character and prompt a message. After I change it to UTF-8, I need to use "gnatmake -gnatW8 Hello.adb" to compile.
  3. I tried to change ["03C0"] to ["abcd"] and compile again, it will fail, saying "invalid wide character in identifier".
    I GUESS: If ["03C0"] is parsed by grammar only, ["abcd"] will also pass the grammar check. So from the fail result and message, I can say, GNAT works in this way: there is a pre-process before send source file to the grammar parser. The pre-process will evaluate the unicode value, check whether or not it is in the valid wide character set. If it is inside the valid wide character set, it will continue send to the grammar parser. Otherwise, fail.
1

1 Answers

4
votes

1: see A.5 The Numerics Package - the RM uses the correct unicode character. Your quote seems to be from the GNAT package. For this, see the GNAT Users Guide about how to tell GNAT which encoding it should use.

2: No rule from the ARM. It is an encoding question, which is done by the implementation (GNAT). ["03C0"] (with -gnatWb, which is default) is handled like π (with -gnatW8) or even Pi as valid identifier for a variable name (or in this case constant).