I am using antlr4 parsing a text file and I am new to it. Here is the part of the file:
abcdef
//emptyline
abcdef
In file stream string it will be looked like this:
abcdef\r\n\r\nabcdef\r\n
In terms of ANTLR4, it offers the "skip" method to skip something like white-space, TAB, and new line symbol by regular expression while parsing. i.e.
WS : [\t\s\r\n]+ -> skip ; // skip spaces, tabs, newlines
My problem is that I want to skip the empty line only. I don't want to skip every single "\r\n". Therefore it means when there are two or more "\r\n" appear together, I only want to skip the second one or following ones. How should I write the regular expression? Thank you.
grammar INIGrammar_1;
init: (section|NEWLINE)+ ;
section: '[' phase_name ':' v ']' (contents)+
| '[' phase_name ']' (contents)+ ;
//
//
phase_name : STRING
|MTT
|MPI_GET
|MPI_INSTALL
|MPI_DETAILS
|TEST_GET
|TEST_BUILD
|TEST_RUN
|REPORTER
;
v : STRING ;
contents: kvpairs
| include_section_pairs
| if_statement
| NEWLINE
| EOT
;
keylhs : STRING
;
valuerhs : STRING
|multiline_valuerhs
|kvpairs
|url
;
kvpairs: keylhs '=' valuerhs NEWLINE
;
include_section_pairs: INCLUDE_SECTION '=' STRING
;
if_statement: IF if_statement_condition THEN NEWLINE (ELSEIF if_statement_condition THEN NEWLINE)*? STRING NEWLINE IFEND NEWLINE
;
if_statement_condition:STRING '=' STRING ';'//here, semicolon has problem, either I use ';' or SEMICOLON
;
multiline_valuerhs:STRING (',' (' ')*? ( '\\' (' ')*? NEWLINE)? STRING)+
;
url:(' ')*?'http'':''//''www.';//ignore this, not finished.
IF: 'if';
ELSEIF:'elif';
IFEND:'fi';
THEN: 'then';
SEMICOLON: ';';
STRING : [a-z|A-Z|0-9|''| |.|\-|_|(|)|#|&|""|/|@|<|>|$]+ ;
//Keywords
MTT: 'MTT';
MPI_GET: 'MPI get';
MPI_INSTALL:'MPI install';
MPI_DETAILS:'MPI Details';
TEST_GET:'Test get';
TEST_BUILD: 'Test build';
TEST_RUN: 'Test run';
REPORTER: 'Reporter';
INCLUDE_SECTION: 'include_section';
//INCLUDE_SECTION_VALUE:STRING;
EOT:'EOT';
NEWLINE: ('\r' ? '\n')+ ;
WS : [\t]+ -> skip ; // skip spaces, tabs, newlines
COMMENT: '#' .*? '\r'?'\n' -> skip;
EMPTYLINE: '\r\n' -> skip;
Part of the INI file
#======================================================================
# MPI run details
#======================================================================
[MPI Details: Open MPI]
# MPI tests
#exec = mpirun @hosts@ -np &test_np() @mca@ --prefix &test_prefix() &test_executable() &test_argv()
exec = mpirun @hosts@ -np &test_np() --prefix &test_prefix() &test_executable() &test_argv()
hosts = &if(&have_hostfile(), "--hostfile " . &hostfile(), \
&if(&have_hostlist(), "--host " . &hostlist(), ""))
One more small thing is, it seems like ";" cannot be indicated as itself in result. The ANTLR4 just keep saying it expects something else and treat the semicolon as unknown symbol.
NL : [\r\n]+ ;
- that's easier. – Lucas Trzesniewski