I'm working on a PEG grammar that takes code in a Music Programming Language and creates a parse tree of musical events (notes, chords, volume/tempo changes, etc.). A feature of my MPL is that it supports voices, i.e. different sequences of events happening at the same time. I'm having a hard time getting my Instaparse grammar to parse this correctly... what I want is a voices
tag that consists of one or more voice
s, each of which consists of a voice definition (e.g. V1:
) and then any number of events. The voices
tag should end either with V0:
(which means the end of the split voices, and we're back to just one voice, or "voice zero"), or the end of the file.
Here is an excerpt of my grammar in progress (I'm leaving out the definitions of note
, chord
, etc. for the sake of clarity):
part = <ows> event+
<event> = chord | note | rest | octave-change |
attribute-change | voices |
marker | at-marker
voices = voice+
voice = !voices voice-number voice-events?
(<voice-zero> | #"\z")
voice-number = <"V"> #"[1-9]\d*" <":"> <ows>
<voice-zero> = <"V0:"> <ows>
voice-events = !voices event+
...
ows = #"\s*"
Given the following code:
V1: o2 b1/>b o2 g+/>g+ o2 g/>g
V0: e8 f+ g+ a b2
Running the parser gives the following output:
[:part
[:voices
[:voice [:voice-number "1"]
[:voice-events
[:octave-change "2"] [:chord [:note [:pitch "b"]
[:duration "1"]] [:octave-change ">"] [:note [:pitch "b"]]]
[:octave-change "2"] [:chord [:note [:pitch "g+"]]
[:octave-change ">"] [:note [:pitch "g+"]]]
[:octave-change "2"] [:chord [:note [:pitch "g"]]
[:octave-change ">"] [:note [:pitch "g"]]]]]]
[:note [:pitch "e"] [:duration "8"]]
[:note [:pitch "f+"]]
[:note [:pitch "g+"]]
[:note [:pitch "a"]]
[:note [:pitch "b"] [:duration "2"]]]
Which is exactly what I want. The V0:
signals the end of the voices
tag, and the last 5 notes are on their own inside the part
tag.
However, when I change the V0
to V2
, I get this:
[:part
[:voices
[:voice [:voice-number "1"]
[:voice-events
[:octave-change "2"] [:chord [:note [:pitch "b"] [:duration "1"]]
[:octave-change ">"] [:note [:pitch "b"]]] [:octave-change "2"]
[:chord [:note [:pitch "g+"]] [:octave-change ">"]
[:note [:pitch "g+"]]] [:octave-change "2"]
[:chord [:note [:pitch "g"]] [:octave-change ">"]
[:note [:pitch "g"]]]
[:voices
[:voice [:voice-number "2"]
[:voice-events
[:note [:pitch "e"] [:duration "8"]] [:note [:pitch "f+"]]
[:note [:pitch "g+"]] [:note [:pitch "a"]]
[:note [:pitch "b"] [:duration "2"]]]]]]]]]
For some reason, either the voice
1 tag or its voice-events
tag is not terminating like it's supposed to, and the second voice
is swallowed up as part of the first voice
's voice-events
. I also don't want there to be a second voices
tag; voice
2 should be within the primary voices
tag.
What I want is this:
[:part
[:voices
[:voice [:voice-number "1"]
[:voice-events
[:octave-change "2"] [:chord [:note [:pitch "b"] [:duration "1"]]
[:octave-change ">"] [:note [:pitch "b"]]] [:octave-change "2"]
[:chord [:note [:pitch "g+"]] [:octave-change ">"]
[:note [:pitch "g+"]]] [:octave-change "2"]
[:chord [:note [:pitch "g"]] [:octave-change ">"]
[:note [:pitch "g"]]]]]
[:voice [:voice-number "2"]
[:voice-events
[:note [:pitch "e"] [:duration "8"]] [:note [:pitch "f+"]]
[:note [:pitch "g+"]] [:note [:pitch "a"]]
[:note [:pitch "b"] [:duration "2"]]]]]]
I can't figure out what I'm doing wrong, but I think it has something to do with how I'm defining the voice
tag and/or the voice-events
tag. It may have something to do with how I'm using negative lookahead, which I don't think I fully understand yet. Can anyone figure out how I can fix my grammar?
Thanks! :)
Solved!
Thanks, @DanielNeal! I've re-worked my grammar to this, which works exactly the way I want it to:
part = <ows> (voices | event)+
<event> = chord | note | rest | octave-change |
attribute-change | marker | at-marker
voices = voice+ (<voice-zero> | <#"\z">)
voice = voice-number event*
voice-number = <"V"> #"[1-9]\d*" <":"> <ows>
<voice-zero> = <"V0:"> <ows>
...
ows = #"\s*"
The big change was in how I defined part
and event
; before, I had these terms defined such that voices
is an event, so any subsequent voice
s were being consumed and lumped into the previous voice
's event
s. By pulling voices
out of the definition of an event
and redefining part
to be a variable number of voices
groupings or event
s, I eliminated the ambiguity and got the grammar to behave the way I want it to.
After that, the events
within a voice
were grouping properly, but I was still having a problem with each voice being within its own separate voices
tag, when I need them to all be within the same voices
grouping. I fixed this by specifying that a voices
tag ends either with a "V0:"
or the end of the file (\z
), in other words being more specific about how much code I want the voices
tag to consume.
The moral of the story is, if you're writing a PEG grammar and you're having problems, you probably need to make your definitions less ambiguous! I also ended up not using negative lookahead at all, which I think helped a lot to simplify / de-ambiguify my grammar.
"V1: o2 b1/>b o2 g+/>g+ o2 g/>g\nV2: e8 f+ g+ a b2"
– Dave YarwoodV1-9
- the grammar Production for Voice allows for empty voice-events which is exactly what you observe. This problem does not occur withV0
is the Last V-Clause of your input. – collapsar"V1: c d e\nV2:"
, which I want to be parsed as[:part [:voices [:voice [:voice-number "1"] [:voice-events [:note [:pitch "c"]] [:note [:pitch "d"]] [:note [:pitch "e"]]]] [:voice [:voice-number "2"]]]]
. At any rate, I tried makingvoice-events
mandatory by removing the?
, but it didn't make any difference. I think my problem has something to do with not terminating thevoice
and/orvoice-events
tags properly... it seems like they are acting greedily. By the way, my output just changed mysteriously... I'm going to update above. – Dave Yarwood