0
votes

I have two text files for a CFG grammar: one is the "rules" (e.g. S->NP VP) and another one contains just the "lexical symbols" (e.g. "these": Det). Does any one know how I can give this two files as my grammar to NLTK? The second file is also known as "lexicon", because it just contains the category of real words. In summary, I just need to provide a lexicon for a specific grammar. Otherwise, I have to write the lexicon as several new rules in my rules' file. Due to the large volume of lexicon, It is not possible to convert the second file to rules and merge it with the first file. So I am completely stuck here... Any help/idea would be appreciated.

1

1 Answers

1
votes

Take a look at the tutorial, it's a little outdated but the idea is there: http://www.nltk.org/book/ch08.html

Then take a look at this question and answer: CFG using POS tags in NLTK

Lastly, here's an example:

from nltk import parse_cfg, ChartParser

grammar_string = """
S -> NP VP
NP -> DT NN | NNP
VP -> VB NP | VBS
VBS -> 'sleeps'
VB -> 'loves' | 'sleeps_with'
NNP -> 'John' | 'Mary'
"""

grammar = parse_cfg(grammar_string)
sentence = 'John loves Mary'.split()
parser = ChartParser(grammar)
print parser.parse(sentence)

[out]:

(S (NP (NNP John)) (VP (VB loves) (NP (NNP Mary))))