-4
votes

I am trying to check if a given sentence is grammatical using NLTK.

Ex:

OK : The whale licks the sadness

NOT OK : The best I ever had

I know that I could do POS tagging, then use a CFG parser and check that way, but I have yet to find a CFG that uses POS tags instead of actual words as terminal branches.

Is there a CFG that anyone can recommend? I think that making my own is silly, because I am not a linguist and will probably leave out important structures.

Also, my application is such that the system would ideally reject many sentences and only approve sentences it is extremely sure of.

Thanks :D

1
Did you see this related StackOverflow discussion? stackoverflow.com/questions/10252448/…stepthom

1 Answers

3
votes

The terminal nodes of the CFG can be anything, even POS tags. As long as your phrasal rules recognize POS instead of words as the input, there shouldn't be a problem to declare the grammar with POS.

import nltk
# Define the cfg grammar.
grammar = nltk.parse_cfg("""
S -> NP VP
NP -> 'DT' 'NN'
VP -> 'VB'
VP -> 'VB' 'NN'
""")


# Make your POS sentence into a list of tokens.
sentence = "DT NN VB NN".split(" ")

# Load the grammar into the ChartParser.
cp = nltk.ChartParser(grammar)

# Generate and print the nbest_parse from the grammar given the sentence tokens.
for tree in cp.nbest_parse(sentence):
    print tree