2
votes

I have been trying to get all the nouns, verbs..etc separately from the brown corpus, so I tried to use the code

brown.all_synsets('n')

but apparently this code works with wordnet only. I am using python 3.4 by the way.


EDITED

@alvas answer worked. But when I used it with random it gets an error. Have a look.

nn = {word for word, pos in brown.tagged_words() if pos.startswith('NN')}
print(nn)

the output is

{'such', 'rather', 'Quite', 'Such', 'quite'}

but when I use

random.choice(nn)

I get

Traceback (most recent call last):
  File "/home/aziz/Desktop/2222.py", line 5, in <module>
    print(random.choice(NN))
  File "/usr/lib/python3.4/random.py", line 256, in choice
    return seq[i]
TypeError: 'set' object does not support indexing
1
Welcome to StackOverflow. Please don't post an answer to response to other answers, edit your question instead, see stackoverflow.com/help/how-to-answer - alvas
It's how words are tagged in Brown, there's no choice but to accept the tags since they are mostly treated as Gold/Silver standards (i.e. ground truth). - alvas
Are you sure you get 'rather' in your output? I didn't =( - alvas
Yes I just made some alterations. the actual code was {word for word, pos in brown.tagged_words() if pos.startswith('NN')} but i changed it to [word for word, pos in brown.tagged_words() if pos.startswith('NN')] and it woked with me - Abdulaziz Al Jumaia

1 Answers

2
votes

TL;DR

>>> from nltk.corpus import brown
>>> {word for word, pos in brown.tagged_words() if pos.startswith('NN')}

In Longer

Iterate through the .tagged_words() function and that will return a list of ('word', 'POS') tuples:

>>> from nltk.corpus import brown
>>> brown.tagged_words()
[(u'The', u'AT'), (u'Fulton', u'NP-TL'), ...]

Please read this chapter to know how NLTK corpora API works: http://www.nltk.org/book/ch02.html

Then, do a list comprehension over it and save a set (i.e. unique list) of the words that are tagged with the noun tags, e.g. NN, NNS, NNP, etc..

>>> {word for word, pos in brown.tagged_words() if pos.startswith('NN')}

Note that the output might not be what you expect because words that are POS tagged with syntactic and syntactic noun is not necessary a semantic argument/entity.


Also, I don't think that the words you've extracted are correct. Double checking the list:

>>> nouns = {word for word, pos in brown.tagged_words() if pos.startswith('NN')} 
>>> 'rather' in nouns
False
>>> 'such' in nouns
False
>>> 'Quite' in nouns
False
>>> 'quite' in nouns
False
>>> 'Such' in nouns
False

The output to the list comprehension: http://pastebin.com/bJaPdpUk


Why random.choice(nn) fails when nn is a set?

The input to random.choice() is a sequence (see https://docs.python.org/2/library/random.html#random.choice).

random.choice(seq)

Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.

And python sequence types in python are

Since set isn't a sequence, you will get the IndexError.