0
votes

I'm trying to create a Default Tagger on python using NLTK, but I keep on receiving an Error. The corpus composed of words in Estonian and the point is to tag the part of speech of each individual word.

My code:

from nltk.corpus.reader import TaggedCorpusReader
mypath = "/Users/mmo/Downloads/"

EC = TaggedCorpusReader(mypath,"estonianSmall_copy.txt",
 encoding="latin-1")
sents = EC.tagged_sents()


from nltk import DefaultTagger
from nltk.probability import FreqDist

tags =[ [(word,tag)for word,tag in sent]\
    for sent in EC.tagged_sents()]
tagF = FreqDist(tags)

the error:

tagF = FreqDist(tags)
Traceback (most recent call last):

   File "<ipython-input-26-c1ca76857fce>", line 1, in <module>
    tagF = FreqDist(tags)

  File "/Users/mmo/anaconda/lib/python2.7/site-packages/nltk/probability.py", line 106, in __init__
    Counter.__init__(self, samples)

  File "/Users/mmo/anaconda/lib/python2.7/collections.py", line 477, in __init__
    self.update(*args, **kwds)

  File "/Users/mmo/anaconda/lib/python2.7/collections.py", line 567, 
in update
    self[elem] = self_get(elem, 0) + 1

TypeError: unhashable type: 'list'
1

1 Answers

0
votes

Your problem is with the FreqDist -- you haven't yet gotten around to creating the default tagger. Since you're just trying to count tags, feed the tags to the FreqDist like this:

tagF = FreqDist(tag for word, tag in EC.tagged_words())

(Note that tagged_words() returns a flat sequence, not a list of lists.) You can then continue with the nltk tutorial to build your default tagger.