3
votes

Extension to the use case here - NLTK words lemmatizing

I have nltk installed on my computer(with all modules & corpus from the book). My use case is to explore and contrast some lemmatization and stemming approaches for my dataset (I tried Porter lemmatization, which worked)

I was trying to use the lemmatization with Wordnet as described by @Chthonic Project here NLTK words lemmatizing . However the source code it points to(see here http://nltk.org/_modules/nltk/app/wordnet_app.html) , needs compat module from nltk.

from nltk import compat
ImportError: cannot import name compat

I googled around for the import error of compat(and it looked like compatibilty?) and here's what I tried on my ubuntu box:-

sudo find . -name compat* which returns the files below . I also tried sudo find -name "trac" -type d which returns nothing .

I see that I should have found some modules with "trac/tests/functional/fixes" in a likewise folder /usr/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/tests/functional/

Source : http://biodegradablegeek.com/2008/08/workaround-for-importerror-cannot-import-name-compat-issue-in-trac-011x/#sthash.NhAThk6e.dpuf

Questions :

1. What am I missing ? And is this an issue with trac/tests?

2. Is there a way to be able to use wordnet for lemmatization (from nltk.corpus import wordnet as wn works just fine. Post the import error is solved, how do I use this module http://nltk.org/_modules/nltk/app/wordnet_app.html (I was trying to build the source locally from this page, i.e. is the file browserver.py, when I hit the import error with compat)

Tip : If you are providing a solution, please also mention how to solve this on my windows environment (I use both windows & ubuntu interchangeably,depending on context)

Files I see from find . -name compat*

ekta@ekta-VirtualBox:/usr/lib/python2.7$ sudo find . -name compat*
./dist-packages/numpy/numarray/compat.pyc
./dist-packages/numpy/numarray/compat.py
./dist-packages/numpy/distutils/compat.pyc
./dist-packages/numpy/distutils/compat.py
./dist-packages/numpy/compat
./dist-packages/numpy/oldnumeric/compat.pyc
./dist-packages/numpy/oldnumeric/compat.py
./dist-packages/twisted/python/compat.pyc
./dist-packages/twisted/python/compat.py
./dist-packages/gtk-2.0/gtk/compat.pyc
./dist-packages/gtk-2.0/gtk/compat.py

I am on python 2.7

1

1 Answers

3
votes

Lemmatizing using WordNet (Morphy, actually) in NLTK is simple:

from nltk.corpus import wordnet as wn

wn.morphy('runs') # "run"
wn.morphy('leaves') # "leaf"

wordnet_app is a WordNet browser, not the NLTK WordNet API: you don't need it! Chthonic Project was talking about derivationally related forms, not lemmatizing, which are two different things.

By the way, the issue you had with wordnet_app and compat is that you copied a recent version of the file which was incompatible with your nltk distribution (compat is a recent NLTK module inspired from six that helps the transition to Python 3.). If you need wordnet_app, don't copy the source, simply use the version in your NLTK distribution!)