24
votes

I am new to NLTK Python and i am looking for some sample application which can do word sense disambiguation. I have got a lot of algorithms in search results but not a sample application. I just want to pass a sentence and want to know the sense of each word by referring to wordnet library. Thanks

I have found a similar module in PERL. http://marimba.d.umn.edu/allwords/allwords.html Is there such module present in NLTK Python?

6
here's a python implementation: github.com/alvations/pywsd - alvas

6 Answers

15
votes

Recently, part of the pywsd code has been ported into the bleeding edge version of NLTK' in the wsd.py module, try:

>>> from nltk.wsd import lesk
>>> sent = 'I went to the bank to deposit my money'
>>> ambiguous = 'bank'
>>> lesk(sent, ambiguous)
Synset('bank.v.04')
>>> lesk(sent, ambiguous).definition()
u'act as the banker in a game or in gambling'

For better WSD performance, use the pywsd library instead of the NLTK module. In general, simple_lesk() from pywsd does better than lesk from NLTK. I'll try to update the NLTK module as much as possible when I'm free.


In responds to Chris Spencer's comment, please note the limitations of Lesk algorithms. I'm simply giving an accurate implementation of the algorithms. It's not a silver bullet, http://en.wikipedia.org/wiki/Lesk_algorithm

Also please note that, although:

lesk("My cat likes to eat mice.", "cat", "n")

don't give you the right answer, you can use pywsd implementation of max_similarity():

>>> from pywsd.similarity import max_similiarity
>>> max_similarity('my cat likes to eat mice', 'cat', 'wup', pos='n').definition 
'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'
>>> max_similarity('my cat likes to eat mice', 'cat', 'lin', pos='n').definition 
'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'

@Chris, if you want a python setup.py , just do a polite request, i'll write it...

7
votes

Yes, in fact, there is a book that the NLTK team wrote which has multiple chapters on classification and they explicitly cover how to use WordNet. You can also buy a physical version of the book from Safari.

FYI: NLTK is written by natural language programming academics for use in their introductory programming courses.

3
votes

As a practical answer to the OP's request, here's a python implementation of several WSD methods that returns senses in form of NLTK's synset(s), https://github.com/alvations/pywsd

It includes

  • Lesk algorithms (includes original Lesk, adapted Lesk and simple Lesk)
  • Baseline algorithms (random sense, first sense, Most Frequent Sense)

It can be used as such:

#!/usr/bin/env python -*- coding: utf-8 -*-

bank_sents = ['I went to the bank to deposit my money',
'The river bank was full of dead fishes']

plant_sents = ['The workers at the industrial plant were overworked',
'The plant was no longer bearing flowers']

print "======== TESTING simple_lesk ===========\n"
from lesk import simple_lesk
print "#TESTING simple_lesk() ..."
print "Context:", bank_sents[0]
answer = simple_lesk(bank_sents[0],'bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING simple_lesk() with POS ..."
print "Context:", bank_sents[1]
answer = simple_lesk(bank_sents[1],'bank','n')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING simple_lesk() with POS and stems ..."
print "Context:", plant_sents[0]
answer = simple_lesk(plant_sents[0],'plant','n', True)
print "Sense:", answer
print "Definition:",answer.definition
print

print "======== TESTING baseline ===========\n"
from baseline import random_sense, first_sense
from baseline import max_lemma_count as most_frequent_sense

print "#TESTING random_sense() ..."
print "Context:", bank_sents[0]
answer = random_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING first_sense() ..."
print "Context:", bank_sents[0]
answer = first_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING most_frequent_sense() ..."
print "Context:", bank_sents[0]
answer = most_frequent_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

[out]:

======== TESTING simple_lesk ===========

#TESTING simple_lesk() ...
Context: I went to the bank to deposit my money
Sense: Synset('depository_financial_institution.n.01')
Definition: a financial institution that accepts deposits and channels the money into lending activities

#TESTING simple_lesk() with POS ...
Context: The river bank was full of dead fishes
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

#TESTING simple_lesk() with POS and stems ...
Context: The workers at the industrial plant were overworked
Sense: Synset('plant.n.01')
Definition: buildings for carrying on industrial labor

======== TESTING baseline ===========
#TESTING random_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('deposit.v.02')
Definition: put into a bank account

#TESTING first_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

#TESTING most_frequent_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)
0
votes

NLTK has apis to access Wordnet. Wordnet places words as synsets. This would give you some information on the word, its hypernyms, hyponyms, root word etc.

"Python Text Processing with NLTK 2.0 Cookbook" is a good book to get you started on various features of NLTK. It is easy to read, understand and implement.

Also, you can look at other papers(outside the realm of NLTK) which talks about using wikipedia for word sense disambiguation.

-1
votes

Yes it is possible with the wordnet module in NLTK. The similarity mesures which used in the tool which mentioned in your post exists in NLTK wordnet module too.