12
votes

I'm starting to program with NLTK in Python for Natural Italian Language processing. I've seen some simple examples of the WordNet Library that has a nice set of SynSet that permits you to navigate from a word (for example: "dog") to his synonyms and his antonyms, his hyponyms and hypernyms and so on...

My question is: If I start with an italian word (for example:"cane" - that means "dog") is there a way to navigate between synonyms, antonyms, hyponyms... for the italian word as you do for the english one? Or... There is an Equivalent to WordNet for the Italian Language ?

Thanks in advance

2

2 Answers

18
votes

You are in luck. The nltk provides an interface to the Open Multilingual Wordnet, which does indeed include Italian among the languages it describes. Just add an argument specifying the desired language to the usual wordnet functions, e.g.:

>>> cane_lemmas = wn.lemmas("cane", lang="ita")
>>> print(cane_lemmas)
[Lemma('dog.n.01.cane'), Lemma('cramp.n.02.cane'), Lemma('hammer.n.01.cane'),
 Lemma('bad_person.n.01.cane'), Lemma('incompetent.n.01.cane')]

The synsets have English names, because they are integrated with the English wordnet. But you can navigate the web of meanings and extract the Italian lemmas for any synset you want:

>>> hypernyms = cane_lemmas[0].synset().hypernyms()
>>> print(hypernyms)
[Synset('canine.n.02'), Synset('domestic_animal.n.01')]
>>> print(hypernyms[1].lemmas(lang="ita"))
[Lemma('domestic_animal.n.01.animale_addomesticato'), 
 Lemma('domestic_animal.n.01.animale_domestico')]

Or since you mentioned "cattiva_persona" in the comments:

>>> wn.lemmas("bad_person")[0].synset().lemmas(lang="ita")
[Lemma('bad_person.n.01.cane'), Lemma('bad_person.n.01.cattivo')]

I went from the English lemma to the language-independent synset to the Italian lemmas.

6
votes

Since I found myself wondering how to actually use the wordnet resources after reading this question and its answer, I'm going to leave here some useful information:

Here is a link to the nltk guide.

The two necessary commands to download wordnet data and thus proceed with the usage explained in the other answer are:

import nltk

nltk.download('wordnet')
nltk.download('omw')