1
votes

I am trying to import stopwords from nltk.corpus. I cannot use nltk.download('stopwords') as I am having proxy issues. I was trying to manually import the stopwords. So this is what I did. I downloaded the nltk_data from github.com and also configured a appropriate path by using nltk.data.path. But when I try top run this code:

import nltk
from nltk.corpus import stopwords
print(stopwords.words('english'))

I get an error like this.

 >Resource 'corpora/stopwords' not found.  Please use the NLTK
 >Downloader to obtain the resource:  >>> nltk.download()
 >Searched in:
 -'C:\\Program Files\\Anaconda3\\Lib\nltk_data'

All my nltk data is present at the the above path and also the corpora folder has the stopwords. As I said above I cannot use nltk.download(). Is there anything I am missing out here ?

Update 1

I reset all the spyder settings and once again ran this code.:

    import nltk
    from nltk.corpus import stopwords
    print(stopwords.words('english'))

I get an error like :

LookupError: 
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - 'C:\\Users\\586594/nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - 'C:\\Program Files\\Anaconda3\\nltk_data'
    - 'C:\\Program Files\\Anaconda3\\lib\\nltk_data'
    - 'C:\\Users\\586594\\AppData\\Roaming\\nltk_data'
**********************************************************************

All my nltk data is present at he "C:\Program Files\Anaconda3\nltk_data" and the corpora directory has the stopwords.

2

2 Answers

5
votes

I got it fixed by importing nltk and downloading "stopwords" from it.

import nltk nltk.download('stopwords')

2
votes

You set the nltk_data path with a Python command, didn't you? Look carefully at the path in the error message:

-'C:\\Program Files\\Anaconda3\\Lib\nltk_data'

The backslashes between path components are doubled, except for the last one; you have a literal newline (\n) character in your path. To avoid surprises like this, always use raw strings when you write Windows paths. E.g.

nltk.data.path.append(r"C:\Program Files\Anaconda3\Lib\nltk_data")