0
votes

I'm trying to load english.pickle for sentence tokenization. Windows 7, Python 3.4

File followed by the path exists(tokenizers/punkt/PY3/english.pickle).

Here is the code:

import nltk.data
tokenizer = nltk.data.load('tokenizers/punkt/PY3/english.pickle')

Here is the error:

OSError: No such file or directory: 'C:\\Python\\nltk_data\\tokenizers\\punkt\\PY3\\PY3\\english.pickle'

How to fix?

1

1 Answers

4
votes

The problem is that \\PY3 is doubled in your path. The nltk.data.load() method adds /PY3 to the path if it is called from python 3.

So it should work if you simply load the tokenizer with (removing /PY3 from the string):

import nltk
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

NLTK does that to allow for the possibility of programs that could be run with python 2 and 3.