import os
import shutil
import codecs
directory = '~/Desktop/ra/clean_tokenized/1987'
for filename in os.listdir(directory):
full_name = directory + '/' + filename
with open(full_name, 'r') as article:
for line in article:
print(line)
Here's the traceback:
Traceback (most recent call last): File "~/Desktop/corpus_filter/01_corpus.py", line 11, in for line in article: File "~/.conda/envs/MangerRA/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte
The file contains Japanese characters and I'm just trying make a CSV file with all the words that have come up in the files. But I can't get over this error.