0
votes

I'm trying to load a CSV file but keep getting the following error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte

Here's my code:

import numpy as np
dataset = pd.read_csv('refined5.csv', error_bad_lines=False, skiprows=[0])

The file can be found here: jmp.sh/xKopnNi

I realize that this is a unicode conversion error I want python to load or skip this line so that the rest of the file will load.

3
I faced a similar issue while reading from a pickle file. I realized that I was using python2.7 . I changed it to use python3.6 and it worked. - user238607
Thanks bu I'm using 3.7 - Niv
You need to know the encoding of the file. Be skeptical of all of these answers. Where did you get the file? - shadowtalker

3 Answers

0
votes

Check this:

I've loaded the error record which you mentioned in a csv file(f1.txt):

mayankp@mayank:~/$ cat f1.txt 
¡??ˆæ? ??ˆæª Ÿ??ˆ??,1

In [89]: df = pd.read_csv('f1.txt', header=None)

In [90]: df
Out[90]: 
                     0  1
0  ¡??ˆæ? ??ˆæª Ÿ??ˆ??  1

I am able to easily read it through pandas.

0
votes

Try to open file in notepad and save it using UTF-8 coding. It worked for me when I had similar error.

0
votes

Use encoding = 'latin1' when reading the file.

Downloads$ python3
Python 3.7.0 (default, Jul 23 2018, 20:22:55)
[Clang 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas as pd
>>> dataset = pd.read_csv('refined5.csv', encoding = 'latin1')
>>> dataset
           human fall flat  1277.33
           0  ¡??æ? ??æª ????        1