How to utilize Pandas read_csv?

Question

I'm trying to load a CSV file but keep getting the following error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte

Here's my code:

import numpy as np
dataset = pd.read_csv('refined5.csv', error_bad_lines=False, skiprows=[0])

The file can be found here: jmp.sh/xKopnNi

I realize that this is a unicode conversion error I want python to load or skip this line so that the rest of the file will load.

I faced a similar issue while reading from a pickle file. I realized that I was using python2.7 . I changed it to use python3.6 and it worked. — user238607
You need to know the encoding of the file. Be skeptical of all of these answers. Where did you get the file? — shadowtalker

Mayank Porwal Mayank Porwal · Accepted Answer · 2018-11-18T12:33:54

Check this:

I've loaded the error record which you mentioned in a csv file(f1.txt):

mayankp@mayank:~/$ cat f1.txt 
¡??ˆæ? ??ˆæª Ÿ??ˆ??,1

In [89]: df = pd.read_csv('f1.txt', header=None)

In [90]: df
Out[90]: 
                     0  1
0  ¡??ˆæ? ??ˆæª Ÿ??ˆ??  1

I am able to easily read it through pandas.

How to utilize Pandas read_csv?

3 Answers