UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte while accessing csv file

Question

I am trying to access csv file from aws s3 bucket and getting error 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte code is below I am using python 3.7 version

        from io import BytesIO
        import boto3
        import pandas as pd
        import gzip
        s3 = boto3.client('s3', aws_access_key_id='######',
        aws_secret_access_key='#######')

        response = s3.get_object(Bucket='#####', Key='raw.csv')
        # print(response)
        s3_data = StringIO(response.get('Body').read().decode('utf-8')

        data = pd.read_csv(s3_data)
        print(data.head())

kindly help me out here how i can resolve this issue

Are you working on windows or on linux? maybe it's an encoding problem of your .py file — papanito
@papanito I am working on linux. let me know if its linux based problem then how i can resolve it — suman
sorry maybe I missunderstood something, you get the error in line s3_data = StringIO(response.get('Body').read().decode('utf-8')? — papanito

suman suman · Accepted Answer · 2020-01-24T09:57:42

using gzip worked for me

client = boto3.client('s3', aws_access_key_id=aws_access_key_id,
                                      aws_secret_access_key=aws_secret_access_key)

csv_obj = client.get_object(Bucket=####, Key=###)

body = csv_obj['Body']
with gzip.open(body, 'rt') as gf:
   csv_file = pd.read_csv(gf)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte while accessing csv file

2 Answers