1
votes

I tried a lot however I am unable to copy data available as json file in S3 bucket(I have read only access to the bucket) to Redshift table using python boto3. Below is the python code which I am using to copy the data. Using the same code I was able to create the tables in which I am trying to copy.

import configparser
import psycopg2
from sql_queries import create_table_queries, drop_table_queries


def drop_tables(cur, conn):
    for query in drop_table_queries:
        cur.execute(query)
        conn.commit()


def create_tables(cur, conn):
    for query in create_table_queries:
        cur.execute(query)
        conn.commit()


def main():
    try:
        config = configparser.ConfigParser()
        config.read('dwh.cfg')

        # conn = psycopg2.connect("host={} dbname={} user={} password={} port={}".format(*config['CLUSTER'].values()))
        conn = psycopg2.connect(
            host=config.get('CLUSTER', 'HOST'),
            database=config.get('CLUSTER', 'DB_NAME'),
            user=config.get('CLUSTER', 'DB_USER'),
            password=config.get('CLUSTER', 'DB_PASSWORD'),
            port=config.get('CLUSTER', 'DB_PORT')

        )

        cur = conn.cursor()

        #drop_tables(cur, conn)
        #create_tables(cur, conn)
        qry = """copy DWH_STAGE_SONGS_TBL
             from 's3://udacity-dend/song-data/A/A/A/TRAAACN128F9355673.json'
             iam_role 'arn:aws:iam::xxxxxxx:role/MyRedShiftRole'
             format as json 'auto';"""
        print(qry)
        cur.execute(qry)
        # execute a statement
        # print('PostgreSQL database version:')
        # cur.execute('SELECT version()')
        #
        # # display the PostgreSQL database server version
        # db_version = cur.fetchone()
        # print(db_version)
        print("Executed successfully")

        cur.close()
        conn.close()

        # close the communication with the PostgreSQL

    except Exception as error:
        print("Error while processing")
        print(error)


if __name__ == "__main__":
    main()

I don't see any error in the Pycharm console but I see Aborted status in the redshift query console. I don't see any reason why it has been aborted(or I don't know where to look for that)

enter image description here

Other thing that I have noticed is when I run the copy statement in Redshift query editor , it runs fine and data gets moved into the table. I tried to delete and recreate the cluster but no luck. I am not able to figure what I am doing wrong. Thank you

1

1 Answers

1
votes

Quick read - it looks like you haven't committed the transaction and the COPY is rolled back when the connection closes. You need to either change the connection configuration to be in "autocommit" or add an explicit "commit()".