How to connect Amazon Redshift to python

16

votes

This is my python code and I want to connect my Amazon Redshift database to Python, but it is showing error in host.

Can anyone tell me the correct syntax? Am I passing all the parameters correctly?

con=psycopg2.connect("dbname = pg_table_def, host=redshifttest-icp.cooqucvshoum.us-west-2.redshift.amazonaws.com, port= 5439, user=me, password= secret")

This is the error:

OperationalError: could not translate host name "redshift://redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com," to address: Unknown host

pythonamazon-web-servicessqlalchemyamazon-redshiftpsycopg2

A Better Solution is already present in - stackoverflow.com/questions/44243169/… - Golokesh Patra

Please include the module name you are using to establish a connection - vt_todd

42

votes

It appears that you wish to run Amazon Redshift queries from Python code.

The parameters you would want to use are:

dbname: This is the name of the database you entered in the Database name field when the cluster was created.
user: This is you entered in the Master user name field when the cluster was created.
password: This is you entered in the Master user password field when the cluster was created.
host: This is the Endpoint provided in the Redshift management console (without the port at the end): redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com
port: 5439

For example:

con=psycopg2.connect("dbname=sales host=redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com port=5439 user=master password=secret")

1

votes

Old question but I just arrived here from Google.

The accepted answer doesn't work with SQLAlchemy, although it's powered by psycopg2:

sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'dbname=... host=... port=... user=... password=...'

What worked:

create_engine(f"postgresql://{REDSHIFT_USER}:{REDSHIFT_PASSWORD}@{REDSHIFT_HOST}:{REDSHIFT_PORT}/{REDSHIFT_DATABASE}")

Which works with psycopg2 directly too:

psycopg2.connect(f"postgresql://{REDSHIFT_USER}:{REDSHIFT_PASSWORD}@{REDSHIFT_HOST}:{REDSHIFT_PORT}/{REDSHIFT_DATABASE}")

Using the postgresql dialect works because Amazon Redshift is based on PostgreSQL.

Hope it can help other people!

-2

votes

Well, for Redshift the idea is made COPY from S3, is faster than every different way, but here is some example to do it:

first you must install some dependencies

for linux users sudo apt-get install libpq-dev

for mac users brew install libpq

install with pip this dependencies pip3 install psycopg2-binary pip3 install sqlalchemy pip3 install sqlalchemy-redshift

import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker


#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<<
DATABASE = "dwtest"
USER = "youruser"
PASSWORD = "yourpassword"
HOST = "dwtest.awsexample.com"
PORT = "5439"
SCHEMA = "public"

S3_FULL_PATH = 's3://yourbucket/category_pipe.txt'
ARN_CREDENTIALS = 'arn:aws:iam::YOURARN:YOURROLE'
REGION = 'us-east-1'

############ CONNECTING AND CREATING SESSIONS ############
connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###########################################################



############ RUNNING COPY ############
copy_command = '''
copy category from '%s'
credentials 'aws_iam_role=%s'
delimiter '|' region '%s';
''' % (S3_FULL_PATH, ARN_CREDENTIALS, REGION)
s.execute(copy_command)
s.commit()
######################################



############ GETTING DATA ############
query = "SELECT * FROM category;"
rr = s.execute(query)
all_results =  rr.fetchall()

def pretty(all_results):
    for row in all_results :
        print("row start >>>>>>>>>>>>>>>>>>>>")
        for r in row :
            print(" ---- %s" % r)
        print("row end >>>>>>>>>>>>>>>>>>>>>>")

pretty(all_results)
s.close()
######################################

-3

votes

The easiest way to query AWS Redshift from python is through this Jupyter extension - Jupyter Redshift

Not only can you query and save your results but also write them back to the database from within the notebook environment.

How to connect Amazon Redshift to python

4 Answers