redshift data.frame not getting written

Question

I am able to connect to AWS redshift cluster using RPostgreSQL package; read tables, even create tables using SQL - dbGetQuery. However, if I try to write a data.frame as below (wher con is a dbConnect object into redshift, and newdf if a simple data.frame

dbWriteTable(con,"newtb",newdf)

I get following error:

Error in postgresqlgetResult(new.con) : RS-DBI driver: (could not Retrieve the result : ERROR: LOAD source is not supported. (Hint: only S3 or DynamoDB based load is allowed)

I have tried multiple simple data.frames, while reading data.frame works writing does not. Hint tells me to use s3 for loading, which means I can process data in R but cannot write a data.frame back to redshift, unless i use explicit inserts. Any ideas what I am doing wrong? or if this is an inherent way redshift prohibits or forces you to use s3

This is due to the way RPostgreSQL::dbWriteTable works - instead, you'll need to create the SQL (CREATE TABLE, INSERT INTO, ...) yourself and send it to the database. — hadley
Thanks Hadley. Creating a SQL and inserting is working. Assuming dbWriteTable not working in this situation due to redshift's non-standard (different) implementation of postgresql — s dscientist
Yes, postgresql supports COPY INTO STDIN where redshift does not (it only supports loading from S3 or dynamodb) — hadley
@hadley could you elaborate a little on your comment? Are you saying you need to construct the insert statement as a SQL query? — Zach
@Zach yes, right. The way that RPostgreSQL uses to load data into postgres doesn't work with redshift — hadley

Alberto C Alberto C · Accepted Answer · 2014-09-11T12:34:22

I solved using RODBC package.

You need to set up an ODBC connection let's call it i.e. redshift_con. From R than you need to create a connection (i called it ch) using:

ch <- odbcConnect("Annalect DB", uid = "username", pwd = "**")

loaded my dataframe in R and runned:

sqlSave(ch, mydataframename, addPK = TRUE, verbose = TRUE)

And R creates a new table in redshift called "mydataframename". Is not really fast, I'll keep updated if I find and optimized method.

If you need more information see http://cran.r-project.org/web/packages/RODBC/index.html

################ SAMPLE CODE FOR TESTING ##############
install.packages("RODBC")
library('RODBC')
ch <- odbcConnect("redshift_con", uid = "admin", pwd = "********")
sqlColumns(ch, "public.r_test")
USArrest<-data(USArrests)
sqlSave(redshift_con, USArrests, rownames = "State", addPK = TRUE, verbose = TRUE)
I'll post a comment on the process speed.

Hope it can help.

UPDATE

Good only for VERY small amount of data

redshift data.frame not getting written

1 Answers