1
votes

I have an odbc connection to Athena and am able to read and retrieve data already. For example, I created a new empty table in hive which uses the same meta store as s3 for Athena:

CREATE EXTERNAL TABLE IF NOT EXISTS adhoc.mtcars
(
mpg integer,
cyl integer,
disp integer,
hp integer,
drat integer,
wt integer,
qsec integer,
vs integer,
am integer,
gear integer,
carb integer)
partitioned by (year string, month string, day string)
stored as orc
location 's3://ourco-emr/tables/adhoc.db/mtcars';

I can read this new empty table using DBI::dbReadTable:

con <- dbConnect(odbc(), "Athena")
dbReadTable(con, DBI::Id(schema = "adhoc", table = "mtcars"))

Returns:

 [1] mpg   cyl   disp  hp    drat  wt    qsec  vs    am    gear  carb  year  month day  
<0 rows> (or 0-length row.names)

So, the empty table is clearly there and visible.

Note the above in the hive create table:

location 's3://ourco-emr/tables/adhoc.db/mtcars'

The data for this table should be stored in s3 at that location.

I tried to write mtcars to this location using dbWriteTable:

dbWriteTable(conn = con,
             name = "tables/adhoc.db/mtcars",
             value = mtcars,
             overwrite = FALSE,
             append = TRUE,
             file.type = "orc",
             partition = c(year = "2020", month = "02", day = "01"),
             s3.location =  "s3://ourco-emr/tables/adhoc.db/mtcars/mtcars")

This seems to run for a few seconds before returning this error message:

Error: nanodbc/nanodbc.cpp:1617: 00000: [Simba][Athena] (1040) An error has been thrown from the AWS Athena client. Athena Error No: 130, HTTP Response Code: 400, Exception Name: InvalidRequestException, Error Message: line 1:14: no viable alternative at input 'CREATE TABLE "tables/adhoc.db/mtcars"' [Execution ID: ] 'CREATE TABLE "tables/adhoc.db/mtcars" ( "row_names" VARCHAR(255), "mpg" DOUBLE PRECISION, "cyl" DOUBLE PRECISION, "disp" DOUBLE PRECISION, "hp" DOUBLE PRECISION, "drat" DOUBLE PRECISION, "wt" DOUBLE PRECISION, "qsec" DOUBLE PRECISION, "vs" DOUBLE PRECISION, "am" DOUBLE PRECISION, "gear" DOUBLE PRECISION, "carb" DOUBLE PRECISION )

Looks like dbi is trying to create a new table where I just want to append to the existing, albeit empty on that I created earlier.

How can I send a data frame to s3 using DBI?

1

1 Answers

2
votes

I can't comment for odbc, but there are two packages RAthena and noctua that have DBI methods for uploading data to AWS Athena.

RAthena utilises the Python SDK boto3 to create a connection to AWS. noctua utilises the R SDK paws to create a connection to AWS.

library(DBI)

# connect to AWS Athena using RAthena
con = dbConnect(RAthena::athena())

# OR connect to AWS Athena using noctua
con = dbConnect(noctua::athena())

# Uploading to existing AWS Athena table
dbWriteTable(conn = con,
             name = "adhoc.mtcars",
             value = mtcars,
             append = TRUE,
             file.type = "parquet",
             partition = c(year = "2020", month = "02", day = "01"),
             s3.location =  "s3://ourco-emr/tables/")

dbGetQuery(con, "select * from adhoc.iris")

Currently these packages only support file.types ["tsv", "csv", "parquet"] when uploading to AWS Athena. To extend the capabilities of the current packages please raise a feature request at: https://github.com/DyfanJones/RAthena/issues and https://github.com/DyfanJones/noctua/issues.

NOTE: Don't load both packages in the same environments as the connection classes will clash.