0
votes

I'm using Apache Drill (v1.10.0) Windows embedded to connect to S3, but am having issues querying successfully unless I use the Drill Explorer client

The ODBC connection works (connection string below)

CastAnyToVarchar=true;
Catalog=s3citibike;
Schema=default;
HandshakeTimeout=5;
QueryTimeout=180;
TimestampTZDisplayTimezone=local;
NumberOfPrefetchBuffers=5;
StringColumnLength=1024;
ConvertToCast=false

If I use Drill Explorer (direct to Drillbit), I can see the files in s3citibike.default, and view the data (see attached image) but for some reason I cannot see my files when using ODBC with another client such as Excel.

I can query using sqline, for example the below query returns the dataset sucessfully

SELECT * FROM `s3citibike`.`default`.`./201307-citibike-tripdata.csv` LIMIT 100;

I'm kind of guessing I'm just not specifying the folder path correctly, but I've been looking around for a while, tried Catalog = DRILL, schema = s3citibike.default, no avail.

I'd try the drill-jdbc-all-1.10.0.jar JDBC driver for my client, but understand it doesn't work with embedded Windows

I was just asked by one of my sales guys if I could get this working for a customer meeting in a couple of hours where an inability to query S3 via Apache Drill ODBC or JDBC is a dealbreaker.

Can anyone see where I'm going wrong?

enter image description here

Thanks and regards, Jack

1

1 Answers

1
votes

Got some feedback from the Apache Drill user group

"With tools like Excel you will either have to figure out how to enter custom SQL, or if you want the data to be more visible to these tools you will have to create Drill Views and then reference these views from the tool via ODBC/JDBC. Properly define the column name and data types in the Views to make it easier for the end user/tool to process the data (this way you push the work to Drill)."

I created a view in a .tmp schema referencing the schema containing my csv files. I was able to see and query this view successfully in my client