4
votes

I'm a newbye with Apache Drill.

The scenario is this:

I've an S3 bucket, where I place my csv file called test.csv. I've install Apache Drill with instructions from official website.

I followed this tutorial: https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/ for create an S3 plugin.

I start Drill, use the correct "workspace" (with: use my-s3;), but when I try to select records from test.cav file an error occured:

Table 's3./test.csv' not found.

Can anyone help me? Thanks!

1
Probably the syntax in your FROM clause is the problem because the syntax is hard to get. Please run these commands in Drill: USE my-s3; SHOW FILES; What my-s3 schemas are listed? If your file is listed, use SELECT * from test.csv.catpaws
When i run SHOW FILES; command, the result is null... I think there's a problem in connection with S3 bucket....bot I don't receive any errorsnicos
Here's a link to the blog that shows the comments, which for some reason I don't see going to the URL you provided. These comments might help: drill.apache.org/blog/2014/12/09/…. Also, if you're willing to experiment (haven't tried it myself), I have a storage plugin config and core-site.xml reported to work that I'll provide in the answer section.catpaws
Hey catpaws, I follow you link and found this answer "Eventually I learned that it only works with s3n:// scheme..." I put "s3n" in my plugin configuration, and now I'm able to run SHOW FILES and Queries! Thanks!!!nicos
yw @user2080395!! tks for the feedback.catpaws

1 Answers

5
votes

Use the name of your workspace (if you use one) and back ticks in the USE command as follows:

USE `my-s3`.`<workspace-name>`; 
SHOW files; //should list test.csv file
SELECT * FROM `test.csv`;

Query the CSV in the local file system using the dfs storage plugin configuration to rule out things like a header causing a problem. This page might help if you haven't seen it.

Storage plugin mentioned in comment above:

    {
  "type": "file",
  "enabled": true,
  "connection": "s3n://<accesskey>:<secret>@catpaws",
  "workspaces": {},
  "formats": {
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "parquet": {
      "type": "parquet"
    },
    "json": {
      "type": "json"
    }
  }
}

Probably, this is not relevant. It's an excerpt from the Amazon S3 help, which contains lots more info:

<property>
  <name>fs.s3.awsAccessKeyId</name>
  <value>ID</value>
</property>

<property>
  <name>fs.s3.awsSecretAccessKey</name>
  <value>SECRET</value>
</property>