Hive Create table over S3 in RIAK CS

Question

I have Hive service running on a Hadoop cluster. I'm trying to create a Hive table over Eucalyptus(RIAK CS) S3 data. I have configured the AccessKeyID and SecretAccessKey in core-site.xml and hive-site.xml. When I execute the Create table command and specify the S3 location using s3n schema, I get the below error:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.http.conn.ConnectTimeoutException: Connect to my-bucket.s3.amazonaws.com:443 timed out)

If I try using the s3a schema, I get the below error:

FAILED: AmazonClientException Unable to load AWS credentials from any providern the chain

I could change the endpoint URL for distcp command using jets3t, but the same didnt work for Hive. Any suggestions to point Hive to Eucalyptus S3 endpoint are welcome.

Is it possible to connect Riak CS by simpler command line tools, e.g. s3cmd or s3curl ? — shino
Some more questions. - Did you use https in s3cmd too? - Can you try to connect plain http instead of https in connecting riak cs? - Do you use proxy to connect to riak cs? - Can you confirm that the client actually try to connect to your riak cs server? - Are there any lines in riak cs log which indicates errors? — shino
I have configured the S3 access for my account using the s3cfg file which has the end-point URL. I have not configured http or https protocol for the connectivity. The Hive client is not trying to connect to RIAK CS. By default client points to "s3.amazonaws.com" and I'm unable to modify it to the required end-point. — Veronica
Do you want to connect to AWS S3 or (your own?) riak cs? If your Hive client does not try to connect ro Riak CS, this is not Riak CS related issue. — shino

Kota UENISHI Kota UENISHI · Accepted Answer · 2016-01-29T01:00:21

I'm not familiar with Hive, but as long as I hear it uses MapReduce as backend processing system. MapReduce uses jets3t as S3 connector - changing its configuration worked for me in both MapReduce and Spark. Hope this helps: http://qiita.com/kuenishi/items/71b3cda9bbd1a0bc4f9e

Configurations like

s3service.https-only=false

s3service.s3-endpoint=yourdomain.com

s3service.s3-endpoint-http-port=8080

s3service.s3-endpoint-https-port=8080

would work for you?

Hive Create table over S3 in RIAK CS

2 Answers