1
votes

I am connecting S3 Buckets to Apache Hive so that I can query the Parquet files in S3 directly through PrestoDB.

For this, I configured the hive-site.xml file and added my AWS Access Key and Secret Key in the file, as mentioned in this blog post. Now, my S3 Bucket URL path where Parquet files reside looks like:

https://s3.console.aws.amazon.com/s3/buckets/sb.mycompany.com/someFolder/anotherFolder/?region=us-east-2&tab=overview

While Create an external table, I gave the location for S3 as:

LOCATION "s3://sb.mycompany.com/someFolder/anotherFolder"

The Apache Hive isn't able to find the parquet files at the mentioned location as it is not returning any data to the queries. This folder contains multiple parquet files. My question:

  • Can Hive collect data from all parquet files at once as they all have the same schema as I defined for my external table?
  • Is the S3 Location I specified is in correct format or should I include the region for S3 bucket as well (if yes, how)?
1
Did you found the solution to this?Shubzumt

1 Answers

0
votes

If you data is partitioned you need to repair the table, in some cases even you dont have any partitions you must repair the table, use the following command in hive:

set hive.msck.path.validation=ignore;

msck repair table schema.table;