I am connecting S3 Buckets to Apache Hive so that I can query the Parquet
files in S3 directly through PrestoDB.
For this, I configured the hive-site.xml
file and added my AWS Access Key and Secret Key in the file, as mentioned in this blog post. Now, my S3 Bucket URL path where Parquet
files reside looks like:
https://s3.console.aws.amazon.com/s3/buckets/sb.mycompany.com/someFolder/anotherFolder/?region=us-east-2&tab=overview
While Create an external table, I gave the location for S3 as:
LOCATION "s3://sb.mycompany.com/someFolder/anotherFolder"
The Apache Hive isn't able to find the parquet files at the mentioned location as it is not returning any data to the queries. This folder contains multiple parquet files. My question:
- Can Hive collect data from all parquet files at once as they all have the same schema as I defined for my external table?
- Is the S3 Location I specified is in correct format or should I include the region for S3 bucket as well (if yes, how)?