0
votes

We have launched two EMR in AWS and installed the hadoop and hive-0.11.0 in one EMR and hive-0.13.1 other one.

Everything seems to be working fine but while trying to loading data into TABLE it's giving the below error and it happening in both the Hive Servers.

ERROR MESSAGE:

An error occurred when executing the SQL command: load data inpath 's3://buckername/export/employee_1/' into table employee_2 Query returned non-zero code: 10028, cause: FAILED: SemanticException [Error 10028]: Line 1:17 Path is not legal ''s3://buckername/export/employee_1/'': Move from: s3://buckername/export/employee_1 to: hdfs://XXX.XX.XXX.XX:X000/mnt/hive_0110/warehouse/employee_2 is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict. [SQL State=42000, DB Errorcode=10028]

I searched for the reason and mean of this message, I found this link but when tried to execute command suggested in the given link it's also giving the below error.

Command:

--service metatool -updateLocation hdfs://XXX.XX.XXX.XX:X000 hdfs://XXX.XX.XXX.XX:X000

Initializing HiveMetaTool.. HiveMetaTool:Parsing failed. Reason: Unrecognized option: -hiveconf

Any help in this will be really appreciated.

2

2 Answers

1
votes

LOAD does not support S3. It is best practice to leave data in S3 and just use it as a Hive external table instead of copying the data to HDFS. Some references http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html and When you create an external table in Hive with an S3 location is the data transfered?

0
votes

If you have installed hive on your Hadoop cluster, the default storage of hive data is HDFS (hive.metastore.warehouse.dir=/user/hive/warehouse).

As a workaround you can copy the file from S3 file system to HDFS and then from HDFS load the file to hive.

Most probably we may need to modify the parameter "hive.exim.uri.scheme.whitelist=hdfs,pfile" to load the data from S3 file system.