hadoop version: Hadoop 2.6.0-cdh5.12.2 hive version: Hive 1.1.0-cdh5.12.2
Considre two tables: products - stores product Id and other details about the product activity - stores user_id, product_id which tells which user purchased which product and other transaction details.
before creating these tables I added SerDe JAR using below command: add jar /home/ManojKumarM_R/json-serde-1.3-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE IF NOT EXISTS products (id string,name string,reseller
string,category string,price Double,discount Double,profit_percent Double)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' location
"/user/ManojKumarM_R/ProductsMergeEnrichOut";
sample data in /user/ManojKumarM_R/ProductsMergeEnrichOut
{"Id":"P101", "Name":"Round Tee", "Reseller":"Nike", "Category":"Top Wear", "Price":2195.03, "Discount":21.09, "Profit_percent":23.47}
{"Id":"P102", "Name":"Half Shift", "Reseller":"Nike", "Category":"Top Wear", "Price":1563.84, "Discount":23.83, "Profit_percent":17.12}
CREATE EXTERNAL TABLE IF NOT EXISTS activity (product_id string,user_id
string,cancellation boolean ,return boolean,cancellation_reason
string,return_reason string, order_date timestamp, shipment_date timestamp,
delivery_date timestamp , cancellation_date timestamp, return_date
timestamp) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' location
"/user/ManojKumarM_R/ActivityMergeEnrichOut/";
sample data in /user/ManojKumarM_R/ActivityMergeEnrichOut/
{"Product_id":"P117", "User_id":"U148", "Cancellation":"TRUE", "Return":"NA", "Cancellation_reason":"Duplicate Product", "Return_reason":"NA", "Order_date":"2016-02-12", "Shipment_date":"NA", "Delivery_date":"NA", "Cancellation_date":"2018-05-20", "Return_date":"NA"}
{"Product_id":null, "User_id":"U189", "Cancellation":"FALSE", "Return":"FALSE", "Cancellation_reason":"NA", "Return_reason":"NA", "Order_date":"2017-04-22", "Shipment_date":"2017-05-05", "Delivery_date":"2017-09-09", "Cancellation_date":"NA", "Return_date":"NA"}
table creation was successful,
select * from products;
&
select * from activity;
queries work absolutely fine thus denoting that SerDe JAR is picked during select query.
However, when I run below join query: I want to join these two tables on a common column which is Product Id
SELECT a.user_id, p.category FROM activity a JOIN products p
ON(a.product_id = p.Id);
it fails with below message
Execution log at: /tmp/ManojKumarM_R/ManojKumarM_R_20181010124747_690490ae-e59f-4e9d-9159-5c6a6e28b951.log 2018-10-10 12:47:43 Starting to launch local task to process map join; maximum memory = 2058354688 Execution failed with exit status: 2 Obtaining error information
Task failed! Task ID: Stage-5
Log in /tmp/ManojKumarM_R/ManojKumarM_R_20181010124747_690490ae-e59f-4e9d-9159-5c6a6e28b951.log
2018-10-10 12:47:43,984 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInProcess(398)) - Hive Runtime Error: Map local work failed org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: org.openx.data.jsonserde.JsonSerDejava.lang.RuntimeException: java.lang.ClassNotFoundException: org.openx.data.jsonserde.JsonSerDe at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:73)
which denotes that Hive is not able to find JsonSerDe JAR even though I've added JAR during that hive session and selct queries were working fine. If anyone has resolved similar issue please let me know, am not sure if Hive looks in different directories for JARs during JOIN operation.