1
votes

I have created a table :

add jar /../xlibs/hive-json-serde-0.2.jar;

CREATE EXTERNAL TABLE SerdeTest (Unique_ID STRING ,MemberID STRING ,Data ARRAY> )

PARTITIONED BY (Pyear INT, Pmonth INT)

ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde";

ALTER TABLE SerdeTest ADD PARTITION (Pyear = 2014, Pmonth =03) LOCATION '../Test2';

The data in the file :

{"Unique_ID":"ABC6800650654751","MemberID":"KHH966375835","Data":[{"SerialNo":1,"VariableName":"Var1","VariableValue":"A_49"},{"SerialNo":2,"VariableName":"Var2","VariableValue":"B_89"},{""SerialNo":3,"VariableName":"Var3","VariableValue":"A_99"}]}

Select query that I am using:

select Data[0].SerialNo from SerdeTest where Unique_ID = 'ABC6800650654751';

however, when I run this query I get the following error:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:98) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:330) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:386) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:237) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:223) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) ]

Can anyone please suggest me what am I doing wrong

2

2 Answers

0
votes

Few suggestions: Make sure that all the packages of hive and hive-json-serde-0.2.jar have execute permission for hadoop user. Hive creates a file called derby.log and metastore_db in the hive directory. It should be allowed to the user invoking the hive query to create files and directories. Location for data should have / at the end. e.g. LOCATION '../Test2/';

-1
votes

In short, the working JAR is json-serde-1.3-jar-with-dependencies.jar which can be found here. This one is working with 'STRUCT' and can even ignore some malformed JSON. During the creation of the table, include the following code:

 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
 LOCATION ...

If needed, it is possible to recompile it from here or here. I tried the first repository and it is compiling fine for me, after adding the necessary libs. The repository has also been updated recently.

Check for more details here.