I am trying to load my hive table using Hcatalog in pig, for that i have written below code but i am getting error. I am opening my pig shell using pig -useHCatalog
Code:
A = LOAD 'patient_info' USING org.apache.hive.hcatalog.pig.HCatLoader();
Error:
ERROR hive.ql.metadata.Table - Unable to get field from serde: com.ibm.spss.hive.serde2.xml.XmlSerDe java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class com.ibm.spss.hive.serde2.xml.XmlSerDe not found) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:275) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:255) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:602) at org.apache.hive.hcatalog.common.HCatUtil.getTableSchemaWithPtnCols(HCatUtil.java:184) at org.apache.hive.hcatalog.pig.HCatLoader.getSchema(HCatLoader.java:216) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175) at org.apache.pig.newplan.logical.relational.LOLoad.(LOLoad.java:89) at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:866) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1688) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1635) at org.apache.pig.PigServer.registerQuery(PigServer.java:587) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:547) at org.apache.pig.Main.main(Main.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: MetaException(message:java.lang.ClassNotFoundException Class com.ibm.spss.hive.serde2.xml.XmlSerDe not found) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:400)
Update:
My command for storing data in hive is given below.
add jar /home/cloudera/hivexmlserde-1.0.5.3.jar;
CREATE EXTERNAL TABLE patient_info (
statusCode string,
title string,
startTime string,
endTime string,
frequencyValue string,
frequencyUnits string
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.statusCode"="medicationsInfo/entryInfo/statusCode/text()",
"column.xpath.title"="medications/code/code/text()",
"column.xpath.startTime"="medications/xxx/startTime/text()",
"column.xpath.endTime"="medications/xxx/endTime/text()",
"column.xpath.frequencyValue"="medications/xxx/frequencyValue/text()",
"column.xpath.frequencyUnits"="medications/xxx/frequencyUnits/text()",
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<medicationsInfo",
"xmlinput.end"="</medicationsInfo>");
load data inpath '/user/cloudera/xml' into table patient_info ;
Sample:
<Document>
<ProductCode>
<code>10160-0</code>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20110729</startTime>
<endTime>20110822</endTime>
<strengthValue>24</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20120130</startTime>
<endTime>20120326</endTime>
<strengthValue>12</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20100412</startTime>
<endTime>20110822</endTime>
<strengthValue>8</strengthValue>
<strengthUnits>d</strengthUnits>
</entryInfo>
</ProductCode>
<ProductCode>
<code>10160-0</code>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20110729</startTime>
<endTime>20110822</endTime>
<strengthValue>24</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20120130</startTime>
<endTime>20120326</endTime>
<strengthValue>12</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20100412</startTime>
<endTime>20110822</endTime>
<strengthValue>8</strengthValue>
<strengthUnits>d</strengthUnits>
</entryInfo>
</ProductCode>
<Medicationsinfo>
<code>10160-0</code>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20110729</startTime>
<endTime>20110822</endTime>
<strengthValue>24</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20120130</startTime>
<endTime>20120326</endTime>
<strengthValue>12</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20100412</startTime>
<endTime>20110822</endTime>
<strengthValue>8</strengthValue>
<strengthUnits>d</strengthUnits>
</entryInfo>
</Medicationsinfo>
<Medicationsinfo>
<code>10160-0</code>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20110729</startTime>
<endTime>20110822</endTime>
<strengthValue>24</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20120130</startTime>
<endTime>20120326</endTime>
<strengthValue>12</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20100412</startTime>
<endTime>20110822</endTime>
<strengthValue>8</strengthValue>
<strengthUnits>d</strengthUnits>
</entryInfo>
</Medicationsinfo>
</Document>