0
votes

I am using cloudera Hadoop 2.6, pig 0.15 versions.

I am trying to extract data from xml file. Below you can see part of xml file.

<product productID="MICROLITEMX1600LAMP">
  <basicInfo>
                <category lang="NL" id="OT1006">Output Accessoires</category>
  </basicInfo>
</product>

I can dump node values but not attribute values using XPath() function. You can see the code below which is returning empty tuples instead of productID.

    DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();   
    allProducts = LOAD '/pathtofile/sample.xml' USING org.apache.pig.piggybank.storage.XMLLoader('product') AS (data:chararray);
    productsOneByOne = FOREACH allProducts GENERATE XPath(data, 'product/@productID') AS productid:chararray
    dump productsOneByOne;

Please help me out to resolve this issue.

1
Thanks for your response. I tried but it didn't work for me. Returning error. productsOneByOne = FOREACH allProducts GENERATE XPathAll(x, 'product/@productID', true, false).$0 as (productid:chararray); ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: <line 4, column 57> Invalid field projection. Projected field [x] does not exist in schema: data:chararray. - Aravind Kumar Anugula
@inquisitive_mind Thank you for refering link. It helped me alot. I posted answer myself . please check it. - Aravind Kumar Anugula

1 Answers

0
votes

Adding more to How to extract xml attributes using Xpath in Pig?

Bug is there in XPath.java as it is ignoring 4th parameter.

By adding following code in XPath.java and complied issue is resolved. http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPath.java

if(input.size() > 3){
  ignoreNamespace=input.get(3);
}

above code should be added before

if (ignoreNamespace) {
                xpathString = createNameSpaceIgnoreXpathString(xpathString);
 }