I want to extract only the states from the below xml file.
<.Table>
<State>Florida</State>
<id>123</id>
<./Table>
<.Table>
<State>Texas</State>
<id>456</id>
<./Table>
Expected output :
(Florida)
(Texas)
But with the below pig statements I get
()
() as output
A = LOAD 'hdfs:/user.xml' USING org.apache.pig.piggybank.storage.XMLLoader('Table') AS (x:chararray);
B = FOREACH A GENERATE FLATTEN (REGEX_EXTRACT_ALL(x,
'<Table>\\n\\s*<State>(.*)</State>\\n\\s*\\n\\s*</Table>'))
as (state:chararray);
Please help me understand where I have gone wrong or how do I eliminate a certain tag line?