0
votes

I have developed a hive query that uses lateral views and get_json_object to unpack some json. The query works well enough using a jdbc client (dbvisualizer) but when run as spark sql from a java application, on the same data, it returns nothing.

If I create an array to mimic the expected unpacked json, the query works

array('Plan A','Plan B','Plan C','Plan D')

but if I do the unpacking with this statement it doesn't:

split(regexp_replace(get_json_object('{"product_offer":[{"productName":"Plan A"},{"productName":"Plan B"},{"productName":"Plan C"},{"productName":"Plan D"}]}', '$.product_offer.productName'), '\\[|\\]|"', ''), ',' )

When run individually in dbvisualizer they produce the same output and are equivalent when compared.

This is a cutdown version of the query that works with a test table with one row, escaped for java:

 "SELECT pageid , exp.* , exp2.* , exp3.* from stephen \n" +
   "lateral view outer posexplode( split(regexp_replace(get_json_object('{\"product_offer\":[{\"productName\":\"P lan A\"},{\"productName\":\"Plan B\"},{\"productName\":\"Plan C\"},{\"productName\":\"Plan D\"}]}', '$.product_offer.productName'), '\\\\[|\\\\]|\"', ''), ',' ) )  exp as seqn, product_name\n" +
 "lateral view outer posexplode( array('aaaa','bbb','ccc','ddd'))  exp2 as seqo, product_offer\n" +
 "lateral view outer posexplode( array('Delete','Add','Add','Delete'))  exp3 as seqa, product_action\n" +
 "where (seqn=seqo)\n" +
 "  and (seqn=seqa)\n" 

I am using java 8, hive version '2.2.0 rda840b0f8fa99cab9f004810cd22abc207493cae'

No errors are logged.

1

1 Answers

0
votes

I rewrote the question as enter link description here when I established that get_json_object was misbehaving: the correct path to use to get all the product names is

'$.product_offer[*].productName'