0
votes

I have developed a hive query that uses lateral views and get_json_object to unpack some json. The query works well enough using a jdbc client (dbvisualizer) against a hive database but when run as spark sql from a java application, on the same data, it returns nothing. I have tracked down the problem to differences in what the function 'get_json_object' returns.

The issue can be illustrated by this type of query

select concat_ws( "|", get_json_object('{"product_offer":[
{"productName":"Plan A"},
{"productName":"Plan B"}]}', 
'$.product_offer.productName') )

When run in dbvisualizer against a Hive database I get an array of the 2 product names in the json array: ["Plan A","Plan B"]. When the same query is run as spark sql from a java application, null is returned.

I have noticed another difference: the path '$.product_offer[0].productName' returns 'Plan A' in db visualizer and nothing in spark.

1

1 Answers

1
votes

The path to extract the array of product names is

select concat_ws( "|", get_json_object('{"product_offer":[{"productName":"Plan A"},{"productName":"Plan B"}]}', '$.product_offer[*].productName'

which works both in spark dbvisualizer.