1
votes

We are trying to create an Unnest view in Athena which is equivalent to Hive lateral view for JSON data which has array fields in it then if the unnest is null then parent ky is getting dropped.

Below are the sample JSONs we tried to create a view on.

{"root":{"colA":"1","colB":["a","b","c"]}}
{"root":{"colA":"2"}}

The output for above data in Hive view is as below:

+----------------------+----------------------+--+ 

| test_lateral_v.cola  | test_lateral_v.colb  |    
+----------------------+----------------------+--+ 

| 1                    | a                    |    
| 1                    | b                     
| 1                    | c                    |    
| 2                    | NULL                 |    
+----------------------+----------------------+--+ 

But when we are trying to create the view in Athena with CROSS JOIN UNNEST below is the output:

cola colb

1   a

1   b

1   c

If the JSON data does not have the values for the field which we have created the UNNEST on, that row is getting eliminated from the output, whereas hive gives that row as well with NULL value for the corresponding missing value.

/DDLs used in hive/

create    external    table    if    not    exists    test_lateral(
root    struct<
 colA:    string,
 colB:    array<
  string
  >
 >
 )
ROW    FORMAT    SERDE    'org.apache.hive.hcatalog.data.JsonSerDe'
Stored    as    textfile 
location    "<hdfs_location>";

create view test_lateral_v 
(colA,colB)
as select
root.colA,
alias
from test_lateral
lateral view outer explode (root.colB) t as alias;

/DDLs used for athena/

create external table if not exists test_lateral(
root struct<
 colA: string,
 colB: array<
  string
  >
 >
 )
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
Stored as textfile 

location "<s3_location>";

create view test_lateral_v 
as select
root.colA,
alias as colB
from test_lateral
cross join unnest (root.colB) as t (alias);
2

2 Answers

2
votes

SELECT * FROM (test_lateral CROSS JOIN UNNEST(coalesce("root"."colb",array[null])) t (alias))

works

0
votes

Obviously, CROSS JOIN UNNEST produces no rows when unnested array is null or empty, but you can use LEFT JOIN UNNEST:

SELECT * test_lateral
LEFT JOIN UNNEST("root"."colb") t(alias) ON true;

This is available since Presto 319. Before that, you can use coalesce to replace null array with a dummy value. (This assumes you don't have empty arrays in your data).

SELECT *
FROM test_lateral
CROSS JOIN UNNEST(coalesce("root"."colb", ARRAY[NULL])) t (alias))