We are trying to create an Unnest view in Athena which is equivalent to Hive lateral view for JSON data which has array fields in it then if the unnest is null then parent ky is getting dropped.
Below are the sample JSONs we tried to create a view on.
{"root":{"colA":"1","colB":["a","b","c"]}}
{"root":{"colA":"2"}}
The output for above data in Hive view is as below:
+----------------------+----------------------+--+
| test_lateral_v.cola | test_lateral_v.colb |
+----------------------+----------------------+--+
| 1 | a |
| 1 | b
| 1 | c |
| 2 | NULL |
+----------------------+----------------------+--+
But when we are trying to create the view in Athena with CROSS JOIN UNNEST below is the output:
cola colb
1 a
1 b
1 c
If the JSON data does not have the values for the field which we have created the UNNEST on, that row is getting eliminated from the output, whereas hive gives that row as well with NULL value for the corresponding missing value.
/DDLs used in hive/
create external table if not exists test_lateral(
root struct<
colA: string,
colB: array<
string
>
>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
Stored as textfile
location "<hdfs_location>";
create view test_lateral_v
(colA,colB)
as select
root.colA,
alias
from test_lateral
lateral view outer explode (root.colB) t as alias;
/DDLs used for athena/
create external table if not exists test_lateral(
root struct<
colA: string,
colB: array<
string
>
>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
Stored as textfile
location "<s3_location>";
create view test_lateral_v
as select
root.colA,
alias as colB
from test_lateral
cross join unnest (root.colB) as t (alias);