2
votes

I have a table called exampleTable of two columns of arrays of strings.

Array<string> col1  Array<string> col2
["a", "b" , "c" ]       ["x","y","z"]
["aa", "bb" , "cc" ]       ["xx","yy","zz"]

My goal is produce a table like this

col1                   col2
"a"                     "x"
"b"                     "y"
"c"                     "z"
"aa"                     "xx"
"bb"                     "yy"
"cc"                     "zz"

I thought about using LATERL VIEW like :

SELECT myCol1, myCol2  FROM exampleTable
         LATERAL VIEW explode(col1) myTable1 AS myCol1
         LATERAL VIEW explode(col2) myTable2 AS myCol2;

but this produces this instead

col1                   col2
"a"                     "x"
"a"                     "y"
"a"                     "z"
"a"                     "xx"
"a"                     "yy"
"a"                     "zz"
"b"                     "x"
"b"                     "y"
"b"                     "z"
"b"                     "xx"
"b"                     "yy"
"b"                     "zz"
"c"                     "x"
"c"                     "y"
"c"                     "z"
"c"                     "xx"
"c"                     "yy"
"c"                     "zz"
"aa"                     "x"
"aa"                     "y"
"aa"                     "z"
"aa"                     "xx"
"aa"                     "yy"
"aa"                     "zz"
"bb"                     "x"
"bb"                     "y"
"bb"                     "z"
"bb"                     "xx"
"bb"                     "yy"
"bb"                     "zz"
"cc"                     "x"
"cc"                     "y"
"cc"                     "z"
"cc"                     "xx"
"cc"                     "yy"
"cc"                     "zz"

How can I tackle this? Thanks in advance.

1
were you able to solve this? - Bee
Yes, I did. I ended up writing my own UDF. - user1309258
the only possibility seems to be to write a custom UDF or a simple custom mapper script (using Hive's transform functionality) that will do that. It will essentially take 3 arrays and return an array of arrays where each subarray is comprised of elements at corresponding indexes. For example, this UDF will take 3 elements in this example: arg1: [title1, title2, title3] arg2: [artist1, artist2, artist3] arg3: [album1, album2, album3] and return [[title1,artist1,album1],[title2,artist2,album2],[title3,artist3,album3]] We can explode this array and pick out individual indexes to get the answer - user1309258

1 Answers

1
votes

You can write a UDTF using explode UDTF or explode2 UDTF as a base.

Refer here for using a explode2 base.