I have tab-separated output from a Mahout recommender that I'd like to query in Hive. The recommendations look like this:
54508 [19:4.9,22:3.5]
54584 [17:5.2]
54648 [13:6.1,3:5.9]
54692 [17:8.1]
55424 [1:3.8]
55448 [16:2.7,3:1.2]
55452 [17:6.8]
57084 [42:6.8,3:5.4]
57212 [17:3.5]
There are two columns: the first column contains a userID, and the second contains a list of recommended products and their expected ratings.
I created a Hive table:
CREATE TABLE `recommendations_raw`(
user int,
recommendations string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/etl/recommender/output';
And I'm able to transform the data into a long tabular form in a Hive query:
select
user,
product,
rating
from recommendations_raw
lateral view explode(str_to_map(substr(recommendations, 2, length(recommendations) - 2), ",", ":")) product_rating as product, rating
user product rating
54508 19 4.9
54508 22 3.5
54584 17 5.2
[etc...]
However, I would have preferred to create the map inside the create table statement instead of using str_to_map
inside the query since it seems wrong to create a table with string
datatype when it's really a map
.
Is this possible/practical? If so, how?