1
votes

I have the below requirement that i want to achieve using Hive.

  1. Table A with Struct 1 column.
  2. Create Table B to split Table A's Struct columns and load as 2 different columns.

Below is the DDL and the steps that i have tried.

Table A :

cluster string from deserializer
schema string from deserializer
table string from deserializer
primarykey struct from deserializer
data struct from deserializer
deleted boolean from deserializer
truncate boolean from deserializer
version bigint from deserializer

Table B : user_id string from deserializer
shippingattr struct billingattr struct

i have tried the following.

insert into Table B
(
user_id,
shippingattr.username,
shippingattr.shipping_fname,
shippingattr.shipping_lname,
shippingattr.shipping_address,
shippingattr.shipping_address2,
shippingattr.shipping_city,
shippingattr.shipping_state,
shippingattr.shipping_zip,
shippingattr.phone,
billingattr.billing_is_shipping,
billingattr.billing_fname,
billingattr.billing_lname,
billingattr.billing_address,
billingattr.billing_address2,
billingattr.billing_city,
billingattr.billing_state,
billingattr.billing_zip
)

select 
primarykey.user_id,
data.username,
data.shipping_fname,
data.shipping_lname,
data.shipping_address,
data.shipping_address2,
data.shipping_city,
data.shipping_state,
data.shipping_zip,
data.phone,
data.billing_is_shipping,
data.billing_fname,
data.billing_lname,
data.billing_address,
data.billing_address2,
data.billing_city,
data.billing_state,
data.billing_zip
from Table A;

but this errors out., could you please let me know the right way to insert this data.

Error Message : NoViableAltException(284@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:742) at org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:40147) at org.apache.hadoop.hive.ql.parse.HiveParser.singleSelectStatement(HiveParser.java:38048) at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:37754) at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:37654) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:36898) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:36774) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1338) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:50 cannot recognize input near '(' 'user_id' ',' in select clause

3
show us the error.strugee
@strugee, updated with the error message. Not sure if this is the right way to insert into a struct data type.Srivignesh KN

3 Answers

0
votes

For start, Hive does not except a list of columns for insert.
All columns should be inserted, by their order.

insert into Table B
select ...
0
votes

Above Scenario canbe handled as below. Sample example to insert data in struct:-

create table xyz (a STRUCT<x : int , y :String , z:int , zz : int >, b string );

 insert into table xyz select named_STRUCT(x,12,y,'text',z,2,zz,3) , 'text' from dummy limit 1;

 create table xyz1 (a1 STRUCT<x : int , y :String > , a2 struct<z:int , zz : int> , b String);

 insert into table xyz1  select named_struct('x',a.x,'y',a.y) ,named_struct('z',a.z,'zz',a.zz) ,b from xyz;

insert into table xyz1(a1,a2,b)  select named_struct('x',a.x,'y',a.y) ,named_struct('z',a.z,'zz',a.zz) , b from xyz;
0
votes

Thank you for the suggestions, I have implemented it this way.

insert into table Table B select user_id, named_struct('username',Table2.data.username,'shipping_fname',Table2.data.shipping_fname,......) as shippingattr, named_struct('billing_is_shipping',Table2.data.billing_is_shipping,.....,'billing_zip',Table2.data.billing_zip) as billingattr from Table A;

Thanks & Regards, Srivignesh KN