2
votes

I want to insert data into hive table from another hive table using Spark SQL. I am getting below error.

My code:

SparkConf conf = new SparkConf().setAppName("GetHiveTableData");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaHiveContext hiveCtx = new JavaHiveContext(sc);

JavaSchemaRDD rdd = hiveCtx.sql("INSERT INTO TABLE table1 ( aggregation_date, subnetwork,last_5_max) select to_date(now()), subnetwork, max(last_5_minute_cpu_utilization_kpi)  from table2 where to_date(dataset_date)= to_date(now()) GROUP BY subnetwork");

Error msg:

        Caused by: org.apache.hadoop.hive.ql.parse.ParseException: line 1:83 cannot recognize input near '(' 'aggregation_date' ',' in select clause
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161)
2
Are ( aggregation_date, subnetwork,last_5_max) your partitions? - dheee

2 Answers

0
votes

You are not using valid hive syntax. You need to do

insert into table table1 select ..

Hive uses :

  • CTAS (create table as select)

  • Load data (local) inpath

for creating data.

0
votes

Named insert in not supported in hive. Use this. it assumes that table table1 have one three columns in order aggregation_date, subnetwork,last_5_max

INSERT INTO TABLE table1 select to_date(now()), subnetwork, max(last_5_minute_cpu_utilization_kpi)  from table2 where to_date(dataset_date)= to_date(now()) GROUP BY subnetwork;