5
votes

I would like to insert the pig output into Hive tables(tables in Hive is already created with the exact schema).Just need to insert the output values into table. I dont want to the usual method, wherein I first store into a file, then read that file from Hive and then insert into tables. I need to reduce that extra hop which is done.

Is it possible. If so please tell me how this can be done ?

Thanks

4

4 Answers

6
votes

Ok. Create a external hive table with a schema layout somewhere in HDFS directory. Lets say

create external table emp_records(id int,
                                  name String,
                                  city String)
                                  row formatted delimited 
                                  fields terminated by '|'
                                  location '/user/cloudera/outputfiles/usecase1';

Just create a table like above and no need to load any file into that directory.

Now write a Pig script that we read data for some input directory and then when you store the output of that Pig script use as below

A =  LOAD 'inputfile.txt' USING PigStorage(',') AS(id:int,name:chararray,city:chararray);
B = FILTER A by id > = 678933;
C = FOREACH B GENERATE id,name,city;
STORE C INTO '/user/cloudera/outputfiles/usecase1' USING PigStorage('|');

Ensure that destination location and delimiter and schema layout of final FOREACH statement in you Pigscript matches with Hive DDL schema.

3
votes

There are two approaches explained below with 'Employee' table example to store pig output into hive table. (Prerequisite is that hive table should be already created)

A =  LOAD 'EMPLOYEE.txt' USING PigStorage(',') AS(EMP_NUM:int,EMP_NAME:chararray,EMP_PHONE:int);

Approach 1: Using Hcatalog

// dump pig result to Hive using Hcatalog 
store A into 'Empdb.employee' using org.apache.hive.hcatalog.pig.HCatStorer();

(or)

Approach 2: Using HDFS physical location

// dump pig result to external hive warehouse location
STORE A INTO 'hdfs://<<nmhost>>:<<port>>/user/hive/warehouse/Empdb/employee/' USING PigStorage(',')

;

0
votes

you can store it using Hcatalog

STORE D INTO 'tablename' USING org.apache.hive.hcatalog.pig.HCatStorer();

see below link https://acadgild.com/blog/loading-and-storing-hive-data-into-pig

0
votes

The best way is to use HCatalog and write the data in hive table.

STORE final_data INTO 'Hive_table_name' using org.apache.hive.hcatalog.pig.HCatStorer();

But before storing the data, make sure the columns in the 'final_data' dataset is perfectly matched and mapped with the schema of the table.

And run your pig script like this : pig script.pig -useHCatalog