0
votes

I've the following Pig Script:

   I'm trying with this:

Source_Data = LOAD '/user/cloudera/Source_Data/' using PigStorage('\t','-tagFile'); Data_Schema = FOREACH Source_Data GENERATE ( (chararray)$1 AS Date, (chararray)$2 AS ID, (chararray)$3 AS Interval, (chararray)$4 AS Code, (chararray)$5 AS S_In_Activity, (chararray)$6 AS S_Out_Activity, (chararray)$7 AS C_In_Activity, (chararray)$8 AS C_Out_Activity, (chararray)$9 AS Traffic_Activity); STORE Data_Schema INTO '/user/cloudera/Source_Data/New_Data/' USING PigStorage('\t');

Here is a row of my source data:

11300 1387926000000 76 1.8190562337403677 0.9613115354827483 330.0372865843317554633 0.1161754442265068633 11.04195619825027733

But I'm getting error when I execute the code but If I remove the last part to define the schema it gives me successfully. Note that the first column was inserted by the Pig Statement.

1

1 Answers

0
votes

You basically answer yourself the question in the last sentence. You can not declare schema when using STORE operator. According to the official doc:

STORE alias INTO 'directory' [USING function];

In your case it will be simply:

Data = LOAD '/user/cloudera/Source' using PigStorage('\t','-tagFile'); 

Data_prestage = FOREACH Data GENERATE (
(chararray)$1 AS Filename, 
(chararray)$2 AS CCode, 
(chararray)$3 AS SCode, 
(chararray)$4 AS In_Act,
(chararray)$5 AS Out_Act,
(chararray)$6 AS In_Act1;

STORE Data_prestage INTO '/user/cloudera/Source/Data2/' USING PigStorage('\t');

Also if you do not plan on doing any manipulation with the data you might think of using STREAM instead.