I am new to PIG and trying to analyse UberDataSet for 2 months to find out on which day more trips were booked.
Format:
B02617,2/27/2015,1551,14677
B02598,2/27/2015,1114,10755
B02512,2/27/2015,272,2056
B02764,2/27/2015,4253,38780
Pig Script1:
A = Load 'UberDataSet.txt' using PigStorage(',') as
(base:chararray, tripdate:datetime, cars:int, tripkms:int);
DESCRIBE A;
DUMP A;
I am able to see that tripdate is of datetime type but I am getting only ,, in output but not dates.
Output:
(B02682,,1395,12693)
(B02617,,1473,12811)
(B02764,,3934,31957)
(B02598,,1134,10661)
(B02617,,1539,14461)
(B02682,,1465,13814)
(B02512,,243,1797)
Then I tried like this.
Pigscript2:
A = Load 'UberDataSet.txt' using PigStorage(',') as
(base:chararray, tripdate:chararray, cars:int, tripkms:int);
B = FOREACH A GENERATE tripdate;
C = FOREACH B GENERATE ToDate(tripdate,'yyyy-MM-dd') as mytripdate;
DESCRIBE C;
DUMP C;
Job Failed with an error message:
Job DAG: job_1495878748804_1697 2017-06-10 16:58:32,785 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2017-06-10 16:58:32,790 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias C. Backend error : org.apache.pig.b ackend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - sc ope-25 Operator Key: scope-25) children: null at []]: java.lang.IllegalArgumentException: Invalid format: "date" Details at logfile: /home/manasa.testing_gmail/pig_1497109612992.log
There is some question related to this problem but could not get right solution or my problem. Loading datetime format files using PIG
I tried to change the date format to 'MM/dd/yyyy' also in
"C = FOREACH B GENERATE ToDate(tripdate,'yyyy-MM-dd') as mytripdate;" keeping remaining script same... But I am getting same error saying about dateformat....
Can anyone help me to go further...
Thanks in advance....