0
votes

I'm trying to parse a Date in a Pig script and i got the following error "Hadoop does not return any error message".

Here is the Date format example : 3/9/16 2:50 PM

And here is how I parse it :

data = LOAD 'cleaned.txt'
AS (Date, Block, Primary_Type, Description, Location_Description, Arrest, Domestic, District, Year);

times = FOREACH data GENERATE ToDate(Date, 'M/d/yy h:mm a') As Time;

You can see the data file here

Do you have any idea ? Thanks


EDIT:

It look like the error is caused by the STORE command on "times".

If I do a DUMP then I got:

ERROR 1066: Unable to open iterator for alias times

It happen only when I use the ToDate function, I have other scripts that work perfectly.

1
can you post how you load the data and perhaps a data snippet (2-3 rows)? - Ran Locar
can you post it as CSV? It's unclear where your field delimiters are. When I tried loading using PigStorage with ' ' as the delimiter, and tried parsing just the date (without the hour part) it worked OK. - Ran Locar
Unfortunately I can't because it's for a school project and it's the only data file that they gave us ! I updated the informations by the way, it look like the problem is elsewhere. - El Capitan

1 Answers

2
votes

First of all, you need to specify the loader in the LOAD statement:

USING PigStorage('\t')

I assumed that you're using tab separator. Than if you have no schema specify the schema with type!

So you're load statement will be sg like this:
data = LOAD 'SO/date2parse.txt' USING PigStorage('\t') AS (Date:chararray, Block:chararray, Primary_Type:chararray, Description:chararray, Location_Description:chararray, Arrest:chararray, Domestic:chararray, District:chararray, Year:chararray);

For now I just use chararray type for everything, but you have to specify the type what is the right representation for you.

After this the date conversion just works fine as you wrote: (2016-03-09T23:55:00.000Z) (2016-03-09T23:55:00.000Z) (2016-03-09T23:55:00.000Z)

My test script:

data = LOAD 'SO/date2parse.txt' USING PigStorage('\t') AS (Date:chararray, Block:chararray, Primary_Type:chararray, Description:chararray, Location_Description:chararray, Arrest:chararray, Domestic:chararray, District:chararray, Year:chararray);
times = FOREACH data GENERATE ToDate(Date, 'M/d/yy h:mm a') As Time;
DUMP times;

UPDATE: Some explanation

By the way the default loader is pig storage

PigStorage is the default load function for the LOAD operator.

but it's nicer to specify. The original issue caused by the lack of datatype

If you don't assign types, fields default to type bytearray

so the ToDate failed on the input type.