Hey guys i have one more question I am just not able to understand the behavior of pig
I am loading the data into pig and after some transformation storing it using PigStorage() on hdfs(/user/sga/transformeddata).
But when I load the data from /user/sga/transformeddata location and do
temp = load '/user/sga/transformeddata' using PigStorage();
gen = foreach temp generate page_type;
dump gen;
getting following error:
databytearray can not be cast to java.lang.String
but if i do
gen = foreach temp generate *;
dump gen;
it works fine
any help is totally appreciated to understand this.
As required presenting the code:
STORE union_of_all_records INTO '/staged/google/data_after_denormalization' using PigStorage('\t','-schema');
union_of_all_records is an alias in pig.
now another script which will consume this data
lookup_data =
LOAD '/staged/google/page_type_map_file/' using PigStorage() AS (page_type:chararray,page_type_classification:chararray);
load_denorm_clickstream_record =
LOAD '/staged/google/data_after_denormalization' using PigStorage('\t','-schema');
and join on these two aliases
denorm_clickstream_record = LIMIT load_denorm_clickstream_record 100;
join_with_lookup =
JOIN denorm_clickstream_record BY page_type LEFT OUTER, lookup_data BY page_type;
step x : final_output =
FOREACH join_with_lookup
GENERATE denorm_clickstream_record::page_type as page_type;
at step x i get the above error.