I have files of the format test_YYYYMM.txt. I am using '-tagFile' and SUBSTRING() to extract the year and month for use in my pig script.
The file name gets added as a pseudo-column at the beginning of the tuple.
Before I do a DUMP I would like to remove that column. Doing a FOREACH ... GENERATE with only the columns I need does not work, it still retains the psuedo-column.
Is there a way to remove this column?
My sample script is as follows
raw_data = LOAD 'test_201501.txt' using PigStorage('|', '-tagFile') as
col1: chararray, col2: chararray;
data_with_yearmonth = FOREACH raw_data GENERATE
SUBSTRING($0,5,11) as yearmonth,
'TEST_DATA' as test,
col1,
col2;
DUMP data_with_yearmonth;
Expected Output: 201501, TEST_DATA, col1, col2
Current Output: 201501, TEST_DATA, test_YYYYMM.txt, col1, col2