1
votes

I'm trying to filter NULL as well as Empty fields from a CSV file in Pig. I have used CSVExcel storage to load the data and remove the header. Below is the pig script that I have tried.

REGISTER /usr/lib/pig/piggybank.jar;
inp = load 'test.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',','YES_MULTILINE','NOCHANGE','SKIP_INPUT_HEADER');
a = foreach inp generate (INT)$0 as id, (CHARARRAY)$1 as name, (CHARARRAY)$2 as dept;
b = filter a by (id is not null) AND (name is not null) AND NOT(name MATCHES '') AND (dept is not null) ;

Sample Input:

id,name,dept

1,Avy,NULL

2,,CS

3,Sam,Mech

After I do Dump b, below is the output.

(1,Avy,NULL)

(3,Sam,Mech)

Ideally, I don't want the first record as well because it contains NULL. Can anyone suggest?

1

1 Answers

1
votes

Finally, this worked for me!

b = filter a by (id is not null) AND (name is not null) AND NOT(name MATCHES '') AND (dept!= 'NULL');

Thanks, guys!