Apache Pig - How to read data from CSV file with data optionally enclosed within double quotes?
Sample data is provided below:
"Traditional",0.03,"Department, of Housing and Urban Development (HUD)",0.01
Expected Output :
Traditional 0.03 Department, of Housing and Urban Development (HUD) 0.01
In the above example we have 4 columns. 2 are enclosed in double quotes and 2 are not and are of floating data type. Moreover there is 3rd column which is having a comma in the data itself.
Please help me with some Pig related API's (sample code) which would help to split the data correctly and process them using positional notation say $0, $1, $2, $3.
I have explored CSVExcelStorage
and CSVLoader
from PiggyBank
, but I am not able to split properly.