I have a case where I receive multiple CSV from third parties (little hard to make them change the format), and those CSVs should have the same columns, but sometimes one or more columns are missing. If I use CDAP File (reading as text) followed by a Wrangler to process the CSV the Wrangler with the following directive:
parse-as-csv :body '\\t' true
cleanse-column-names
It will assume that all files read have the same column format and will mess the data of the files that have less or more column than the first file.
So far I tried to use the File to read as blob and to have the output as bytes with a Wrangler configured with this directive:
set-type :body string
parse-as-csv :body '\t' true
cleanse-column-names
But now I do not even have any output (or error), so I am clueless how to parse those non uniform files. Is CDAP able to handle this case? If yes, how?