1
votes

I have a case where I receive multiple CSV from third parties (little hard to make them change the format), and those CSVs should have the same columns, but sometimes one or more columns are missing. If I use CDAP File (reading as text) followed by a Wrangler to process the CSV the Wrangler with the following directive:

parse-as-csv :body '\\t' true
cleanse-column-names 

It will assume that all files read have the same column format and will mess the data of the files that have less or more column than the first file.

So far I tried to use the File to read as blob and to have the output as bytes with a Wrangler configured with this directive:

set-type :body string
parse-as-csv :body '\t' true
cleanse-column-names

But now I do not even have any output (or error), so I am clueless how to parse those non uniform files. Is CDAP able to handle this case? If yes, how?

1

1 Answers

1
votes

You can use the directive set-column to add new columns to the files which don't have all the needed columns. By and large, I would recommend you to look into all the directives documentation to preprocess your files.

I hope that helps.