0
votes

I'm downloading a huge csv file from Azure blob and I want to transform the data by adding an new column and then uploading the transformed csv file to another location.

Since it is a huge file with around 42 columns the application fails or restarts when trying to transform.

Can someone suggest how can I achieve this use-case?

Input in csv

col1,col2,col3....col41 10,23,asds....29 34,83,hdkd....57 so on

Expected Output in csv

NewCol,col1,col2,col3.....col41 1023,10,23,asda......29
3483,34,83,hdkd......57 so on

Thanks in advance

1
Could you please inform how the file is being downloaded? HTTP, SFTP, other? Thanks. - olamiral
How it fails? Please add any error messages in the logs, complete, as text. - aled
Is your question about how to resolve the error or about how to add a column? - aled
I'm getting the data from the azure storage connector. I tried the transformation to add the column but when I run it on cloudhub I get this error "[warning] PersistedLongArray(fileName: dw-buffer-index-5.tmp is being GCed but is still open. It is going to be closed to avoid tmp leaks." and the application gets restarted. As mentioned I have 41 columns. - user12277274
Could you please update your question with a screenshot of your flow? Thanks - olamiral

1 Answers

0
votes

To add a column to the CSV output you just need to add the field to each row:

%dw 2.0
output application/csv
---
payload map ($ ++ { NewCol: $.col1 ++ $.col2})