0
votes

I have a csv file in GCS with fields with hundreds of columns enclosed in quotes, like below :

"John","Doe","5/15/2021 7:18:26 PM"

I need to load this to BigQuery using Data fusion, created a pipeline. My question is

  1. How do I trim quotes from these the columns in the Wrangler? I don't find much documentation for this, rather than the basic things
  2. How do I apply this rule for all the columns in one shot.

Please guide me, any good reading on these kind of operations will also be helpful

1

1 Answers

1
votes

For testing purposes I used your sample data and add a few more entries.

enter image description here

Remove quotes

If your data looks like this and your objective is to just remove the quotes from your data, what you can do is:

  • Click the drop down arrow beside body
  • Select Find and replace
  • At find put " and leave replace as blank

enter image description here

Your output will look like this:

enter image description here


Parse CSV to split into columns

You can then convert your CSV to columns:

  • Click the drop down beside body
  • Select Parse -> CSV
  • A pop up will appear and select "Comma"

enter image description here

This will tell your wrangler to read it as a CSV and split the comma to columns. But the original data will remain at column body.

enter image description here

To delete body:

  • Select body by ticking the check box at the right
  • Click the drop down beside body
  • Select Delete column

enter image description here

Your data should now look like this:

enter image description here