Escape character in file

Question

I have few hundred file represent 400Gb of data with CSV format with specificicatons belows

enclosed : double quote
separator : comma
escape character : antislash

my data could be

a,30,"product, A","my product : \"good product\""

I think BQ evaluated data as

col 1: a col 2 : 30 col 3 : product col 4 : A col 5 : my product : "good product"

and i want

col 1: a col 2 : 30 col 3 : product, A col 4 : my product : "good product"

It's possible to load this kind of file without use dataflow or dataprep

bq load --noreplace --source_format=CSV --max_bad_records=1000000 --allow_jagged_rows ods.my_file gs://file/file.csv.gz

My data were shifted and bigquery didn't load rows

Error while reading data, error message: Could not parse 'XXX' as int for field (position 49) starting at location 2121

Data between close double quote (") and field separator.

Are these CSVs exported from MySQL? Check medium.com/google-cloud/… — Felipe Hoffa
Tks Felipe, it's an extract of anotehr system witch I can't change. It's not possible to define escape character ? — Pierre.c
Probably the quickest path will be to use a CLI tool to transform the escaping — Felipe Hoffa
Agree with Felipe. I recently did similar transformation. This works for Mac, but slightly different syntax for Linux. sed -i '' 's/\"\"STRING_WITH_EXTRA_QUOTE_HERE\"/\"STRING_WITH_EXTRA_QUOTE_HERE\"/g' YOUR_FILE.csv — Jas

Felipe Hoffa Felipe Hoffa · Accepted Answer · 2019-09-20T22:17:46

Try the options in the comments, or to parse inside BigQuery, try loading each row raw and running a UDF like this:

SELECT csv.cols[SAFE_OFFSET(0)] a
  , csv.cols[SAFE_OFFSET(1)] b
  ,csv.cols[SAFE_OFFSET(2)] c
  ,csv.cols[SAFE_OFFSET(3)] d
  ,csv.cols[SAFE_OFFSET(4)] e
  ,csv.cols[SAFE_OFFSET(5)] f
FROM (
  SELECT fhoffa.x.parse_csv('hello,"dealing here, with", some \" quoted words, all is fine, good one?,"even a , in between"')  csv
)

Ref: https://medium.com/@hoffa/new-in-bigquery-persistent-udfs-c9ea4100fd83

Escape character in file

1 Answers