0
votes

I am trying to load data into bigquery using bq-command line tool. Data is in following scheme (TSV):

time_stamp:INTEGER
id:INTEGER
url:STRING (-- unused/ignore)
domain:STRING
keyword:STRING
normalized_key:STRING (-- comma separated list)
is_natural:BOOLEAN (as "t"/"f")
category_code:STRING
p_id:STRING

But I am getting following error

File: 0 / Line:120642 / Field:5: Data between close double quote
(") and field separator: field starts with: <massive >
File: 0 / Line:127690 / Field:1: Value cannot be converted to
expected type.

My Understanding is

  • File: 0 / Line:120642 / Field:5: Data between close double quote (") and field separator: field starts with: --> This is because values of Field:5 is comma separated list

  • File: 0 / Line:127690 / Field:1: Value cannot be converted to expected type. --> Actual Field values are of different type then expected.

How do I make bigquery to read comma separated list as Field:5 value & ignore the records where Field values are different type than expected

1
Found this for second error : --max_bad_records=xx, but doesn't worked .roy
Replace xx with a big number. That's what I do :)Felipe Hoffa
Before that, to work around first error I removed double quote using sed.roy
Now i want to delete records which start with non integer in first column.roy
Another option: Import everything as string, filter and cast inside BigQuery laterFelipe Hoffa

1 Answers

1
votes

Try setting the quote char to '\0' or something that doesn't appear in the table. TSV files don't usually quote fields (i.e. you won't have a line that looks like a\t"foo bar"\tbaz) so this should likely be ok unless you have tab chars that should be quoted (which is unlikely in TSV).