2
votes

I am having a issue in loading larger file on Google's BigQuery. The problem is when I am loading file less than 1000 lines, it is loading without any error, but when i am loading file more than 10000 lines I am getting load error.

BigQuery error in load operation: Error processing job '......': Too many errors encountered.

The job is to export data from Redshift and load it into BigQuery. Here is how I am doing the job (steps):

1. Using "Unload" command, I am exporting Redshift table (having more than 160 columns) as a CSV file into S3
2. Transferring data to Google Cloud
3. Creating a table in BigQuery by specifying data source as Google Cloud bucket.

Note: I have used almost all options in Redshift "Unload" command, but seems like BigQuery is not understanding the file format for bigger files.

Can anyone suggest what could be the issue here?

1
@josh-haberman: can you please look into my issue? - andy

1 Answers

2
votes

Since the task at hand is to export from Redshift to BigQuery, have you tried a tool to automate this process?

https://github.com/iconara/bigshift

From their README:

The CSV produced by Redshift's UNLOAD can't be loaded into BigQuery no matter what options you specify on either end. Redshift can quote all fields or none, but BigQuery doesn't allow non-string fields to be quoted. The format of booleans and timestamps are not compatible, and they expect quotes in quoted fields to be escaped differently, to name a few things.

This means that a lot of what BigShift does is make sure that the data that is dumped from Redshift is compatible with BigQuery. To do this it reads the table schema and translates the different datatypes while the data is dumped. Quotes are escaped, timestamps formatted, and so on.