0
votes

I'm syncing a set of tables into BigQuery from Mysql using Spark and a simple wrapper library created by the folks at AppFlyer (https://github.com/appsflyer-dev/spark-bigquery). This approach works like a charm for all of my tables except one. When importing that table I get the following error back from BigQuery:

Exception in thread "main" java.io.IOException: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. at com.google.cloud.hadoop.io.bigquery.BigQueryUtils.waitForJobCompletion(BigQueryUtils.java:95) at com.appsflyer.spark.bigquery.BigQueryClient.com$appsflyer$spark$bigquery$BigQueryClient$$waitForJob(BigQueryClient.scala:129) at com.appsflyer.spark.bigquery.BigQueryClient.load(BigQueryClient.scala:100)

The table schema on the Mysql side looks like:

CREATE TABLE mytable (
  id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  field1_id int(11) NOT NULL,
  created_at datetime(6) DEFAULT NULL,
  updated_at datetime(6) DEFAULT NULL,
  field2_id int(11) NOT NULL,
  hidden_at datetime(6) DEFAULT NULL,
  deleted_at datetime(6) DEFAULT NULL,
  field3 tinyint(4) NOT NULL,
  field4 tinyint(1) DEFAULT '1',
  PRIMARY KEY (id),
) ENGINE=InnoDB AUTO_INCREMENT=10193389 DEFAULT CHARSET=utf8mb4;

I at a loss to understand what this one table is causing a problem.

1
without seeing what the payload is when running the API call it's hard to identify a bug in some conversion app. - Pentium10

1 Answers

0
votes

Problem solved - there was a mismatch between the JSON table description that was being sent and the JSON representing the data sent. This was fixed with this PR:

https://github.com/appsflyer-dev/spark-bigquery/pull/8

The code was creating a table with a column of type text but the Spark JSONizer encoded the data as a numeric. BigQuery's table import was failing with the posted error. A better error would have been nice.