I am trying to write to BigQuery using google dataflow. But the data is corrupt because of which the data I am trying to write in a column of a table in BigQuery is not matching the datatype of that column. So the job logs show errors like the one given below:
BigQuery job "dataflow_job_6154554826811450962-B" in project "analytics-repo-dev" finished with error(s): errorResult: JSON table encountered too many errors, giving up. Rows: 1951; errors: 1., error: JSON table encountered too many errors, giving up. Rows: 1951; errors: 1., error: JSON parsing error in row starting at position 32470424514 at file: gs://abhishek_bucket/staging_bucket/6154554826811451567/dax-tmp-2016-08-21_08_10_40-11073983896195810394-S122-1-e7a57fb7a3ac697f/-shard-00000-of-00001-try-2ac1f1192effdfbb-endshard.json. Could not parse '20-259228628. ' as a timestamp. Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]] Field: Ts; Value: 20-259228628.
The problem is these are "errors" and not "exceptions". So the moment Dataflow gets these errors, the entire dataflow job stops. So I need a mechanism to "catch" them. Had these messages been "exceptions", I would have handled them separately and the job would not have failed completely.
So is there a way to get the actual exceptions which get raised when i try to write bad data to BigQUery in Google Dataflow?