How to identify record in error for spark jdbc write to Teradata

Question

I am reading a hive table and writing it to a Teradata table (column to column, no transformations)

 try {     
 val df=spark.table("Hive Table")
 df.write.mode(SaveMode.Append).jdbc(jdbcURL, "TD Table", properties)
 }catch {case ex: Exception =><print error by calling getNextException repeatedly>

It runs for a while and fails with Teradata Database] [TeraJDBC 16.20.00.06] [Error 6706] [SQLState HY000] The string contains an untranslatable character

If I just insert the date/numeric columns, it works fine.

I have tried making Teradata table columns as UNICODE with no success.

Question is, how do I identify the errant record/column? There are hundreds of millions of rows and hundreds of columns, so running one at a time is not a viable solution. I have to either a)Identify the record/column or b) force a translation using whatever (junk) characters

I'm far from a Spark expert, but can't you encode your string in spark however you want? In theory, you should be able to encode it as UTF-8. It will probably fail on the rows you can't insert into Teradata, but maybe you can print them out as part of your exception handling or something? — Andrew
you can add a trigger on insert and write a partial record (with an id, for example) to a log table. This will allow to zoom in on to the record in question. — access_granted
In general you cannot identify errors - some things continue without duplicate key inserts being mentioned, for example — thebluephantom

hhoeck hhoeck · Accepted Answer · 2018-07-19T14:38:25

you can at least try column by column. But from my experience in the past this error was caused for 80% due to pandas NaN - Null - None. The other "usual suspect" ist that you mixed up with the columns and therefore (unintentionally) trying to put a strict into an numeric.

To check which column is the bad one, it's worth to test it via "binary search" -ish.

Beside this, Teradata Studio delivers a nice GUI to move data from Hadoop to Teradata called SmartLoader. Right here you see the column mappings and can define some Null-handling (http://downloads.teradata.com/tools/articles/smart-loader-for-hadoop).

Horst

How to identify record in error for spark jdbc write to Teradata

1 Answers