1
votes

sqoop import job failed caused by: java.sql.SQLException: Numeric Overflow I have to load Oracle table, it has column type NUMBER in Oracle,without scale, and it's converted to DOUBLE in hive. This is the biggest possible size for both, Oracle and Hive numeric values. The question is how to overcome this error?

2
"NUMBER without scale" >> meaning without neither precision nor scale i.e. NUMBER i.e. NUMBER(38,*)?Samson Scharfrichter
Yes, just NUMBER. It would be nice to solve this on hadoop side without talking to Oracle people to fix the value. I can't find anything. Sqoop doesn't allow to change data type during ingestion.T_man
"Sqoop doesn't allow to change data type" -- really? Can you point to a source that contradicts the official documentation at sqoop.apache.org/docs/1.4.6/…?Samson Scharfrichter
I tried free sql to select this column with to_char(mycolumn) and got an error : caused by : java.sql.SQLSyntaxErrorException: ORA-01722: invalid nuberT_man
I tried free sql to select this column with to_char(mycolumn) and got an error : caused by : java.sql.SQLSyntaxErrorException: ORA-01722: invalid number. According to Oracle: "ORA-01722 invalid number Cause: The attempted conversion of a character string to a number failed because the character string was not a valid numeric literal. Only numeric fields or character fields containing numeric data may be used in arithmetic functions or expressions...." It failed on all mappers, not just on one mapper with bad value. So I concluded...T_man

2 Answers

0
votes

OK, my first answer assumed that your Oracle data was good, and your Sqoop job needed specific configuration to cope with NUMBER values.

But now I suspect that your Oracle data contains shit, and specifically NaN values, as a result of calculation errors.
See that post for example: When/Why does Oracle adds NaN to a row in a database table

And Oracle even has distinct "Not-a-Number" categories to represent "infinity", to make things even more complicated.

But on Java side, BigDecimal does not support NaN -- from the documentation, in all conversion methods...

Throws:
  NumberFormatException - if value is infinite or NaN.

Note that the JDBC driver masks that exception and displays NumericOverflow instead, to make things more complicated to debug...


Solr Numeric Overflow (from Oracle)

In the end, you will have to "mask" these NaN values with Oracle function NaNVL, using a free-form query in Sqoop:

$ sqoop import --query 'SELECT x, y, NANVL(z, Null) AS z FROM wtf WHERE $CONDITIONS' 
0
votes

Edit: that answer assumed that your Oracle data was good, and your Sqoop job needed specific configuration to cope with NUMBER values. That was not the case, see alternate answer.


From the Oracle documentation about "Copying Oracle tables to Hadoop" (within their Big Data appliance), section "Creating a Hive table" > "About datatype conversion"...

NUMBER

  • INT when the scale is 0 and the precision is less than 10
  • BIGINT when the scale is 0 and the precision is less than 19
  • DECIMAL when the scale is greater than 0 or the precision is greater than 19

So you must find out what is the actual range of values in your Oracle table, then you will be able to specify the target Hive column either a BIGINT or a DECIMAL(38,0) or a DECIMAL(22,7) or whatever.


Now, from the Sqoop documentation about "sqoop - import" > "Controlling type mapping"...

Sqoop is preconfigured to map most SQL types to appropriate Java or Hive representatives. However the default mapping might not be suitable for everyone and might be overridden by --map-column-java (for changing mapping to Java) or --map-column-hive (for changing Hive mapping).

Sqoop is expecting comma separated list of mappings (...) for example
$ sqoop import ... --map-column-java id=String,value=Integer


Caveat #1: according to SQOOP-2103, you need Sqoop V1.4.7 or above to use that option with Decimal, and you need to "URL Encode" the comma, e.g. for DECIMAL(22,7)
--map-column-hive "wtf=Decimal(22%2C7)"

Caveat #2: in your case, it is not clear whether the overflow occurs when reading the Oracle value into a Java variable, or when writing the Java variable into the HDFS file -- or even elsewhere. So maybe --map-column-hive will not be sufficient.
And again, according to that post which points to SQOOP-1493, --map-column-java does not support Java type java.math.BigDecimal until at least Sqoop V1.4.7 (and it's not even clear whether it is supported in that specific option, and whether it is expected as BigDecimal or java.math.BigDecimal)


So I would advise to just hide the issue by converting your rogue Oracle column to a String, at read time.
Cf. documentation about "sqoop - import" > "Free-form Query Imports"...

Instead of using the --table, --columns and --where arguments, you can specify a SQL statement with the --query argument (...) Your query must include the token $CONDITIONS (...) For example:
$ sqoop import --query 'SELECT a.*, b.* FROM a JOIN b ON a.id=b.id WHERE $CONDITIONS' ...

In your case, SELECT x, y, TO_CHAR(z) AS z FROM wtf plus the appropriate formatting inside TO_CHAR so that you don't lose any information due to rounding.