sqoop import job failed caused by: java.sql.SQLException: Numeric Overflow I have to load Oracle table, it has column type NUMBER in Oracle,without scale, and it's converted to DOUBLE in hive. This is the biggest possible size for both, Oracle and Hive numeric values. The question is how to overcome this error?
2 Answers
OK, my first answer assumed that your Oracle data was good, and your Sqoop job needed specific configuration to cope with NUMBER values.
But now I suspect that your Oracle data contains shit, and specifically NaN
values, as a result of calculation errors.
See that post for example: When/Why does Oracle adds NaN to a row in a database table
And Oracle even has distinct "Not-a-Number" categories to represent "infinity", to make things even more complicated.
But on Java side, BigDecimal
does not support NaN
-- from the documentation, in all conversion methods...
Throws:
NumberFormatException
- if value is infinite or NaN.
Note that the JDBC driver masks that exception and displays NumericOverflow
instead, to make things more complicated to debug...
Solr Numeric Overflow (from Oracle)
In the end, you will have to "mask" these NaN
values with Oracle function NaNVL
, using a free-form query in Sqoop:
$ sqoop import --query 'SELECT x, y, NANVL(z, Null) AS z FROM wtf WHERE $CONDITIONS'
Edit: that answer assumed that your Oracle data was good, and your Sqoop job needed specific configuration to cope with NUMBER values. That was not the case, see alternate answer.
From the Oracle documentation about "Copying Oracle tables to Hadoop" (within their Big Data appliance), section "Creating a Hive table" > "About datatype conversion"...
NUMBER
- INT when the scale is 0 and the precision is less than 10
- BIGINT when the scale is 0 and the precision is less than 19
- DECIMAL when the scale is greater than 0 or the precision is greater than 19
So you must find out what is the actual range of values in your Oracle table, then you will be able to specify the target Hive column either a BIGINT
or a DECIMAL(38,0)
or a DECIMAL(22,7)
or whatever.
Now, from the Sqoop documentation about "sqoop - import" > "Controlling type mapping"...
Sqoop is preconfigured to map most SQL types to appropriate Java or Hive representatives. However the default mapping might not be suitable for everyone and might be overridden by
--map-column-java
(for changing mapping to Java) or--map-column-hive
(for changing Hive mapping).Sqoop is expecting comma separated list of mappings (...) for example
$ sqoop import ... --map-column-java id=String,value=Integer
Caveat #1: according to SQOOP-2103, you need Sqoop V1.4.7 or above to use that option with Decimal, and you need to "URL Encode" the comma, e.g. for DECIMAL(22,7)
--map-column-hive "wtf=Decimal(22%2C7)"
Caveat #2: in your case, it is not clear whether the overflow occurs when reading the Oracle value into a Java variable, or when writing the Java variable into the HDFS file -- or even elsewhere. So maybe --map-column-hive
will not be sufficient.
And again, according to that post which points to SQOOP-1493, --map-column-java
does not support Java type java.math.BigDecimal
until at least Sqoop V1.4.7 (and it's not even clear whether it is supported in that specific option, and whether it is expected as BigDecimal
or java.math.BigDecimal
)
So I would advise to just hide the issue by converting your rogue Oracle column to a String, at read time.
Cf. documentation about "sqoop - import" > "Free-form Query Imports"...
Instead of using the
--table
,--columns
and--where
arguments, you can specify a SQL statement with the--query
argument (...) Your query must include the token $CONDITIONS (...) For example:$ sqoop import --query 'SELECT a.*, b.* FROM a JOIN b ON a.id=b.id WHERE $CONDITIONS' ...
In your case, SELECT x, y, TO_CHAR(z) AS z FROM wtf
plus the appropriate formatting inside TO_CHAR so that you don't lose any information due to rounding.
NUMBER
i.e.NUMBER(38,*)
? – Samson Scharfrichter