1
votes

ISSUE

I'm using Sqoop to fetch data from Oracle and put it to HDFS. Unlike other basic datatypes i understand SDO_GEOMETRY is meant for spatial data.

My Sqoop job fails while fetching datatype SDO_GEOMETRY.

Need help to import the column Shape with SDO_GEOMETRY datatype from Oracle to Hdfs.

I have more than 1000 tables which has the SDO_GEOMETRY datatype , how can i handle the datatype in general while sqoop imports happen ?

I have tried the --map-column-java and --map-column-hive , but i still get the error.

error :
ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive does not support the SQL type for column SHAPE

SQOOP COMMAND

Below is the sqoop command that i have :

sqoop import --connect 'jdbc:oracle:thin:XXXXX/xxxxx@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(Host=xxxxxxx)(Port=1521))(CONNECT_DATA=(SID=xxxxx)))' -m 1 --create-hive-table --hive-import --fields-terminated-by '^' --null-string '\\\\N' --null-non-string '\\\\N' --hive-overwrite --hive-table PROD.PLAN1   --target-dir test/PLAN1  --table PROD.PLAN  --map-column-hive SE_XAO_CAD_DATA=BINARY --map-column-java SHAPE=String --map-column-hive SHAPE=STRING --delete-target-dir
3

3 Answers

0
votes

The default type mapping that Sqoop provides between relational databases and Hadoop is not working in your case that is why Sqoop Job Fails. You need to override the mapping as geometry datatypes not supported by sqoop.

Use the below parameter in your sqoop job

syntax:--map-column-java col1=javaDatatype,col2=javaDatatype.....

sqoop import
 .......
 ........
 --map-column-java columnNameforSDO_GEOMETRY=String

As your column name is Shape

--map-column-java Shape=String
0
votes

Sqoop import to HDFS

Sqoop does not support all of the RDBMS datatypes.

If a particular datatype is not supported, you will get error like:

No Java type for SQL type .....

Solution

Add --map-column-java in your sqoop command.

Syntax: --map-column-java col-name=java-type,...

For example, --map-column-java col1=String,col2=String

Sqoop import to HIVE

You need same --map-column-java mentioned above.

By default, sqoop supports these JDBC types and convert them in the corresponding hive types:

 INTEGER 
 SMALLINT 
 VARCHAR 
 CHAR 
 LONGVARCHAR 
 NVARCHAR 
 NCHAR 
 LONGNVARCHAR 
 DATE 
 TIME 
 TIMESTAMP 
 CLOB 
 NUMERIC 
 DECIMAL 
 FLOAT 
 DOUBLE 
 REAL 
 BIT 
 BOOLEAN 
 TINYINT 
 BIGINT 

If your datatype is not in this list, you get error like:

Hive does not support the SQL type for .....

Solution

You need to add --map-column-hive in your sqoop import command.

Syntax: --map-column-hive col-name=hive-type,...

For example, --map-column-hive col1=string,col2='varchar(100)'


Add --map-column-java SE_XAO_CAD_DATA=String,SHAPE=String --map-column-hive SE_XAO_CAD_DATA=BINARY,SHAPE=STRING in your command.

Don't use multiple --map-column-java and --map-column-hive.

0
votes

For importing SDO GEOMETRY from Oracle to HIVE through SQOOP, use the SQOOP free form query option along with Oracle's SDO_UTIL.TO_GEOJSON, SDO_UTIL.TO_WKTGEOMETRY functions.

The SQOOP --query option allows us to add a SELECT SQL QUERY so that we can get the required data only from the table. And, in the SQL query we can include SDO_UTIL package functions like TO_GEOJSON and TO_WKTGEOMETRY. It looks something like,

sqoop import \
...
--query 'SELECT ID, NAME, \
         SDO_UTIL.TO_GEOJSON(MYSHAPECOLUMN), \
         SDO_UTIL.TO_WKTGEOMETRY(MYSHAPECOLUMN) \
         FROM MYTABLE WHERE $CONDITIONS' \
...

This returns the SDO GEOMETRY as Geojson and WKT formats as per the definitions of functions and can directly be inserted into respective HIVE STRING-type columns without any other type mapping in the SQOOP command.

Choose the Geojson and WKT as per requirement and this approach also can be extended to other spatial functions available.