0
votes

I want to import data from an Oracle database to Hive using Sqoop. I want Sqoop to create the table into the destination Hive database.

I put the Oracle JDBC (ojdbc6.jar) into the Sqoop lib directory.

I try those 2 methods but none of them work.

sqoop import \
    --connect jdbc:oracle:thin:@${DB_HOST}:${DB_PORT}:${DB_NAME} \
    --username ${DB_USER} \
    --password ${DB_PWD} \
    --table ${INPUT_TABLE} \
    --hcatalog-home /usr/hdp/current/hive-webhcat \
    --hcatalog-database ${OUTPUT_DB} \
    --hcatalog-table ${OUTPUT_TABLE} \
    --create-hcatalog-table \
    --num-mappers 1


sqoop import  \
    --connect jdbc:oracle:thin:@${DB_HOST}:${DB_PORT}:${DB_NAME} \
    --username ${DB_USER} \
    --password ${DB_PWD} \
    --hive-home /usr/hdp/current/hive \
    --hive-import \
    --create-hive-table \
    --hive-table "${OUTPUT_DB}.${OUTPUT_TABLE}" \
    --table ${INPUT_TABLE}

I have this error message:

ERROR tool.ImportTool: Imported Failed: There is no column found in the target table input_table. Please ensure that your table name is correct.

It seems Sqoop doesn't take into account the --create-hcatalog-table or --create-hive-table

However, when I import data from PostgreSQL with Sqoop, the table creation works well. Any ideas? Thanks

For information, Sqoop read well the Oracle table. I ran this command and got the good result:

sqoop eval \
    --connect jdbc:oracle:thin:@${DB_HOST}:${DB_PORT}:${DB_NAME} \
    --username ${DB_USER} \
    --password ${DB_PWD} \
    --query "select count(1) from input_table"

The full logs of my error:

16/07/21 18:08:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.0.0-169
16/07/21 18:08:29 DEBUG tool.BaseSqoopTool: Enabled debug logging.
16/07/21 18:08:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/21 18:08:29 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/07/21 18:08:29 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/07/21 18:08:29 DEBUG sqoop.ConnFactory: Loaded manager factory: org.apache.sqoop.manager.oracle.OraOopManagerFactory
16/07/21 18:08:29 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
16/07/21 18:08:29 DEBUG sqoop.ConnFactory: Trying ManagerFactory: org.apache.sqoop.manager.oracle.OraOopManagerFactory
16/07/21 18:08:29 DEBUG oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop can be called by Sqoop!
16/07/21 18:08:29 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
16/07/21 18:08:29 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory
16/07/21 18:08:29 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@host:1521:sid
16/07/21 18:08:29 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache.
16/07/21 18:08:29 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/21 18:08:29 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.OracleManager@7d8704ef
16/07/21 18:08:29 INFO tool.CodeGenTool: Beginning code generation
16/07/21 18:08:29 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM input_table t WHERE 1=0
16/07/21 18:08:29 DEBUG manager.SqlManager: Execute getColumnInfoRawQuery : SELECT t.* FROM input_table t WHERE 1=0
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.0.0-169/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/07/21 18:08:30 DEBUG manager.OracleManager: Creating a new connection for jdbc:oracle:thin:@host:1521:sid/user, using username: user
16/07/21 18:08:30 DEBUG manager.OracleManager: No connection paramenters specified. Using regular API for making connection.
16/07/21 18:08:30 INFO manager.OracleManager: Time zone has been set to GMT
16/07/21 18:08:30 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
16/07/21 18:08:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM input_table t WHERE 1=0
16/07/21 18:08:30 DEBUG manager.SqlManager: Found column c1 of type [2, 10, 0]
16/07/21 18:08:30 DEBUG manager.SqlManager: Found column c2 of type [2, 10, 0]
16/07/21 18:08:30 DEBUG manager.SqlManager: Found column c3 of type [12, 20, 0]
16/07/21 18:08:30 DEBUG manager.SqlManager: Found column c4 of type [2, 0, -127]
16/07/21 18:08:30 DEBUG manager.SqlManager: Found column c5 of type [2, 0, -127]
16/07/21 18:08:30 DEBUG manager.SqlManager: Found column c6 of type [12, 80, 0]
16/07/21 18:08:30 DEBUG manager.SqlManager: Found column c7 of type [93, 0, 0]
16/07/21 18:08:30 DEBUG manager.SqlManager: Found column c8 of type [12, 20, 0]
16/07/21 18:08:30 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@host:1521:sid/user
16/07/21 18:08:30 ERROR tool.ImportTool: Imported Failed: There is no column found in the target table intput_table. Please ensure that your table name is correct.
1
--hive-import will automatically create table in hive mentioned in --hive-table tag. Put -verbose in the end of your import query (to get extended logs) and share complete logsDev
Edited my post with the complete logs with verbose optionMouette
error in log says target table input_table but you have quoted output_table: make sure ${OUTPUT_TABLE} and ${INPUT_TABLE} values are correct. are all columns int type? - try to run without variables (using actual values) first.Ronak Patel
I corrected the typo, The error is "target table input"_table. I tried without variable, the result is the same. I still have the same error. The columns are not all int (int, varchar and date)Mouette

1 Answers

1
votes

I found a solution, it seems that the --table parameter didn't work well for me, so I used to --query parameter instead.

    sqoop import \
        --connect ${DB_CNX_STR} \
        --username ${DB_USER} \
        --password ${DB_PWD} \
        --query "SELECT * FROM ${INPUT_TABLE} WHERE \$CONDITIONS" \
        --target-dir ${TARGET_DIR}/${INPUT_TABLE} \
        --hive-import \
        --hive-home "/usr/hdp/current/hive" \
        --create-hive-table \
        --hive-table "${OUTPUT_DB}.${OUTPUT_TABLE}" \
        --num-mappers 1