Lunch TDCH to Load to load data from Hive parquet table to Teradata

Question

I need to load data from Hive tables which stored as parquet files to Teradata Database using TDCH(Teradata connector for Hadoop). I use TDCH 1.5.3 and CDH 5.8.3. and Hive 1.1.0

I try to start TDCH usign hadoop jar command and getting the Error:

java.lang.ClassNotFoundException: org.apache.parquet.hadoop.util.ContextUtil

Is anybody have any idea why it's happened?

M.Vrobel M.Vrobel · Accepted Answer · 2018-01-15T08:26:07

When looking at your problem that you might not have all Hive Libraries needed to be able to upload to Teradata.

Here is a example of script that could be used for exporting from Hive to TD.

#!/bin/bash

## Declare Hive Source and Teradata Target
Source_Database="???"
Source_Table="???"
Target_Database="???"
Target_Table="???"
JDBC="???"

## Format
Format="???"

## Declare User used to Connect and Load Data
MYUSER="???"
MYPASSWORD="???"

## Display configuration libraries.
echo $USERLIBTDCH
echo $LIB_JARS

## Define the connection option
hadoop jar $USERLIBTDCH \
com.teradata.connector.common.tool.ConnectorExportTool \
-libjars $LIB_JARS \
-url jdbc:teradata://$JDBC/logmech=ldap,database=$Target_Database,charset=UTF16 \
-username $MYUSER \
-password  $MYPASSWORD \
-jobtype hive \
-fileformat $Format \
-sourcedatabase $Source_Database \
-sourcetable $Source_Table \
-targettable $Target_Table \
-method internal.fastload \
-nummappers 1`

Before using this script you need to check if the libraries that you put into hadoop jars are configured. This means that all Path variables are set and as follows by calling (use your variable name)

echo $USERLIBTDCH

Expected output of the PATH Variable (This is how it looks in Cloudera Enviroment)

/opt/cloudera/parcels/CDH/lib/avro/avro.jar,
/opt/cloudera/parcels/CDH/lib/avro/avro-mapred-hadoop2.jar,
/opt/cloudera/parcels/CDH/lib/hive/conf,
/opt/cloudera/parcels/CDH/lib/hive/lib/antlr-runtime-3.4.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/commons-dbcp-1.4.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/commons-pool-1.5.4.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/datanucleus-api-jdo-3.2.6.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/datanucleus-core-3.2.10.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/datanucleus-rdbms-3.2.9.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-cli.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-metastore.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/jdo-api-3.0.1.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/libfb303-0.9.2.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/libthrift-0.9.2.jar,
/opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar,
/opt/jars/parquet-hadoop-bundle.jar

I would probably expect that the Path Variable is not properly set. For this your could use the following command. To create all the necesseary paths.

PATH=$PATH:~/opt/bin
PATH=~/opt/bin:$PATH

If you look at the Teradata Connector Documentation you need to specify the following libraries.

  Hive Job(version 0.11.0 as example):
         a) hive-metastore-0.11.0.jar
         b) hive-exec-0.11.0.jar
         c) hive-cli-0.11.0.jar
         d) libthrift-0.9.0.jar
         e) libfb303-0.9.0.jar
         f) jdo2-api-2.3-ec.jar
         g) slf4j-api-1.6.1.jar
         h) datanucleus-core-3.0.9.jar
         i) datanucleus-rdbms-3.0.8.jar
         j) commons-dbcp-1.4.jar
         k) commons-pool-1.5.4.jar
         l) antlr-runtime-3.4.jar
         m) datanucleus-api-jdo-3.0.7.jar

    HCatalog Job:
         a) above Hive required jar files
         b) hcatalog-core-0.11.0.jar

Hope this helps.

Lunch TDCH to Load to load data from Hive parquet table to Teradata

1 Answers