I have been using sqoop create-hive-table command with appropriate arguments which would copy the table schema from mysql or any RDBMS into hive. Today I came across an requirement where I had to use
"sqoop create-hive-table" along with --hive-partition-key to create table schema with partition column specified. The sqoop command I used:
sqoop create-hive-table --connect jdbc:mysql://localhost/XYZ \ --username XYZ --password password \ --table POC_XYZ_Partition \ --hive-table POC_XYZ.POC_Datatype_Mapping_Sodhi \ --hive-partition-key join_date \ --fields-terminated-by ',' --lines-terminated-by '\n'
which errored out with the error message:
ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: Partition key join_date cannot be a column to import.
I know partition key should not be a part of create table command but I totally forgot that.
I then found a work-around and did:
sqoop create-hive-table --connect jdbc:mysql://localhost/XYZ \
--username XYZ --password password \
--table POC_XYZ_Partition \
--hive-table POC_XYZ.POC_Datatype_Mapping_Sodhi \
--hive-partition-key join_date_partition \
--fields-terminated-by ',' --lines-terminated-by '\n'
The above created the schema but later when I tried to load data from mysql into hdfs it is considering the join_date_partition to be a part of mysql table and not getting the data from mysql.
I tried many solutions like:
sqoop import --connect jdbc:mysql://localhost/XYZ \
--username XYZ --password password \
--query 'select a.*, a.join_date as join_date_partition from POC_XYZ_Partition a WHERE $CONDITIONS' \
--split-by id \
--hcatalog-database POC_XYZ \
--hcatalog-table POC_XYZ_Partition;
but could not import data because of a bug in sqoop which would not allow me to duplicate the column names in sqoop.
Has anyone faced the issue and resolved it? If so, please suggest