I'm relatively new the process of sqooping so pardon any ignorance. I have been trying to sqoop a table from a data source as a parquet file and create an impala table (also as parquet) into which I will insert the sqooped data. The code runs without an issue, but when I try to select a couple rows for testing I get the error:
.../EWT_CALL_PROF_DIM_SQOOP/ec2fe2b0-c9fa-4ef9-91f8-46cf0e12e272.parquet' has an incompatible Parquet schema for column 'dru_id.test_ewt_call_prof_dim_parquet.call_prof_sk_id'. Column type: INT, Parquet schema: optional byte_array CALL_PROF_SK_ID [i:0 d:1 r:0]
I was mirroring the process I found on a cloudera guide here:https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_create_table.html. Mainly the "Internal and External Tables" section. I've been trying to avoid having to infer the schema with a particular parquet file, since this whole thing will be kicked off every month with a bash script (and I also can't think of a way to point it to just one file if I use more than one mapper).
Here's the code I used. I feel like I'm either missing something small and stupid, or I've screwed up everything major without realizing it. Any and all help appreciated. thanks!
sqoop import -Doraoop.import.hint=" " \
--options-file /home/kemri/pass.txt \
--verbose \
--connect jdbc:oracle:thin:@ldap://oid:389/cn=OracleContext,dc=[employer],dc=com/EWSOP000 \
--username [userid] \
--num-mappers 1 \
--target-dir hdfs://nameservice1/data/res/warehouse/finance/[dru_userid]/EWT_CALL_PROF_DIM_SQOOP \
--delete-target-dir \
--table DMPROD.EWT_CALL_PROF_DIM \
--direct \
--null-string '\\N' \
--null-non-string '\\N' \
--as-parquetfile
impala-shell -k -i hrtimpslb.[employer].com
create external table test_EWT_CALL_PROF_DIM_parquet(
CALL_PROF_SK_ID INT,
SRC_SKL_CD_ID STRING,
SPLIT_NM STRING,
SPLIT_DESC STRING,
CLM_SYS_CD STRING,
CLM_SYS_NM STRING,
LOB_CD STRING,
LOB_NM STRING,
CAT_IND STRING,
CALL_TY_CD STRING,
CALL_TY_NM STRING,
CALL_DIR_CD STRING,
CALL_DIR_NM STRING,
LANG_CD STRING,
LANG_NM STRING,
K71_ATOMIC_TS TIMESTAMP)
stored as parquet location '/data/res/warehouse/finance/[dru_userid]/EWT_CALL_PROF_DIM_SQOOP';
--hive-importwhich immediately creates hive tables with underlying parquet files without you having to worry. Impala uses the same table metastore so no issues there. - spijs