HDP-2.5.0.0 using Ambari 2.4.0.1
There are several SQL Server and Oracle database schema that need to imported to HDFS/Hive.
The current approach is working fine :
- Sqoop import from RDBMS to HDFS in avro format
- Creation of a Hive external table atop the avro files viz. dataaggregate_avro_compressed
- Create the final table from the step.2. AUTOMATE this step
- Insert data from table in Step 2. to the final table
Now, the Step3. table has to be ORC + COMPRESSED + PARTITIONED and possibly, MANAGED. Manually, following can be done :
CREATE TABLE `dataaggregate_orc_empty`( ......)PARTITIONED BY (`datedimensionid` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ('orc.compress'='ZLIB');
But it's AUTOMATIC creation is a challenge, I'm struggling with the following approaches :
CTAS empty table
CREATE TABLE dataaggregate_orc_empty LIKE dataaggregate_avro_compressed ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ('orc.compress'='ZLIB');
Now, this table contains the partition column datedimensionid which needs to be removed from the table but an 'alter table drop column' is not supported
CTAS using REGEX column spec. :
set hive.support.quoted.identifiers=none; CREATE TABLE dataaggregate_orc_empty AS SELECT
(datedimensionid)?+.+
FROM dataaggregate_avro_compressed limit 0;
This creates the table without the partition column datedimensionid but now how to alter this empty table to include the partition column, this is where even the first approach hits the wall ! The documentation talks about adding partitions with a specification but I don't have one at this stage - I just want this table to be alike the one created manually(shown in the beginning of the post).
How shall I proceed ?
HiveMetaStoreClient
may help your requirement. I gave sample approach in this way we modified partitions. you can give a try - Ram Ghadiyaram