0
votes

I am a newbie in Hadoop.

We have issue in hive - data in our staging tables(external tables) and that data we are inserting in target table which is partitioned by month(managed table).

But after some time, when we do count(*) - can check the total count but select * from target table limit 2; shows no data.

Ran Analyze target table compute statistics command and describe the table that found number of rows: 0 ...

But unable to find how data got truncated.

Earlier thought it is metadata got corrupted. Don't know if same issue still exists. Has someone faced similar issue? Can someone help, as how table ddl exists but it contains no data

2
currently unable to find the reason how data got truncated again and again. What can be possible reasons in hive. - Richa Khandelwal
To get a better clarity - 1. You have external table and managed table. You are inserting data in to Managed table which is partitioned by month from the data in external table. Is that correct? 2. On which table are you trying to run the count(*) operation? Managed or External table? 3. FYI - Hive does not store Structure/DDL and the data of the table in same location. The strucutre/DDL is saved in Hive Metastore DB (RDBMS) and the data in HDFS. With the answers to above questions, I can give you some answers to your original question. - Gomz
Also, please update the question by adding the structure of the managed and external tables by running SHOW CREATE TABLE <table_name> for better understanding! - Gomz

2 Answers

0
votes

In order to insert the data from one non-partitioned table to partitioned table, need
to use either one of these : 1) Static partitioning and 2) Dynamic partitioning

By default static partitioning is enable in hive, you need to go with dynamic partitions.

Dynamic partitioning

  • Set below properties in hive

    hive> set hive.exec.dynamic.partition=true;
    hive> set hive.exec.dynamic.partition.mode=nonstrict;

  • Trigger Insert query with dynamic partitioning

    hive> insert into targetTable partition(Month) select * from sourceTable;

0
votes

We are doing count(*) on Managed table. and yes managed table is partioned bu month. Currently, asked the admin team for the logs as we don't have access on location where data is stored.

Location is shown when we do Show create table