0
votes
  • I have created two database DB1 and DB2 from HIVE terminal.
  • ran "use DB1" and has created ORC formatted table T1 having partition as Year, month and date and loaded it with some records.
  • ran "use DB2", and there I create ORC formatted Table T2 suing DDL script and it is having same structure as DB1.T1.
  • I inserted few records in DB2.T2 so that in HDFS- Hive warehouse some directory structure is created for the partition.
  • I ran Hadoop fs -cp /apps/hive/warehouse/DB1.db/T1/* /apps/hive/warehouse/DB2.db/T2
  • it copied the files and build remaining directory structure in /apps/hi../DB2.T2.

Problem is select count from DB2.T2 is not returning latest records which are copied from DB1.T1.

Can some one explain reason behind it please? It works for the normal Text formatted table.

Thanks,

Abhi

1
Can you show command by command exactly what you have run? - Jeremy Beard
1. Create database DB1; 2. use DB1; 3. CREATE TABLE e1( name string, PARTITIONED BY ( year int, month int, day int, id int) stored as orc; 4. insert into table e1 partition(year,month,day,site_id) SELECT name, year, month, day, id from master; 5. Create database DB2; 6. use DB2; 7. CREATE TABLE e_arc( name string, PARTITIONED BY ( year int, month int, day int, id int) stored as orc; 8. haddop fs -cp /apps/hive/warehouse/db1.db/e1/* /apps/hive/warehouse/db2.db/e_arc/ 9. select * from e_arc returns null.Normal table works with these steps. - Abhi

1 Answers

0
votes

So as to get the statistics like count of records faster, HIVE now gets that from metadata which is updated during the insert process(only exception is metadata does not have count populated). In case you are adding partitions or files to hive table manually make sure you run ANALYZE command as shown below.

analyze table t [partition p] compute statistics for [columns c,...];

Also in case of manual addition of partition outside not via hive command, run the repair table command

MSCK REPAIR TABLE table_name;
or 
ALTER TABLE table_name RECOVER PARTITIONS;