4
votes

I want to do some actions to files on hdfs by using hive temporarily,so i do not want to use internal table.but my data is so huge ,for example 1TB,so I worry about the performance of external table. so I ask the question about difference of performance between table and extenal table in hive.

2
Hope you are looking for difference between Internal table and external table in Hive. Please clarify.Sandeep Singh
yes,I got the wrong word "extend".I'm sorry.I search again by using the right word,and get some answer,which is no difference of performance between them.it isn't right?ElapsedSoul
Refer this answer of mine: stackoverflow.com/a/37192041/2142994Ani Menon
Yes. there is no major difference in performance between both table types. But here you have large data size and you are using hive temporarily then you should use internal table.Sandeep Singh
why should I use internal table,when my data is large,if there's no difference between them?ElapsedSoul

2 Answers

0
votes

You may just create hive external tables and use them. I haven't noticed any major difference in performance internal and external tables.

To improve performance you may create ORC(file format) tables which are managed by hive.

Create ORC table:
CREATE TABLE IF NOT EXISTS <orc_table_name>(
    <col name> <type>)
COMMENT 'comments'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC;

Then insert into ORC tables:

INSERT OVERWRITE TABLE <orc_table_name> SELECT * FROM <external_table_name>;

Refer: HDFS to Hive external table and ORC

0
votes

Difference between external and internal table performance that i have experienced is

internal tables takes more CPU Time

External tables takes less CPU Time by approximately 40%