How hive create a table from a file present in HDFS?

Question

I am new to HDFS and HIVE. I got some introduction of both after reading some books and documentation. I have a question regarding creation of a table in HIVE for which file is present in HDFS. I have this file with 300 fields in HDFS. I want to create a table accessing this file in HDFS. But I want to make use of say 30 fields from this file. My questions are 1. Does hive create a separate file directory? 2. Do I have to create hive table first and import data from HDFS? 3. Since I want to create a table with 30 columns out of 300 columns, Does hive create a file with only those 30 columns? 4. Do I have to create a separate file with 30 columns and import into HDFS and then create hive table pointing to HDFS directory?

Please add a data sample (let say 3 rows). State the columns you want to retrieve (at least the last one) — David דודו Markovitz

Abraham Abraham · Accepted Answer · 2017-04-27T07:12:23

My questions are

Does hive create a separate file directory? YES if you create a hive table (managed/external) and load the data using load command.

NO if you create an external table and point to the existing file.

Do I have to create hive table first and import data from HDFS?

Not Necessarily you can create a hive external table and point to this existing file.

Since I want to create a table with 30 columns out of 300 columns, Does hive create a file with only those 30 columns?

You can do it easily using hiveQL. follow the below steps (note: this is not the only approach):

create a external table with 300 column and point to the existing file.
create another hive table with desired 30 columns and insert data to this new table from 300 column table using "insert into table30col select ... from table300col". Note: hive will create the file with 30 columns during this insert operation.
1. Do I have to create a separate file with 30 columns and import into HDFS and then create hive table pointing to HDFS directory?

Yes this can be an alternative. I personally like solution mentioned in question 3 as I don't have to recreate the file and I can do all of that in hadoop without depending on some other system.

How hive create a table from a file present in HDFS?

2 Answers