3
votes

I want to know if its possible create a hive table from a file stored in hadoop file system (users.tbl) in ORC format. I read that ORC format its better than text in terms of optimization. So I would like to know if its possible create a hive table using stored as orc tblproperties and location attributes to create a table from the hdfs file but in orc format.

Something as:

create table if not exists users
(USERID BIGINT,
 NAME STRING,
 EMAIL STRING,
 CITY STRING)
STORED AS ORC TBLPROPERTIES ("orc.compress"="SNAPPY")
LOCATION '/tables/users/users.tbl';

Insted of text:

create table if not exists users
    (USERID BIGINT,
     NAME STRING,
     EMAIL STRING,
     CITY STRING)
     ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE 
     LOCATION '/tables/users/users.tbl';
4

4 Answers

2
votes

1.Create a table in hive.

 create table MyDB.TEST (
 Col1 String,
 Col2 String,
 Col3 String,
 Col4 String)
 STORED AS INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
 OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';

2. Load data to the table.

 LOAD DATA INPATH '/hdfs/dir/folder/to/orc/files/' INTO TABLE MyDB.TEST;
2
votes

You can not do that in only one step. The create table statement doesn't process the data, just specify the format and the location.

My suggestion is that you create a temporal table using the "STORED AS TEXTFILE" create statement and create the final table using ORC as storage format (using an empty location).

Then insert in the "ORC table" all the rows from the temporal "text table".

Insert [overwrite] table orcTable select col1, col2 from textTable;

Insert Overwrite will replace all the data in the table with the new data. If you only want to add new data you will use "Insert table . . ."

After the import you could delete the temporal "text table".

1
votes

just create your table on existing data like below

CREATE EXTERNAL TABLE mytable
(
col1 bigint,
col2 bigint
) 
STORED AS ORC
location '<ORC File location';

Please refer this Link

https://community.hortonworks.com/questions/179897/hive-table-creation-from-orc-format-file.html

0
votes

How about just create your table on top of your location, and use msck repair table table_name, so your data will be loaded into your table ready for querying.