There are multiple steps to that. Follows the details.
Create a hive table which is able to read from the plain text file. Assuming that your file is a comma delimited file and your file is on HDFS on a location called /user/data/file1.txt, follows will be the syntax.
create table MyDB.TEST (
Col1 String,
Col2 String,
Col3 String,
Col4 String
)
row format delimited
fields terminated by ','
location '/user/data/file1.txt';
Now you have a schema which is in sync with the format of the data you posses.
- Create another table with ORC schema
Now you need to create the ORC table as you were creating earlier. Here is a simpler syntax for creating that table.
create table MyDB.TEST_ORC (
Col1 String,
Col2 String,
Col3 String,
Col4 String)
STORED AS ORC;
Your TEST_ORC table is an empty table now. You can populate this table using the data from TEST table using the following command.
INSERT OVERWRITE TABLE TEST_ORC SELECT * FROM TEST;
The aforementioned statement will select all the records from TEST table and will try to write those records to TEST_ORC table. Since TEST_ORC is an ORC table, the data will be converted to ORC format on the fly when written into the table.
You can even check the storage location of TEST_ORC table for ORC files.
Now your data is in ORC format and your table TEST_ORC has the required schema to parse it. You may drop your TEST table now, if not needed.
Hope that helps!