I am creating a hive table on a .txt file placed in an HDFS directory. While accessing the data, it shows the output as NULL for the last datetime column(order_dtm). I have searched and tried other options provided on google but nothing has worked so far.
Hive Query:---Tab delimited
Create EXTERNAL table Orders(
order_id int,
cust_id int,
order_dtm TIMESTAMP)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/analyst/order/';
HDFS File -head
>> hdfs dfs -cat /user/analyst/order/orders.txt | head -10
17/09/15 23:46:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
5000001 1133938 06-01-2008 00:03:35
5000002 1131278 06-01-2008 00:27:42
5000003 1153459 06-01-2008 00:49:37
5000004 1159099 06-01-2008 01:05:28
5000005 1020687 06-01-2008 01:08:36
5000006 1187459 06-01-2008 01:11:09
5000007 1048773 06-01-2008 01:36:35
5000008 1064002 06-01-2008 01:36:52
5000009 1096744 06-01-2008 01:49:46
5000010 1107526 06-01-2008 03:07:14
cat: Unable to write to output stream.
\t
) (3) You are guessing the delimiter instead of checking it (4) Your delimiter is space, which makes no sense since your data (order_dtm) contain space. In this specific use-case there is a way to handle it, but it is a bad practice. (5) Any other format other than ISO timestamp format -yyyy-MM-dd HH:mm:ss[.S*]
, will yield NULLs. – David דודו Markovitz