0
votes

I am creating a hive table on a .txt file placed in an HDFS directory. While accessing the data, it shows the output as NULL for the last datetime column(order_dtm). I have searched and tried other options provided on google but nothing has worked so far.

Hive Query:---Tab delimited

Create EXTERNAL table Orders(
  order_id int, 
  cust_id int,
  order_dtm TIMESTAMP)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/analyst/order/';

HDFS File -head

>> hdfs dfs -cat /user/analyst/order/orders.txt | head -10
17/09/15 23:46:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
5000001 1133938 06-01-2008 00:03:35
5000002 1131278 06-01-2008 00:27:42
5000003 1153459 06-01-2008 00:49:37
5000004 1159099 06-01-2008 01:05:28
5000005 1020687 06-01-2008 01:08:36
5000006 1187459 06-01-2008 01:11:09
5000007 1048773 06-01-2008 01:36:35
5000008 1064002 06-01-2008 01:36:52
5000009 1096744 06-01-2008 01:49:46
5000010 1107526 06-01-2008 03:07:14
cat: Unable to write to output stream.
1
(1) "it shows the output as NULL" - A single record with a single value which is NULL? Multiple rows and one of the columns contains nothing but NULL values? All columns contain nothing but NULL values? (2) ` '/t'` - Backslash, not slash (\t) (3) You are guessing the delimiter instead of checking it (4) Your delimiter is space, which makes no sense since your data (order_dtm) contain space. In this specific use-case there is a way to handle it, but it is a bad practice. (5) Any other format other than ISO timestamp format - yyyy-MM-dd HH:mm:ss[.S*] , will yield NULLs.David דודו Markovitz
Thanks for the reply.Deepak
@DuduMarkovitz....Thanks for the reply. 1) The last column ORDER_DTM was showing NULL values. 2) it was a typo, my original query had backslash. 3) I wasn't guessing, I knew it was space because i had tested sample data in a spreadsheet, but I was trying different options to fix the issue 4) understood, 5) Understood.... Would appreciate if you could take away the downvote... I had posted questions after doing my research.Deepak
(1) Do you wish to continue with this space delimited data or do you want to change the format? (2) The down-vote was not for lack of effort but for obscurity of the scenario. Decide about (1), Improve the description of the NULL output results within the post and fix the typos or remove the irrelevant attempts and I'll remove the down-vote.David דודו Markovitz
@DuduMarkovitz 1) Unfortunately, due to the nature of the project, I can't. 2) I have corrected the scenario description.Deepak

1 Answers

1
votes
create external table orders
(
    order_id    int
   ,cust_id     int
   ,order_dtm   string
) 
    row format delimited 
    fields terminated by ' ' 
    location '/user/analyst/order'
    tblproperties ('serialization.last.column.takes.rest'='true')

;

select * from orders
;

+-----------+----------+----------------------+
| order_id  | cust_id  |      order_dtm       |
+-----------+----------+----------------------+
| 5000001   | 1133938  | 06-01-2008 00:03:35  |
| 5000002   | 1131278  | 06-01-2008 00:27:42  |
| 5000003   | 1153459  | 06-01-2008 00:49:37  |
| 5000004   | 1159099  | 06-01-2008 01:05:28  |
| 5000005   | 1020687  | 06-01-2008 01:08:36  |
| 5000006   | 1187459  | 06-01-2008 01:11:09  |
| 5000007   | 1048773  | 06-01-2008 01:36:35  |
| 5000008   | 1064002  | 06-01-2008 01:36:52  |
| 5000009   | 1096744  | 06-01-2008 01:49:46  |
| 5000010   | 1107526  | 06-01-2008 03:07:14  |
+-----------+----------+----------------------+

create view orders_v
as
select  order_id
       ,cust_id 
       ,from_unixtime(to_unix_timestamp(order_dtm,'MM-dd-yyyy HH:mm:ss')) as order_dtm

from    orders
;

select * from orders_v
;

+-----------+----------+----------------------+
| order_id  | cust_id  |      order_dtm       |
+-----------+----------+----------------------+
| 5000001   | 1133938  | 2008-06-01 00:03:35  |
| 5000002   | 1131278  | 2008-06-01 00:27:42  |
| 5000003   | 1153459  | 2008-06-01 00:49:37  |
| 5000004   | 1159099  | 2008-06-01 01:05:28  |
| 5000005   | 1020687  | 2008-06-01 01:08:36  |
| 5000006   | 1187459  | 2008-06-01 01:11:09  |
| 5000007   | 1048773  | 2008-06-01 01:36:35  |
| 5000008   | 1064002  | 2008-06-01 01:36:52  |
| 5000009   | 1096744  | 2008-06-01 01:49:46  |
| 5000010   | 1107526  | 2008-06-01 03:07:14  |
+-----------+----------+----------------------+