0
votes

How i can create a timestamp field in pig from a string that hive accepts as timestamp?

I have formatted the string in pig to match timestamp format in hive, but after loading it is null instead of showing the date.

2014-04-10 09:45:56 this is how the format looks like in pig, and this is matching the format with hive timestamp, but cannot load. (only if i load into string field)

any ideas why?

quick update: no hcatalog is available

problem is some case the timestamp fields contains null values and all the filed become null when using timestamp data type. When putting timestamp to a column where all the row is in the above format it works fine. So the real question is how null values can be handle

1
Can you show some code snippets? Which Pig/Hive version do you use? - Lorand Bendig
problem is some case the timestamp fields contains null values and all the filed become null when using timestamp data type. When putting timestamp to a column where all the row is in the above format it works fine. So the real question is how null values can be handle - clairvoyant
(date == '' ? '' : CONCAT(date, ' 00:00:00')) as dt - clairvoyant
also pig version is 11 - clairvoyant

1 Answers

1
votes

I suspect you have written your data to HDFS using PigStorage and you want to load it into a Hive table. The problem is that a missing tuple field will be written by Pig as null which will be treated by Hive 0.11 as null. So far so good. But then all the subsequent fields will be treated as null, however they can have different values. Hive 0.12 doesn't have this issue.

Depending on the SerDe type, Hive can interpret different strings as null. In case of LazySimpleSerDe it is \N.

You have two option:

  • set the table's null format property to the empty string which is produced by Pig
  • or store \N in Pig for null fields

E.g:

Given the following data in Pig 0.11 :

A = load 'data' as (txt:chararray, ts:chararray);
dump A;
(a,2014-04-10 09:45:56)
(b,2014-04-11 10:45:56)
(,)
(e,2014-04-12 11:45:56)

Option 1:

store A into '/user/data';

Hive 0.11 :

CREATE EXTERNAL TABLE test (txt string, tms TimeStamp) 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/data';

alter table test SET SERDEPROPERTIES('serialization.null.format' = '');

Option 2:

...
B = foreach A generate txt, (ts is null?'\\N':ts);
store B into '/user/data'; 

Then create the table in Hive without setting the serde property.