0
votes

I want to format date coming from source file. I am handling junk values as this column may contain 0 (it could be 0 or 00 or 000 etc.)

select case when capture_date in ('','00','000') then NULL else from_unixtime(unix_timestamp(capture_date,'yyyyMMdd'),'yyyy-MM-dd') end as capt_dt from test_table;

instead of increasing junk values in the list I want to handle it in a generic way meaning if we receive any number of 0's it should populate as NULL .

any solution?

1

1 Answers

1
votes

It seems there is no point dealing with illegal date literals since they will yield NULL values in any case (unless we might have 7 zeros or more)

hive> with test_table as (select stack(5,'','0','00','000','20170831') as capture_date)
    > select  from_unixtime(unix_timestamp(capture_date,'yyyyMMdd'),'yyyy-MM-dd') as capt_dt 
    > from    test_table
    > ;
OK
capt_dt
NULL
NULL
NULL
NULL
2017-08-31

If 7 or more zeros are optional -

hive> with test_table as (select stack(5,'','0','00','000000000000','20170831') as capture_date)
    > select  from_unixtime(unix_timestamp(regexp_replace(capture_date,'^0+$',''),'yyyyMMdd'),'yyyy-MM-dd') as capt_dt 
    > from    test_table
    > ;
OK
capt_dt
NULL
NULL
NULL
NULL
2017-08-31

or

hive> with test_table as (select stack(5,'','0','00','000000000000','20170831') as capture_date)
    > select  case 
    >             when capture_date not rlike '^0+$' 
    >             then from_unixtime(unix_timestamp(capture_date,'yyyyMMdd'),'yyyy-MM-dd')  
    >         end as capt_dt
    >         
    > from    test_table
    > ;
OK
capt_dt
NULL
NULL
NULL
NULL
2017-08-31