My table structure is like this:
hive> describe user_data2;
OK
received_at string
message_id string
type string
version string
timestamp_user string
user_id string
sent_at string
channel string
time_log string
And I am targetting this fields,
hive> select received_at, time_log, user_id from user_data2 limit 5;
OK
2016-01-08T12:27:05.565Z 1452256025 836871
2016-01-08T12:27:12.634Z 1452256033 800798
2016-01-08T12:27:12.632Z 1452256033 795799
2016-01-08T12:27:13.694Z 1452256033 820359
2016-01-08T12:27:15.821Z 1452256036 294141
On this I want to make time based query. like
- Avg Hours Active; per month; period: last 12 months
- % of users active between 0-1h / day
- % of users active between 1-2h / day
- % of users active between 2-4h / day
- % of users active between 4-8h / day
- % of users active between 8-12h / day
- % of users active between 12-16h / day
- % of users active between 16-24h / day
I got some clue of using Datetime UDF - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
But I am not aware how to use this function.
I tried:
select unix_timestamp(received_at) from user_data2 limit 5;
OK
NULL
NULL
NULL
NULL
NULL
Which gives none.
I appreciate if someone give example of using time UDF and getting records between two hours or some other time frame.
unix_timestamp()is explicit, it expects an ODBC format without time zone while your data uses ISO8601 format with TZ >>> you must first reformat the STRING with someregexp_replace(), then convert to a "local TZ" TIMESTAMP withfrom_utc_timestamp(), and only then you will be able to play withunix_timestamp()as a LONG. - Samson Scharfrichtertime_logI have timestamp, using that can I play what I want? - Feedly news