I have a Hive table 'Orders' with four columns (id String, name String, Order String, ts String). Sample data of table is as below.
-------------------------------------------
id name order ts
-------------------------------------------
1 abc completed 2018-04-12 08:15:26
2 def received 2018-04-15 06:20:17
3 ghi processed 2018-04-16 11:36:56
4 jkl received 2018-04-05 12:23:34
3 ghi received 2018-03-23 16:43:46
1 abc processed 2018-03-17 18:39:22
1 abc received 2018-02-25 20:07:56
The Order column has three states received -> processed -> completed. There are many orders for a single name and each has these three stages. I need the latest value of order for a given 'id' and 'name'. This may seem as a novice question for you but I am stuck with this.
I tried writing queries like below but they are not working and I couldn't use max function directly on 'ts' column as it is in String format. Please advice a best method. Thanks in advance.
Queries I tried
SELECT
ORDER
FROM Orders
WHERE id = '1'
AND name = 'ghi'
AND ts = (
SELECT max(unix_timestamp(ts, 'yyyy-MM-dd HH:mm:SS'))
FROM Orders
)
Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'select' 'max' '(' in expression specification
SELECT
ORDER
FROM Orders
WHERE id = '1'
AND name = 'ghi'
AND max(unix_timestamp(ts, 'yyyy-MM-dd HH:mm:SS'))
Error while compiling statement: FAILED: SemanticException [Error 10128]: Line 1:93 Not yet supported place for UDAF 'max'
select o.order from Orders o
inner join (
select id, name, order, max(ts) as ts
from Orders
group by id, name, order
) ord on d.id = ord.id and o.name = ord.name and o.ts = ord.ts where o.id = '1' and o.name = 'abc'
This query was executed but the output is not a single latest order stage but of each order stage with corresponding latest timestamp.
Please help.