0
votes

I have a Hive table 'Orders' with four columns (id String, name String, Order String, ts String). Sample data of table is as below.

-------------------------------------------
id  name    order               ts
------------------------------------------- 
1   abc     completed       2018-04-12 08:15:26     
2   def     received        2018-04-15 06:20:17
3   ghi     processed       2018-04-16 11:36:56
4   jkl     received        2018-04-05 12:23:34
3   ghi     received        2018-03-23 16:43:46
1   abc     processed       2018-03-17 18:39:22
1   abc     received        2018-02-25 20:07:56

The Order column has three states received -> processed -> completed. There are many orders for a single name and each has these three stages. I need the latest value of order for a given 'id' and 'name'. This may seem as a novice question for you but I am stuck with this.

I tried writing queries like below but they are not working and I couldn't use max function directly on 'ts' column as it is in String format. Please advice a best method. Thanks in advance.

Queries I tried

SELECT
ORDER
FROM Orders
WHERE id = '1'
    AND name = 'ghi'
    AND ts = (
        SELECT max(unix_timestamp(ts, 'yyyy-MM-dd HH:mm:SS'))
        FROM Orders
        )

Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'select' 'max' '(' in expression specification

SELECT
ORDER
FROM Orders
WHERE id = '1'
    AND name = 'ghi'
    AND max(unix_timestamp(ts, 'yyyy-MM-dd HH:mm:SS'))

Error while compiling statement: FAILED: SemanticException [Error 10128]: Line 1:93 Not yet supported place for UDAF 'max'

select o.order  from Orders o
inner join ( 
    select id, name, order, max(ts) as ts
    from Orders
    group by id, name, order
) ord on d.id = ord.id and o.name = ord.name and o.ts = ord.ts where o.id = '1' and o.name = 'abc'

This query was executed but the output is not a single latest order stage but of each order stage with corresponding latest timestamp.

Please help.

2

2 Answers

1
votes

For a given order, you want one row. Hence, you can use order by and limit:

SELECT o.*
FROM Orders o
WHERE id = 1 AND  -- presumably id is a number
     name = 'ghi'
ORDER BY ts DESC
LIMIT 1;

This should also have the best performance.

0
votes

You can use the RANK analytical function to get your problem resolved as below:

select id,name,order,ts
from (select id,name,order,ts,rank() over(partition by id,name order by ts) r from orders)k
where r = 1
and id = '1'
and name = 'ghi'

If you want to get the latest record for all the ID's and name then you don't need to pass the values for "ID" and "NAME" you will get your desired result easily.

All the best!!!