1
votes

I have the following data (the data is available from 2017 - Present)

SELECT * FROM TABLE1 WHERE DATE > TO_DATE('01/01/2019','MM/DD/YYYY')

 Emp_ID         Date            Vehicle_ID        Working_Hours
 1005          01/01/2019         X500               7
 1005          01/02/2019         X500               6
 1005          01/03/2019         X700               7
 1005          01/04/2019         X500               5
 1005          01/05/2019         X700               7
 1005          01/06/2019         X500               7
 1006          01/01/2019         X500               7
 1006          01/02/2019         X500               6
 1006          01/03/2019         X700               7
 1006          01/04/2019         X500               5
 1006          01/05/2019         X700               7
 1006          01/06/2019         X500               7

I need to calculate two columns. LAST_6M_UNIQ_Vehicle_Count ==> Count of Unique Vehicle ID in the last(past) 6 months for that employee LAST_6M_Vehicle_Count ==> Count of all vehicle ID for that employee in the Past 6 months Note: Past 6 month from the date column

Expected output:

 Emp_ID         Date            Vehicle_ID        Working_Hours     LAST_6M_UNIQ_Vehicle_Count     LAST_6M_Vehicle_Count
 1005          01/01/2019         X500               7                      6                       66
 1005          01/02/2019         X500               6                      7                       62
 1005          01/03/2019         X700               7                      6                       63
 1005          01/04/2019         X500               5                      7                       67
 1005          01/05/2019         X700               7                      7                       66
 1005          01/06/2019         X500               7                      7                       67
  .               .                .                 .
  .               .                .                 .
  .               .                .                 .
 1005          03/20/2019         X600               6                      12                      75
 1006          01/01/2019         X500               7                      11                      74
 1006          01/02/2019         X500               6                      10                      66
 1006          01/03/2019         X700               7                      11                      72
 1006          01/04/2019         X500               5                      13                      67
 1006          01/05/2019         X700               7                      12                      64
 1006          01/06/2019         X500               7                      12                      63

For example, in the first row, the value for LAST_6M_UNIQ_Vehicle_Count is 6 because for the employee id 1005, the unique count of vehicle id between ((01/01/2019) - 6 month) and 01/01/2019 has 6 different vehicle id in them.

I tried Over and Partition by but the 6 month interval is missing

 SELECT t.*, COUNT(DISTINCT t.VEHICLE_ID) OVER (PARTITION BY t.EMP_ID ORDER BY t.DATE) 
        AS LAST_6M_UNIQ_Vehicle_Count
        FROM TABLE1 t

I am not able to calculate the values based on 6 month interval for each rows.

Your help is much appreciated.

3

3 Answers

1
votes

Oracle doesn't like COUNT( DISTINCT ... ) OVER ( ... ) when used in a windowed analytic function with a range and will raise an ORA-30487: ORDER BY not allowed here exception (otherwise, that would be the solution). It will work without the DISTINCT keyword but not with it.

Instead, you can use a correlated sub-query:

SELECT t.*,
       ( SELECT COUNT( DISTINCT vehicle_id )
         FROM   table_name c
         WHERE  c.emp_id = t.emp_id
         AND    c."DATE" <= t."DATE"
         AND    ADD_MONTHS( t."DATE", -6 ) <= c."DATE"
       ) AS last_6m_uniq_vehicle_count,
       COUNT(t.vehicle_id) OVER (
         PARTITION BY t.emp_id 
         ORDER     BY t."DATE"
         RANGE BETWEEN INTERVAL '6' MONTH PRECEDING
               AND     CURRENT ROW
      ) AS last_6m_vehicle_count
FROM  table_name t

Which for the sample data:

CREATE TABLE table_name ( vehicle_id, emp_id, "DATE" ) AS
SELECT 1, 1, DATE '2020-08-31' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-07-31' FROM DUAL UNION ALL
SELECT 1, 1, DATE '2020-06-30' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-05-31' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-04-30' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-03-31' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-02-29' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2020-01-31' FROM DUAL UNION ALL
SELECT 3, 1, DATE '2020-01-31' FROM DUAL;

Outputs:

VEHICLE_ID | EMP_ID | DATE      | LAST_6M_UNIQ_VEHICLE_COUNT | LAST_6M_VEHICLE_COUNT
---------: | -----: | :-------- | -------------------------: | --------------------:
         2 |      1 | 31-JAN-20 |                          2 |                     2
         3 |      1 | 31-JAN-20 |                          2 |                     2
         2 |      1 | 29-FEB-20 |                          2 |                     3
         2 |      1 | 31-MAR-20 |                          2 |                     4
         2 |      1 | 30-APR-20 |                          2 |                     5
         2 |      1 | 31-MAY-20 |                          2 |                     6
         1 |      1 | 30-JUN-20 |                          3 |                     7
         2 |      1 | 31-JUL-20 |                          3 |                     8
         1 |      1 | 31-AUG-20 |                          2 |                     7

db<>fiddle here

1
votes

You can do this with window functions, and a range frame specification.

Computing the distinct count is a bit tricky: Oracle does not support it directly, but we can proceed in two steps. First perform a window count within employee/vehicle partitions, and then take in account only the first occurence of each vehicle in the employee partition.

So:

select vehicle_id, emp_id, "DATE",
    sum(case when flag = 1 then 1 else 0 end) over(
        partition by emp_id
            order by "DATE"
            range between interval '6' month preceding and current row
    ) as last_6m_uniq_vehicle_count,
    count(*) over (
        partition by emp_id 
        order by "DATE"
        range between interval '6' month preceding and current row
    ) as last_6m_vehicle_count
from (
    select t.*, 
        count(*) over (
            partition by emp_id , vehicle_id
            order by "DATE"
            range between interval '6' month preceding and current row
        ) as flag
    from table_name t
) t
order by "DATE", vehicle_id
0
votes

As MTO points out, count(distinct) cannot be used as a window function to solve this.

For that reason, I would go for a lateral join:

select t.*, l.*
from t cross join lateral
     (select count(*) as last_6m_vehicle_count, count(distinct t.vehicle_id) as last_6m_uniq_vehicle_count
      from t t2
      where t2.emp_id = t.emp_id and
            t2.dte <= t.dte and
            t2.dte > add_months(t.dte, -6)
    ) l;

Here is a db<>fiddle.