0
votes

Given the following table structure:

CAT NR DATE VERSION SOME_CODE
ABC 123 2009-02-19T00:00:00Z 1 OPP
ABC 456 2009-03-18T00:00:00Z 1 ZUM
ABC 444 2009-03-18T00:00:00Z 1 ZUM
ABC 456 2009-04-18T00:00:00Z 2 XXX
ABC 456 2009-04-18T00:00:00Z 3 XXX
ABC 456 2009-04-18T00:00:00Z 4 UIO
ABC 456 2009-05-18T00:00:00Z 5 RQA
ABC 444 2009-03-18T00:00:00Z 2 MUZ
DEF 789 2018-02-17T00:00:00Z 1 WER
DEF 637 2018-02-16T00:00:00Z 1 FAW
GHI 248 2018-02-17T00:00:00Z 1 QWE
GHI 248 2019-02-17T00:00:00Z 2 PPP
GHI 357 2020-02-16T00:00:00Z 1 FFF
GHI 357 2020-02-16T00:00:00Z 2 GGG
GHI 357 2020-02-16T00:00:00Z 3 LLL
GHI 357 2020-02-16T00:00:00Z 4 LLL
GHI 357 2020-08-16T00:00:00Z 4 FFF
GHI 357 2020-10-16T00:00:00Z 5 ZZZ
GHI 420 2020-02-16T00:00:00Z 1 QDS

I need to find out how many effective changes took place in column SOME_CODE (sometimes values are the same and therefore they should not count) per month and category (CAT).

So expected SQL query result should like this:

CAT MONTH SOME_CODE_CHANGES
ABC APR 2009 1
ABC MAR 2009 1
GHI FEB 2020 2

Does anybody have a hint how to start the query?

Within a category (e.g. ABC) I want to count number of changes in column SOME_CODE. When a changed happened (e.g. for NR=456) say from VERSION 1 to 2 ("ZUM" -> "XXX") I wanna count it but with two conditions:

  • must be a change within a month
  • must be an effective change. Means the change from VERSION 3 to VERSION 4 is a valid change ("XXX" -> "UIO") while the change from VERSION 2 to VERSION 3 is nonesense ("XXX" -> "XXX") and should not be counted

My approach:

select                                                                                                                                                                                                                                                                                      
trunc(DATE, 'MON'),                                                                                                                                                                                                                                                             
CAT,                                                                                                                                                                                                                                                                         
sum(case when _MY_CONDITION_ then 1 else 0 end) counter                                                                                                                                                                                                                                     
from table where date between TO_TIMESTAMP('01/06/2018 00:00:00.000000000', 'DD/MM/YYYY HH24:MI:SS.FF') and TO_TIMESTAMP('31/07/2018 00:00:00.000000000', 'DD/MM/YYYY HH24:MI:SS.FF')                                                                                           
group by                                                                                                                                                                                                                                                                                    
    trunc(DATE, 'MON'),                                                                                                                                                                                                                                                               
    CAT                                                                                                                                                                                   
order by                                                                                                                                                                                         
trunc(DATE, 'MON')    
2
Please explain how these numbers are calculated. Why are you choosing only two months for the results?Gordon Linoff
What should be the output if a value changes from ABC to DEF and then back to ABC? Is that one or two changes?MT0
@MT0 then it should count as twice changes (if within a month)DerBenniAusA
@GordonLinoff I added some more hintsDerBenniAusA
1. In your data you have three rows with the same CAT, same DATE and same SOME_CODE. How do you order such rows? "By date" doesn't suffice, since they have exactly the same timestamp. And, of course, different order may result in different counts of "changes". 2. If the value changes from 'ZUM' on February 26 to 'ZAM' on March 2, is that a "change" you count for March? In almost any sane business use case the answer should be "yes" - but your explanation is confusing enough that MT0 assumed "no" for his answer. Please clarify.mathguy

2 Answers

1
votes

From Oracle 12, you can use MATCH_RECOGNIZE:

SELECT cat,
       month,
       COUNT(*)
FROM   (
  SELECT t.*,
         TRUNC( "DATE", 'MM' ) AS month
  FROM   table_name t
)
MATCH_RECOGNIZE(
  PARTITION BY cat, month
  ORDER BY "DATE", version
  ONE ROW PER MATCH
  AFTER MATCH SKIP TO LAST change_code
  PATTERN ( strt change_code )
  DEFINE
    change_code AS change_code.some_code <> strt.some_code
)
GROUP BY cat, month

Which, for the sample data:

CREATE TABLE table_name ( CAT, NR, "DATE", VERSION, SOME_CODE ) AS
SELECT 'ABC',   123,    TIMESTAMP '2009-02-19 00:00:00 UTC',    1,  'OPP' FROM DUAL UNION ALL
SELECT 'ABC',   456,    TIMESTAMP '2009-03-18 00:00:00 UTC',    1,  'ZUM' FROM DUAL UNION ALL
SELECT 'ABC',   444,    TIMESTAMP '2009-03-18 00:00:00 UTC',    1,  'ZUM' FROM DUAL UNION ALL
SELECT 'ABC',   444,    TIMESTAMP '2009-03-18 00:00:00 UTC',    2,  'MUZ' FROM DUAL UNION ALL
SELECT 'ABC',   456,    TIMESTAMP '2009-04-18 00:00:00 UTC',    2,  'XXX' FROM DUAL UNION ALL
SELECT 'ABC',   456,    TIMESTAMP '2009-04-18 00:00:00 UTC',    3,  'XXX' FROM DUAL UNION ALL
SELECT 'ABC',   456,    TIMESTAMP '2009-04-18 00:00:00 UTC',    4,  'UIO' FROM DUAL UNION ALL
SELECT 'ABC',   456,    TIMESTAMP '2009-05-18 00:00:00 UTC',    5,  'RQA' FROM DUAL UNION ALL
SELECT 'DEF',   637,    TIMESTAMP '2018-02-16 00:00:00 UTC',    1,  'FAW' FROM DUAL UNION ALL
SELECT 'DEF',   789,    TIMESTAMP '2018-02-17 00:00:00 UTC',    1,  'WER' FROM DUAL UNION ALL
SELECT 'GHI',   248,    TIMESTAMP '2018-02-17 00:00:00 UTC',    1,  'QWE' FROM DUAL UNION ALL
SELECT 'GHI',   248,    TIMESTAMP '2019-02-17 00:00:00 UTC',    2,  'PPP' FROM DUAL UNION ALL
SELECT 'GHI',   357,    TIMESTAMP '2020-02-16 00:00:00 UTC',    1,  'FFF' FROM DUAL UNION ALL
SELECT 'GHI',   420,    TIMESTAMP '2020-02-16 00:00:00 UTC',    1,  'QDS' FROM DUAL UNION ALL
SELECT 'GHI',   357,    TIMESTAMP '2020-02-16 00:00:00 UTC',    2,  'GGG' FROM DUAL UNION ALL
SELECT 'GHI',   357,    TIMESTAMP '2020-02-16 00:00:00 UTC',    3,  'LLL' FROM DUAL UNION ALL
SELECT 'GHI',   357,    TIMESTAMP '2020-02-16 00:00:00 UTC',    4,  'LLL' FROM DUAL UNION ALL
SELECT 'GHI',   357,    TIMESTAMP '2020-08-16 00:00:00 UTC',    4,  'FFF' FROM DUAL UNION ALL
SELECT 'GHI',   357,    TIMESTAMP '2020-10-16 00:00:00 UTC',    5,  'ZZZ' FROM DUAL

Outputs:

CAT | MONTH     | COUNT(*)
:-- | :-------- | -------:
ABC | 01-MAR-09 |        1
ABC | 01-APR-09 |        1
DEF | 01-FEB-18 |        1
GHI | 01-FEB-20 |        3

If you want to see the changes then you can use:

SELECT *
FROM   (
  SELECT t.*,
         TRUNC( "DATE", 'MM' ) AS month
  FROM   table_name t
)
MATCH_RECOGNIZE(
  PARTITION BY cat, month
  ORDER BY "DATE", version
  MEASURES
    MATCH_NUMBER()     AS mn,
    FIRST( some_code ) AS change_from,
    LAST( some_code )  AS change_to
  ONE ROW PER MATCH
  AFTER MATCH SKIP TO LAST change_code
  PATTERN ( strt change_code )
  DEFINE
    change_code AS change_code.some_code <> strt.some_code
)

Which outputs:

CAT | MONTH     | MN | CHANGE_FROM | CHANGE_TO
:-- | :-------- | -: | :---------- | :--------
ABC | 01-MAR-09 |  1 | ZUM         | MUZ      
ABC | 01-APR-09 |  1 | XXX         | UIO      
DEF | 01-FEB-18 |  1 | FAW         | WER      
GHI | 01-FEB-20 |  1 | FFF         | QDS      
GHI | 01-FEB-20 |  2 | QDS         | GGG      
GHI | 01-FEB-20 |  3 | GGG         | LLL      

db<>fiddle here


If your requirement for "within a month" is that you want changes where there is at most one month's difference between the previous row to the changed row, even if the rows are in two different calendar months, (rather than just the changes that happen in the same calendar month) then you can use:

SELECT cat,
       TRUNC( change_date, 'MM' ) AS month,
       COUNT(*)
FROM   table_name
MATCH_RECOGNIZE(
  PARTITION BY cat
  ORDER BY "DATE", version
  MEASURES
    LAST( "DATE" ) AS change_date
  ONE ROW PER MATCH
  AFTER MATCH SKIP TO LAST change_code
  PATTERN ( strt change_code )
  DEFINE
    change_code AS (
      change_code.some_code <> strt.some_code
      AND MONTHS_BETWEEN( change_code."DATE", strt."DATE" ) <= 1
    )
)
GROUP BY cat, TRUNC( change_date, 'MM' )

Which outputs:

CAT | MONTH     | COUNT(*)
:-- | :-------- | -------:
ABC | 01-MAR-09 |        2
ABC | 01-MAY-09 |        1
ABC | 01-APR-09 |        2
DEF | 01-FEB-18 |        1
GHI | 01-FEB-20 |        3

db<>fiddle here

0
votes

I think you want one less than the number of distinct codes for each cat in each month:

select cat, trunc(date, 'MON') as yyyymm,
       count(distinct code) - 1
from t
group by cat, trunc(date, 'MON');

From what I can tell, this matches the results you specify.