1
votes

I am looking for a SQL query or a series of SQL queries.

Schema

  • I have a logging table with three columns: id, event_type, and timestamp
  • The IDs are arbitrary text, generated randomly at runtime and unknown to me
  • The event types are numbers from a finite collection of known event types
  • The timestamps are your typical int64 epoch timestamp
  • A single ID value may have 1 or more rows, each with some value for event_type. representing a flow of events associated with the same ID
  • For each ID, its collection of rows can be sorted by increasing timestamp
  • Most times, there will be only one occurrence of an ID + event type combination, but rarely, there could be two; not sure this matters

Goal

What I want to do is to query the number of distinct combinations of event types (sorted by timestamp). For example, provided this table:

id          event_type          timestamp
-----------------------------------------
foo         event_1             101
foo         event_2             102
bar         event_2             102
bar         event_1             101
foo         event_3             103
bar         event_3             103
blah        event_1             101
bleh        event_2             102
backwards   event_1             103
backwards   event_2             102
backwards   event_3             101

Then I should get the following result:

combination               count
-------------------------------
[event_1,event_2,event_3]   2    // foo and bar
[event_3,event_2,event_1]   1    // backwards
[event_1]                   1    // blah
[event_2]                   1    // bleh
2
Which dbms are you using? (The answer will probably be product specific.)jarlh

2 Answers

3
votes

You can do 2 levels of grouping to your data.
For Mysql use group_concat():

select t.combination, count(*) count
from (
  select
    group_concat(event_type order by timestamp) combination
  from tablename
  group by id
) t
group by t.combination
order by count desc

See the demo.
For Postgresql use array_agg() with array_to_string():

select t.combination, count(*) count
from (
  select
    array_to_string(array_agg(event_type order by timestamp), ',') combination
  from tablename
  group by id
) t
group by t.combination
order by count desc

See the demo.
For Oracle there is listagg():

select t.combination, count(*) count
from (
  select
    listagg(event_type, ',') within group (order by timestamp) combination
  from tablename
  group by id
) t
group by t.combination
order by count desc

See the demo.
For SQL Server 2017+ there is string_agg():

select t.combination, count(*) count
from (
  select
    string_agg(event_type, ',') within group (order by timestamp) combination
  from tablename
  group by id
) t
group by t.combination
order by count desc

See the demo.
Results:

| combination             | count |
| ----------------------- | ----- |
| event_1,event_2,event_3 | 2     |
| event_3,event_2,event_1 | 1     |
| event_1                 | 1     |
| event_2                 | 1     |
0
votes
SELECT
    "combi"."combination",
    COUNT(*) AS "count"
FROM 
    (
        SELECT
            GROUP_CONCAT("event_type" SEPARATOR ',') AS "combination"
        FROM
            ?table?
        GROUP BY
            "id"
    ) AS "combi"
GROUP BY
  "combi"."combination"

Note: GROUP_CONCAT(... SEPARATOR ...) syntax is not SQL standard, it's DB specific (in this case MySQL, other dbs have other aggregate functions). You might need to adjust for your DB of choice or specify in tags which DB you are actually using.

As for "sorted by timestamp", you need to define what this actually means. What is "sorted by timestamp" for a group of groups?