4
votes

We are struggling in trying to solve a problem that might simpler than I think. I am just stuck.

The problem consists of defining a DAX Measure that dynamically counts only first visits in the period for each patient. Visits that are not first in the period must be assigned BLANK() value. Please before discussing the goal let me introduce the scenario.

  1. Scenario:
    We have a model composed of one fact table (F_Visits) and two dimensions (D_Customer; D_Calendar). F_Visits has 1M records and contains all visits made by customers into the Clothing shop. Its fields are:
    Visit Id: Incremental unique number which is the natural key of the table
    Customer Fk: Foreign Key of dimension D_Customer in the fact table
    Calendar Fk: Foreign Key of dimension D_Calendar in the fact table
    Quantity: Number of cloth pieces purchased in the visit
    Amount: Amount of dollars spent in the visit
    Seller: Employee name who served the customer

  2. Goal:
    Create a Measure that:

    • dynamically (based on the time period selection)
    • for each visit within each customer return the value 1 if it is the first for that patient in the selected period (minimum Visit Id in the period), ignoring visits that are not the first in the selected time period for each customer
    • overall (in total) this measure needs to return count of total number of first visits made by all patients in the selected time period

I have been searching on online communities for the last two weeks but found no problem like mine. A similar (but different) one is the following: Sum distinct values for first occurance in Power BI The main difference compared to that question is the measure for this problem needs to be BLANK() for visits that are not the first of that patient in the selected period.
3. Example:

+----------+-------------+-------------+----------+--------+--------+
| Visit Id | Customer FK | Calendar FK | Quantity | Amount | Seller |
+----------+-------------+-------------+----------+--------+--------+
|        1 | John        |    20170101 |        1 |     10 | Rick   |
|        2 | John        |    20180101 |        2 |     15 | Morty  |
|        3 | John        |    20180101 |        3 |     17 | Eric   |
|        4 | John        |    20190101 |        2 |     17 | Eric   |
|        5 | Mark        |    20170101 |        1 |     17 | Eric   |
|        6 | Mark        |    20180101 |        3 |     12 | Eric   |
|        7 | Jack        |    20190101 |        0 |      0 | Rick   |
+----------+-------------+-------------+----------+--------+--------+
  1. Solution: The user now filter for Calendar FK = 2018. This will exclude some visit and the measure is dynamically calculated to count only first vists in the selected period for each patient:

Detail Table:

+----------+-------------+-------------+----------+--------+--------+---------+
| Visit Id | Customer FK | Calendar FK | Quantity | Amount | Seller | Measure |
+----------+-------------+-------------+----------+--------+--------+---------+
|        2 | John        |    20180101 |        2 |     15 | Morty  | 1       |
|        3 | John        |    20180101 |        3 |     17 | Eric   | BLANK() |
|        6 | Mark        |    20180101 |        3 |     12 | Eric   | 1       |
+----------+-------------+-------------+----------+--------+--------+---------+

Total:

+-------+
| Total |
+-------+
|     2 |
+-------+

Total by Seller:

+--------+---------+
| Seller | Measure |
+--------+---------+
| Morty  | 1       |
| Eric   | 1       |
| Rick   | BLANK() |
+--------+---------+

There is no specific requirements about the approach except it needs to be dynamically calculated on front end.

Update

please refer to question DAX: avoid ALL() causing incorrect show of dimensional attribute in matrix

1
is update clear?Seymour

1 Answers

6
votes
First Visit =
// Make a table of all patients in context and their first visit id in context
VAR PatientFirstVisits =
  ADDCOLUMNS (
    VALUES ( 'F_Visits'[Customer Fk] ),
    "MinVisitId", CALCULATE (
        MIN ( 'F_Visits'[Visit Id] ),
        ALL ( 'F_Visits'[Visit Id], 'F_Visits'[Seller] ), // note: fragile
        ALLSELECTED ( 'D_Calendar' )
    )
  )
RETURN
  // Count the Visit Ids that are associated with one of the patient-visit pairs in
  // PatientFirstVisits
  CALCULATE ( 
    DISTINCTCOUNT ( 'F_Visits'[Visit Id] ),
    KEEPFILTERS ( TREATAS ( PatientFirstVisits, 'F_Visits'[Customer Fk], 'F_Visits'[Visit Id] ) )
  )

Note on the fragile part, it is a best practice to construct fact tables of only FKs and aggregateable facts, i.e. do not have any descriptive attributes in the fact table. The reason the commented part is fragile is that you will need to add to the ALL any new descriptive columns you add to the fact that may end up in a visual with this measure. If you pull seller out to a dim, you can refactor as below:

First Visit = 
VAR PatientFirstVisits =
  ADDCOLUMNS (
    VALUES ( 'F_Visits'[Customer Fk] ),
    "MinVisitId", CALCULATE (
        MIN ( 'F_Visits'[Visit Id] ),
        ALL ( 'F_Visits'[Visit Id] ),
        ALLSELECTED ( 'D_Calendar' ),
        ALL ( 'Seller' )
    )
  )
RETURN
  CALCULATE ( 
    DISTINCTCOUNT ( 'F_Visits'[Visit Id] ),
    KEEPFILTERS ( TREATAS ( PatientFirstVisits, 'F_Visits'[Customer Fk], 'F_Visits'[Visit Id] ) )
  )

Based on my experience, columns within a table are about an order of magnitude more volatile than tables within a model. What I mean by this is that I typically see much more churn in columns (adding, removing, refactoring in some way) than churn in tables. By pulling out a 'Seller' dim, you can just do ALL ( 'Seller' ) and you don't have to worry about tracking columns.

And here's the measure in action:

measure in a visual

Edit: A version based on the comment asking if we can do a COUNTROWS version rather than DISTINCTCOUNT. The VAR would be the same, so I'm just showing the count after the RETURN.

...
RETURN
  // Count the Visit Ids that are associated with one of the patient-visit pairs in
  // PatientFirstVisits
  COUNTROWS ( 
    INTERSECT (
      GROUPBY( 'F_visits', 'F_visits'[Customer FK], 'F_visits'[Visit Id] ),
      PatientFirstVisits
    )
  )

Here we just intersect the VAR table and the values in context from the fact. That intersect is 1 row or empty for detail rows, and it is the total without being filtered by 'F_Visits'[Visit Id].