4
votes

Good day!

I have a relational data source with 3 fact tables which are related to each other and model

patient doctor visits (EncounterEventFact table), assignment of a diagnosis to patient (DiagnosisEventFact table) and collecting lab results for the patient (LabComponentResultFact table) in the picture below. They all share EncounterKey a key, specifying a unique visit to a doctor. All EncounterKeys are in EncounterEventFact table and only once. I am using SSAS tabular in memory to model the data.

Each fact table has a few million rows (2-4 mil). DiagnosisDim has few tens of thousands of rows. PatientDim has few millions rows (<10 mil). LabDim has few hundred records. This is a simplified model with just 3 main dimensions.

My measures are a distinct count of EncounterKeys on DiagnosisEventFact and distinct count of EncounterKeys on LabComponentResultFact.

The sample report which is being developed is given a selection on diagnosis dim side, show counts of unique visits for this diagnosis and counts of lab collection visits for this diagnosis.

diagnosis count_visits_diagnosis count_visits_labs


ABC 5 0

DEF 10 5

etc

The problem comes when I select ALL diagnoses in the report. Counts for diagnoses measure loads instantly, while counts of labs measure takes forever. When I select a few diagnoses report works fast.

How to improve my SSAS tabular model or computation to handle this sort of reporting efficiently?

Unfortunately I can not re-model data on the data source side.

enter image description here

1
Is the model simplified, or is every Key in your outer fact tables linked to a dimension, like Lab and Diagnosis? In general; it's not a good idea to link several fact tables directly together. Normally (in a perfect schema) you like facts via conformed dimensions. In theory, of course. You say you can't modify the data source side, but can you edit the query in SSAS? My first hunch, is to join the middle fact to the both outer joins (and fetch only the keys needed), so you end up with one fact in your model. Since they have the same granulity, this should be no problem.TJ_
In addition: If you make a query to join the fact tables: please include only the columns needed (just Encounter-, Patient-, Diagnosis- and Lab-Key, in this case). You should do this anyway, also in your current model: delete all the keys you don't need. This will speed up the engine -a lot-.TJ_
@TJ_ Thanks for your comments. I tried removing middle Encounter fact via left outer join in the SSAS indeed. But the catch here is to establish a filtering relationship between two outer facts and without using middle fact it is simply not possible in SSAS as far as I know as it is neither of supported SSAS relationships 1-Many, Many-1 or 1-1.YuGagarin
What I meant, is that you create the join via SQL, in the query of the model. Then you end up with just one fact table, and filtering will be fine.TJ_
@TJ_ Do you mean joing 3 fact tables from the picture in SQL to create single joined fact table?YuGagarin

1 Answers

3
votes

A schema can have one or more facts, but these facts are not linked by any key relationship. It is best practice not to join fact tables in a single query as you would whey querying a normalized/transnational database. Due to the nature of many to many joins, etc. - the results would be incorrect if attempted.

Instead suggest to follow a process that Kimball calls "drilling across". The drill-across process can be broken down into two phases. In the first phase, each star is queried, and results are aggregated to a common level of detail. In the second phase, these result sets are merged based on their common dimensions.

A good read about this is available in the Kimball site

A similar question is available here for reference: Design of a data warehouse with more than one fact tables