0
votes

I am new to SSAS Tabular and DAX. I have built a data model and processed successfully. Now, I am in the middle of recoding SSRS report to use SSAS Tabular instead of hitting tables and indexed views in data warehouse.

The fact table that this particular DAX query I am writing against has a size of ~220 million records. Below is the query:

DEFINE
VAR Ind_Fact_Summ_Matter_By_Day_By_Posted =     
//SAMPLE(1000,
    ADDCOLUMNS
    (
        SUMMARIZE
        (
            'Revenue Summary',
            'Revenue Summary'[AK_Tran_Date],
            'Revenue Summary'[AK_Post_Date],
            'Revenue Summary'[AK_Period],
            'Revenue Summary'[AK_Current_Matter],
            'Revenue Summary'[AK_Matter],
            'Revenue Summary'[AK_Exchange_DateRange]
        ),
        "Debt_Total_Outstanding", CALCULATE([Total Debt Total Outstanding]), --CALCULATE(SUM('Revenue Summary'[Debt_Total_Outstanding]))
        "Fees_WIP_Amount", CALCULATE(SUM('Revenue Summary'[Fees_WIP_Amount])),
        "Disbs_Soft_WIP", CALCULATE(SUM('Revenue Summary'[Disbs_Soft_WIP])),
        "Disbs_Hard_WIP", CALCULATE(SUM('Revenue Summary'[Disbs_Hard_WIP])),
        "Disbs_Total_WIP", CALCULATE(SUM('Revenue Summary'[Disbs_Total_WIP])),
        "Fees_Base_Amount", CALCULATE(SUM('Revenue Summary'[Fees_Base_Amount])),
        "Debt_Total_Billed", CALCULATE(SUM('Revenue Summary'[Debt_Total_Billed])),
        "Net_Revenue", CALCULATE(SUM('Revenue Summary'[Net_Revenue])),
        "Fees_Net_Revenue", CALCULATE(SUM('Revenue Summary'[Fees_Net_Revenue])),
        "Other_Revenue", CALCULATE(SUM('Revenue Summary'[Other_Revenue])),
        "Total_Cost_Amt", CALCULATE(SUM('Revenue Summary'[Total_Cost_Amt])),
        "Fees_WIP_reserve", CALCULATE(SUM('Revenue Summary'[Fees_WIP_Reserve])),
        "Other_Disbs_WIP_Reserve", CALCULATE(SUM('Revenue Summary'[Other_Disbs_WIP_Reserve])),
        "Debt_Total_Reserve", CALCULATE(SUM('Revenue Summary'[Debt_Total_Reserve])),
        "Billable_Hours", CALCULATE(SUM('Revenue Summary'[Billable_Hours])),
        "Partner_Billable_Hours", CALCULATE(SUM('Revenue Summary'[Partner_Billable_Hours])),
        "Legal_Staff_Billable_Hours", CALCULATE(SUM('Revenue Summary'[Legal_Staff_Billable_Hours])),
        "Standard_Revenue_No_Secondment", CALCULATE(SUM('Revenue Summary'[Standard_Revenue_No_Secondment])),
        "Standard_Revenue_Secondment", CALCULATE(SUM('Revenue Summary'[Standard_Revenue_Secondment])),
        "Fees_Standard_Revenue", CALCULATE(SUM('Revenue Summary'[Fees_Standard_Revenue])),
        "Fees_Billed_WIP_Writeoff", CALCULATE(SUM('Revenue Summary'[Fees_Billed_WIP_Writeoff])),
        "Debt_Writeoff_Secondment", CALCULATE(SUM('Revenue Summary'[Debt_Writeoff_Secondment])),
        "Debt_Writeoff_No_Secondment", CALCULATE(SUM('Revenue Summary'[Debt_Writeoff_No_Secondment])),
        "WIP_Reserve", CALCULATE(SUM('Revenue Summary'[Fees_WIP_Reserve]) + SUM('Revenue Summary'[Other_Disbs_WIP_Reserve])),
        "Total_Reserve", CALCULATE(SUM('Revenue Summary'[Fees_WIP_Reserve]) + SUM('Revenue Summary'[Other_Disbs_WIP_Reserve]) + SUM('Revenue Summary'[Debt_Total_Reserve]))
    )
//  ,RAND()
//)
EVALUATE 
    Ind_Fact_Summ_Matter_By_Day_By_Posted

As you can see, it is pretty straightforward DAX query - aggregating some metrics. 'Revenue Summary' is the big fact table I was talking about, which has 220 million records. When I ran the query in DAX Studio, it failed due to memory constraint. When I did SAMPLE or TOPN 1000, it took 5 minutes! and that was only returning 1 aggregated metrics (I commented the rest - just for testing purpose)!

I am not sure if it's the model, the DAX, the hardware or the combo of them, which cause such a poor performance. The dev server where the tabular model is hosted is virtual server with the following specs (based on CPU-Z):

  • Intel Xeon E5 (8 cores, 4 sockets)
  • 90 GB RAM
  • Memory type: EDO (not sure what this is?!)
  • DRAM Frequeny: 66.0 Mhz (slow as??)
  • At rest, 47% of RAM is used. When I ran the query (without the TOPN 1000), it went up until it failed due to memory constraint and CPU utilization was quite high ~80-90%.

Also, as I mentioned above, I am recoding SSRS report which has been written using indexed views. Note than the query is only part of the bigger query which a report uses. The report runs about a minute hence I am exploring a way to speed it up using SSAS Tabular. When I did TOP 1000 of the equivalent query in SQL using the indexed view, it only took 1 sec, so looks like SSAS Tabular fails me here?!

What should I do to improve the performance significantly? Anything I did wrong?

Any guidance will be much appreciated. Thanks

EDIT: Using SSAS 2016 in Tabular/In-Memory/Import mode (not DirectQuery)

1
Just wondering, DirectQuery mode is not enabled, right?Alexander Volok
@AlexanderVolok I just checked in SSMS, the database is in Import mode not DirectQuery mode.iKnowNothing
The first thing I would do is comment out all measures but one, run the query and time it. Then comment out all but the next measure and time it. I am assuming one or more measures are expensive or can be optimized. Once you have found the culprit post the DAX for that measure and any other measures it refers to.GregGalloway
@GregGalloway thanks for the tip. FYI, all of them are not measures but fields straight from the data source (SQL Server table), except the first one, [Total Debt Total Outstanding]. I am thinking the slowness could be caused by the fact that the fact table is quite big with lots of metrics fields, and each metrics fields' cardinality is high due to lots of distinct values, which does not help the compression hence performance of the tabular model. Not sure what to do to optimize. If I import just that fact table without any other tables, the model database uses ~38GB of RAM by itself.iKnowNothing
How many rows would the query return if it finished? Is the query fast without Total Debt Total Outstanding?GregGalloway

1 Answers

1
votes

Your error is that you use Summarize to sum some of the columns - but not all! you can use Summarizecolumns and simply summarize all of the columns in one go, so instead of:

VAR Ind_Fact_Summ_Matter_By_Day_By_Posted =     
//SAMPLE(1000,
    ADDCOLUMNS
    (
        SUMMARIZE
        (
            'Revenue Summary',
            'Revenue Summary'[AK_Tran_Date],
            'Revenue Summary'[AK_Post_Date],
            'Revenue Summary'[AK_Period],
            'Revenue Summary'[AK_Current_Matter],
            'Revenue Summary'[AK_Matter],
            'Revenue Summary'[AK_Exchange_DateRange]
        ),
        "Debt_Total_Outstanding", CALCULATE([Total Debt Total Outstanding]), --CALCULATE(SUM('Revenue Summary'[Debt_Total_Outstanding]))
        "Fees_WIP_Amount", CALCULATE(SUM('Revenue Summary'[Fees_WIP_Amount])),
        "Disbs_Soft_WIP", CALCULATE(SUM('Revenue Summary'[Disbs_Soft_WIP])),
        "Disbs_Hard_WIP", CALCULATE(SUM('Revenue Summary'[Disbs_Hard_WIP])),

Or whatever else you've got in there, just go for:

VAR Ind_Fact_Summ_Matter_By_Day_By_Posted =     
SUMMARIZE
            (
                'Revenue Summary',
                'Revenue Summary'[AK_Tran_Date],
                'Revenue Summary'[AK_Post_Date],
                'Revenue Summary'[AK_Period],
                'Revenue Summary'[AK_Current_Matter],
                'Revenue Summary'[AK_Matter],
                'Revenue Summary'[AK_Exchange_DateRange], 
'Revenue Summary'[Debt_Total_Outstanding],
'Revenue Summary'[Fees_WIP_Amount],
'Revenue Summary'[Disbs_Soft_WIP],
'Revenue Summary'[Disbs_Hard_WIP]
            )

There's absolutely no reason in the "add columns" section to add a calculated measure (unless it's already predefined). In the ADDCOLUMNS section we add the attributes that we want to group by them.