5
votes

How to construct filter tables for SUMMARIZECOLUMNS function?

The SUMMARIZECOLUMNS has the following pattern:

SUMMARIZECOLUMNS( 
    ColumnName1, ...
    ColumnNameN,
    FilterTable1,     -- my question concerns this line
    FilterTableN, 
    Name1, [measure1], 
    NameN, [measure2], 
)

I have checked that the following 3 patterns work. They return the same results, at least for the simple sample data I used.

SUMMARIZECOLUMNS (
    T[col],
    FILTER( T, T[col] = "red" ) 
)
SUMMARIZECOLUMNS (
    T[col],
    CALCULATETABLE( T, T[col] = "red" ) 
)
SUMMARIZECOLUMNS (
    T[col],
    CALCULATETABLE ( T, KEEPFILTERS ( T[col] = "red" ) )
)

Is any of these patterns superior over the other?

Reference: https://www.sqlbi.com/articles/introducing-summarizecolumns/

Update

I would be interested in an answer that contains a query plan analysis or link to credible source. I would be grateful if you mentioned using the SUMMARIZECOLUMNS function when grouping columns from multiple tables.

1
In general, I don't recommend adding bounty offers or negotiations into posts themselves, either for the time when a bounty is live, or in order to tempt new answers. This material can go into the bounty reason while a bounty is live, and is then removed automatically once the bounty expires; adding it to the question itself rather clutters up the problem with meta-commentary, which is not interesting to the broad audience (they do not have an account or sign in).halfer
@halfer regular bounty work very good with tags that have very broad audience. Tags such as dax or powerbi are not popular. There is narrow community of just a few people. Starting regular bounty does not even increase the views because those few people will see those few posts that are posted here every week. That is why I would like to start a bounty that may encourage people to tackle the problem. The question itself is not a bug correction request. There is a possibility ot reward existing answer and I would like to inform community that I am going to do it if answer appears. Why not?Przemyslaw Remin
Thank you Przemyslaw. I am working on the "first principle" that questions should contain question material only, and answers should contain answer material only. The Meta community and mods are generally in agreement with this - we keep material free from casual conversation, what time of day it is, greetings and Christmas wishes, hopes that the reader is well, etc. In the same way, commentary about voting is ephemeral and distracting.halfer
Since there are two other good ways to note bounty information (comments, bounty notice) I think they are preferable. I might see if I can find the time to ask Meta a question about it.halfer
(To a lesser degree I am conscious that bounties are used as a device to obtain free work - the rules for obtaining a bounty become highly detailed work orders and the OP feels that they no longer have to be an active participant in the work. I think perhaps we could discourage that too - they are after all just unicorn points, and helpers should be free to answer how they wish, to some extent).halfer

1 Answers

5
votes

You can also construct them the way PowerBI does, using VAR:

VAR  __MyFilterTable = FILTER( T, T[col] = "red" ) 

RETURN
SUMMARIZECOLUMNS (
    T[col],
    __MyFilterTable
)

Which is more efficient will depend on the complexity your filtering, so there is no "one size fits all" rule necessarily. For a simple table level filter, just FILTER will suffice. I caution you that Line 1, where you're filtering the entire table T, is a bad idea. It's much more performant to only filter a single column. When you filter the entire table, DAX materializes the entire table in memory, while the following just materializes the one value of T[col]:

VAR  __MyFilterTable = FILTER( ALL(T[col]), T[col] = "red" ) // This is better.

RETURN
SUMMARIZECOLUMNS (
    T[col],
    __MyFilterTable
)

You can do even better than that, conceptually. You can basically tell DAX, "I know this is a value, so don't even look in the table for it. Just make me a table and treat it as though I filtered it. Like this:

VAR  __MyFilterTable = TREATAS ({"red"}, T[col] )

RETURN
SUMMARIZECOLUMNS (
    T[col],
    __MyFilterTable
)

Again, this is the pattern that PowerBI uses when performing its filters.

BTW, Creating the filter tables a the top vs. creating them inline with SUMMARIZECOLUMNS() won't make any difference for speed. Avoid using CALCULATETABLE() as you've done here generally.

You can also do this as well, though you aren't likely to see a speed increase generally:

CALCULATETABLE(
    SUMMARIZECOLUMNS (
        T[col]
    ),
    KEEPFILTERS(T[col] = "red")
)