0
votes

I'm trying to figure out how I can create a calculated measure that produces a count of only unique facts in my fact table. My fact table basically stores events from a historical perspective. But I need the measure to filter out redundant events.
Using sales as an example(Since all material around OLAP always uses sales in examples):

The fact table stores sales EVENTS. When a sale is first made it has a unique sales reference which is a column in the fact table. A unique sale however can be amended(Items added or returned) or completely canceled. The fact table stores these changes to a sale as different rows.

If I create a count measure using SSAS I get a count of all sales events which means an unique sale will be counted multiple times for every change made to it (Which in some reports is desirable). However I also want a measure that produces a count of unique sales rather than events but not just based on counting unique sales references. If the user filters by date then they should see unique sales that still exist on that date (If a sale was canceled by that date if should not be represented in the count at all).

How would I do this in MDX/SSAS? It seems like I need have a count query work from a subset from a query that finds the latest change to a sale based on the time dimension. In SQL it would be something like:

SELECT COUNT(*) FROM SalesFacts FACT1 WHERE Event <> 'Cancelled' AND
Timestamp = (SELECT MAX(Timestamp) FROM SalesFact FACT2 WHERE FACT1.SalesRef=FACT2.SalesRef)

Is it possible or event performant to have subqueries in MDX?

3
BTW, that's not a performant way of doing the count in SQL. A distinct count of the eventID (esp. if it is indexed) would be much better. Or a 'Row_number over ...' type approach. The above will die with significant data volumes.piers7

3 Answers

0
votes

In SSAS, create a measure that is based on the unique transaction ID (The sales number, or order number) then make that measure a 'DistinctCount' aggregate function in the properties window.

Now it should count distinct order numbers, under whichever dimension slice it finds itself under.

0
votes

The posted query might probably be rewritten like this:

SELECT COUNT(DISTINCT SalesRef)
FROM SalesFacts
WHERE Event <> 'Cancelled'
0
votes

An simple answer would be just to have a 'sales count' column in your fact view / dsv query that supplies a 1 for an 'initial' event, a zero for all subsiquent revisions to the event and a -1 if the event is cancelled. This 'journalling' approach plays nicely with incremental fact table loads.

Another approach, probably more useful in the long run, would be to have an Events dimension: you could then expose a calculated measure that was the count of the members in that dimension non-empty over a given measure in your fact table. However for sales this is essentially a degenerate dimension (a dimension based on a fact table) and might get very large. This may be inappropriate.

Sometimes the requirements may be more complicated. If you slice by time, do you need to know all the distinct events that existed then, even if they were later cancelled? That starts to get tricky: there's a recent post on Chris Webb's blog where he talks about one (slightly hairy) solution:

http://cwebbbi.wordpress.com/2011/01/22/solving-the-events-in-progress-problem-in-mdx-part-2role-playing-measure-groups/