Typically measures in an OLAP cube are a numbers, and these numbers get aggregated via addition (or via some not-very-exotic function like times or MAX or MIN). I'm wondering if any of the major OLAP servers let you make measures that are sets of numbers, or sets of strings. ("Set" here is in the mathematical sense, not the OLAP jargon sense of "a list of tuples".) Whereas OLAP typically aggregates measures via numeric functions, my hypothetical "set measures" would be aggregated via set operations, e.g. set union or set intersection.
I'm interested both theoretically and practically. Theoretically/abstractly/mathematically, there's a nice parallel between addition-over-integers and union-over-sets, and it seems like someone could have considered this parallel in writing an OLAP server. (One potential implementation, if all the possible set members were known in advance, is to represent each set as a (potentially large) integer, and then to aggregate/union by performing bitwise OR.)
As for practice, I'll attempt to provide a concrete case where this might seem at least marginally useful: Suppose you had a dataset where each fact was the metadata associated with an academic paper. Each paper might have a date, a topic, and a set of one or more authors, like so:
- fact1: {"Year": 1997, "Topic" : "AI", "AuthorSet": ["Bill Jones", "Martha X"]}
- fact2: {"Year": 1997, "Topic" : "Linguistics", "AuthorSet": ["John Q", "Sam S"]}
- fact3: {"Year": 1997, "Topic" : "Linguistics", "AuthorSet": ["John Q", "Jack X"]}
- etc.
(I'm using quasi-JSON here only because it helps make it obvious what's multi-valued.)
If you created an OLAP cube around this data, it would seem very natural to be able to make a report showing how the set of authors writing on a given topic changed from year to year. In MDX, it might look like this:
select
[Measures].[AuthorSet] on columns,
[Year].[Year].All on rows
where ([Topic].[Topic].[AI])
For each year, this query would roll up the list of authors via set union.
To get the very most out of this feature you'd probably need custom OLAP client tools that knew about set measures in particular. But for existing clients you could probably just fall back to some string representation of a set. (e.g. the above query could return cells containing, e.g., the string "Bill Jones; Martha X; John Q; Sam S; Jack X")
I'm most familiar with SSAS, and SSAS doesn't seem to support anything like this out of the box. It seems like there might be a way to hack it together using measures of string type, maybe plus custom CLR functions, but I haven't figured it out yet.
This particular case you might also be able to solve in MDX only, maybe using custom members, Generate and SetToStr? (Solutions welcome!) But my intuition is that as things get more complicated it'd be more natural to have the set aggregation happen more naturally behind the scenes, with this "set member" functionality.