1
votes

I am trying to build an OLAP datasource from a bunch of binary files, and our current model just isn't working. We are using SSAS as our analysis / reporting model for our results, but aren't able to get the performance we want out of SQL.

Our main constraints are:

  1. The database is very large. We have huge dimension tables with millions of rows, and several smaller fact tables (<1,000,000 rows).

  2. We have a dynamic cube. B/C the fact tables are built dynamically, and often (possibly multiple times per day), there can't be any huge overhead in setting up the cube. Current deploy times on the cube can exceed 24 hours, and we need orders of magnitude increase in performance which hardware just isn't gonna give us.

Basically, we want a fast setup and deploy, which doesn't inherently lend itself to SSAS using SQL Server 2005, but we want to use SSRS for reporting and we want an OLAP model for analysis in Excel, so we'd still like to use SSAS to build the cube if possible.

The common solution in SSAS for a fast deploy is ROLAP, but we are getting execution errors on larger ROLAP queries, and we also don't like all the overhead involved in converting the binary data to SQL and loading it into the cube.

Has anyone done work on a custom OLAP datasource that SSAS can use? We are looking to create our own ROLAP engine that will query the binary source files directly.

4

4 Answers

2
votes

If you need a low latency cube (i.e. one showing up-to-date data) the canonical architecture for such things is thus:

  • Incrementally load a fact table with changed data from your source.
  • Build a partitioned cube with a process that generates new partitions every day or some other suitable period. The cube has have the most recent partition set up in a ROLAP mode and the older partitions built as MOLAP.
  • Set up a process that updates the partitions and changes the older partitions from ROLAP to MOLAP as it generates a new leading edge partition.

Queries against the cube will hit the relatively small ROLAP partition for the most recent data and the MOLAP partitions for the historical data. The MOLAP partitions can have aggregations. The process keeps ticking the leading edge ROLAP partition over and converting its predecessor. AS will keep the older partition around and use it until the replacement partition is built and comes on line.

If this type of architecture will fit your requirements you could consider doing it this way.

1
votes

Thanks for the response, Nigel.

I guess I need to explain this a little better. My source data is in a proprietary format, not a database, so getting to the fact table itself is taking quite a bit of time. Then we need to deploy the cube as quickly as possible (preferably within minutes) and have fast query responses, which we are not currently seeing even on a small dataset using SQL.

Because the structure of the cube is dynamic, we often have to rebuild every aspect of the cube, we don't introduce new data after the fact, so partitioning parts of it as MOLAP and other parts of ROLAP doesn't really help. We are looking for performance on the "Process Full".

We are beginning to realize that we just can't use SQL for querying, and want to know if anyone has created a custom ROLAP datasource that analysis services (or any OLAP tool) can read.

We can handle creating the result sets quickly; we just need to figure out how to get the query from SSAS and feed it back those results. We’re really just looking to use SSAS as an intermediary between our system and Excel, SSRS, etc. rather than using it to process or aggregate the data.

1
votes

Could you use something like R with a homebrew library (it supports C Extensions) to interface to your data sets. R would give you a fair amount of flexibility for building complex reports or data pre-processing libraries. It also has an interface to Excel.

This a somewhat different toolchain to the traditional DB/OLAP model but you could fairly easily write a fast dataset loader in C and skip the intermediate step of loading into the database.

0
votes

I haven't had any luck yet. We are going the route of either building our own Data Provider and building add-ins for excel to emulate the olap behavior, OR using CLR table-valued functions to emulate our data-sources and build the cube off of that. The one attempt I took at the CLR stuff had horrible performance and blew up though due to the amount of queries SSAS runs when building a cube. I am waiting to get a newer faster machine in the SQL 08 environment to see if this is feasible. Good luck Scott.