1
votes

Consider a crowdfunding system whereby anyone in the world can invest in a project.

I have the normalized database design in place and now I am trying to create a data warehouse it (OLAP).

I have come up with the following:

Star Schema

This has been denormalized and I have chosen Investment as the fact table because I think the following examples could be useful business needs:

  • Look at investments by project type
  • Investments by time periods i.e. total amount of investments made per week etc.

Having done some reading (The Data Warehouse Toolkit: Ralph Kimball) I feel like my schema isn't quite right. The book says to declare the grain (in my case each Investment) and then add facts within the context of the declared grain.

Some facts I have included do not seem to match the grain: TotalNumberOfInvestors, TotalAmountInvestedInProject, PercentOfProjectTarget.

But I feel these could be useful as you could see what these amounts are at the time of that investment.

Do these facts seem appropriate? Finally, is the TotalNumberOfInvestors fact implicitly made with the reference to the Investor dimension?

2

2 Answers

4
votes

I think "one row for each investment" is a good candidate grain.

The problem with your fact table design is that you include columns which should actually be calculations in your data-application ( olap cube ).

TotalNumberOfInvestors can be calculated by taking the distinct count of investors.

TotalAmountInvestedInProject should be removed from the fact table because it is actually a calculation with assumptions. Try grouping by project and then taking the sum of InvestmentAmount, which is a more natural approach.

PercentOfProjectTarget is calculated by taking the sum of FactInvestment.InvestmentAmount divided by the sum of DimProject.TargetAmount. A constraint for making this calculationwork is having at least a member of DimProject in your report.

Hope this helps,

Mark.

0
votes

Either calculate these additional measures in a reporting tool or create a set of aggregated fact tables on top of the base one. They will be less granular and will reference only a subset of dimensions.

Projects seem to be a good candidate. It will be an accumulating snapshot fact table that you can also use to track projects' life cycle.