0
votes

I'm new to datawarehousing and I have a star schema with a contract fact table. It holds basic contract information like Start date, end date, amount ...etc.

I have to link theses facts to a customer dimension. there's a maximum of 4 customers per contract. So I think that I have two options either I flatten the 4 customers into one row for ex:

DimCutomers

name1, lastName1, birthDate1, ... , name4, lastName4, birthDate4

the other option from what I've heard is to create a bridge table between the facts and the customer dimension. Thus complexifying the model.

What do you think I should do ? What are the advantages / drawbacks of each solution and is there a better solution ?

2
Can multiple customers be on multiple contracts? Like Customer 1 and 2 are on Contract A, and Customer 1 and 3 are on Contract B? Or is it truly one set of customers for every contract?N West

2 Answers

3
votes

I would start by creating a customer dimension with all customers in it, and with only one customer per row. A customer dimension can be a useful tool by itself for CRM and other purposes and it means you'll have a single, reliable list of customers, which makes whatever design you then implement much easier.

After that it depends on the relationship between the customer(s) and the contract. The main scenarios I can think of are that a) one contract has 4 customer 'roles', b) one contract has 1-4 customers, all with the same role, and c) one contract has 1-n customers, all with the same role.

Scenario A would be that each contract has 4 customer roles, e.g. one customer who requested the contract, a second who signs it, a third who witnesses it and a fourth who pays for it. In that case your fact table will have one row per contract and 4 customer ID columns, each of which references the customer dimension:

...
RequesterCustomerID int,
SignatoryCustomerID int,
WitnessCustomerID int,
BillableCustomerID int,
...

Of course, if one customer is both a requester and a witness then you'll have the same ID in both RequesterCustomerID and WitnessCustomerID because you only have one row for him in your customer dimension. This is completely normal.

Scenario B is that all customers have the same role, e.g. each contract has 1-4 signatories. If the number of signatories can never be more than 4, and if you're very confident that this will 'always' be true, then the simple solution is also to have one row per contract in the fact table with 4 columns that reference the customer dimension:

...
SignatoryCustomer1 int,
SignatoryCustomer2 int,
SignatoryCustomer3 int,
SignatoryCustomer4 int,
...

Even if most contracts only have 1 or 2 signatories, it's not doing much harm to have 2 less frequently used columns in the table.

Scenario C is where one contract has 1-n customers, where n is a number that varies widely and can even be very large (class action lawsuit?). If you have 50 customers on one contract, then adding 50 columns to the fact table becomes difficult to manage. In this case I would add a bridge table called ContractCustomers or whatever that links the fact table with the customer dimension. This isn't as 'neat' as the other solutions, but a pure star schema isn't very good at handling n:m relationships like this anyway.

There may also be more complex cases, where you mix scenarios A and C: a contract has 3 requesters, 5 signatories, 2 witnesses and the bill is split 3 ways between the requesters. In this case you will have no choice but to create some kind of bridge table that contains the specific customer mix for each contract, because it simply can't be represented cleanly with just one fact and one dimension table.

0
votes

Either way can work but each solution has different implications. Certainly you need customer and contract tables. A key question is: is it always a maximum of four or may it eventually increase beyond that? If it will stay at 4, then you can have a repeating group of 4 customer IDs in the contract. The disadvantage of this is that it is fixed. If a contract does not have four, there are some empty spaces. If, however, there might be more than 4, then the only viable solution is to use a bridge table because in a bridge table you add more customers by inserting new rows (with no change to the table structure). In the fixed solution, in this case you add more than 4 customers by altering the table. A bridge table is an example of what, for many decades now, ER modeling has called an associative entity. It is the more flexible of the two solutions. However, I worked on a margin system once wherein large margin amounts needed five levels of manager approval. It has been five and will always be five, they told me. Each approving manager represented a different organizational level. In this case, we used a repeating group of five manager IDs, one for each level, and included them in the trade. So it is important to understand the current business rules and the future outlook.