0
votes

Here is my use case. We are trying to implement a policy sever using Drools. There could be few hundreds thousand (~200K) rules, all based data driven. Few example rules:

POLICY #1:  Gender = Male &&
            AGE BETWEEN 30 AND 40 &&
            BORN_STATE IN (NC,CA,PA)
===> [Outcome = Allow] 
POLICY #2:  (Gender = Male OR Gender = Female) &&
            AGE = 40 &&
            COUNTRY IN (USA,UK,BRA, IND)
===> [Outcome = Allow]
POLICY #3:  Gender = Female &&
            (AGE BETWEEN 10 AND 20 || BORN_YEAR > 1990) &&
            BORN_STATE IN (NC,CA,PA) &&
            BORN_STATE_SUPPORTIVE = TRUE
===> [Outcome = NotAllow]

NOTE: I am only using 4 parameters here, but there could up to 20 parameters in any given rule

Policy #1 and Policy #2 look simple and straight forward. However, Policy #3 is tricky. The last condition (BORN_STATE_SUPPORTIVE = TRUE) of Policy #3 indicates the States on the policy are "inclusive" which means, if the rule matched, the Outcome would "NotAllow" however, if all parameters matched but NOT BORN_STATE (e.g. Female, Age=15, State=NY) then the Outcome should be "Allow". It's just way for our users not to list all 47 States for this condition.

There are few other usecases:

  • Find and prevent duplicate rules while rules maintanence by users
  • As I mentioned, there could be thousands of these rules, so if there is a conflict (multiple conflicting rules matched), we should consider the Outcome of high priority rule.

Our users neither like Drools Guvnor nor convinced it's suitable for our need. So, we are tasked to build a custom UI to manage the rules. I am planning to come up with a DSL that our users use and generate DRL file programatically. I am not sure Drools DSL feature would work for this.

I am familiar with Drools and DRL file. However, I am not sure some of the complexity is solvable in Drools (Ex: finding duplicate rules and Policy #3) and, if Drools would be able to handle the load. Any, reference or direction would be greatly appreciated.

Update for BORN_STATE_SUPPORTIVE:

Polic #3 Expected Behavior:

Input #1:   Female, Age=15, State=NC
===> [Expected Outcome = NotAllow]
Input #2:   Female, Age=25, State=NC 
===> [Expected Outcome = <age not in range, rule didn't match>] (Neither NotAllowed Nor Allowed) 
Input #3:   Female, Age=15, State=NY 
===> [Expected Outcome = Allow] (Opposite of NotAllow)

In short, if all other condition matched except BORN_STATE then the outcome must be opposite of rule. And, if any other condition didn't match, the rule must be disregarded.

1

1 Answers

0
votes

You are creating a complication where there isn't one: The BORN_STATE_SUPPORTIVE can be eliminated by providing the operator NOT IN.

As for testing: you have a 20-dimensional space with discrete coordinates, which means that the number of points is at least 2^20. It won't be a problem to discover multiple or no matches at runtime, so providing a solution accepted by the boffins for this situation might be the way to go. If you need to make sure that this never occurs you'll have to validate your specs.

I advise against using priorities.

Don't create O(100000) rules. A similar case didn't work for less.

As for validation: The logic you have is based on sets of points in the 20-dimensional space S. You have two sets, A ∪ F = S and A ∩ F = Φ. Each rule defines a subset Ai ⊂ A, and you need to check that ∪Ai = A and Ai ∩ Aj = Φ for all i ≠ j. Not a big problem, can be implemented in a day or two (unless you have something very weird in the 17 dimensions you haven't shown).

Update There is one set of states for girls of age 15: {AL:NY,ND:WY}. So you'll have one parameter set ({15}, {AL:NY,ND:WY},...), probably other ones such as ({16:20},{AL:WY},...), ({21:99},{AL:WY},...).

If the boffins say that all (15-99) women from all states are eligible except the 15 year olds from North Carolina, you'll have to compute the set difference

({15:99},{AL:WY}) - ({15},{NC})

as a union of mutually disjoint sets (for which there is one optimum solution and many more besides). --- I have only a layman's understanding of insurance policies but I'm quite certain that a good strategy for handling these set algebraic operations will be rather self evident. After all, the "disallows" aren't the result of some random operation.