3
votes

I am working on association rule mining using arules package (apriori algorithm). I want to classify rules into main or sub rule.

library(arules)
data("Adult")
rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, target = "rules"))
Rules_2 = as(rules, "data.frame")
nrow(Rules_2)
Rules_2 = Rules_2[order(-Rules_2$lift), ]

#Remove Brackets
Rules_2$rules=gsub("\\{", "", Rules_2$rules)
    Rules_2$rules=gsub("\\}", "", Rules_2$rules)
Rules_2$rules=gsub("\"", "", Rules_2$rules)

#Split the rule
library(splitstackshape)
Rules_3=cSplit(Rules_2, "rules","=>")
names(Rules_3)[names(Rules_3) == 'rules_1'] <- 'LHS'
Rules_4=cSplit(Rules_3, "LHS",",")
Rules_5=subset(Rules_4, select= -c(rules_2))
names(Rules_5)[names(Rules_5) == 'rules_3'] <- 'RHS'

I want one additional column to be added to the right of "Rules_5" table and label each rule as "Main" or "Sub-Rule".

To determine main or sub rule, we need to look at the rules and if all the item in one rule (rule A) are contained in the other rule (rule B) then A is a sub rule of B.

The desired output of first 2 rows -

support confidence  lift    RHS LHS_1   LHS_2   LHS_3   Classification
0.541542115 0.905108989 1.058554027 race=White  sex=Male    native-country=United-States    NA  Sub Rule
0.511363171 0.903258472 1.056389787 race=White  sex=Male    capital-loss=None   native-country=United-States    Main

The first rule (row) of data frame "Rules_5" is "sub-rule" because "sex=Male, native-country=United-States" exists in second rule (row). And the second rule is not found in any other rule so it is tagged as "main" rule.

1

1 Answers

0
votes

This is probably too late. But for other people looking for similar things, I would suggest looking into closed and maximal rules.

As for Riya's question,

rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, target = "rules")) // all rules

rules.maximal <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, target = "maximal")) // main rules

Or, you can use is.maximalfunction to find the set of maximal rules. These would be the Main Rules, and the other rules would be Sub-Rules.