I am working on association rule mining using arules package (apriori algorithm). I want to classify rules into main or sub rule.
library(arules)
data("Adult")
rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, target = "rules"))
Rules_2 = as(rules, "data.frame")
nrow(Rules_2)
Rules_2 = Rules_2[order(-Rules_2$lift), ]
#Remove Brackets
Rules_2$rules=gsub("\\{", "", Rules_2$rules)
Rules_2$rules=gsub("\\}", "", Rules_2$rules)
Rules_2$rules=gsub("\"", "", Rules_2$rules)
#Split the rule
library(splitstackshape)
Rules_3=cSplit(Rules_2, "rules","=>")
names(Rules_3)[names(Rules_3) == 'rules_1'] <- 'LHS'
Rules_4=cSplit(Rules_3, "LHS",",")
Rules_5=subset(Rules_4, select= -c(rules_2))
names(Rules_5)[names(Rules_5) == 'rules_3'] <- 'RHS'
I want one additional column to be added to the right of "Rules_5" table and label each rule as "Main" or "Sub-Rule".
To determine main or sub rule, we need to look at the rules and if all the item in one rule (rule A) are contained in the other rule (rule B) then A is a sub rule of B.
The desired output of first 2 rows -
support confidence lift RHS LHS_1 LHS_2 LHS_3 Classification
0.541542115 0.905108989 1.058554027 race=White sex=Male native-country=United-States NA Sub Rule
0.511363171 0.903258472 1.056389787 race=White sex=Male capital-loss=None native-country=United-States Main
The first rule (row) of data frame "Rules_5" is "sub-rule" because "sex=Male, native-country=United-States" exists in second rule (row). And the second rule is not found in any other rule so it is tagged as "main" rule.