1
votes

I have a large data set (matrix of 0s and 1s) with 20 variables(each variable is an item) and about 100 rows (each row is a transaction). I use "arules" package in R for association rule mining.

I am only interested by rules with rhs=1 and lhs=1 (I mean all the item have to be true if I want to use the data later). I don't know how to select or classify my rules to keep only the rules I need.

For example with my data:

{hautvert=1,basintermediaire=1}  => {basvert=1}  0.1190476 1.0000000   4.941176
1235 {hautlarge=1,basbleu=0}          => {basvert=1}  0.1309524 0.9166667  4.529412
1274 {hautvert=1,basblanc=0}          => {basvert=1}  0.2023810 0.8947368  4.421053
1808 {hautlarge=1,pantalon=1}         => {baslarge=1} 0.1071429 1.0000000  4.421053
1811 {hautbleu=1,hautlarge=1}         => {baslarge=1} 0.1071429 1.0000000  4.421053
1889 {basbleu=1,pantalon=1}           => {baslarge=1} 0.1071429 1.0000000  4.421053
2261 {hautintermediaire=1,pantalon=1} => {basblanc=1} 0.1428571 1.0000000  4.200000
2291 {basserre=1,pantalon=1}          => {basblanc=1} 0.1428571 1.0000000  4.200000
2294 {hautbleu=0,pantalon=1}          => {basblanc=1} 0.1428571 1.0000000  4.200000
1256 {hautvert=1,basserre=0}          => {basvert=1}  0.2023810 0.8095238  4.000000

I need to have only the rules such as the first line where the both items are equals to 1 on the lhs and where the rhs is also equal to 1.

Thank you very much for your help.

1
Welcome to SO. To make it easier for all, you should always provide a reproducible example.lukeA
Ok thank you for your advice I edit.Stan
A reproducible example is an example that anyone can copy, paste and run. Like the one in my answer. However, unlike the data that you provided.lukeA
how did you get the values of variables?Liger

1 Answers

1
votes

Have a look at ?arules::subset, ?`%pin%` and ?apriori (minlen in the details section):

library(arules)
data("Adult")
rules <- apriori(Adult, parameter = list(minlen = 2)) 
rules.sub <- subset(rules, subset = lhs %pin% "relationship" & rhs %pin% "sex" & lift > 1.4 & support > 0.4)
as(rules.sub, "data.frame")
#                                                                      rules   support confidence     lift
# 80                                    {relationship=Husband} => {sex=Male} 0.4036485  0.9999493 1.495851
# 550 {marital-status=Married-civ-spouse,relationship=Husband} => {sex=Male} 0.4034028  0.9999492 1.495851