0
votes

I have a data frame df containing 3 numerical variables,1 outcome and 1 categorical variable

I need to carry out a procedure which involves filtering the df by different levels of category A or B and then dump them into a function such as binnedplot to check for interaction between the categorical and numerical variables.

sample df:

set.seed(10)

df=data.frame(num1=sample(100,60), 
              num2=sample(100,60), 
              num3=sample(100,60),
              category=as.factor(rep(c("A","B"),30)),
              outcome=sample(c(0,1),60, replace=T))

   df1=df%>%filter(category=="A")
   df2=df%>%filter(category=="B")

binnedplot(df1$num1, df1$outcome)
binnedplot(df2$num1, df2$outcome)

binnedplot(df1$num2, df1$outcome)
binnedplot(df2$num2, df2$outcome)

binnedplot(df1$num3, df1$outcome)
binnedplot(df2$num3, df2$outcome)

Update:

split.dfs<-split(df, df$category)
par(mar=c(1,1,1,1))
par(mfcol=c(2,1))
lapply(split.dfs, function(x) lapply(df[1:3], function(x) binnedplot(x, df$outcome, main=df$category)))

Initially I wondered how can I do this via a function in a more scalable way such as I can handle more numerical and categorical columns without too much repetition.

Now with updated code (Still have bug), my main issue is how to label the 3 2x1 Panels with the correct category header and how to label x axis with num1/num2/num3 for clarity of the plot.

1
You can split the 'df' by 'category'akrun
thanks i've updated my attempted code, the only issue left now is how to title the charts so I know which is whichsantoku

1 Answers

1
votes

You can use a combination of by and lapply:

library(arm)

by(df, df$category,
   function(x) lapply(subset(x, select = -c(category, outcome)),
                      binnedplot, x$outcome))