0
votes

I have a data frame with 526 observations of 83 variables. These observations come from two independent sources and all of the data are non-normally distributed. I would therefore like to do Wilcoxon rank-sum tests for all of the 83 variables to compare the two sources. I cannot, however, figure out how to craft a function to step through each of the variables. My first thought was to use a for loop to step through the variables, calling each column by its number.

I have no trouble with this if I use a number to designate the column. However, if I use a variable instead, I get an error message. For instance, the following works just fine to perform the Wilcoxon test on the 11th column of my data frame:

model<-wilcox.test(pool[ , 11] ~ group, data = pool, paired = FALSE)
model

However, the following returns an error

i <- 11
model<-wilcox.test(pool[ , i] ~ group, data = pool, paired = FALSE)
model

The error I get is:

Error in model.frame.default(formula = pool[, i] ~ Church, data = pool) : invalid type (list) for variable 'pool[, i]'

Obviously I'm missing some key concept, but I'm at a loss for what it might be at this point.

1
Could you show a small example datasetakrun
try unlist(pool[ , i ])Koundy
I couldn't reproduce the error using R.3.1.2 on a normal 'data.frame'. It seems that your columns are lists (from the error)akrun

1 Answers

1
votes

You can construct the formula like so:

modelList<-list()
for(i in 1:83){
    fmla <- formula(paste(names(pool)[i], " ~ group"))
    modelList[[i]]<-wilcox.test(fmla, data = pool, paired = FALSE)
}

The advantage to this approach is that the model object will include the variable name, which makes identifying which variables match to which statistics a bit easier.