0
votes

Using split() with two grouping variables leaves me with a list, which contains all my variables, including the ones I used to group with.

> s <- split (iris, list(iris$Sepal.Length,iris$Species), drop = T)

$`4.3.setosa`
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
14          4.3           3          1.1         0.1  setosa

$`4.4.setosa`
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
9           4.4         2.9          1.4         0.2  setosa
39          4.4         3.0          1.3         0.2  setosa
43          4.4         3.2          1.3         0.2  setosa
...

Now I want to calculate the rowMeans() of iris$Sepal.Width and iris$Petal.Width. Since iris$Species is a factor, I can't simply

> sapply(s, drop = T), rowMeans())

Error in FUN(X[[i]], ...) : 'x' must be numeric

I could subset and calculate the means for the variables I'm interested in, but then I lose my grouping variables (they are still in the resulting row names, but not in a directly usable format: "4.3.setosa" etc.)

> s <- lapply(s, subset, select = c("Sepal.Width", "Petal.Width"))
> t(sapply(s, colMeans))
               Sepal.Width Petal.Width
4.3.setosa        3.000000   0.1000000
4.4.setosa        3.033333   0.2000000
...

I've been sitting on this for 2 days now and can't think of any elegant solution. I know I could split them afterwards (https://stackoverflow.com/a/43431847/9015909), but that would be bad in case any variables come along that have a point in their name. I think writing a for loop that c() binds every single colMeans() result with s[[i]][1,c("Sepal.Length","Species") and then combines them in a data frame could work, but I feel like there is a more elegant solution I'm just not seeing. Thanks in advance for any advice.

1
I don't understand why you need a grouping variable if you are going for the rowMeans lapply(split (iris, list(iris$Sepal.Length,iris$Species), drop = TRUE), function(x) data.frame(Species = x$Species, Mean = rowMeans(x[1:4])))akrun

1 Answers

0
votes

you can use the aggregate function and still be able to keep your groupings.

 aggregate(.~Species,iris,mean)
      Species Sepal.Length Sepal.Width Petal.Length Petal.Width
 1     setosa        5.006       3.428        1.462       0.246
 2 versicolor        5.936       2.770        4.260       1.326
 3  virginica        6.588       2.974        5.552       2.026