I have an extensive block of code that I've written using dplyr syntax in R. However, I am trying to put that code in a loop, so that I can ultimately create multiple output files as opposed to just one. Unfortunately, I appear unable to do so.
For illustration purposes regarding my problem, let's refer to the commonly used "iris" dataset in R:
> data("iris")
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num
$ Sepal.Width : num
$ Petal.Length: num
$ Petal.Width : num
$ Species : Factor w/ 3 levels "setosa","versicolor","virginica"
Let's say that I want to save the average Petal.Length of the species "versicolor". The dplyr code could look like the following:
MeanLength2 <- iris %>% filter(Species=="versicolor")
%>% summarize(mean(Petal.Length)) %>% print()
Which would give the following value:
mean(Petal.Length)
1 4.26
Lets attempt to create a loop to get the average petal length for all of the species.
From what little I know of loops, I would want to do something like this:
for (i in unique(iris$Species))
{
iris %>% filter(iris$Species==unique(iris$Species)[i]) %>%
summarize(mean(iris$Petal.Length)) %>% print()
print(i)
}
For some reason, I had to specify the data frame and the column inside the loop, which is generally not the case while using the piping functionality of dplyr. I'm assuming that this is indicative of the problem.
Anyways, the above code gives the following output:
mean(iris$Petal.Length)
1 3.758
[1] "setosa"
mean(iris$Petal.Length)
1 3.758
[1] "versicolor"
mean(iris$Petal.Length)
1 3.758
[1] "virginica"
So the code is outputting 3.758 three times, which is the average petal length across all species in the dataset. This indicates that the "filter" code did not work as expected. From what I can tell, it appears that the loop itself functioned as intended, as all three unique species names were printed in the eventual output.
How can one go about doing something like this with the use of for loops? I understand that this particular exercise does not require the use of fancy loops as one can easily get the average petal length of all the species by using, for example, the "group_by" function in dplyr, but I am looking to output close to a 100 unique table and PDF files with the dataset that I am working with and knowing how to use for loops would really help for that purpose.
group_by
and thensplit()
the result into a list with an element for each piece that you want. – joraniris %>% filter(Species == i) %>% summarize(mean(Petal.Length)) %>% print()
. That will make it produce different numbers for each species. – TJ Mahr