Here is an approach where we make a narrow format tidy data set to start, and then use pivot_wider()
to get the result with Length
and Width
columns.
library(tidyr)
id <- 1:nrow(iris)
data <- cbind(id,iris)
data %>% gather(.,key = "part.measurement",value = "value",-id,-Species) %>%
separate(.,part.measurement,c("part","measurement")) -> narrow_data
head(narrow_data[2:5])
> head(narrow_data[2:5])
Species part measurement value
1 setosa Sepal Length 5.1
2 setosa Sepal Length 4.9
3 setosa Sepal Length 4.7
4 setosa Sepal Length 4.6
5 setosa Sepal Length 5.0
6 setosa Sepal Length 5.4
At this point we can use pivot_wider()
to create the Length
and Width
columns. We'll add an arrange()
so the sort order matches the image posted with the question.
narrow_data %>% pivot_wider(.,names_from = measurement,values_from = value) %>%
arrange(Species,part)-> wide_data
head(wide_data[2:5])
...and the output:
> head(wide_data[2:5])
Species part Length Width
<fct> <chr> <dbl> <dbl>
1 setosa Petal 1.4 0.2
2 setosa Petal 1.4 0.2
3 setosa Petal 1.3 0.2
4 setosa Petal 1.5 0.2
5 setosa Petal 1.4 0.2
6 setosa Petal 1.7 0.4
>
The second output is tricky because it essentially merges the 200 observations of part and measurement for each species of flower into an output tibble of 200 rows, one for each combination of part
and length
for each of the 50 observations of each Species
.
speciesId <- c(1:200,1:200,1:200) # unique obs within species
narrow_species_data <- cbind(speciesId,narrow_data[order(narrow_data[,1],narrow_data[,3],narrow_data[,4]),c(2:5)])
narrow_species_data %>% pivot_wider(.,names_from= Species,values_from = value) %>%
arrange(part,measurement,speciesId) -> wide_data_species
head(wide_data_species[2:6])
...and the output:
> head(wide_data_species[2:6])
part measurement setosa versicolor virginica
<chr> <chr> <dbl> <dbl> <dbl>
1 Petal Length 1.4 4.7 6
2 Petal Length 1.4 4.5 5.1
3 Petal Length 1.3 4.9 5.9
4 Petal Length 1.5 4 5.6
5 Petal Length 1.4 4.6 5.8
6 Petal Length 1.7 4.5 6.6
>
A "completely tidyverse" version
Here is a version of both parts of the question that solely uses features from the tidyverse
family of packages.
For the first question, we use mutate()
and seq_along()
to create unique sequential numbers to identify each observation in the original data. We create a narrow form tidy data set with gather()
, and then convert it into the desired output with pivot_wider()
. To match the order of observations from the image in the original question, we arrange(Species,part)
.
library(tidyr)
library(dplyr)
# add an ID variable so we can pivot_wider and match measurement for correct observations
iris %>% mutate(id = seq_along(Species)) %>% gather(.,key = "part.measurement",value = "value",-id,-Species) %>%
separate(.,part.measurement,c("part","measurement")) -> narrow_data
narrow_data %>% pivot_wider(.,names_from = measurement,values_from = value) %>%
arrange(Species,part) -> wide_data
head(wide_data[2:5])
...and the output:
> head(wide_data[2:5])
id part Length Width
<int> <chr> <dbl> <dbl>
1 1 Petal 1.4 0.2
2 2 Petal 1.4 0.2
3 3 Petal 1.3 0.2
4 4 Petal 1.5 0.2
5 5 Petal 1.4 0.2
6 6 Petal 1.7 0.4
>
For the second question, instead of building a vector of sequential IDs for each species and using cbind()
it to the rest of the data, we can use dplyr
functions to create the sequences within a pipeline.
We use arrange()
to sort the data by Species, id, part, and measurement. Then we group_by(Species)
so we can use mutate()
to create a unique sequential ID with seq_along()
. The sort order is important because we want to merge the 1st observation with the 51st observation and the 101st observation.
Then we ungroup()
to clear the group_by()
and use pivot_wider()
with id_cols = speciesId
to create the desired output.
narrow_data %>% arrange(Species,id,part,measurement) %>% group_by(Species) %>% mutate(speciesId = seq_along(Species)) %>%
ungroup(.) %>% pivot_wider(.,id_cols=c("speciesId","part","measurement"),names_from= Species,values_from = value) %>%
arrange(part,measurement,speciesId) -> wide_data_species
head(wide_data_species[2:6])
...and the output:
> head(wide_data_species[2:6])
part measurement setosa versicolor virginica
<chr> <chr> <dbl> <dbl> <dbl>
1 Petal Length 1.4 4.7 6
2 Petal Length 1.4 4.5 5.1
3 Petal Length 1.3 4.9 5.9
4 Petal Length 1.5 4 5.6
5 Petal Length 1.4 4.6 5.8
6 Petal Length 1.7 4.5 6.6
>
dplyr
functionmutate
. – Matt