0
votes

I have seen this Subsetting a data frame based on a logical condition on a subset of rows and that https://statisticsglobe.com/filter-data-frame-rows-by-logical-condition-in-r

I want to subset a data.frame according to a specific value in the row.names.

data <- data.frame(x1 = c(3, 7, 1, 8, 5),                    # Create example data
                   x2 = letters[1:5],
                   group = c("ga1", "ga2", "gb1", "gc3", "gb1"))
data                                                         # Print example data
# x1 x2 group
#  3  a    ga1
#  7  b    ga2
#  1  c    gb1
#  8  d    gc3
#  5  e    gb1

I want to subset data according to group. One subset should be the rows containing a in their group, one containing b in their group and one c. Maybe something with grepl?

The result should look like this

data.a                                                       
# x1 x2 group
#  3  a    ga1
#  7  b    ga2

data.b                                                      
# x1 x2 group
#  1  c    gb1
#  5  e    gb1

data.c
#  8  d    gc3

I would be interested in how to subset one of these output examples, or perhaps a loop would work too.

I modified the example from here https://statisticsglobe.com/filter-data-frame-rows-by-logical-condition-in-r

3
Your question is answered exactly as requested in your request. Good question.Gray

3 Answers

1
votes

Extract the data which you want to split on :

sub('\\d+', '', data$group)
#[1] "ga" "ga" "gb" "gc" "gb"

and use the above in split to divide the data into groups.

new_data <- split(data, sub('\\d+', '', data$group))
new_data
#$ga
#  x1 x2 group
#1  3  a   ga1
#2  7  b   ga2

#$gb
#  x1 x2 group
#3  1  c   gb1
#5  5  e   gb1

#$gc
#  x1 x2 group
#4  8  d   gc3

It is better to keep data in a list however, if you want separate dataframes for each group you can use list2env.

list2env(new_data, .GlobalEnv)
1
votes

We can use group_split with str_remove in tidyverse

library(dplyr)
library(stringr)
data %>% 
    group_split(grp = str_remove(group, "\\d+$"), .keep = FALSE)
1
votes

Good question. This solution uses inputs and outputs that closely match the request: "I want to subset data according to group. One subset should be the rows containing a in their group, one containing b in their group and one c. Maybe something with grepl?".

The code below uses the data frame that was provided (named data), and uses grep(), and subsets by group.

code:

ga <- grep("ga", data$group)   # seperate the data by group type
gb <- grep("gb", data$group)   
gc <- grep("gc", data$group) 

ga1 <- data[ga,]                     # subset ga
gb1 <- data[gb,]                     # subset gb
gc1 <- data[gc,]                     # subset gc

print(ga1)
print(gb1)
print(gc1)

Windows and Jupyter Lab were used. This output here closely matches the output that was shown above.

Output shown at link: link1