1
votes

I have an example dataset looks like:

data <- as.data.frame(c("A","B","C","X1_theta","X2_theta","AB_theta","BC_theta","CD_theta"))
colnames(data) <- "category"
> data
  category
1        A
2        B
3        C
4 X1_theta
5 X2_theta
6 AB_theta
7 BC_theta
8 CD_theta

I am trying to generate a logical variable when the category (variable) contains "theta" in it. However, I would like to assign the logical value as "FALSE" when cell values contain "X1" and "X2".

Here is what I did:

data$logic <- str_detect(data$category, "theta")
> data
  category logic
1        A FALSE
2        B FALSE
3        C FALSE
4 X1_theta  TRUE
5 X2_theta  TRUE
6 AB_theta  TRUE
7 BC_theta  TRUE
8 CD_theta  TRUE

here, all cells value that have "theta" have the logical value of "TRUE".

Then, I wrote this below to just assign "FALSE" when the cell value has "X" in it.

data$logic <- ifelse(grepl("X", data$category), "FALSE", "TRUE")
> data
  category logic
1        A  TRUE
2        B  TRUE
3        C  TRUE
4 X1_theta FALSE
5 X2_theta FALSE
6 AB_theta  TRUE
7 BC_theta  TRUE
8 CD_theta  TRUE

But this, of course, overwrote the previous application

What I would like to get is to combine two conditions:

> data
  category logic
1        A FALSE
2        B FALSE
3        C FALSE
4 X1_theta FALSE
5 X2_theta FALSE
6 AB_theta  TRUE
7 BC_theta  TRUE
8 CD_theta  TRUE

Any thoughts? Thanks

2

2 Answers

2
votes

We can create the 'logic', by detecting substring 'theta' at the end and not having 'X' ([^X]) as the starting (^) character

libary(dplyr)
library(stringr)
library(tidyr)
data %>%
    mutate(logic = str_detect(category, "^[^X].*theta$"))

If we need to split the column into separate columns based on the conditions

data %>%
   mutate(logic = str_detect(category, "^[^X].*theta$"),
          category = case_when(logic ~ str_replace(category, "_", ","),
           TRUE ~ as.character(category))) %>%
   separate(category, into = c("split1", "split2"), sep= ",", remove = FALSE)
#  category   split1 split2 logic
#1        A        A   <NA> FALSE
#2        B        B   <NA> FALSE
#3        C        C   <NA> FALSE
#4 X1_theta X1_theta   <NA> FALSE
#5 X2_theta X2_theta   <NA> FALSE
#6 AB,theta       AB  theta  TRUE
#7 BC,theta       BC  theta  TRUE
#8 CD,theta       CD  theta  TRUE

Or in base R

data$logic <- with(data, grepl("^[^X].*theta$", category))

Another option is to have two grepl condition statements

data$logic <- with(data, grepl("theta$", category) & !grepl("^X\\d+", category))
data$logic
#[1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
0
votes

Not the cleanest in the world (since it adds 2 unnecessary cols) but it gets the job done:

data <- as.data.frame(c("A","B","C","X1_theta","X2_theta","AB_theta","BC_theta","CD_theta"))
colnames(data) <- "category"

data$logic1 <- ifelse(grepl('X',data$category), FALSE, TRUE)
data$logic2 <- ifelse(grepl('theta',data$category),TRUE, FALSE)
data$logic <- ifelse((data$logic1 == TRUE & data$logic2 == TRUE), TRUE, FALSE)
print(data)

I think you can also remove the logic1 and logic2 cols if you want but I usually don't bother (I'm a messy coder haha).

Hope this helped!

EDIT: akrun's grepl solution does what I'm doing way more cleanly (as in, it doesn't require the extra cols). I definitely recommend that approach!