1
votes

I have a data frame, e.g.:

df <- data.frame(
        type = c("BND", "INV", "BND", "DEL", "TRA"),
        chrom1 = c(1, 1, 1, 1, 1),
        chrom2 = c(1, 1, 2, 1, 3)
        )

I want to reassign all df[df$type=='BND',] instances to either INV or TRA depending on the values in chrom1 and chrom2.

I am trying to use fct_recode from the forcats package as so:

library(forcats)

df$type <- ifelse(df$type=="BND", 
                  ifelse(df$chrom1 == df$chrom2,
                         fct_recode(df$type, BND="INV"),
                         fct_recode(df$type, BND="TRA")),
                  df$type)

However, this recodes my factors as numbers:

  type chrom1 chrom2
1    1      1      1
2    3      1      1
3    1      1      2
4    2      1      1
5    4      1      3

Here's my expected outcome:

  type chrom1 chrom2
1    INV      1      1 # BND -> INV as chrom1==chrom2
2    INV      1      1
3    TRA      1      2 # BND -> TRA as chrom1!=chrom2
4    DEL      1      1
5    TRA      1      3

How can I split a factor into two levels in this way?

4
Does it have to be using 'fct_recode'? and what should be the output?DJV
Did you see this?Sotos
@DJV - It doesn't have to use fct_recode. See update for expected outputfugu

4 Answers

2
votes

You can also do it with case_when()

library(tidyverse)

df %>% 
  mutate(type = as.factor(case_when(
    type == 'BND' & chrom1 == chrom2 ~ 'INV', 
    type == 'BND' & chrom1 != chrom2 ~ 'TRA',
    TRUE  ~ as.character(type))))

data:

df <- data.frame(
  type = c("BND", "INV", "BND", "DEL", "TRA"),
  chrom1 = c(1, 1, 1, 1, 1),
  chrom2 = c(1, 1, 2, 1, 3)
)
1
votes

My way of thinking it is as follows: (1) Index the rows you want to change, (2) you do the ifelse statement. I hope this helps:

  df <- data.frame(
  type = c("BND", "INV", "BND", "DEL", "TRA"),
  chrom1 = c(1, 1, 1, 1, 1),
  chrom2 = c(1, 1, 2, 1, 3)
)

indexBND<-which(df$type=="BND")
df$type[indexBND]<-ifelse(df$chrom1[indexBND] == df$chrom2[indexBND], df$type[indexBND] <- "INV", "TRA")

df
#   type chrom1 chrom2
# 1  INV      1      1
# 2  INV      1      1
# 3  TRA      1      2
# 4  DEL      1      1
# 5  TRA      1      3

Cheers!

1
votes

For the sake of completeness, here is also a concise data.table solution:

library(data.table)
setDT(df)[type == "BND" & chrom1 == chrom2, type := "INV"][type == "BND", type := "TRA"][]
   type chrom1 chrom2
1:  INV      1      1
2:  INV      1      1
3:  TRA      1      2
4:  DEL      1      1
5:  TRA      1      3

The benefit is that type is updated by reference, e.g., without copying the whole object, and only for those rows for which the condition applies.

0
votes

Or just

df$type[df$type == "BND"] <- with(df, 
                                  ifelse(df[type == "BND", ]$chrom1 == 
                                           df[type == "BND", ]$chrom2,
                                         "INV", "TRA"))
> df
  type chrom1 chrom2
1  INV      1      1
2  INV      1      1
3  TRA      1      2
4  DEL      1      1
5  TRA      1      3