Recoding factor levels using dplyr or tidyverse

Question

I have a table that features 3 levels of risk alleles at different genomic loci. Ultimately, I need to set up this table a key to identify the prevalence of the different alleles factored by risk status in a large number of samples. I currently have an example of the risk table below:

genomic.stuff <- data.frame(c("A A", "A G", "G A", "G G"), c("T T", "C T", "T C", "C C"),
                        row.names= c("Risk Level 1", "Risk Level 2", "Risk Level 3", "Risk Level 4"),
                        stringsAsFactors = TRUE)
colnames(genomic.stuff) <- c("Gene A", "Gene B")

genomic.stuff
             Gene A Gene B
Risk Level 1    A A    T T
Risk Level 2    A G    C T
Risk Level 3    G A    T C
Risk Level 4    G G    C C

str(genomic.stuff)
'data.frame':   4 obs. of  2 variables:
 $ Gene A: Factor w/ 4 levels "A A","A G","G A",..: 1 2 3 4
 $ Gene B: Factor w/ 4 levels "C C","C T","T C",..: 4 2 3 1

So I have 2 things I would like to do with this data frame. Bear in mind I have a large mapping file with many genes, so if this can be done across the entire table in dplyr or tidyverse that would (I think?) be best.

1) I want to re-level the factors so that they ranked according to risk status and not automatically leveled according to alphabetical order (The data frame already exists so I don't think I can do it on the level of the data frame construction)

2) I want to reassign factor level such that Risk Level 1 = 1, Risk Level 2 | 3 = 2, Risk Level 4=3.

Thank you all very much for your help!

A row name made sense to my peanut brain, but I can certainly make it into a column, as you did below. But Im not sure where that gets me in terms of solving the problem. — Mark Z
Making it a column enables you refactor based on this "Risk Level" if and only if it is numeric. However, the levels of the genes themselves don't change. — NelsonGon

NelsonGon NelsonGon · Accepted Answer · 2018-12-19T03:23:47

You will need to make Risk Level numeric and do the reordering as follows:

EDIT: You can choose to reclass Risk Level.

 library(tidyverse)
genomic.stuff <- data.frame(c("A A", "A G", "G A", "G G"), c("T T", "C T", "T C", "C C"),
                            row.names= c("Risk Level 1", "Risk Level 2", "Risk Level 3", "Risk Level 4"),
                            stringsAsFactors = TRUE)
colnames(genomic.stuff) <- c("Gene A", "Gene B")

    new_genome<-genomic.stuff %>% 
      mutate(RiskLevel=as.factor(c(1,2,3,4))) %>% 
      mutate(RiskLevel=as.numeric(c(1,2,2,4)),`Gene A`=fct_reorder(`Gene A`,RiskLevel),
             `Gene B`=fct_reorder(`Gene B`,RiskLevel)) 
    levels(new_genome$RiskLevel)
    levels(as.factor(new_genome$RiskLevel))

Recoding factor levels using dplyr or tidyverse

1 Answers