1
votes

I'm attempting to revisit some older code in which I used a for loop to calculate a combined ranking of genes based on two columns. My end goal is to get out a column that lists the proportion of genes that any given gene in the dataset performs better than.

I have a data.frame that I'm calling scores which contains two columns of relevant scores for my genes. To calculate the combined ranking I use the following for loop and I calculate the proportional score by dividing the resulting rank by the total number of observations.

scores <- data.frame(x = c(0.128, 0.279, 0.501, 0.755, 0.613), y = c(1.49, 1.43, 0.744, 0.647, 0.380))

#Calculate ranking
comb.score = matrix(0, nrow = nrow(scores), ncol = 1)
for(i in 1:nrow(scores)){
  comb.score[i] = length(which(scores[ , 1] < scores[i, 1] & scores[ , 2] < scores[i, 2]))
}

comb.score <- comb.score/length(comb.score) #Calculate proportion 

Now that I've become more familiar and comfortable with the tidyverse I want to convert this code to use tidyverse functions but I haven't been able to figure it out on my own, nor with SO or RStudio community answers.

The idea I had in mind was to use mutate() along with min_rank() but I'm not entirely sure of the syntax. Additionally the behavior of min_rank() appears to assess rank using a logical test like scores[ , 1] <= scores[i, 1] as opposed to just using < like I did in my original test.

My expected out come is an additional column in the scores table that has the same output as the comb.score output in the above code: a score that tells me the proportion of genes in the whole dataset that a gene on a given row performs better than.

Any help would be much appreciated! If I need to clarify anything or add more information please let me know!

2
What does your people dataframe look like? What is your expected output?Matt
The people data.frame was a typo. I've updated my question to be a little more specific and to state my expected output.Jeffrey Brabec

2 Answers

3
votes

Interessting question. I propose this way:

scores %>%
  rowwise() %>%
  mutate(comb_score = sum(x > .$x & y > .$y)) %>%
  ungroup() %>%
  mutate(comb_score = comb_score/n())

which gives

# A tibble: 5 x 3
      x     y comb_score
  <dbl> <dbl>      <dbl>
1 0.128 1.49         0  
2 0.279 1.43         0  
3 0.501 0.744        0  
4 0.755 0.647        0.2
5 0.613 0.38         0 
3
votes

A bit similar to Martins answer, but using pmap instead.

library(tidyverse)

scores <- data.frame(
    x = c(0.128, 0.279, 0.501, 0.755, 0.613), 
    y = c(1.49, 1.43, 0.744, 0.647, 0.380)
)

scores %>% 
  mutate(
    score = pmap(list(x, y), ~ sum(..1 > x & ..2 > y)) / n()
  )
#>       x     y score
#> 1 0.128 1.490     0
#> 2 0.279 1.430     0
#> 3 0.501 0.744     0
#> 4 0.755 0.647   0.2
#> 5 0.613 0.380     0

Created on 2020-06-18 by the reprex package (v0.3.0)