0
votes

I'm sorry for the basic question. I'm just struggling with something that should be simple. Say I have the the data frame "Test" that originally has three fields: Col1, Col2, Col3.

I want to create new columns based on each of the original columns. The values in each row of the new columns would specify whether the corresponding value in the matching row on the original column is above or below the initial column's median. So, for example, in the image attached, Col4 is based on Col1. Col5 is based on Col2. Col6 based on Col3.

test dataframe example:

test dataframe example

It's quite easy to perform this function on a single column and output a single column:

Test <- Test %>% mutate(Col4 = derivedFactor(

"below"= Col1 > median(Test$Col1),

"at"= Col1 == median(Test$Col1),

"above"= Col1 < median(Test$Col1)

.default = NA)

)

But if I'm performing this same operation over 50 columns, writing out/copy-paste and editing the code can be tedious and inefficient. I should mention that I am hoping to add the new columns to the data frame, not create another data frame. Additionally, there are about 200 other fields in the data frame that will not have this function performed on them (so I can't just use a mutate_all). And the columns are not uniformly named (my examples above are just examples, not the actual dataset) so I'm not able to find a pattern for mutate_at. Maybe there is a way to manually pass a list of column names to the mutate command?

There must be an easy and elegant way to do this. If anyone could help, that would be amazing.

1
Please do not post your data as a pictur, but in a easy reproducible format. Also, what does the function derivedFactor do? See also How to make a great R reproducible example Nevertheless: Have a look at mutate_allkath

1 Answers

0
votes

You can do the following using data.table.

Firstly, I define a function which is applied onto a numeric vector, whereby it outputs the elements' corresponding position in relation to the vector's median:

med_fn = function(x){

  med = median(x)
  unlist(sapply(x, function(x){
    if(x > med) {'Above'} 
    else if(x < med) {'Below'}
    else {'At'}
  }))

}

> med_fn(c(1,2,3))
[1] "Below" "At"    "Above"

Let us examine some sample data:

dt = data.table(
  C1 = c(1, 2, 3),
  C2 = c(2, 1, 3),
  C3 = c(3, 2, 1)
)

old = c('C1', 'C2', 'C3') # Name of columns I want to perform operation on
new = paste0(old, '_medfn') # Name of new columns following operation

Using the .SD and .SDcols arguments from data.table, I apply med_fn across the columns old, in my case columns C1, C2 and C3. I call the new columns C#_medfn:

dt[, (new) := lapply(.SD, med_fn), .SDcols = old]

Result:

> dt
   C1 C2 C3 C1_medfn C2_medfn C3_medfn
1:  1  2  3    Below       At    Above
2:  2  1  2       At    Below       At
3:  3  3  1    Above    Above    Below