0
votes

I'm trying to convert a dataframe comprised of both categorical and numeric columns into a dataframe where each value represents the relative frequency within the column. I need to scale flexibly, so using the names of columns as listed in the dataframe I'm practicing with doesn't work for my purposes.

As a toy example, please consider this dataframe:

df<-data.frame(fruit=c('apple','apple','pear','orange','apple','pear'),
           price=c(47,92,87,14,21,19),
           town=c('home','far','close','close','close','far'))

As a goal dataframe, I'm hoping to have the result:

goal<-data.frame(fruit=c(.50,.50,.33,.17,.5,.17),
                 price=c(.01,1.29,1.14,-0.93,-0.73,-0.79),
                 town=c(.17,.33,.50,.50,.50,.33))

In the goal data frame, I'm hoping to have numeric columns scaled, and columns that have categorical values transformed into the relative frequency of the value within the column. For example, "apple" appears for three of the six records in the dataframe, and thus .50 reflects the 3/6 within the column.

I am able to convert the price variable, and all numeric columns in my dataframe, to z-score using:

newdf <- df %>%
         mutate_if(is.numeric,scale)

This accomplishes my goal for numeric columns, as the scaled value is more interesting to me than the number of times a value on a continuous scale appears (most of the sets I'll use this on have tons of decimal places, and exact repeats are not there).

I tried using some slight edits to code I found in this answer: dplyr: apply function table() to each column of a data.frame , but I've failed to accomplish my goal. How could I reach my desired result?

Thank you in advance!

1

1 Answers

2
votes

Here's an alternative

library(qdapTools) # for `lookup` function

df %>%
  mutate_if(is.numeric,scale) %>% 
  mutate_if(is.factor, function(x) lookup(x, as.data.frame(prop.table(table(x))))) %>% 
  round(2)
#   fruit price town
# 1  0.50  0.01 0.17
# 2  0.50  1.29 0.33
# 3  0.33  1.15 0.50
# 4  0.17 -0.93 0.50
# 5  0.50 -0.73 0.50
# 6  0.33 -0.79 0.33