0
votes

I have a large csv file, and I am trying to find the median and the mean values of certain values in a column. One of my columns is titled 'Race' and another is called 'debt_to_income_ratio'. Within the Race column, the four options are 'White', 'Black', 'Hispanic', and 'Other'. The 'debt_to_income_ratio' column has a number in it indicating the debt to income ratio of whatever the race is in the 'Race' column. I am trying to get a median and mean debt to income ratio for each race (white, black, hispanic, and other).

The code I am currently using is:

df['race average'] = df.groupby('Race')['debt_to_income_ratio'].transform('mean') %>%
df['race median'] = df.groupby('Race')['debt_to_income_ratio'].transform('median')

I'm not really sure what I should be doing, so thanks in advance for any help!

2
This is python or R ? Seems like a chimera... Can you clarify which programming language is this intended for and also can you share df by doing dput(head(df)) and pasting the output?StupidWolf
If this is a question about computing summary statistics by group of one variable, then it is a frequent duplicate. See 1, 2.Rui Barradas
This is intended for R.Lauren
I use the code that the code suggested in 2, which was: group_by(Race) %>% mutate(Race.mean.values = mean(debt_to_income_ratio)) . A new column was created, but it all of the values were NA.Lauren
We don't have your data, and your only code is both in python (not R) and not completely correct python code (%>%?). Please spend a moment to improve this question to be a minimal reprex, where we have some representative data to play with. (Unambiguous data is best served with dput(head(x)) or data.frame(...), depending on several factors.) From there, if you have preferences for R "ecosystems" like base, dplyr, or data.table, please be explicit, otherwise answers might encourage packages with which you are not familiar.r2evans

2 Answers

1
votes

We can use dplyr to do this

library(dplyr)
df %>%
    group_by(Race) %>%
    mutate(Mean = mean(debt_to_income_ratio, na.rm = TRUE),
           Median = median(debt_to_income_ratio, na.rm = TRUE))
   
0
votes

An option based on the base R aggregate function. Is this what you mean?

race_median = aggregate(debt_to_income_ratio ~ Race, data = df, FUN = function(x) quantile(x, 0.5, na.rm = T))
race_mean   = aggregate(debt_to_income_ratio ~ Race, data = df, FUN = "mean")