I have a large csv file, and I am trying to find the median and the mean values of certain values in a column. One of my columns is titled 'Race' and another is called 'debt_to_income_ratio'. Within the Race column, the four options are 'White', 'Black', 'Hispanic', and 'Other'. The 'debt_to_income_ratio' column has a number in it indicating the debt to income ratio of whatever the race is in the 'Race' column. I am trying to get a median and mean debt to income ratio for each race (white, black, hispanic, and other).
The code I am currently using is:
df['race average'] = df.groupby('Race')['debt_to_income_ratio'].transform('mean') %>%
df['race median'] = df.groupby('Race')['debt_to_income_ratio'].transform('median')
I'm not really sure what I should be doing, so thanks in advance for any help!
%>%
?). Please spend a moment to improve this question to be a minimal reprex, where we have some representative data to play with. (Unambiguous data is best served withdput(head(x))
ordata.frame(...)
, depending on several factors.) From there, if you have preferences for R "ecosystems" like base,dplyr
, ordata.table
, please be explicit, otherwise answers might encourage packages with which you are not familiar. – r2evans