0
votes

I need to calculate the mean per group (i.e, per Coordinates) of the sample table below without losing any of the columns (the real table has over 40,000 rows with different states,location coordinates and type) So this:

State Location Coordinates Type 2002 2003 2004 2005 2006 2007 2008 2009 2010
California West Debt 234 56 79 890 24 29 20 24 26
Nevada West Debt 45 54 87 769 54 76 90 87 98

Would become this :

State Location Coordinates Type 2002 2003 2004 2005 2006 2007 2008 2009 2010
West West Debt 234 56 79 890 24 29 20 24 26

When I use aggregate (df <- aggregate(df[,4:length(df)], list(df$Coordinates), mean). It removes the State and City columns.

Location Coordinates 2002 2003 2004 2005 2006 2007 2008 2009 2010
West 235 55 83 843 24 29 20 24 26 Debt 54 769 76 87

And when I use sqldf it averages the year and becomes this:

State Location Coordinates Type 2002 2003 2004 2005 2006 2007 2008 2009 2010
West West Debt 2002 2003 2004 2005 2006 2007 2008 2009 2010

Any suggestions?

2
Can you try aggregate(.~ State + Coordinates + Type, df, mean)akrun

2 Answers

1
votes

@akrun: Could this work:

df %>% 
  group_by(Coordinates) %>% 
  summarise(State=Coordinates, Type=Type, across(where(is.numeric), ~mean(.x, na.rm = TRUE))) %>% 
  filter(row_number() %% 2 == 0) ## Select even rows

Output:

# Groups:   Coordinates [1]
  Coordinates State Type  `2002` `2003` `2004` `2005` `2006` `2007` `2008` `2009` `2010`
  <chr>       <chr> <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 West        West  Debt    140.     55     83   830.     39   52.5     55   55.5     62
1
votes

An option is to first change the 'State' as 'Coordinates' and then apply the formula method including the grouping variables 'State', 'Coordinates' and 'Type'

df$State <- df$`Location Coordinates`
aggregate(.~ State + `Location Coordinates` + Type,
            df, FUN = mean)