I need to calculate the mean per group (i.e, per Coordinates) of the sample table below without losing any of the columns (the real table has over 40,000 rows with different states,location coordinates and type) So this:
State | Location Coordinates | Type | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 |
---|---|---|---|---|---|---|---|---|---|---|---|
California | West | Debt | 234 | 56 | 79 | 890 | 24 | 29 | 20 | 24 | 26 |
Nevada | West | Debt | 45 | 54 | 87 | 769 | 54 | 76 | 90 | 87 | 98 |
Would become this :
State | Location Coordinates | Type | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 |
---|---|---|---|---|---|---|---|---|---|---|---|
West | West | Debt | 234 | 56 | 79 | 890 | 24 | 29 | 20 | 24 | 26 |
When I use aggregate (df <- aggregate(df[,4:length(df)], list(df$Coordinates), mean). It removes the State and City columns.
Location Coordinates | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
West | 235 | 55 | 83 | 843 | 24 | 29 | 20 | 24 | 26 | Debt | 54 | 769 | 76 | 87 |
And when I use sqldf it averages the year and becomes this:
State | Location Coordinates | Type | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 |
---|---|---|---|---|---|---|---|---|---|---|---|
West | West | Debt | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 |
Any suggestions?
aggregate(.~ State + Coordinates + Type, df, mean)
– akrun