I need to write a function that takes a numeric input vector so that it computes summary statistics of the minimum, mean, median, & maximum. The result should be a vector of length four. Then I need to apply it to all the columns in my data frame and produce a new data frame with these results with 5 columns.
The original data frame looks like this (partial):
> dput(head(commodities,4))
structure(c(2054.86, 2131.01, 1978.38, 1932.46, 401.96, 372.19,
422.91, 395.9, 66.58, 66.58, 69.9, 69.9, 136.36, 134.55, 118,
114.51, 39.7, 40.26, 40.83, 41.41, 3167.16, 3236.82, 3091.1,
2910.1, 168.67, 164.83, 184.38, 180.81, 162.56, 162, 169.89,
162.9, 591.59, 596.6, 561.51, 541.45, 2592.63, 2916.71, 2303.83,
2074.55, 88.72, 97.21, 93.53, 90.56, 986.56, 1040.81, 960.43,
944.36, 980.08, 1000.38, 1009.4, 1015.04, 59.1, 48.7, 39.4, 38.1,
12.15, 12.15, 12.15, 12.15, 117.23, 122.01, 119.32, 132.72, 1111.13,
1166.24, 1117.74, 970.03, 84.39, 80.25, 80.25, 87.5, 146.08,
159.57, 155.28, 152.79, 105.51, 114.17, 109.84, 108.26, 6584.8,
6978.93, 6733.79, 6233.37, 35.64, 35.09, 36.01, 35.09, 40, 38.5,
38.25, 38.15, 38, 36, 35.75, 35, 37, 37.04, 39.52, 39.5, 2271.72,
2256.48, 2188.11, 2081.17, 347, 350, 338, 377, 547.05, 555.32,
518.13, 504.91, 72.42, 65.53, 65, 51.86, 33.9, 32.55, 31.83,
30.99, 395, 399, 415, 419, 68.82, 75.31, 66.35, 60.55, 7.45,
7.6, 7.4, 7.43, 297.61, 308.29, 304.92, 302.95, 138, 131.23,
131.23, 143.08, 13.34, 12.68, 12.79, 12.35, 201.76, 198.26, 186.1,
181.2, 525.58, 518.75, 486.78, 451.07, 238.77, 241.36, 227.08,
218.21, 17.3, 22.75, 19.63, 21.25, 566.93, 573.96, 535.28, 486.06,
225.18, 233.09, 226.83, 221.81, 16973.59, 17090.21, 17460.59,
17041.71, 40, 38, 35, 32, 175.63, 172.7, 163.51, 156.53, 553.12,
568.15, 552.75, 510.65, 684.28, 722.57, 695.96, 688.13, 773.82,
868.62, 740.75, 707.68), .Dim = c(4L, 48L), .Dimnames = list(
c("1980M1", "1980M2", "1980M3", "1980M4"), c("aluminum",
"bananas", "barley", "beef", "coal", "cocoa", "coffee_arabica",
"coffee_robusta", "rapeseed_oil", "copper", "cotton", "fishmeal",
"groundnuts", "hides", "iron_ore", "lamb", "lead", "logs_soft",
"logs_hard", "maize", "nickel", "crude_westtx", "crude_brent",
"oil_dubai", "oil_westtx", "olive_oil", "oranges", "palm_oil",
"pork", "poultry", "rice", "rubber", "fish", "sawn_wood_hard",
"sawn_wood_soft", "shrimp", "soybean_meal", "soybean_oil",
"soybeans", "sugar", "sunflower_oil", "tea", "tin", "uranium",
"wheat", "wool_coarse", "wool_fine", "zinc")))
I tried writing a function like this:
commodities_summary <- function(x) {
com_min <- min(x)
com_mean <- mean(x)
com_med <- median(x)
com_max <- max(x)
c(Min=com_min, Mean=com_mean, Median=com_med, Max=com_max)
}
Then trying to apply it to all the columns in the data frame like this:
commodities2 <- ddply(.data=commodities,
.variables=c(1:48),
.fun=commodities_summary)
But I keep getting an error with a new data frame looking like this:
> dput(head(commodities2,4))
structure(list(X1 = 2054.86, X2 = 2131.01, X3 = 1978.38, X4 = 1932.46,
X5 = 1775.8, X6 = 1668.96, X7 = 1758.07, X8 = 1783.63, X9 = 1655.07,
X10 = 1626.15, X11 = 1503.91, X12 = 1430.66, X13 = 1430.83,
X14 = 1452.38, X15 = 1442.95, X16 = 1369.23, X17 = 1297,
X18 = 1232.41, X19 = 1175.31, X20 = 1230.83, X21 = 1168.91,
X22 = 1140.18, X23 = 1081.85, X24 = 1130.88, X25 = 1113.26,
X26 = 1087.71, X27 = 1029.24, X28 = 996.21, X29 = 973.29,
X30 = 918.85, X31 = 957.85, X32 = 958.86, X33 = 959.53, X34 = 951.48,
X35 = 965.15, X36 = 987.39, X37 = 1076.53, X38 = 1230.46,
X39 = 1301.73, X40 = 1363.57, X41 = 1453.43, X42 = 1465.58,
X43 = 1520.21, X44 = 1603.67, X45 = 1613.73, X46 = 1566.26,
X47 = 1516.73, X48 = 1549.34, Min = 7.45, Mean = 937.015208333333,
Median = 172.15, Max = 16973.59), .Names = c("X1", "X2",
"X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11", "X12",
"X13", "X14", "X15", "X16", "X17", "X18", "X19", "X20", "X21",
"X22", "X23", "X24", "X25", "X26", "X27", "X28", "X29", "X30",
"X31", "X32", "X33", "X34", "X35", "X36", "X37", "X38", "X39",
"X40", "X41", "X42", "X43", "X44", "X45", "X46", "X47", "X48",
"Min", "Mean", "Median", "Max"), row.names = 1L, class = "data.frame")
I need a data frame so that it has the names of the columns (aluminum, bananas, barley, etc.) in the first column, then each of the summary statistics in the following 4 columns.
dplyr
i.ecommodities %>% group_by_(names(commodities)[1:48]) %>% summarise_each(fun(commodities_summary(.)))
– akrundput
output i.e.dput(droplevels(head(yourdata,10)))
– akrun