I want to determine the within-, the overall- and the between standard deviation of panel data, using R. I have found this very similar problem Between/within standard deviations in R, but I don't know how to apply the solution to my data.
Let us use the following data as en example:
library(foreign)
Panel <-read.dta("http://dss.princeton.edu/training/Panel101.dta")
giving the following output:
country year y y_bin x1 x2 x3 opinion
1 A 1990 1342787840 1 0.27790365 -1.1079559 0.28255358 Str agree
2 A 1991 -1899660544 0 0.32068470 -0.9487200 0.49253848 Disag
3 A 1992 -11234363 0 0.36346573 -0.7894840 0.70252335 Disag
4 A 1993 2645775360 1 0.24614404 -0.8855330 -0.09439092 Disag
5 A 1994 3008334848 1 0.42462304 -0.7297683 0.94613063 Disag
6 A 1995 3229574144 1 0.47721413 -0.7232460 1.02968037 Str agree
7 A 1996 2756754176 1 0.49980500 -0.7815716 1.09228814 Disag
8 A 1997 2771810560 1 0.05162839 -0.7048455 1.41590083 Str agree
9 A 1998 3397338880 1 0.36641079 -0.6983712 1.54872274 Disag
10 A 1999 39770336 1 0.39584252 -0.6431540 1.79419804 Str disag
11 B 1990 -5934699520 0 -0.08184998 1.4251202 0.02342812 Agree
12 B 1991 -711623744 0 0.10616001 1.6496018 0.26036251 Str agree
13 B 1992 -1933116160 0 0.35378519 1.5937191 -0.23439877 Agree
14 B 1993 3072741632 1 0.72677696 1.6917576 0.25622433 Str disag
15 B 1994 3768078848 1 0.71939486 1.7414261 0.41174951 Disag
16 B 1995 2837581312 1 0.67154658 1.7083139 0.53584301 Str disag
17 B 1996 577199360 1 0.81985730 1.5324961 -0.49964902 Str agree
18 B 1997 1786851584 1 0.88016719 1.5021962 -0.57626772 Disag
19 B 1998 -149072048 0 0.70451611 1.4236463 -0.44841924 Agree
20 B 1999 -1174480128 0 0.23696731 1.4545859 -0.04936399 Str disag
21 C 1990 -1292379264 0 1.31256068 -1.2931356 0.20408297 Agree
22 C 1991 -3415966464 0 1.17748356 -1.3442180 0.28397188 Str agree
23 C 1992 -355804672 0 1.25640798 -1.2599510 0.37339270 Agree
24 C 1993 1225180032 1 1.42154455 -1.3117452 -0.37596563 Disag
25 C 1994 3802287616 1 1.11419308 -1.2849948 0.56046754 Str disag
26 C 1995 1959696640 1 1.15948391 -1.2188276 0.69540799 Agree
27 C 1996 530576672 1 1.16045427 -1.2350063 0.81689382 Agree
28 C 1997 3128852224 1 1.44641161 -1.3275964 -0.14206907 Str disag
29 C 1998 3201045760 1 1.15162671 -1.2061129 1.19458139 Str agree
30 C 1999 4663067648 1 1.19054413 -1.1266172 1.67016041 Disag
31 D 1990 1883025152 1 -0.31391269 1.7366557 0.64663702 Disag
32 D 1991 6037768704 1 0.36009100 2.1318641 1.09994173 Disag
33 D 1992 10244189 1 0.05188770 1.6816775 0.96976823 Str agree
34 D 1993 5067265024 1 0.20944354 1.6149769 -0.21257821 Str agree
35 D 1994 3882478336 1 0.38207000 1.5683011 -1.16538668 Disag
36 D 1995 8827006976 1 0.24208580 1.5412215 -0.18413101 Agree
37 D 1996 5782000128 1 0.48636678 1.7423391 -0.03731453 Str disag
38 D 1997 5090524160 1 0.35942599 1.8742865 0.08786795 Str agree
39 D 1998 1850565248 1 0.23220351 1.5953021 0.07247547 Disag
40 D 1999 -2025476864 0 -0.07998896 1.7047973 0.55843300 Str agree
41 E 1990 1342787840 1 0.45286715 1.7284026 0.59705788 Str disag
42 E 1991 2296009472 1 0.41904032 1.7068400 0.79313534 Str agree
43 E 1992 1737627776 1 0.38521346 1.6852775 0.98921281 Agree
44 E 1993 113973136 1 -0.24428773 1.6492835 1.22413278 Str agree
45 E 1994 260098048 1 1.39113998 2.5302765 -0.52620137 Str disag
46 E 1995 -7863482880 0 0.31968558 1.1890552 -0.48425370 Agree
47 E 1996 3520491520 1 0.61097682 1.4845277 -0.97895509 Agree
48 E 1997 5234565120 1 0.71761495 1.5544620 -0.98863661 Str disag
49 E 1998 344746176 1 0.69613826 1.7010406 -0.08965246 Disag
50 E 1999 243920688 1 0.60662067 1.6119040 -0.08929884 Str disag
51 F 1990 1342787840 1 -0.56757486 -0.3466710 1.25841928 Str agree
52 F 1991 3560401920 1 0.15974578 -0.4641182 0.32665297 Str disag
53 F 1992 3192281088 1 0.88706642 -0.5815655 -0.60511333 Agree
54 F 1993 8941232128 1 0.53241795 -0.7553238 -0.51157588 Agree
55 F 1994 8124504576 1 0.87260014 -0.7114431 0.20570269 Str agree
56 F 1995 491740096 1 0.91935229 -0.3697441 -0.01292755 Str agree
57 F 1996 3497164544 1 1.39689231 -0.3601406 0.67867643 Str agree
58 F 1997 4764803072 1 0.98688608 -0.3590902 0.24226174 Str agree
59 F 1998 -4671723520 0 0.78830910 -0.7556524 0.73347801 Agree
60 F 1999 6349319168 1 0.27938697 -0.4601679 1.17317200 Disag
61 G 1990 1342787840 1 0.94488174 -1.5150151 1.45265734 Str disag
62 G 1991 -1518985728 0 1.09872830 -1.4614717 1.43964469 Agree
63 G 1992 1912769920 1 1.25257492 -1.4079282 1.42663205 Str agree
64 G 1993 1345690240 1 0.76276451 -1.3519315 1.85448635 Str disag
65 G 1994 2793515008 1 1.20645559 -1.3252175 2.23653030 Str disag
66 G 1995 1323696384 1 1.08718646 -1.4098167 2.82980847 Str disag
67 G 1996 254524176 1 0.78107548 -1.3279996 4.27822399 Str agree
68 G 1997 3297033216 1 1.25787950 -1.5773667 4.58732557 Disag
69 G 1998 3011820800 1 1.24277663 -1.6012177 6.11376190 Disag
70 G 1999 3296283392 1 1.23420024 -1.6217614 7.16892195 Disag
The within St.Dev. shall capture the variance within a country over the years. Whereas the between St.Dev. shall capture the variance between countries. The output should therefore be 3 different standard deviations (within, between and overall) for every variable (here: x1, x2, x3). PS: I am also using the plm and the reshape2 package.
EDIT: In the second step I am calculating the mean for every country by
Panel_mean <- Panel %>% group_by(country) %>% summarise(mean(x1), mean(x2), mean(x3))
Getting the variance for the in between countries by:
Panel %>% group_by(country) %>% summarise_each(funs(mean), x1, x2, x3) %>%
summarise_each(funs(var), x1, x2, x3)
and the variance for the in between years by:
Panel %>% group_by(year) %>% summarise_each(funs(mean), x1, x2, x3) %>%
summarise_each(funs(var), x1, x2, x3)
EDIT 2: Because it was asked, here are my next steps: I want to determine country-specific regressors to plot unconditional correlations between y and each of these regressors. I want to get 3 "groups" of plots for each variable: 1. Overall correlation 2. deviations of y and regressors from their country means (within variance) 3. the correlation of the district means of variables (between variance)
Here is an example of the output desired:
For the overall correlation I guess I could simply use a lm (instead of the plm used for the panel data analysis), as in:
plot(x1, y)
abline(lm(y~x1)
Or am I completely on the wrong track?
Panel %>% group_by(country) %>% summarise_each(funs(mean), x1, x2, x3) %>% summarise_each(funs(var), x1, x2, x3)
this? – ExperimenteRgroup_by(year)
. – Gilles Cosyn