2
votes

This may be a naive question but I am running a regression of a variable on a set of other variables.

But each country has several observations and the original regression has a pooled sample. Now I want the averages of each country and run a regression on the averages.

For example I have 50 countries and each country has either 3 or 4 observations. Now I want an average of each variable by country. So that in the end every independent/dependent variable has 50 observations, one for each country.

Right now I am using the aggregate command but its creating a variable with country name and average value both. So I am not able to run a regression on these variables.

This for example what I have

Country/ some-observation/ Some-other-observation/ some-other-observation-2
Somalia/ 3 / 7 / . ...
USA/ 7 / 8 / ...
Nigeria/ 5/ 8 / ...
Nigeria/ 9 / 2 / ..
India/ 4 / 7/ ..
India/ 7 / 9/ ..
UK/ 8 / 1/ ..
UK /5/ 5 / ..

etc

1
The best way to do this (and what they usually insist on at StackOverflow) is if you supply a small set of data that has the same issue as your problem so we can show you how to do it.Glen_b
This question appears to be off-topic because it is about how to use R, but w/o a reproducible example.gung - Reinstate Monica

1 Answers

5
votes

One very good way to generate such things is to use tapply.

#set up some data
mycodat <- read.csv(stdin(),header=TRUE)
country,obsv
Spain,4     
Spain,5
Portugal,3
Portugal,7
Venezuala,8
Zambia,2
Zambia,4
Zambia,3

regdat <- data.frame(country=unique(mycodat$country)) # the thing you're trying to get
                                                      # the country means into

At this point we have two sets of data, the first with multiple values per country and the second with one row per country where you want to put the mean data. So here's how to do it.

# Now generate the summary and put it in the data
regdat$meanobsv <- with(mycodat,tapply(obsv,country,mean))