Looping subsets in plm

Question

I'm trying to program something quite simple (I think) in R, but I can't seem to get it right. I have a dataset of 50 countries (1 to 50) for 15 years each and about 20 variables per country. For now I am only testing one variable (OS) on my dependent variable (SMD). I would like to do this with a loop country by country so I would get the output for each country in stead of the overall output.

I thought it would be wise to create a subset first (to be able to look at country 1 first, after which my loop should increase the number for country and test country 2). I believe my regression at the bottom of the page should give me the output for country 1 in stead of the overall score for the entire dataset. However I keep getting these errors:

> pdata <- plm.data(newdata, index=c("Country","Date"))
  series    are constants and have been removed
> pooling <- plm(Y ~ X, data=pdata, model= "pooling") 
  series Country, xRegion are constants and have been removed
  Error in model.matrix.pFormula(formula, data, rhs = 1, model = model,  : 
  NA in the individual index variable
> summary(pooling)
  Error in summary(pooling) : object 'pooling' not found

I might be looking at this all wrong, but I believe that without getting this to work, there is no point in going further with programming the loop itself. Any advice on solving my errors, or other ways of programming a loop are really appreciated.

My code:

rm(list = ls())
mydata <- read.table(file = file.choose(), header = TRUE, dec = ",")
names(mydata)
attach(mydata)

Y <- cbind(SMD)
X <- cbind(OS)

newdata <- subset(mydata, Country %in% c(1))

newdata

pdata <- plm.data(newdata, index=c("Country","Date"))
pooling <- plm(Y ~ X, data=pdata, model= "pooling") 
summary(pooling)

Edit: data sample of first 2 countries which causes same error

dput(mydata) structure(list(Region = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("NAF", "SAME"), class = "factor"), Country = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Date = c(1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L ), OS = structure(c(19L, 25L, 27L, 15L, 22L, 20L, 23L, 9L, 7L, 5L, 2L, 1L, 4L, 3L, 6L, 10L, 11L, 13L, 11L, 8L, 26L, 25L, 31L, 29L, 28L, 21L, 30L, 24L, 24L, 16L, 11L, 14L, 12L, 17L, 18L, 29L, 32L, 32L, 33L, 34L), .Label = c("51.5", "52.2", "55.6", "56.4", "56.7", "57.7", "57.8", "58.3", "59", "59.2", "59.6", "59.9", "60.2", "60.4", "61.1", "61.2", "62.2", "62.3", "62.8", "63.2", "63.3", "63.8", "63.9", "64.2", "64.3", "64.5", "64.7", "65.3", "65.5", "65.6", "66.4", "68", "69.6", "70.7"), class = "factor"), SMD = structure(c(7L, 12L, 20L, 21L, 17L, 15L, 13L, 10L, 14L, 22L, 23L, 33L, 1L, 32L, 29L, 34L, 28L, 25L, NA, NA, 9L, 6L, 8L, 4L, 2L, 35L, 3L, 36L, 5L, 11L, 16L, 18L, 24L, 19L, 26L, 31L, 27L, 30L, NA, NA), .Label = c("100.3565662", "13.44788845", "13.45858747", "13.56815534", "15.05892471", "17.63789658", "18.04088718", "18.3101351", "19.34226196", "21.25530884", "21.54423145", "23.75898948", "24.08770926", "26.39817342", "29.44079001", "31.40605191", "34.46667996", "34.52913657", "35.66070947", "36.4419931", "39.16875621", "44.0126137", "45.72949566", "49.13062679", "54.83730247", "56.87886311", "59.80971583", "60.5658962", "69.20148901", "70.91362874", "72.64845214", "73.97139238", "75.20140919", "76.18378138", "9.570435019", "9.867635305"), class = "factor")), .Names = c("Region", "Country", "Date", "OS", "SMD"), class = "data.frame", row.names = c(NA, -40L))

Welcome to Stack Overflow! Please read about how to provide a reproducible example including data and code. — Thomas
@josilber, I have added a subsample of two countries, hopefully in the right format — user3352474

jlhoward jlhoward · Accepted Answer · 2014-02-26T18:07:05

Are you sure you need to use plm?? This produces a list of summaries by country.

# convert factors to numeric
mydata$SMD <- as.numeric(mydata$SMD)
mydata$OS  <- as.numeric(mydata$OS)

# Using lapply(...)
smry <- lapply(unique(mydata$Country),
               function(cntry)
                 summary(lm(SMD~OS,data=mydata[mydata$Country==cntry,])))
# Same thing, using for loop
smry <- list()
for (cntry in unique(mydata$Country)) {
  smry <- list(smry, 
               summary(lm(SMD~OS,data=mydata[mydata$Country==cntry,])))
}

In your dataset, SMD and OS are factors, which need to be converted to numeric first.

Looping subsets in plm

1 Answers