0
votes

I have a panel data including income for individuals over years, and I am interested in the income trends of individuals, i.e individual coefficients for income over years, and residuals for each individual for each year (the unexpected changes in income according to my model). However, I have a lot of observations with missing income data at least for one or more years, so with a linear regression I lose the majority of my observations. The data structure is like this:

caseid<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)
years<-c(1998,2000,2002,2004,2006,2008,1998,2000,2002,2004,2006,2008,
1998,2000,2002,2004,2006,2008,1998,2000,2002,2004,2006,2008)
income<-c(1100,NA,NA,NA,NA,1300,1500,1900,2000,NA,2200,NA, 
NA,NA,NA,NA,NA,NA, 2300,2500,2000,1800,NA, 1900)
df<-data.frame(caseid, years, income)

I decided using a random effects model, that I think will still predict income for missing years by using a maximum likelihood approach. However, since Hausman Test gives a significant result I decided to use a fixed effects model. And I ran the code below, using plm package:

inc.fe<-plm(income~years, data=df, model="within", effect="individual")

However, I get coefficients only for years and not for individuals; and I cannot get residuals. To maybe give an idea, the code in Stata should be

 xtest caseid
 xtest income year
 predict resid, resid

Then I tried to run the pvcm function from the same library, which is a function for variable coefficients.

inc.wi<-pvcm(Income~Year, data=ldf, model="within", effect="individual")

However, I get the following error message: "Error in FUN(X[[i]], ...) : insufficient number of observations".

How can I get individual coefficients and residuals with pvcm by resolving this error or by using some other function?

My original long form data has 202976 observations and 15 years.

I would also really appreciate any comments or suggestions over the method I choose to analyse. Thank you very much.

2
So, just to clarify, your regression formula is income~years and you expect to get coefficients for something else than years, namely individuals?coffeinjunky
@coffeinjunky I want to get coefficients for the change in income over years for every individual. I want to see if it followed an upward path, a downward path and what was the slope of that trend for each individual (caseid 1, caseid 2, caseid 3, caseid 4 ) seperately. Then, in a later step, I want to look at the relation of this personal income trend with some health variables. That's why I am interested in the coefficient of income~year for every individual.Aslı Gürer
Could you provide some Stata output showing what the commands xtest caseid ; xtest income year would do? I am not a Stata expert, and neither my Stata version recognizes these commands, nor does google. I furthermore have a hard time understanding what you want to do. Regressing A on B will never give you any coefficients other than a coefficient on B, and in particular not coefficients for individual observations.coffeinjunky

2 Answers

2
votes

Does the fixef function from package plm give you what you are looking for? Continuing your example:

fixef(inc.fe)

Residuals are extracted by:

residuals(inc.fe)
0
votes

You have a random effects model with random slopes and intercepts. This is also known as a random coefficients regression model. The missingness is the tricky part, which (I'm guessing) you'll have to write custom code to solve after you choose how you wish to do so.

But you haven't clearly/properly specified your model (at least in your question) as far as I can tell. Let's define some terms:

Let Y_it = income for ind i (i= 1,..., N) in year t (t= 1,...,T). As I read you question, you have not specified which of the two below models you wish to have:

M1: random intercepts, global slope, random slopes

Y_it ~ N(\mu_i + B T + \gamma_i I T, \sigma^2) 
\mu_i ~ N(\phi_0, \tau_0^2) 
\gamma_i ~ N(\phi_1, tau_1^2)

M2: random intercepts, random slopes

Y_it ~ N(\mu_i + \gamma_i I T, \sigma^2) 
\mu_i ~ N(\phi_0, \tau_0^2) 
\gamma_i ~ N(\phi_1, tau_1^2)

Also, your example data is nonsensical (see below). As you can see, you don't have enough observations to estimate all parameters. I'm not familiar with library(plm) but the above models (without missingness) can be estimated in lme4 easily. Without a realistic example dataset, I won't bother providing code.

R> table(df$caseid, is.na(df$income))

    FALSE TRUE
  1     2    4
  2     4    2
  3     0    6
  4     5    1

Given that you do have missingness, you should be able to produce estimates for either hierarchical model via the typical methods, such as EM. But I do think you'll have to write the code to do the estimation yourself.