0
votes

For a university project, I am trying to fit a regression model for demand to a number of independent variables. I tried to include a small example, but it didn't work as a figure (as I am new to this). Instead, see the following link to view a sample of the dataset that I am using:

In this table, the first column indicates country pairs, columns 2 to 6 are independent variables, and the final column is the depedent variable. What I would like to do, is to perform regression analysis, assuming that this data can be described by a gravity equation.

I know that people frequently use log-linearisation to solve this. However, as I am dealing with zeros in my data (and don't to distort the data by adding small constants), and as I assume that heteroskedasticity is in the data, I would like to use a different method. Based on what Santos 2006 described (in his paper "The log of gravity"), I would like to use Poisson Pseudo Maximum Likelihood Estimation.

However, I am fairly new to using R (or any statistical software), and I cant figure out how to do this in R. Can anybody help me with this? The only thing I've found so far is that the glm commands poisson and quasipoisson could be used (https://stat.ethz.ch/pipermail/r-help/2010-September/252476.html).

I've searched for help in documents on glm, but I don't understand how to use the glm function to solve this gravity model? How do I specify that I want the model in the form:

DEM = RP^alpha1 * GDPC_O^alpha2 * GDPC_D^alpha3 * POP_O^alpha4.... and then use the regression to solve for the different alphas?

1
Have a look at ?family. This question may be useful also: stats.stackexchange.com/q/20826/229 - James

1 Answers

1
votes

Hard to say precisely without more detail, but

glm(DEM ~ log(RP) + log(GDPC_O) + log(GDPC_D) + log(POP_O),
     data=your_data,
     family=quasipoisson(link="log"))

should work reasonably well. The intercept will be the log of the proportionality constant; the other coefficients will be the exponents of the respective terms (this works because the log link says that log(DEM) = beta_0 + beta_1*log(RP) + ...; if you exponentiate both sides you get DEM = exp(beta_0) * exp(log(RP)*beta_1) * ... or DEM = C*RP^beta_1*...

PS it is not necessary, but it may be helpful for interpretation to scale and center your predictor variables.