4
votes

I'm trying to run plm to see effects of classes positive, negative and neutral on stock prices.

DATE <- c("1","2","3","4","5","6","7","1","2","3","4","5","6","7")
COMP <- c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B")
RET <- c(-2.0,1.1,3,1.4,-0.2, 0.6, 0.1, -0.21, -1.2, 0.9, 0.3, -0.1,0.3,-0.12)
CLASS <- c("positive", "negative", "neutral", "positive", "positive", "negative", "neutral", "positive", "negative", "negative", "positive", "neutral", "neutral", "neutral")
df <- data.frame(DATE, COMP, RET, CLASS, stringsAsFactors=F)

df

#    DATE COMP   RET    CLASS
# 1     1    A -2.00 positive
# 2     2    A  1.10 negative
# 3     3    A  3.00  neutral
# 4     4    A  1.40 positive
# 5     5    A -0.20 positive
# 6     6    A  0.60 negative
# 7     7    A  0.10  neutral
# 8     1    B -0.21 positive
# 9     2    B -1.20 negative
# 10    3    B  0.90 negative
# 11    4    B  0.30 positive
# 12    5    B -0.10  neutral
# 13    6    B  0.30  neutral
# 14    7    B -0.12  neutral

If I run the model, the output shows only two of the estimates (neutral and positive). How can I see the estimate of class negative? I think it's got something to do with the Dummies. But still, shouldn't there be at least a line "Intercept" for the negative class?

mymodel <- plm(RET ~ CLASS, data=df,
              index = c("DATE", "COMP"), 
              model="within", 
              effect="time")

summary(mymodel)

# Oneway (time) effect Within Model

# Call:
# plm(formula = RET ~ CLASS, data = df, effect = "time", model = "within", 
#     index = c("DATE", "COMP"))

# Balanced Panel: n=7, T=2, N=14

# Residuals :
#    Min. 1st Qu.  Median 3rd Qu.    Max. 
# -2.1500 -0.4620 -0.0791  0.7540  1.9300 

# Coefficients :
#               Estimate Std. Error t-value Pr(>|t|)
# CLASSneutral   0.35818    0.81581  0.4390    0.670
# CLASSpositive -0.56418    0.81581 -0.6916    0.505

# Total Sum of Squares:    16.79
# Residual Sum of Squares: 14.694
# R-Squared      :  0.12486 
#       Adj. R-Squared :  0.089183 
# F-statistic: 0.713347 on 2 and 10 DF, p-value: 0.5133

Thank You!

1

1 Answers

1
votes

As with most models with categorical covariates, The first level is used as a reference level. In this case the "negative" category is used as the reference category because by default R sorts the levels of a factor alphabetically. When you have a categorical data, you can't really tease out the person-specific mean and the mean for the reference category. They are combined into the intercept term. Then the coefficient for CLASSneutral isn't the effect of the neutral class, it's the different between the effect of neutral and negative. Same for CLASSpositive -- that's the different between the effect of positive and negative. Because the model by default uses individual effects, each person has their own intercept, I'm assuming that's why they didn't print it on the summary.

This is not unique to plm. The same thing would happen with a standard lm.