It's too bad this was migrated from stats.stackexchange.com, because you probably would have gotten a better answer there.
The mlogit
package expects data on individuals, and can accept either "wide" or "long" data. In the former there is one row per individual indicating the mode chosen, with separate columns for every combination for the mode-specific variables (time and price in your example). In the long format there is are n rows for every individual, where n is the number of modes, a second column containing TRUE
or FALSE
indicating which mode was chosen for each individual, and one additional column for each mode-specific variable. Internally, mlogit
uses long format datasets, but you can provide wide format and have mlogit
transform it for you. In this case, with just two variables, that might be the better option.
Since mlogit
expects individuals, and you have counts of individuals, one way to deal with this is to expand your data to have the appropriate number of rows for each mode, filling out the resulting data.frame with the variable combinations. The code below does that:
df.agg <- data.frame(month=1:4,car=c(3465,3674,3543,4334),bus=c(1543,2561,2432,1266),bicycle=c(453,234,123,524))
df.lvl <- data.frame(mode=c("car","bus","bicycle"), price=c(120,60,0), time=c(5,10,30))
get.mnth <- function(mnth) data.frame(mode=rep(names(df.agg[2:4]),df.agg[mnth,2:4]),month=mnth)
df <- do.call(rbind,lapply(df.agg$month,get.mnth))
cols <- unlist(lapply(df.lvl$mode,function(x)paste(names(df.lvl)[2:3],x,sep=".")))
cols <- with(df.lvl,setNames(as.vector(apply(df.lvl[2:3],1,c)),cols))
df <- data.frame(df, as.list(cols))
head(df)
# mode month price.car time.car price.bus time.bus price.bicycle time.bicycle
# 1 car 1 120 5 60 10 0 30
# 2 car 1 120 5 60 10 0 30
# 3 car 1 120 5 60 10 0 30
# 4 car 1 120 5 60 10 0 30
# 5 car 1 120 5 60 10 0 30
# 6 car 1 120 5 60 10 0 30
Now we can use mlogit(...)
library(mlogit)
fit <- mlogit(mode ~ price+time|0 , df, shape = "wide", varying = 3:8)
summary(fit)
#...
# Frequencies of alternatives:
# bicycle bus car
# 0.055234 0.323037 0.621729
#
# Coefficients :
# Estimate Std. Error t-value Pr(>|t|)
# price 0.0047375 0.0003936 12.036 < 2.2e-16 ***
# time -0.0740975 0.0024303 -30.489 < 2.2e-16 ***
# ...
coef(fit)["time"]/coef(fit)["price"]
# time
# -15.64069
So this suggests the reducing travel time by 1 (minute?) is worth about 15 (dollars)?
This analysis ignores the month
variable. It's not clear to me how you would incorporate that, as month is neither mode-specific nor individual specific. You could "pretend" that month is individual-specific, and use a model formula like : mode ~ price+time|month
, but with your dataset the system is computationally singular.
To reproduce the result from the other answer, you can use mode ~ 1|month
with reflevel="car"
. This ignores the mode-specific variables and just estimates the effect of month (relative to mode = car).
There's a nice tutorial on mlogit
here.
dput(mydata)
. That way, it's easy for us to import and manipulate it. – jlhoward