We are attempting to estimate a travel mode choice model using the mlogit package. Ultimately, we intend to set up a nested model with more variables, however we are attempting to first set up a very simple non-nested multinomial model to test. In particular, what we're trying to accomplish differs from the examples in the mlogit package in that we have some alternative-specific (e.g. bike vs. walk vs. drive) utility functions.
Our starting dataset (simplified) has this form:
"recid","mode","walk_mode_time","bike_mode_time","carsdivworkers"
254,"Bike",15.0666484832764,4.51999473571777,0.5
7,"SOV",17.9941387176514,5.39824199676514,2
40,"Walk",43,12.8999996185303,1
The utility functions that we want to specify for this test model are as follows:
Utility(SOV)= beta1* carsdivworkers
Utility(Walk)= Constant(Walk)+ beta6*(walk_mode_time) + beta7 *( carsdivworkers)
Utility(Bike)= Constant(Bike)+ beta8*(bike_mode_time) + beta9 *( carsdivworkers))
To make our data look more like the examples in the mlogit documentation, we THINK we need to structure our data with:
- Each record (which lists a chosen alternative) replicated to also include the non-chosen alternatives for a given trip.
- Alternative-specific values zeroed out for the non-chosen alternatives
This results in a data structure that looks like:
"recid","mode","choice","walk_mode_time",”bike_mode_time","cardivwkr"
7,"Bike",FALSE,0,5.39824199676514,1
7,"DriveTransit",FALSE,0,0,1
7,"HOV2",FALSE,0,0,1
7,"HOV3",FALSE,0,0,1
7,"SOV",TRUE,0,0,1
7,"Walk",FALSE,17.9941387176514,0,1
7,"WalkTransit",FALSE,0,0,1
40,"Bike",FALSE,0,12.8999996185303,0.5
40,"DriveTransit",FALSE,0,0,0.5
40,"HOV2",FALSE,0,0,0.5
40,"HOV3",FALSE,0,0,0.5
40,"SOV",FALSE,0,0,0.5
40,"Walk",TRUE,43,0,0.5
40,"WalkTransit",FALSE,0,0,0.5
254,"Bike",TRUE,0,4.51999473571777,1
254,"DriveTransit",FALSE,0,0,1
254,"HOV2",FALSE,0,0,1
254,"HOV3",FALSE,0,0,1
254,"SOV",FALSE,0,0,1
254,"Walk",FALSE,15.0666484832764,0,1
254,"WalkTransit",FALSE,0,0,1
We then turn this into an mlogit data structure as follows:
logit_data <- mlogit.data(data=joined_data,
choice="choice",
shape="long",
alt.var="mode",
chid.var="recid",
drop.index=TRUE,
reflevel= "SOV")
And our model specification:
mc <-mlogit(formula= choice ~ 1 | carsdivworkers | walk_mode_time + bike_mode_time,
data = logit_data, reflevel= "SOV")
Unfortunately, we get the following error when we run this against our full dataset:
Error in solve.default(H, g[!fixed]) : Lapack routine dgesv: system is exactly singular
We think that this formula specifies the utility functions we want, but are not sure. Is this correct? Also, do we need to manually replicate our data records as we have done? Or is there a way of having mlogit.data() build a set of choice alternatives from our initial dataset?