
there are some linked questions but I really can not make any sense out of it. I am new to statistics, R, the mlogit package and also to stockoverflow. I will try to ask my question as precisely as possible. Here is [a link to the data ].(https://docs.google.com/spreadsheets/d/1IvN6ZgCgDERu3Mn4AglZMjicoXnFQQHc9GhAhbrpFRI/edit?usp=sharing) I have a data set from a discrete choice experiment with a dependent variable "choice" with two levels (yes/no) and 4 independent variables with each 3 levels.

I try to estimate with mlogit but I have some real problems and my supervisor is not able to help. In my dataset the values for each variable are either 1,2,3, (1 for brand 1, 2 for brand 2, etc...)

    t1 <- read_csv("~/Dokumente/UvA/Thesis/R/t1.csv")
t1 <- mlogit.data(data=t1, choice="choice",shape="long",alt.levels=paste("pos",1:4),id.var="id")

To run the estimation I use the following function:

m1 <- mlogit(choice~ 0 + Brand+ Features+ Valence+ Volume, data=t1)

and got this outcome: model 1 estimates and noticed that Rstudio interpreted my data set variables as integer. As the variables are 3 different brands, 3 different features and 3 different categories of valenve and volume (low, med and high), I would like to include the estimates of the levels. I therefore tired to upload them into Rstudio and specified them as characters using this function

t1 <- read_csv("~/Dokumente/UvA/Thesis/R/t1.csv", 
col_types = cols(Brand = col_character(), 
    Features = col_character(), Valence = col_character(), 
    Volume = col_character()))

If I run the same mlogit function now, I get an error:

Error in solve.default(H, g[!fixed]) : system is computationally singular: reciprocal condition number = 3.11303e-18

When I use characters for the different levels (e.g. brand names instead of 1,2,3 see data sheet 2"t2") I have the same singularity problem. a) Does the outcome make any sense if I use the numbers in the first data set? b) how can I integrate my values as characters to estimate the attribute levels?

I hope someone can help me because I am really confused and new to all of this. I am most certainly making an very basic or stupid mistake.



There are several issues. The first issue is that you have one choice value labeled as "10", but you say it should have only two levels.


t1 <- read_excel("~/Downloads/Data mlogit.xlsx", sheet=1) %>% as.data.frame
t1$choice %>% table

   0    1   10 
2770  925    1 

Assuming that it's just mislabled, you should also not be running a multinomial logit, which only applies if you have more than two levels. Instead, you should be running a standard logistic or similar. Example:

# Correct mislabeled sample
t1$choice[t1$choice == 10] <- 1

# Make everything factors
for(i in 1:ncol(t1)) {
  t1[[i]] <- factor(t1[[i]])

# Run logistic

y <- t1$choice
t1d <- dplyr::select(t1, Brand, Features, Valence, Volume)
t1d <- model.matrix( ~ .-1, t1d)
fit <- glmnet(t1d,y, family="binomial", intercept=F, lambda = 0, alpha=0)

(Intercept)  .        
Brand0      -2.0328103
Brand1      -0.4518273
Brand2      -1.4383109
Brand3      -1.4903840
Features1   -0.5857877
Features2    0.2900501
Features3    0.2717443
Valence1     1.4788752
Valence2    -0.1585652
Valence3    -1.9390001
Volume1     -0.6920187
Volume2     -0.1013821
Volume3      0.7010679

There are lots of ways to run logistic regression in R, I tend to use the glmnet package.