0
votes

there are some linked questions but I really can not make any sense out of it. I am new to statistics, R, the mlogit package and also to stockoverflow. I will try to ask my question as precisely as possible. Here is [a link to the data ].(https://docs.google.com/spreadsheets/d/1IvN6ZgCgDERu3Mn4AglZMjicoXnFQQHc9GhAhbrpFRI/edit?usp=sharing) I have a data set from a discrete choice experiment with a dependent variable "choice" with two levels (yes/no) and 4 independent variables with each 3 levels.

I try to estimate with mlogit but I have some real problems and my supervisor is not able to help. In my dataset the values for each variable are either 1,2,3, (1 for brand 1, 2 for brand 2, etc...)

    t1 <- read_csv("~/Dokumente/UvA/Thesis/R/t1.csv")
t1 <- mlogit.data(data=t1, choice="choice",shape="long",alt.levels=paste("pos",1:4),id.var="id")

To run the estimation I use the following function:

m1 <- mlogit(choice~ 0 + Brand+ Features+ Valence+ Volume, data=t1)
summary(m1)

and got this outcome: model 1 estimates and noticed that Rstudio interpreted my data set variables as integer. As the variables are 3 different brands, 3 different features and 3 different categories of valenve and volume (low, med and high), I would like to include the estimates of the levels. I therefore tired to upload them into Rstudio and specified them as characters using this function

library(readr)
t1 <- read_csv("~/Dokumente/UvA/Thesis/R/t1.csv", 
col_types = cols(Brand = col_character(), 
    Features = col_character(), Valence = col_character(), 
    Volume = col_character()))

If I run the same mlogit function now, I get an error:

Error in solve.default(H, g[!fixed]) : system is computationally singular: reciprocal condition number = 3.11303e-18

When I use characters for the different levels (e.g. brand names instead of 1,2,3 see data sheet 2"t2") I have the same singularity problem. a) Does the outcome make any sense if I use the numbers in the first data set? b) how can I integrate my values as characters to estimate the attribute levels?

I hope someone can help me because I am really confused and new to all of this. I am most certainly making an very basic or stupid mistake.

Cheers

1

1 Answers

0
votes

There are several issues. The first issue is that you have one choice value labeled as "10", but you say it should have only two levels.

library(readxl)
library(dplyr)

t1 <- read_excel("~/Downloads/Data mlogit.xlsx", sheet=1) %>% as.data.frame
t1$choice %>% table

   0    1   10 
2770  925    1 

Assuming that it's just mislabled, you should also not be running a multinomial logit, which only applies if you have more than two levels. Instead, you should be running a standard logistic or similar. Example:

# Correct mislabeled sample
t1$choice[t1$choice == 10] <- 1

# Make everything factors
for(i in 1:ncol(t1)) {
  t1[[i]] <- factor(t1[[i]])
}

# Run logistic
library(glmnet)

y <- t1$choice
t1d <- dplyr::select(t1, Brand, Features, Valence, Volume)
t1d <- model.matrix( ~ .-1, t1d)
fit <- glmnet(t1d,y, family="binomial", intercept=F, lambda = 0, alpha=0)
coefficients(fit)

(Intercept)  .        
Brand0      -2.0328103
Brand1      -0.4518273
Brand2      -1.4383109
Brand3      -1.4903840
Features1   -0.5857877
Features2    0.2900501
Features3    0.2717443
Valence1     1.4788752
Valence2    -0.1585652
Valence3    -1.9390001
Volume1     -0.6920187
Volume2     -0.1013821
Volume3      0.7010679

There are lots of ways to run logistic regression in R, I tend to use the glmnet package.