1
votes

I am new to using Markov Chains and have a problem that I haven't found a solution to. I am trying to fit a Markov Chain to a dataset to get the transition probabilities that people switch from one state to another, and I am wondering how I take the effect of individual participants into account when building the model.

Here's an example:

#Here's a dataframe with participant numbers and their state: A, B, C
DF <- data.frame(pp = rep(1:10, each  = 3), 
             state = sample(rep(LETTERS[1:3], each = 10)))

> head(DF)
   pp state
1  1     A
2  1     A
3  1     B
4  2     A
5  2     B
6  2     B

I can't just fit a Markov Chain to the state column as this would ignore the participant information, and it doesn't make sense to get the transition probability across two participants, for example:

mcFit <- markovchainFit(data=DF$state)

Do I need to fit a transition matrix to each participant individually and then average across them? And if so, how would I got about this?

Further, in this case, how would you deal with non-identical transition matrices across participants? E.g. some participants might not have any transitions between the states A and C, while others would:

#Example of Participant 1
     A    B
 A 0.25 0.50
 B 0.15 0.55

#Example of Participant 2
     A    C
 A 0.50 0.25
 C 0.25 0.50

Any help with this or recommendations of resources would be greatly appreciated.

1
I think if you separate the individual states with "NA" then markovchainFit will interpret the vector as a sequence of short transitions. See the help for details. You can also pass it a list. That would give you global transition probs - a null model you can test the hypothesis that anyone is different (somehow...) - Spacedman
@Spacedman Thanks! I'm only interested in the global transition probabilities, I just don't want to include transitions that never happened (e.g. the transition from B to A between participant 1 and 2 in my example). So by converting my dataframe to a list, with each participant as an individual item in the list would work? - Bjorn
Some simple tests seem to agree yes, except with a list argument I don't get a log-likelihood (of what?) but I do if I use NA. e.g. markovchainFit(c(1,2,1,NA,1,1,1,2)) and markovchainFit(list(c(1,2,1),c(1,1,1,2))) - same outputs, but no loglik in the second case... - Spacedman
For some reason the code here github.com/spedygiorgio/markovchain/blob/… doesn't add the log likellihood unless the data is a simple vector. Anyway, have we cracked this now? - Spacedman

1 Answers

1
votes

If you run markovchainFit just on the vector you get this estimate:

> markovchainFit(DF$state)$estimate
MLE Fit 
 A  3 - dimensional discrete Markov Chain defined by the following states: 
 A, B, C 
 The transition matrix  (by rows)  is defined as follows: 
          A         B         C
A 0.2000000 0.3000000 0.5000000
B 0.6000000 0.1000000 0.3000000
C 0.2222222 0.5555556 0.2222222

but you can use split to break the state vector up into a list of vectors based on the pp column, then pass that to markovchainFit:

> markovchainFit(split(DF$state,DF$pp))$estimate
MLE Fit 
 A  3 - dimensional discrete Markov Chain defined by the following states: 
 A, B, C 
 The transition matrix  (by rows)  is defined as follows: 
          A         B         C
A 0.4000000 0.2000000 0.4000000
B 0.7500000 0.0000000 0.2500000
C 0.1428571 0.5714286 0.2857143