JAGS: variable number of clusters

Question

I am trying to run a Bayesian clustering model where the number of clusters is random with binomial distribution. This is my Jags model:

model{
    for(i in 1:n){
        y[ i ,1:M] ~ dmnorm( mu[z[i] , 1:M] , I[1:M, 1:M])      
        z[i] ~ dcat(omega[1:M])
    }
    for(j in 1:M){
        mu[j,1:M] ~ dmnorm( mu_input[j,1:M] , I[1:M, 1:M] )
    }
    M ~ dbin(p, Mmax)       
    omega ~ ddirich(rep(1,Mmax))
}

to run it, we need to define the parameters anche the initial values for the variables, which is done in this R script

Mmax=10

y = matrix(0,100,Mmax)
I = diag(Mmax)
y[1:50,] = mvrnorm(50, rep(0,Mmax), I)
y[51:100,] = mvrnorm(50, rep(5,Mmax), I)

plot(y[,1:2])

z = 1*((1:100)>50) + 1

n = dim(y)[1]

M=2
mu=matrix(rnorm(Mmax^2),nrow=Mmax)
mu_input=matrix(2.5,Mmax,Mmax) ### prior mean
p=0.5

omega=rep(1,Mmax)/Mmax

data = list(y = y, I = I, n = n, mu_input=mu_input, Mmax = Mmax, p = p)

inits = function() {list(mu=mu,
                         M=M,
                         omega = omega) }

require(rjags)
modelRegress=jags.model("cluster_variabile.txt",data=data,inits=inits,n.adapt=1000,n.chains=1)

however, running the last command, one gets

Error in jags.model("cluster_variabile.txt", data = data, inits = inits,
:   RUNTIME ERROR: Compilation error on line 6. 
Unknown variable M Either supply values 
for this variable with the data or define it  on the left hand side of a relation.

which for me makes no sense, since the error is at line 6 even if M already appears at line 4 of the model! What is the actual problem in running this script?

Robert Thomas Robert Thomas · Accepted Answer · 2020-12-30T19:53:17

So JAGS is not like R or other programming procedural languages in that it doesn't actually run line by line, it is a declarative language meaning the order of commands doesn't actually matter at least in terms of how the errors pop up. So just because it didn't throw an error on line 4 doesn't mean something isn't also wrong there. Im not positive, but I believe the error is occuring because JAGS tries to build the array first before inputting values, so M is not actually defined at this stage, but nothing you can do about that on your end.

With that aside, there should be a fairly easy work around for this, it is just less efficient. Instead of looping from 1:M make the loop iterate from 1:MMax that way the dimensions don't actually change, it is always an MMax x MMax. Then line 7 just assigns 1:M of those positions to a value. The downside of this is that it will require you to do some processing after the model is fit. So on each iteration, you will need to pull the sampled M and filter the matrix mu to be M x M, but that shouldn't be too tough. Let me know if you need more help.

JAGS: variable number of clusters

2 Answers