Bayesian Imputation Missing Data JAGS
Hi all,
I was wondering whether it is possible to conduct Bayesian imputation for missing data in JASP using the JAGS module. I was exploring this possibility using the synthetic data and model down below which seems to work when using the rjags package in R, but not when using the JAGS module in JASP. Since I am new to Bayesian imputation and new to JAGS, as I have previously used RStan, I'm unsure as to whether I'm making a mistake or not.
I'd really appreciate your help!
Thanks in advance
Jonathan
R code for synthetic data:
n = 100
x = rnorm(n, 0, 1)
y = rnorm(n, 1*x, 1) # simulate y as a function of x
d = data.frame(x=x, y=y)
d[sample(1:nrow(d), size = 10, replace = F), "x" ] = NA # drop 10 x values at random to create cases of missing data
d = d %>% arrange(x) # Arrange so that d[91:100, "x"] contains missing values
JAGS model code:
model {
# Prior on intercept and slope
intercept ~ dnorm(0, 1)
slope ~ dnorm(0, 1)
# Prior on sd of y
sigma ~ dunif(0, 5)
# Priors on mean + sd of x
mux ~ dnorm(0, 1)
sdx ~ dunif(0, 5)
# x[1:90] and y[1:90] are known
# Use known values to estimate mux, sdx, intercept, slope, sigma
for (i in 1:90) {
x[i] ~ dnorm(mux, sdx)
mu[i] = intercept + slope * x[i]
y[i] ~ dnorm(mu[i], sigma)
}
# x[91:100] are unknown (NA) and y[91:100] are known
Estimate unkown x
for (i in 91:100) {
x[i] ~ dnorm(mux, sdx)
mu[i] = intercept + slope * x[i]
y[i] ~ dnorm(mu[i], sigma)
}
}
Comments
why do you say it works in rjags but not in jasp? would it be possible to give the R code to run rjags as well so I can compare? I can see it doesn't work in JASP
Hey patc3, thanks for the reply! This is what I've been working with. (I adjusted the model code ever so slightly so that I can adjust the number of simulated data points)
Well, you're right that it runs in R but not in JASP, not sure why yet... this is the error I get (missing data go from x[991] to x[1000])
That's the same error I was getting. I'm not sure how exactly rjags handles the NA values under the hood, but maybe it's different from how the JAGS module in JASP does it? I was wondering whether JAGS inserts parameters for the NAs in the x-vector, (this is what I do manually in stan,) so I went ahead and replaced the "x" in the missing-data-loop with "z" to ensure that it is treated as a parameter. This seems to have done the trick, I think, as I'm getting estimates for z that are basically equal to the estimates I get when running the previous code in R. Do you know if this is the same thing conceptually?
# x[991:1000] are unknown (NA) and y[991:1000] are known
# Estimate unkown x
for (i in 991:1000) {
z[i-990] ~ dnorm(mux, sdx)
mu[i] = intercept + slope * z[i-990]
y[i] ~ dnorm(mu[i], sigma)
}
Hi Jonathan, you're very resourceful, yes I think conceptually this should be the same. Nicely done. Also it's a nice guess what you said about NAs, I don't know why it fails in JASP but it's a good guess.
By the way if you wanna check that the two models are approximately the same you should try with a LOT more missing data, e.g. nmiss=700 (or something extreme like that). You stand more chance of spotting a difference between the two models (if there is one) if there is more than 1% missing data (10/1000).
hey it just occurred to me, you might not need the last for loop at all if you run in R, for example just this model string
This still doesn't run in JASP and I think it's because where you can say variable.names="x" in coda.samples(), JASP doesn't see x as a parameter and doesn't let you monitor it
I'll forward this to our expert. I think it is a known bug!
EJ