Introduction to Monte Carlo simulation using JAGS
This practical gives an introduction to the Monte Carlo method using JAGS (Just Another Gibbs Sampler). First we have to install some R packages that act as frontenda for JAGS:
R2jags
allows fitting JAGS models from within R (depend onrjags
);coda
provides functions for output analysis and diagnostics for MCMC;lattice
provides functions for visualisation of an MCMC object;MCHCvis
in an R packages used to visualise, manipulate, and summarise MCMC output.
```{r eval=FALSE}
install.packages('R2jags', dependencies = TRUE) # install coda too
install,packages('coda', dependencies = TRUE)
install.packages('lattice', dependencies = TRUE)
install.packages('MCMvis', dependencies = TRUE)
```{r message=FALSE, warning=FALSE}
library(R2jags)
library(MCMCvis)
library(coda)
library(lattice)
Introduce
The main steps of fitting a Baysian model using R2jags
are the following:
- Define the JAGS model (as a function);
- Prepare the data , which should be passed on as a list. Note that this list should include other known variables not in the data, like the length N of the data vector, when this N is used in the JAGS model definition (e.g. in a for loop);
- Identify the parameters whose posterior distribution are of interest.
- Define starting values for JAGS. We need starting values for all the stochastic nodes (unknoen parameters, missing data) in each chain. Initial values should be passed on as a list. Note that JAGS runs without user-defined starting values, in which cases it generates its own initial values. But this is not recommended, especially for complex models;
- Fit the model in JAGS.
The most important arguments of the #jags# function includes:*data* (data list), inits (list of initial values), *parameters.to.save* (parameters of interest), *n.chains* (number of MC chains - each gives a sequence of simulated values), *n.iter* (number of iterations in the sampling process), *n.burnin* (number of initial samples to discard), *model.file* (the JAGS model definition);
- Convergence diagnostics. Update if convergence hasn't reached;
- Once we are satisfied that the chains have converged, we can start the statistical inference. E.g. we can plot the posterior density, get point estimates and interval estimates by extracting the posterior mean and credible intervals.
To demonstrate the structure of a Baysian model fit using R2jags we will consider some examples with increasing complexity.
For reproducibility we should always set the seed, e.g. set.seed(45621)
Coin example
Suppose we want to known the probability of getting 8 or more heads when we toss a fair coin 10 times.
When we toss a fair coin 10 times, the number of heads Y follows a binomial distribution with parameters n=10
and p=0.5
. We are interested in $P(Y>=8)$.
First we represent this model using the BUGS language. Recall that ~
is used to represent stochastic dependent, and <-
is used to represent logical dependence.
```{r include=TRUE}
model definition
jags.model.coin<- function(){
Y ~ dbin(0.5,10) # our data model
P8 <- ifelse(Y>7, 1, 0) # the probability of interest
}
For each iteration a sample from $Binom(10,0.5)$ is drawn, and it's checked whether this sample is strictly greater than 7 or not (since the distribution is discrete, this is equivalent to $>=8$). For example, a run of length five could give us
$$Y = (5,5,8,4,9)$$
$$P8 = (0,0,1,0,1)$$
In this simple example we have no data, and since this is a Monte Carlo sampling froma known distribution, there is no real need to provide initial values (although JAGS will still generate itsself). For the same reason we don't have to worry about convergence either (see later).
We will monitor both parameters (Y and P8). This can be specified separately or directly in the jags function. We will generate 100 samples without discarding any, using one chain.
```{r message=FALSE, include=FALSE, results='hide'}
jags.model.fit.coin<- jags(
data = list(),
model.file = jags.model.coin,
parameters.to.save = c('Y','P8'),
n.chains = 1,
DIC = FALSE,
n.burnin = 0,
n.iter = 100
)
Note that the only reason we need DIC=FALSE
in the above is that our model here doesn't contain a likelihood.
To get the numerical summary of the above model run we use the print function.
```{r message=TRUE, warning=FALSE}
print(jags.model.fit.coin)
This gives us the mean, standard deviation and some quantiles of the generated samples (along with some other information) for all the monitored parateters. *What is your point estimate for* $P(Y>=8)$?
We can also plot the simulated samples using the `traceplot` function.
```{r}
traceplot(jags.model.fit.coin)
For more diagnostics and visualisation we can convert the jags
function into an MCMC object. Using the MCMC
object we can look
at numerical summarises,traceplot and desity plots.
# convert into MCMC object
jags.fit.mcmc.coin<- as.mcmc(jags.model.fit.coin)
# get numerical summary
summary(jags.fit.mcmc.coin)
# get traceplots
xyplot(jags.fit.mcmc.coin)
# get density estimate
densityplot(jags.fit.mcmc.coin)
The function xyplot
and densityplot
of the lattice
package give us trace plots and density plots of all the parameters. If we want to concentrate on individual parameters, we can use the MCMCtrace
function of the MCMCvis packages
MCMCtrace(jags.fit.mcmc.coin,
params = 'Y', # parameters of interest
type = 'density', # density plot
ind = TRUE, # separate desnity lines for each chains
pdf = FALSE # plots are NOT exported into a pdf
)
MCMCtrace(jags.fit.mcmc.coin,
params = 'P8',
type = 'trace',
ind = TRUE,
pdf = FALSE
)
Note that the above outputs can also be generated manually using the simulated samples. These can be extracted from the jags object, e.g. jags.model.fit.coin$BUGSoutput$sims.list$Y
gives us the 100 binomial samples generated by the model fit.
We can also extract summaries of interest. For example, we can get the point estimate, and the two endpoints of a 95% credible interval using the following.
jags.model.fit.coin$BUGSoutput$summary[,1] # mean
jags.model.fit.coin$BUGSoutput$summary[,3] # 2.5 quantile
jags.model.fit.coin$BUGSoutput$summary[,7] # 97.5 quantile
Exercise
Increase the number of iterations when fitting the model. What effect will this have on your point estimate and why?
Produce output using 2 or 3 chains with varying number of iterations. How similar are the outputs produced by the different chains? Note that using multiple chains is crucial when assessing chain convergence.
By editing the model function, find the probability that a clinical trial with 30 subjects, each with peobability 0.7 of response, will show 15 or fewer response.
By writing your own model find the expectation of the cube of a random variable with mean 1 and standard deviation 2. Note that if only one node is monitored you might have to use
print(jags.model.fit.coin[drop=F])
to print the summary.
```{r message=TRUE, include=FALSE}
Solutions
1. Increase the number of iterations when fitting the model. What effect will this have on your point estimate and why?
We can improve the accuracy of the point estimate by taking a large sample size, say n.iter=10000.
2. Produce output using 2 or 3 chains with varying number of iterations. How similar are the outputs produced by the different chains? Note that using multiple chains is crucial when assessing chain convergence.
For small sample size the estiamtes produced by the two chains don't match up as well as it does for a large sample size.
3. By editing the model function, find the probability that a clinical trial with 30 subjects, each with peobability 0.7 of response, will show 15 or fewer response.
jags.model.coin2 <- function(){
Y ~ dbin(0.7, 30) # our data model
P15 <- ifelse(Y<=15, 1, 0) # the probability of interest
}
jags.model.fit.coin2<- jags(
data = list(),
model.file = jags.model.coin2,
parameters.to.save = c('Y','P15'),
n.chains = 1,
DIC = FALSE,
n.burnin = 0,
n.iter = 10000
)
print(jags.model.fit.coin2[drop=F]) # look at the mean of P15
4. By writing your own model find the expectation of the cube of a random variable with mean 1 and standard deviation 0.25. Note that if only one node is monitored you might have to use print(jags.model.fit.coin[drop=F])
to print the summary.
jags.model.normal1<- function(){
y~ dnorm(1, 0.25) # our data model
x<- y^3
}
jags.model.fit.normal1<- jags(
data = list(),
model.file = jags.model.normal1,
parameters.to.save = c('y','x'),
n.chains = 1,
DIC = FALSE,
n.burnin = 0,
n.iter = 10000
)
print(jags.model.fit.normal1[drop=F]) # look at mean of x
# Drug example without data
Consider a drug to be given for relief chronic pain. We want to know the probability that 15 or more patients will ecperience a positive response.
Even though the annual response rate of the drug is unknown, experience with similar compounds has suggested that a response rate between 0.2 and 0.6 could be feasible.
Note that in the Baysian world the responese rate, as an unknown parameter, is treated as a random variable, and needs a prior when we are defining the model.
Since we know that the response rate is feasible between 0.2 and 0.6, we can interpret this as a distribution with mean 0.4 and standard deviation 0.1, and thus suggest a Beta(9.2,13.8) distribution as prior for the annual response rate (it also has to be tetween 0 and 1).
The model definition has to include the data model, the priors for the unknown parameters, and any other quantities of interest. Thus we have the following.
```{r}
### Drug example in JAGS
# model
jags.model.drug<- function(){
theta ~ dbeta(9.2,13.8) # prior for the unknown parameter
y ~ dbin(theta, 20) # the data model
P.crit <- ifelse(y>=15, 1, 0)
}
Even though we don't have any available data, monitoring the parameter y and P.crit will allow us to plot the predictive distribution of y and get a point estimate for the predictive probability that 15 or more patients will ecperience a positive response.
To build the distribution we set the number of iterations to be 10000, which will provide a fairly large sample size.
```{r include=FALSE}
jags.model.fit.drug <- jags(
data = list(),
n.iter = 10000,
DIC = FALSE,
parameters.to.save = c('theta','y','P.crit'),
model.file = jags.model.drug,
n.chains = 1
)
The node *theta* will just gives us samples from the known Beta(9.3,13.8) distribution, y will give us samples from the predictive distribution, while the 0-1 indicator P.crit provides the estimated tail-area probability.
Once the models is fitted we can look at the outcome. The *print* function gives us some numerical summaries, including points estimates, while density plots can be used to visualise the distribution of the monitored parameters. What is the predictive probability of haveing 15 or more responese?
```{r}
# look at the outcome
print(jags.model.fit.drug)
# traceplot
traceplot(jags.model.fit.drug)
# convert into MCMC object for more visualisation tools
jags.fit.mcmc.drug <- as.mcmc(jags.model.fit.drug)
# get density plots of the monitored nodes
densityplot(jags.fit.mcmc.drug)
Since the predictive distribution is a discrete distribution in this case, we might also want to look at histograms. The jags.hist
function of the jagplot
package can be used to create histograms of the R2jags output.
# install the jagsplot packages
remotes::install_github('mbtyers/jagsplot')
library(jagsplot)
# plot histogram of the discrete distribution
jags.hist(jags.model.fit.drug, which.param= c('y', 'P.crit'))
The package has other visualisation tools too. E.g. jags.dens create density plots of selected parameters of an R2jags object.
Exercise
- Edit the model code to specify a Uniform(0,1) prior on the response rate theta, and re-run the analysis. Plot the predictive distribution for the number of the number of success. What is now the predictive paobability that 15 or more patients a positive response out of 20 new patients affected?
jags.model.drug2 <- function(){
theta ~ dunif(0,1) # prior for the unknown parameter
y ~ dbin(theta, 20) # the data model
P.crit <- ifelse(y>15, 1, 0)
}
jags.model.fit.drug2 <- jags(
data = list(),
n.iter = 10000,
DIC = FALSE,
parameters.to.save = c('theta', 'y', 'P.crit'),
model.file = jags.model.drug2,
n.chains = 1
)
print(jags.model.fit.drug2) # mean of the P.crit gives the required probability
# plot predicitive distribution
jags.hist(jags.model.fit.drug2, which.param = c('y'))
Inference on proportions - Drug example with data
In the previous example we consider the early investigation of a new drug, where we assumed that the response rate has a Beta(9.2,13.8) prior distribution.
Now suppose that upon treating 20 volunters with the compound we observe $y=15$ response. This is our data.
As a next step, we would consider continuing the development programme if the drug managed to achieve at least a further 25 success out of 40 future trails.
Thus, we are interested in the posterior response rate, and want to know what the probability that we can continue the drug deveploment programme is.
For this we add an extra node to the model definition. This represents the future trial that we are trying to predict. The model is this case will combine the prior on the response rate with the information from the data (which is captured by the likelihood) to get a posterior response rate. Then uses this posterior to make future predictions. In particular, we first find the posterior predicitive distribution in the case of 40 patients, then use this to find the probability that we have at least 25 response out of 40 in this future trial.
# model
jags.model.drug3 <- function(){
theta ~ dbeta(a, b) # prior
r ~ dbin(theta, n) # likelihood
r.pred ~ dbin(theta, m) # predictive distribution
P.crit <- ifelse(r.pred>= ncrit, 1, 0) # probability of interest
}
Note that in this case r is known (this is the data - not a random object), while r.pred
is unknown.
In the previous example the model definition included the values of all the known parameters. A more flexible approach is when dummy variable are used, and then their known values are later passed onto jags
as part of the data list. Thus the jags.data
variable below not only has our data y=15, but also all the known parameters in our model definition.
# data
a= 9.2
b= 13.8
r= 15 # response
n= 20 # patient number
m= 40 # future paient number
ncrit= 25 # threshold for future trial
jags.data.drug3 <- list('a', 'b', 'r', 'n', 'm', 'ncrit')
Next we choose the parameters we want to monitor: theta
will give us the posterior response rate, r.pred
will give the posterior predictive distribution, while P.crit
gives the probability of interest. (Again, r
is known, thus the monitoring this node would give us no extra information)
The stochatic nodes need initial values, thus we choose some feasible values for theta
and r.pred
, this time for two chains. Since the sampling is from an unknown distribution, convergence would have to be assessed. Convergence is the reason we would need at least twp chains with distinct initial values. Note however, that we won't assess convergence here, this is something that's covered later.
# parameters we want to monitor
jags.model.param.drug3 <- c('theta', 'r.pred', 'P.crit')
# Specify initial values
jags.inits1<- list('theta'= 0.7, 'r.pred'= 20)
jags.inits2<- list('theta'= 0.5, 'r,pred'= 28)
jags.inits.drug3<- list(jags.inits1, jags.inits2)
Finally we fit the model (using two chains this time)
```{r include=FALSE}
jags.model.fit.drug3 <- jags(
data = jags.data.drug3,
inits = jags.inits.drug3,
parameters.to.save = jags.model.param.drug3,
n.chains = 2,
n.iter = 10000,
model.file = jags.model.drug3
)
In order to produce summary statistics dor all the variables we have monitored, and provide a point and interval estimate for the treatment response rate we can use `print` function.
```{r}
print(jags.model.fit.drug3)
Exercise
Produce kernel density plots of the posterior distribution for
theta
and the predicitive distribution forr.pred
.Compare the model code with the code for the drug example without the data. Make sure you understand the different between the two analysis.
Edit the model code to specify a Uniform(0,1) prior on the response rate
theta
(or equivalently, a Beta(1,1) prior), and re-run the analysis. How is the posterior estimate oftheta
affected? How is the probability that 25 or more patients will experience a positive response out of 40 new patients affected?
# Solutions
## 1.
## kernel density plot for theta
jags.dens(jags.model.fit.drug3)
## predicitive detribution for r.pred
jags.hist(jags.model.fit.drug3, which.param = c('r.pred'))
## 2.
## 3. Use a=1, amd b=1 and return the previous code
# data
a= 1
b= 1
r= 15 # response
n= 20 # patient number
m= 40 # future paient number
ncrit= 25 # threshold for future trial
jags.data2.drug4 <- list('a', 'b', 'r', 'n', 'm', 'ncrit')
jags.model.fit.drug4 <- jags(
data = jags.data2.drug4,
inits = jags.inits.drug3,
parameters.to.save = jags.model.param.drug3,
n.chains = 2,
n.iter = 10000,
model.file = jags.model.drug3
)
print(jags.model.fit.drug4)
Inference using the model normal distribution - THM example
Regional water companies in the UK are require to take routine measurements of trihalomethane (THM) concentrations in tap water samples for regulatory purposes.
We want to estiamte the average THM concentration in particualr water zone, z. In particular we want to know the probability that the THM levels in the water supply exceed $145\mu g/l$.
Suppose we know that the assay measurement error has a standard deviation $5\mu g/l$. Thus means that we can model the THM concentration using a normal distribution with unknown mean, and standard deviation $5\mu g/l$.
To estimate the mean THM concentration, two independent meansurements were taken and their mean we found to be $130\mu g/l$. This is our data. Note however taht we will need to specify values for both the THM obervations when fiiing the model, so rather than just inputting their mean as the data, we will enter both values as 130.
Since the mean of the THM concentration is an unknown parameter, we also need to specify a prior for this. Historical data on THM levels in other zones supplied from the same souce showed that the mean THM concentration was $120\mu g/l$ with standard deviation $10\mu g/l$, which suggests a $N(120,10^2)$ prior.
Combining all the information gives the following model definition and data list. Note that for logical nodes we can take advantage of the vevtorised way R does computations, but in case of stochastic dependence we have to use a for loop.
# model
jags.model.thm <- function(){
theta ~ dnorm(120, 1/100) # prior
# We have N=2 observations, hence the for loop
for (i in 1:N){
y[i] ~ dnorm(theta, 1/(5^2)) # likelihood
}
ypred ~ dnorm(theta, 1/(5^2)) # predicitive distribution
}
# data (note that some known parameters were included in the model definition)
N <- 2
y <- c(130, 130)
thm.data <- list('N', 'y')
Exercise
Using what you've learnt so far finish the analysis of this example. In particular, edit the model definition and fit the model, so that you can answer the following.
(a) Calculate the probability that the zone mean THM concentration exceed $130\mu g/l$. (For this you will have to add an extra line to the model definition).
(b) Calcualte the probability that a future THM concentration measured in the zone exceeds $130\mu g/l$.Now try using a non-informative prior on the unknown mean THM concentration (assume either a uniform prior with a wide range, or a normal prior with a large variance (small precision)). Repeat your analysis.
# Solution
## 1. Add the following to the model definition:
jags.model.thm <- function(){
theta ~ dnorm(120, 1/100) # prior
# We have N=2 observations, hence the for loop
for (i in 1:N){
y[i] ~ dnorm(theta, 1/(5^2)) # likelihood
}
ypred ~ dnorm(theta, 1/(5^2)) # predicitive distribution
# find probability that future THM level will exceed 130
pred.crit <- ifelse(ypred>130, 1, 0)
# find probability that current mean THM level exceeds 130
theta.crit <- ifelse(theta>130, 1, 0)
}
thm.param <- c('theta', 'ypred', 'pred.crit', 'theta.crit')
# initial values
thm.inits <- function(){
list('theta'= dnorm(100, 5), 'ypred'= dnorm(120, 5))
}
# fit hte model
thm.model.fit <- jags(
data = thm.data,
inits = thm.inits,
parameters.to.save = thm.param,
n.chains = 2,
n.iter = 10000,
n.burnin = 0,
model.file = jags.model.thm
)
# the mean of the.crit and pred.crit gives the quantities of interest
print(thm.model.fit)
# 2. For example we can use the following prior
# theta ~ dunif(0, 10^6)
# return the code..
Inference using count data - Flying bombs
Exercise: From scrath, implement the following problem.
Data below are the number of flying bomb hits on London during World War II in a $36km^2$ area of South London. (Area was partitioned into $0.25 km^2$ grid squires and the number of bombs falling in each grid was counted).
Hist,x | 0 | 1 | 2 | 3 | 4 | 7 |
---|---|---|---|---|---|---|
number of areas,n | 229 | 211 | 93 | 35 | 7 | 1 |
(Total hits $\sum_in_ix_i=537$, and total number of areas, $\sum_in_i=576$).
If the hits are assumed to be random, a Poisson distribution with some unknown hot rate $\theta$ should fit the data. We use the dgamma(0.5,0.00001) distribution as prior for the mean $theta$ of the Poisson distribution (this is an approximation of the improper Gamma(0.5,0) distribution).
Find a point estimate for the unknown parameter and also plot its posterior distribution.
set.seed(123)
# model
jags.model.bomb <- function(){
# prior
lambda ~ dgamma(0.001, 0.001)
# We have N=576 obervations, hence he for loop
for(i in 1:N){
y[i] ~ dpois(lambda) # likelihood
}
}
# data
y <- c(rep(0,229), rep(1,211), rep(2,93), rep(3,35), rep(4,7), rep(7,1))
N <- length(y)
bomb.data <- list('N', 'y')
# initial values (for 2 chains)
bomb.inits1<- list('lambda'=0.8)
bomb.inits2<- list('lambda'=1)
bomb.inits<- list(bomb.inits1, bomb.inits2)
# parameters to monitor
jags.param<- c('lambda')
# fit model
jags.model.fit.bomb<- jags(
data= bomb.data,
inits = bomb.inits,
parameters.to.save = jags.param,
n.chains = 2,
n.iter = 10000,
model.file = jags.model.bomb
)
# look at numerical summary
print(jags.model.fit.bomb)
# point estiamte for the parameter is 0.993
# plot posterior density
bomb.mcmc<- as.mcmc(jags.model.fit.bomb)
MCMCtrace(
bomb.mcmc,
params = 'lambda',
type = 'density',
ind = TRUE,
pdf = FALSE
)