数据科学与统计学之使用JAGS训练贝叶斯回归模型

简介: 使用JAGS训练贝叶斯回归模型

Fitting Baysian Regression Models Using JAGs

In Baysian statistics I use samples from the posterior distribution to answer inferential questions. The samples are generated by a Markov chain whose stationary distribution is the posterior. What this means is that not all samples generated by the Markov chain are from the posterior, but I usually have to let it run for some time before it reaches its stationary distribution.

Since I only want to use those samples for the analysis that were drawn from the posterior, it is crucial that I am able to assess whether the target distribution has been reached, that is, the Markov chain has converged.

When assessing concergence I will do a 'visual inspection' of the traceplot (is there good mixing between the chains?) and will look at the so-called scale-reduction factors of the parameters. This latter is called Gelman-Rubin diagnostic. The scale reduction factor quantifies the difference between the between-chain variance and whin-chain variance. If these are equal I get a scale-reduction factor of 1, while larger values mean that there is still a notable difference between chains. For both the visual inspection and the Gelman-Rubin diagnostic I will need multiple chains with differing starting points.

I will demonstrate convergence diagnostic with a Baysian linear regression problem.

Using density data

Union density is defined as the percentage of the work force who belongs to a trade union. There are a number of competing theories on what explains cross-national variation in union density.

To explore the different theories I use data from $n=20$ countries with a continue histroy of democracy since World War II. I have the following variables:

  • Union density (Uden);
  • (log) Labour force size (LabF);
  • Industrial concentration (IndC);
  • Left wing givernment (LeftG);
    meansured id the late 1970s.

I fit the following (centred) linear regression model:
$$Uden_i \sim N(\mu_i,\sigma^2)$$
$$\mu_i = b_0 + b_1(LeftG_i - \overline{LeftG}) + b_2(LabF_i - \overline{LabF}) + b_3(IndC_i - \overline{IndC})$$
with vague priors
$$ 1/\sigma^2 \sim Gamma(0.001,0.001) \\ b_0 \sim N(0,10^5) \\ b_1 \sim N(0,10^5) \\ b_2 \sim N(0,10^5) \\ b_3 \sim N(0,10^5) $$
I will fit the model, aeesee convergence, and discard any burn-in samples before making any posterior inference. For this I will run 2 separate chains and use Gelman_Rubin convergence disgnostic. Recall that when fitting a model previously I defined the jags model, coverted the data to a list, chose parameters of interest and specified starting values for all the chains. Now I will also consider burn-in periods, that is, choose the number of initial samples to be discard.

Firstly, I load the necessary packages:

```{r include=FALSE}
library(R2jags)
library(coda)
library(lattice)
library(MCMCvis)
library(tidyverse)


To defin the model, I essentially translate the above model definition to the BUGS language line by line. Recall that in case of stochastic dependence I can't take advantage of the vertorised computation normally allowed by R, but I have to use a for loop. Also remember that JAGS uses the precision instead of the variance to parametrise the normal distribution.

```{r}
jags.model<- function(){
    for (i in 1:20){
        Uden[i] ~ dnorm(mu[i], tau) # normal likelihood

        # linear predictor with centred covariates
        mu[i] <- b0+ b[1]* (LeftG[i]- mean(LeftG[]))+
            b[2]* (LabF[i]- mean(LabF[]))+
            b[3]* (IndC[i]- mean(IndC[]))
    }

    # prior on residual error variance
    tau ~ dgamma(0.001, 0.001)
    sigma2 <- 1/ tau # residual error variance

    # vagues priors on regression coefficients
    b0 ~ dnorm(0, 0.00001)
    for (k in 1:3){
        b[k] ~ dnorm(0, 0.00001)
    }
}

I provide the data directly. Alternatively a cav file could be provide with colnumns Uden,LabF,LeftG and IndC. But here, since the data file is small, I omit having separate files. JAGS needs the data to be passed on as a list, thus I create a list from the defined data vectors, and call this list jags.data.

# the data matrix
tmpdata <- c(82.4,8.28,111.84,1.55,80.0,6.90,73.17,1.71,74.2,4.39,17.25,2.06,73.3,7.62,59.33,1.56,71.9,8.12,43.25,1.52,69.8,7.71,90.24,1.52,68.1,6.79,0.00,1.75,65.6,7.81,48.67,1.53,59.4,6.96,60.00,1.64,58.9,7.41,83.08,1.58,51.4,8.60,33.74,1.37,50.6,9.67,0.00,0.86,48.0,10.16,43.67,1.13,39.6,10.04,35.33,0.92,37.7,8.41,31.50,1.25,35.4,7.81,11.87,1.68,31.2,9.26,0.00,1.35,31.0,10.59,1.92,1.11,28.2,9.84,8.67,0.95,24.5,11.44,0.00,1.00)

tmpdata <- matrix(tmpdata, nrow= 20, ncol= 4, byrow= TRUE)

Uden<- tmpdata[, 1]
LabF<- tmpdata[, 2]
LeftG<- tmpdata[, 3]
IndC<- tmpdata[, 4]

# define the data that JAGS needs
jags.data<- list('Uden', 'LabF', 'LeftG', 'IndC')

Next I set the parameters I want to monitor. At this stage these are the regression coefficients. I also specify initial values. TO check convergence I am planning to run 2 chains, thus I need two sets of initial values for the random parameters (ideally we should initialise all the stochastics nodes). These again should be passed on as a list. Note that Gelman-Rubbin convergence diagnostic is not possible with a single chain.

# Parameters I want to monitor
jags.param<- c('b0', 'b')

# Specify initial values
inits1<- list('tau'= 100, 'b0'=20, 'b'= c(10, -5, -25))
inits2<- list('tau'= 10000, 'b0'= -100, b= c(-100, 100, 500))
jags.inits<- list(inits1, inits2)

Sidenote:When defining initial values, I could specify the value of the variables separately, then add the variable names to the list (this is more convenient when the structure is more complex).

## Another way of creating the inits
# tau= 100
# b0= 40
# b= c(10, -5, -25)
# inits<- list('tau', 'b0', 'b')

Initial value can also be generated by random numbers, where the random number generators are added to a function

## Another way of creating the inits
# jags.inits<- function(){
#    list('tau'= rnorm(1), 'b0'= rnorm(1), 'b'= rnorm(3))
# }

The advantage of this latter method is that we can change the number of chains without having without having to worry about defining more or less initial values - sice jags will just call the above initial value function when in need of a new set of initial values. The disadvantage is that we have less control over the chosem initial values, and thatthis version can be harder to set up when the parameters, (including missing values) have a slighly more compler structure.

When fitting the model I set the number chains to tow (n.chains=2), draw 6000 samples for each chain (n.iter=6000), but discard the first 500 samples as burin-in (n.burnin=500). I also set n.thin to 1. This is because at this stage I want to keep all the samples. (Recall that he samples generated by the Markov chain are denendent samples. Ideally we would want indenpent, ot at least uncorrelated samples. To reduce correlation I could thin the generated samples, by taking only every nth sample, e.g. every 5th. This is what the thining argument does). Apart from the previously mentioned arguments, I also have to pass on the data, the list of initial values, the model definition and also the vector of parameters I wish to monitor.

```{r include=FALSE}

Fit the JAGS model

jags.model.fit <- jags(
data= jags.data,
inits = jags.inits,
parameters.to.save = jags.param,
n.chains = 2,
n.iter = 6000,
n.burnin = 500,
n.thin = 1,
model.file = jags.model,
DIC = FALSE
)


To assess covergence I forst look at traceplots and check the mixing of the chains. Recall that there are multiple ways to generate traceplots. Here I use the **MCMCtrace** function of the **MCMCvis** package, which takes an MCMC object as it argument. The reason is that is the current version of **R2jags** there is a bug that, in certain cases, keeps some initak samples even when the burn-in values in high. These samples always show up when using **traceplot**, but with **MCMCvis** there is a better chance of seeing the true behaviour of the chains. 
**Q**:**Pruduce trace plots using the traceplot function and compare the results**.

```{r}
jagsfit.mcmc <- as.mcmc(jags.model.fit)
MCMCtrace(jagsfit.mcmc, type= 'trace', ind= TRUE, pdf= FALSE)

image.png

Here I see good mixing between the chains (heuristically this mean that the chains look like a random scatter around a stable value, and the behaviour of the two chains are indistinguishable).

Traceplots on their own are not enough to access convergence, but they should be combined with Gelman-Rubin diagnostic. The gelman.diag() function can be used to extract the Gelman_Rubin test statistic values. Just as the MCMCtrace function nedds an MCMC object as its parameter.

gelman.diag(jagsfit.mcmc)

image.png

We can store these in a dataframe, which can be useful when I have a large number of parameters.

gelman.df<- as.data.frame(gelman.diag(jagsfit.mcmc)$psrf)
gelman.df

image.png

If the gelman.diag command doesn't run I can try setting the multivariate argument to FALSE.

Here both the traceplots and the Gelman-Rubin values indicate convergence, but when there's no convergence I can draw new samples using the update function:

jags.model.fit.upd <- update(jags.model.fit, n.iter=10000)

## We can also try the following function, which will keep updating the
## modeluntil it converges (assuming that convergence with the given 
## conditions is possible)
# jags.model.fit.upd <- autojags(jags.model.fit)

Q:Produce traceplots of the updated samples, is there still evidence of the bug?

Once the convergence has been reached I can look at summary statistics and density plots of the posterior distribution. Note that Rhat column of the print output also provides estimates of the scale-reduction factor. However the gelman.diag() estimates with the upper cofidence interva;s are more reliable.

# summary statistics
print(jags.model.fit)

image.png

If I use the summary function with the mcmc object, the MC standard error can also be assessed (this is given by the time-series SE column of the output). As a rule of thumb we want the MC error to be smaller than 1-5% of the posterior standard deviation (given in SD column).

summary(jagsfit.mcmc)

image.png

Q:Returen the model fitiing with a couple of different sample sizes and observe how the MC error changes in each case.

Note that the samples used in the above summary are all from the poserior distribution, since sonvergence has benn reached, and the initial 500 samples that we used as burn-in were discarded.

When looking at density plots the MCMCtrace function allows us to select specific parameter from a parameter vector. For example, we migh be interested in the coefficient of the labour force size($b_2$).

# posterior density
MCMCtrace(
    jagsfit.mcmc,
    params = c('b\\[2\\]'),
    type = 'density',
    ind = TRUE,
    ISB = FALSE,
    exact = FALSE,
    pdf = FALSE
          )

image.png

Extracting the simulated samples allow me to make further inferences. As an example we will produce box plots of the posterior regression coefficients.

# boxplot
beta<- as.data.frame(jags.model.fit$BUGSoutput$sims.list$b)
colnames(beta)<- c("b1", "b2", "b3")
boxplot(beta, outline = FALSE)

image.png

Data tend to favour Wallerstein's theory (union density depends on labour force size), but I can't draw strong conclusions from this analysis (notice that the 95% credible intervals for LabF and Indc both contain 0).

Exercise

  1. Produce a box plot of the posterior distribution of the mean union densities. Edit the plot to order the box plots by rank of the posterior mean union density. Hint: You will have to add mu to the list of monitored parameters and fit the model again. To create the boxplots, where the means are ordered according to their median rearrange the simulated samples for mu using mu[, order(apply(mu, 2, FUN=median))].

  2. Edit the code to fit the same model, but with 'uncentred' covariates. How does this affect the convergence of the MCMC simulations and how does it affect the MC error?

  3. Return the analysis first using the Wallerstein informative prior, then using the Stephens infrmative prior. Wallerstein and Stephens agree in the LeftG effect (and the vague intercept), but the two theories differ in their assumptions about the LabF effect and the IdcC effect.

  • Both Wallerstein and Stephens believe that left-waing governments assist union growth. Which translates a $N(0.3,0.15^2)$ prior on the LeftG.

  • Wallerstein believes in negative labour force, and thus assumes a $N(-5,2.5^2)$ prior on LabF effect, while keeps the vague $N(0,10^5)$ prior for the Indc effect.

  • Stephens believes in postive industrial concentration effect, and thus assumes a $N(10,5^2)$ prior for the IndC effect.

What conclusions can you draw?

## Solutions

# 1.
jags.param<- c("mu", "b0", "b")

jags.model.fit<- jags(
    data = jags.data,
    inits = jags.inits,
    parameters.to.save = jags.param,
    n.chains = 2,
    n.iter = 6000,
    n.burnin = 1000,
    n.thin = 1,
    model.file = jags.model
)

# Boxplot where the means are ordered according to their median
mu<- as.data.frame(jags.model.fit$BUGSoutput$sims.list$mu)
mu<- mu[, order(apply(mu, 2, FUN = median))]

boxplot(mu, outline = FALSE)

image.png

# 2. Uncentred coviariates gives the following model definition:
jags.model<- function(){
    for (i in 1:20){
        Uden[i] ~ dnorm(mu[i], tau) # normal likelihood

        # linear predictor with uncentred covariates
        mu[i] ~ b0+ b[1]* LeftG[i]+ b[2]* LabF[i] + b[3]* IndC[i]
    }

    # prior on residual error variance
    tau ~ dgamma(0.001, 0.001)
    sigma <- 1/ tau # residual error variance

    # vague priors on regression coefficients

    b0 ~ dnorm(0, 0.00001)
    for (k in 1:3){
        b[k] ~ dnorm(0, 0.00001)
    }
}
# 3. Model for Wallerstein informative prior:
jags.model.wallerstein<- function(){
    for (i in 1:20){
        Uden[i] ~ dnorm(mu[i], tau) # normal likelihood
        # linear predictor with centered covariates
        mu[i] <- b0+ b[1]* (LeftG[i]- mean(LeftG[]))+
            b[2]* (LabF[i]- mean(LabF[]))+
            b[3]* (IndC[i]- mean(IndC[]))
    }
    # prior on residual error variance
    tau ~ dgamma(0.001, 0.001)
    sigma <- 1/tau # residual error variance
    # vague priors on regression coefficients
    b0 ~ dnorm(0, 0.00001)

    b[1] ~ dnorm(0.3, 1/(0.15^2))
    b[2] ~ dnorm(-5, 1/(2.5^2))
    b[3] ~ dnorm(0, 0.00001)
}

 # Model for Stephens informative prior:
jags.model.stephens<- function(){
    for (i in 1:20){
        Uden[i] ~ dnorm(mu[i], tau) # normal likelihood
        # linear regression with centred covariates
        mu[i] <- b0+ b[1]* (LeftG[i]- mean(LeftG[]))+
            b[2]* (LabF[i]- mean(LabF[]))+
            b[3]* (IndC[i]- mean(IndC[]))
    }
    # prior on residual error variance
    tau ~ dgamma(0.001, 0.001)
    sigma <- 1/tau # residual error variance
    # vague priors on regression coefficients
    b0 ~ dnorm(0, 0.00001)

    b[1] ~ dnorm(0.3, 1/(0.15^2))
    b[2] ~ dnorm(0, 0.00001)
    b[3] ~ dnorm(10, 1/(5^2))
}

Dugongs

Baysian regression models can be easilly generalised to the non-linear case. To demonstrate this we will use the DUgong dataset. Furthermore I will also see how to make predictions and impute missing data.

Carlin and Gelfand (1991) consider data on length $(y_i)$ and age $(x_i)$ measurements for 27 dugongs (sea cows) captured off the Queensland.

To model the length of dugongs we can use a nonlinear growth vurve
$$ y_i \sim N(\mu_i,\sigma^2) \\ \mu_i = \alpha - \beta\gamma^{x_i} $$
where $\alpha,\beta>0 \quad \gamma\in (0,1)$.

Vague prior distributions with suitable constrains may be specified:
$$ \alpha \sim Unif(0,100) \\ \beta \sim Unif(0, 100) \\ \gamma \sim Unif(0,1) $$

Note that for $\alpha$ and $\beta$ I could also use a $N(0,10000)$ distribution restricted to the positive reals, that is dnorm(0, 0.00001);
T(0,) (note that T(,) is used for truncation where T(L,) means that the distribution is truncated below at value L T(,U) means it is truncated above at value U).

For the sampling variance, we could specify a uniform prior on log variance or log sd scale
$$log\sigma \sim Unif(-10,10$$
or gamma prior on precision scale.

Just as before, the dataset is small, so I will read it into R directlly.

# age of dugongs
x <- c( 1.0, 1.5, 1.5, 1.5, 2.5, 4.0, 5.0, 5.0, 7.0, 8.0, 8.5, 9.0, 9.5, 9.5, 10.0, 12.0, 12.0, 13.0, 13.0, 14.5, 15.5, 15.5, 16.5, 17.0, 22.5, 29.0, 31.5)
# length of dugongs
Y = c(1.80, 1.85, 1.87, 1.77, 2.02, 2.27, 2.15, 2.26, 2.47, 2.19, 2.26, 2.40, 2.39, 2.41, 2.50, 2.32, 2.32, 2.43, 2.47, 2.56, 2.65, 2.47, 2.64, 2.56, 2.70, 2.72, 2.57)
# number of measurements
N= 27
# add above to a list for JAGS
jags.data<- list('x', 'Y', 'N')

First we fit this model, check for convergence and look at posterior summary statistics.

To fit the model I just translate the above model definition line-by-line using the BUGS language.

jags.model<- function(){
    for (i in 1:N){
        Y[i] ~ dnorm(mu[i], tau)
        mu[i]<- alpha- beta* gamma^x[i]
    }
    # prior
    alpha ~ dunif(0, 100)
    beta ~ dunif(0, 100)
    gamma~ dunif(0, 1.0)

    tau ~ dgamma(0.001, 0.001)
    sigma<- 1/ tau
}

Next I set the paraeters I want to monitor, choose initial values for two chains, and fit the model, where I generate 10000 samples, and discard first 1000 as burn-in.

```{r include=FALSE}

parameters I want to monitor

jags.param<- c("mu", "alpha", "beta", "gamma", "sigma")

mu is essentially the fitted model

alpha, beta, gamma and sogama are parameters

initial values for the parameters

inits1<- list(alpha= 1, beta= 1, tau= 1, gamma= 0.9)
inits2<- list(alpha= 10, beta= 3, tau= 5, gamma= 0.1)
jags.inits<- list(inits1, inits2)

Fit JAGS

jags.model.fit<- jags(
data= jags.data,
parameters.to.save = jags.param,
inits = jags.inits,
n.chains = 2,
n.iter = 10000,
n.burnin = 1000,
n.thin = 1,
model.file = jags.model
)


Next I check for convergence by producing traceplots and looking at the Gelman-Rubin statistic values.

```{r}
# convert into MCMC object
jagsfit.mcmc<- as.mcmc(jags.model.fit)

# produce traceplots
MCMCtrace(jagsfit.mcmc, type= "trace", ind = TRUE, pdf = FALSE)

# GR diagnostics
gelman.diag(jagsfit.mcmc, multivariate = FALSE)

image.png

image.png

image.png

image.png

image.png

image.png

image.png

I can deduce that the model has converged, thus I can move onto looking at summary statistics, posterior density plots. And we can also plot the fitted model. This latter is done by extracting the mean and relevant percentiles of the posterior distribution, then using ggplot to plot these along with the meansurements. Here I will use rows of the summary statistics that correspond to the parameter mu.

# summary statistics
print(jags.model.fit)

# extract position of the mu parameters (check which rownames start with mu)
pos<- substr(rownames(jags.model.fit$BUGSoutput$summary), 1, 2)== 'mu'

# extract posterior mean for mu
mu<- jags.model.fit$BUGSoutput$summary[pos, 1]
# extract 2.5 percentile of the posterior
lcr<- jags.model.fit$BUGSoutput$summary[pos, 3]
# extract 97.5 percentile of the posterior
ucr<- jags.model.fit$BUGSoutput$summary[pos, 7]

# add these to a dataframe (note that x and Y are our data)
df<- data.frame(x= x, y =Y, mu= mu, lower= lcr, upper= ucr)

# finally plot the outplot
ggplot(data= df)+
    geom_point(aes(x= x, y= Y))+
    geom_line(aes(x= x, y= mu))+
    geom_line(aes(x= x, y= lower), linetype= "dashed", color= "red")+
    geom_line(aes(x= x, y= upper), linetype= "dashed", color= "red")

image.png

Note that if I knew that the posterior was normal, I could have just used the mean and standard deviation of the posterior distribution to get the above plot. These can be extracted more esaily using the following code.

# mu<- jags.model.fit$BUGSoutput$mean$mu
# sd<- jags.model.fit$BUGSoutput$sd$mu

As I can see fitting a non-linear model is just as straightforward in JAGS as a linear model. Even if there is no closed form solution available, it is still straightforward to obtain samples from posterior using MCMC.

I could explore the posterior further, but instead I now move onto discussing prediction.

It's important to be able to predict unobserved quantiles for;

  • 'filling-in' missing or censored data;
  • model checking - are predictions 'similar' to oberved data?
  • making predictions!

This is easy in JAGS; I just specify a stochastic node without a data-value - it will automatically predicted.

As an example, suppose we want to project beyond current observation in the dugong data, e.g. I might be interested at gugong length at ages 35 and 40.

There are two ways of doing, I can choose to set the prediction up as missing data, as mentioned before, and JAGS will automatically predict it. But I can also set up predictions explicity.

When making direct prediction we can add direct statements to our model definition to predict the gugong length at age 35 and 40. THat is, I get the following model definition.

jags.model<- function(){
    # model set up
    for (i in 1:N){
        Y[i] ~ dnorm(mu[i], tau)
        mu[i]<- alpha- beta* gamma^x[i]
    }
    # prior
    alpha ~ dunif(0, 100)
    beta ~ dunif(0, 100)
    gamma~ dunif(0, 1.0)

    tau ~ dgamma(0.001, 0.001)
    sigma<- 1/ tau

    # prediction
    mu35<- alpha- beta*gamma^35
    mu40<- alpha- beta*gamma^40
    y35~ dnorm(mu35, tau)
    y40~ dnorm(mu40, tau)
}

Note that the interval around mu40 will reflect uncertainty concerning fitted parameters. The interval around y40 will additionally reflectsampling error sigma and uncertainty about sigma.

Since now I have two extra stochastic nodes in the model definition, when specifing initial values I also have to account for these, and add initial values for y35 and y40 to the existing list.

inits1<- list(alpha= 1, beta= 1, tau= 1, gamma= 0.9, y35= 2.4, y40= 2.62 )
inits2<- list(alpha= 10, beta= 3, tau= 5, gamma= 0.1, y35= 2.5, y40= 2.6)
jags.inits<- list(inits1, inits2)

Since I have already fitted the model without the prediction part, here I am only interested in the nodes I am predicting, so I set to monitor these variables. And finally fit the model.

```{r include=FALSE}

Monitor the parameters I am using for the prediction

jags.params<- c("y35", "y40", "mu35", "mu40")

Fitting the model

jags.model.fit<- jags(
data = jags.data,
inits = jags.inits,
parameters.to.save = jags.params,
n.chains = 2,
n.iter = 10000,
n.burnin = 1000,
n.thin = 1,
model.file = jags.model
)


**Q**:*Check for convergence. Have all the nodes converged?* Looking at summary statistics of the posterior of the monitored nodes gives us point and interval estimates for out predictions. *What are these?*

```{r}
# get point and interval estimates
print(jags.model.fit)

image.png

Next I move onto the second prediction method.

I can set up prediction as missing data, and JAGS will automatically predict it. In this case I use the original model definition, but a modeified dataset. Generally, imputing missing data as a method of prediction is easier, but I do have to do a lit more work when initialsing the chains.

# In the modified dataset I treat the observations corresponding to 
# x=35, and x=40 as missing

x <- c( 1.0, 1.5, 1.5, 1.5, 2.5, 4.0, 5.0, 5.0, 7.0, 8.0, 8.5, 9.0, 9.5, 9.5, 10.0, 12.0, 12.0, 13.0, 13.0, 14.5, 15.5, 15.5, 16.5, 17.0, 22.5, 29.0, 31.5, 35, 40)
# length of dugongs
Y = c(1.80, 1.85, 1.87, 1.77, 2.02, 2.27, 2.15, 2.26, 2.47, 2.19, 2.26, 2.40, 2.39, 2.41, 2.50, 2.32, 2.32, 2.43, 2.47, 2.56, 2.65, 2.47, 2.64, 2.56, 2.70, 2.72, 2.57, NA, NA)

# Notice the NAs corresponding to ages 35 and 40
# and note that I now have 29 data points (27 observed, 2 missing)
N<- 29

jags.data<- list("x", "Y", "N")

# use original model definition
jags.model<- function(){
    for (i in 1:N){
        Y[i] ~ dnorm(mu[i], tau)
        mu[i]<- alpha- beta* gamma^x[i]
    }
    # prior
    alpha ~ dunif(0, 100)
    beta ~ dunif(0, 100)
    gamma~ dunif(0, 1.0)

    tau ~ dgamma(0.001, 0.001)
    sigma<- 1/ tau
}

# parameters I want to monitor
jags.param<- c("mu", "Y")

Recall that all unknown quantities (including missing data!) should still be initialised. Since the 'missing values' are part of the data that I pass onto JAGS, I need to use this data structure when initialising the chains. That is, we use NA where the data in available (since these are not stochastic nodes), and initialise where data is missing. In the model deginition, where the data is missing, Y[i]~dnorm(mu[i],tau) is essentially a prior on these missing values.

```{r include=FALSE}
inits1<- list(alpha= 1, beta= 1, tau= 1, gamma= 0.9,
Y= c(rep(NA, 27), 2.4, 2.62))
inits2<- list(alpha= 10, beta= 3, tau= 5, gamma= 0.1,
Y= c(rep(NA, 27), 2.5, 2.6))
jags.inits<- list(inits1, inits2)


Now I am ready to fit the model.

```{r include=FALSE}
# Fit JAGS
jags.model.fit<- jags(
    data = jags.data,
    inits = jags.inits,
    parameters.to.save = jags.param,
    n.chains = 2,
    n.iter = 20000,
    n.burnin = 10000,
    n.thin = 1,
    model.file = jags.model
)

When looking at the outcome (you should also check for convergence) I extract point and interval estimates and also plot to fitted model. Note that the first 27 Y nodes are not stochastic, since I have data available there. But since I am monitoring these too, they will appear in the output of the print function.

# print model fit
print(jags.model.fit)

# extract position of the mu parameters (check with rownames start with mu)
pos<- substr(rownames(jags.model.fit$BUGSoutput$summary), 1, 2)== 'mu'

# extract posterior mean for mu
mu<- jags.model.fit$BUGSoutput$summary[pos, 1]
# extract 2.5 percentile of the posterior
lcr<- jags.model.fit$BUGSoutput$summary[pos, 3]
# extract 97.5 percentile of the posterior
ucr<- jags.model.fit$BUGSoutput$summary[pos, 7]
# data and predicted values
Ysim<- jags.model.fit$BUGSoutput$mean$Y

# add these to a dataframe
df<- data.frame(x= x, y= Ysim, mu= mu, lower= lcr, upper= ucr)

# plot using ggplot
ggplot(data= df)+
    geom_point(aes(x= x, y= Ysim))+
    geom_line(aes(x= x, y= mu))+
    geom_line(aes(x= x, y= lower), linetype= "dashed", colour= "red")+
    geom_line(aes(x= x, y= upper), linetype= "dashed", colour= "red")

image.png

image.png

Notice how in the above plot the two missing values got imputed. However we have greater uncertainty around these nodes.

Exercise

  1. Change the prior in the original model (without prediction) to being uniform on $log(\sigma)$: is there any influence? (Note - you must always specify initial values for the stochastic parameters, i.e. the parameters assigned prior distribution, so you will need to edit the initial values files here as well as the model code.) Also remember that you cannot specify a function of parameters on the left side of a distribution in the BUGS language, so you will need to create a new variable equal to $log(\sigma)$ and assign this new variable an appropriate prior. Since BUGS does not allow the same variable to appear more than once on the left side of a statement, you will need to specify this new variable using the following format:
    function.of.sigma<- new.variable
    rather than
    new.variable<- function.of.sigma
    Check the posterior distribution for $\gamma$: do you think the uniform prior on this parameter is very influential?
jags.model<- function(){
    for (i in 1:N){
        Y[i] ~ dnorm(mu[i], tau)
        mu[i] <- alpha- beta*gamma^x[i]
    }
    # priors
    alpha ~ dunif(0, 100)
    beta ~ dunif(0, 100)
    gamma ~ dunif(0, 1.0)

    log.sigma ~ dunif(-100, 100)
    log(sigma) <- log.sigma
    tau <- 1/(sigma^2)
}

# Note that I also have to change the list of initianl values
# since I need a prior on log.sigma (the stochastic node) and not tau
inits1<- list(alpha= 1, beta= 1, log.sigma= 1, gamma= 0.9)
inits2<- list(alpha= 10, beta= 3, log.sigma= 0.1, gamma= 0.1)
jags.inits<- list(inits1, inits2)

# parameters I want to monitor
jags.param<- c("mu", "Y", "gamma")

# Fit the model
jags.model.fit<- jags(
    data = jags.data,
    inits = jags.inits,
    parameters.to.save = jags.param,
    n.chains = 2,
    n.iter= 20000,
    n.burnin = 10000,
    n.thin = 1,
    model.file = jags.model
)

# look at outcome - point/interval estimate
print(jags.model.fit)

# Note, Result very similar to those with gamma prior on tau
# you should also chenck for convergence, but Rhat values look goof

# look at posterior of gamma
# could use MCMCtrace to create a density plot
# but here I extract the simulated values and use ggplot

gamma.sim<- data.frame(gamma= jags.model.fit$BUGSoutput$sims.list$gamma)

ggplot(gamma.sim, aes(x= gamma))+
    geom_density()

# posterior mass is not too close to the limits(0,1) imposed by the
# uniform prior on gamma, so doesn't appear to be too influential

image.png

  1. Use both method to obtain prediction at age 25 and 33. How do these look compared to the predictions aat age 35 and 40?

  2. We can also use prediction as model checking. Edit the model definition so that it includes a statement to calculate the predictive for each Y.pred[i]

# Here I work with the original dataset again
# But I add an extra variable to the for loop of the model definition

x <- c( 1.0, 1.5, 1.5, 1.5, 2.5, 4.0, 5.0, 5.0, 7.0, 8.0, 8.5, 9.0, 9.5, 9.5, 10.0, 12.0, 12.0, 13.0, 13.0, 14.5, 15.5, 15.5, 16.5, 17.0, 22.5, 29.0, 31.5)
# length of dugongs
Y = c(1.80, 1.85, 1.87, 1.77, 2.02, 2.27, 2.15, 2.26, 2.47, 2.19, 2.26, 2.40, 2.39, 2.41, 2.50, 2.32, 2.32, 2.43, 2.47, 2.56, 2.65, 2.47, 2.64, 2.56, 2.70, 2.72, 2.57)
# number of measurements
N= 27
# add above to a list for JAGS
jags.data<- list('x', 'Y', 'N')

jags.model<- function(){
    for (i in 1:N){
        Y[i] ~ dnorm(mu[i], tau)
        Y.pred[i] ~ dnorm(mu[i], tau) # extra node
        mu[i]<- alpha- beta* gamma^x[i]
    }
    # prior
    alpha ~ dunif(0, 100)
    beta ~ dunif(0, 100)
    gamma~ dunif(0, 1.0)

    tau ~ dgamma(0.001, 0.001)
    sigma<- 1/ tau
}

# I will only monitor the Y.pred node
jags.param<- c("Y.pred")

# initial values for the parameters
inits1<- list(alpha= 1, beta= 1, tau= 1, gamma= 0.9)
inits2<- list(alpha= 10, beta= 3, tau= 5, gamma= 0.1)
jags.inits<- list(inits1, inits2)
# In theory I need to initialise Y.pre, but if I don't specify that
# JAGS will hust generate initial values

# Fit JAGS
jags.model.fit<- jags(
    data= jags.data,
    parameters.to.save = jags.param,
    inits = jags.inits,
    n.chains = 2,
    n.iter = 10000,
    n.burnin = 1000,
    n.thin = 1,
    model.file = jags.model
)

# Look at outcome - point/interval estimates
print(jags.model.fit)

# Plot the posterior prediction
# extract position of the mu parameters (check with rownames start with mu)
pos<- substr(rownames(jags.model.fit$BUGSoutput$summary), 1, 6)== 'Y.pred'

# extract posterior mean for mu
mu<- jags.model.fit$BUGSoutput$summary[pos, 1]
# extract 2.5 percentile of the posterior
lcr<- jags.model.fit$BUGSoutput$summary[pos, 3]
# extract 97.5 percentile of the posterior
ucr<- jags.model.fit$BUGSoutput$summary[pos, 7]
# data and predicted values
Ysim<- jags.model.fit$BUGSoutput$mean$Y

# add these to a dataframe
df<- data.frame(x= x, y= Ysim, mu= mu, lower= lcr, upper= ucr)

# plot using ggplot
ggplot(data= df)+
    geom_line(aes(x= x, y= mu))+
    geom_line(aes(x= x, y= lower), linetype= "dashed", colour= "red")+
    geom_line(aes(x= x, y= upper), linetype= "dashed", colour= "red")

image.png

目录
相关文章
|
6月前
|
机器学习/深度学习
【机器学习】最大似然估计
【1月更文挑战第23天】【机器学习】最大似然估计
|
6月前
|
机器学习/深度学习 资源调度
【机器学习】高斯分布-概率密度函数
【1月更文挑战第23天】【机器学习】高斯分布-概率密度函数
【机器学习】高斯分布-概率密度函数
|
6月前
|
机器学习/深度学习 数据采集 人工智能
【机器学习】非线性回归与线性回归有何不同?
【5月更文挑战第19天】【机器学习】非线性回归与线性回归有何不同?
|
6月前
|
算法 C++ Python
R语言RSTAN MCMC:NUTS采样算法用LASSO 构建贝叶斯线性回归模型分析职业声望数据
R语言RSTAN MCMC:NUTS采样算法用LASSO 构建贝叶斯线性回归模型分析职业声望数据
|
机器学习/深度学习 算法 Python
机器学习基础:用 Lasso 做特征选择
机器学习基础:用 Lasso 做特征选择
机器学习基础:用 Lasso 做特征选择
|
机器学习/深度学习 人工智能 算法
【机器学习】十大算法之一 “线性回归”
机器学习是人工智能的一个重要分支,它利用各种算法和模型,通过分析和理解数据,让机器学习到一个智能模型,从而对数据作出预测和判断。回归分析是一种基于统计学方法的数学分析技术,用于描述两个或多个变量之间的关系。线性回归是一种最简单的回归分析方法,它使用最小二乘法来拟合一个关于自变量的线性函数,以预测其与因变量之间的相互作用关系。线性回归是机器学习领域中最为常见的算法之一,它是一个简单但非常有效的算法,常用于数据挖掘和机器学习的预测分析,例如房价预测、销售额预测等等。
410 0
【机器学习】十大算法之一 “线性回归”
|
机器学习/深度学习 算法 测试技术
机器学习实战︱基于多层感知机模型和随机森林模型的某地房价预测
在现实生活中,除了分类问题外,也存在很多需要预测出具体值的回归问题,例如年龄预测、房价预测、股价预测等。相比分类问题而言,回归问题输出类型为一个连续值,如下表所示为两者的区别。在本文中,将完成房价预测这一回归问题。
476 0
机器学习实战︱基于多层感知机模型和随机森林模型的某地房价预测
|
机器学习/深度学习
机器学习中的数学原理——二分类问题
机器学习中的数学原理——二分类问题
484 0
机器学习中的数学原理——二分类问题
|
机器学习/深度学习 算法
机器学习中的数学原理——逻辑回归
机器学习中的数学原理——逻辑回归
184 0
机器学习中的数学原理——逻辑回归
|
机器学习/深度学习