Polya Gamma Data Augmentation for Fully Bayesian inference

The good old “fully” Bayesian and hardcore Bayesian James Scott’s paper, Bayesian inference for logistic models using Polya-Gamma latent variables presents an effective way to perform posterior inference using Polya-Gamma data-augmentation trick. So, what is that trick? The goal here is that given a (e.g.,) binomial likelihood on {y_i} given a p-dimensional input {x_i} and a p-dimensional weight vector {{\mathbf{\beta}}}, where {{\mathbf{\beta}}} has a Gaussian prior

\displaystyle \begin{array}{rcl} \text{likelihood: } y_i|x_i, {\mathbf{\beta}} &\sim& \mbox{Binom}\left(n_i, \frac{1}{1+\exp(-\psi_i)}\right), \; \text{where } \psi_i = x_i^T {\mathbf{\beta}} \\ \text{prior: } {\mathbf{\beta}} &\sim& {\mathcal N}(b, B), \end{array}

how to sample from the posterior over {{\mathbf{\beta}}}. It turns out one can sample from it simply by iterating the following two steps, introducing the Polya-Gamma latent variable

\displaystyle \begin{array}{rcl} \text{PG latent variable: } \omega_i|{\mathbf{\beta}} &\sim& \mbox{PG}\left(n_i, x_i^T {\mathbf{\beta}} \right), \\ \text{posterior: } {\mathbf{\beta}}| y, \omega &\sim& {\mathcal N}(m_\omega, V_\omega), \end{array}

where the mean and covariance are defined by

\displaystyle \begin{array}{rcl} V_\omega &=& (X^T \mbox{diag}(\omega) X + B^{-1})^{-1}, \; \text{where } X \text{is a design matrix} \\ m_\omega &=& V_\omega ( X^T \kappa + B^{-1} b), \end{array}

where {\kappa = (y_1 - \frac{n_1}{2}, \cdots, y_N - \frac{n_N}{2})}. If anyone who knows the Gaussian fun facts, this basically coincides with writing down the likelihood term as

\displaystyle \begin{array}{rcl} {\mathbf{\beta}} |y, x, \omega &\sim& {\mathcal N}((X^T \mbox{diag}(\omega) X)^{-1}(X^T\kappa), (X^T \mbox{diag}(\omega) X)^{-1}). \end{array}

Now, let’s figure out why one can write the likelihood this way.

Theorem 1. Let {p(\omega)} denote the density of the r.v. {\omega \sim \mbox{PG}(b, 0)}, {b>0}. Then, the following holds for all {a\in\mathbb{R}}:

\displaystyle \begin{array}{rcl} \frac{(e^\psi)^a}{(1+e^\psi)^b} &=& 2^{-b} e^{\kappa \psi} \int_0^\infty e^{-\frac{\omega\psi^2}{2}} p(\omega) d\omega, \end{array}

where the integrand can be viewed as the joint distribution over ({\psi, \omega}) and {\kappa = a - \frac{b}{2}}. Using Theorem 1, we can write down the likelihood term for the {i}th observation as

\displaystyle \begin{array}{rcl} L_i({\mathbf{\beta}}) &=& \frac{(e^{x_i^T {\mathbf{\beta}}})^{y_i}}{(1+e^{x_i^T {\mathbf{\beta}}})}, \\ &\propto& e^{\kappa_i {x_i^T {\mathbf{\beta}}}} \int_0^\infty e^{-\frac{\omega_i({x_i^T {\mathbf{\beta}}})^2}{2}} \mbox{PG}(\omega_i|n_i, 0) d\omega_i, \end{array}

where {\kappa = y_i - \frac{n_i}{2}}. So, the posterior is given by

\displaystyle \begin{array}{rcl} {\mathbf{\beta}}| y, \omega &\propto& p( {\mathbf{\beta}}) \prod_{i=1}^N L_i( {\mathbf{\beta}}|\omega_i), \\ &=& p( {\mathbf{\beta}}) \prod_{i=1}^N e^{\kappa_i {x_i^T {\mathbf{\beta}}}} e^{-\frac{\omega_i({x_i^T {\mathbf{\beta}}})^2}{2}}, \\ &\propto& p( {\mathbf{\beta}}) \prod_{i=1}^N \exp\left[\frac{\omega_i}{2}\left(x_i^T{\mathbf{\beta}} -\frac{\kappa_i}{\omega_i}\right)^2\right], \\ &\propto& p( {\mathbf{\beta}})  \exp\left[-\frac{1}{2}(z- X{\mathbf{\beta}})^T\mbox{diag}(\omega)(z- X{\mathbf{\beta}})\right] \end{array}

where {z = (\frac{\kappa_1}{\omega_1}, \cdots, \frac{\kappa_N}{\omega_N}) }

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s