Let \(X\) be the set of vectors \(X = \left\{(x_1,x_2) \in \mathbb{R}^2 : 0 < x_2 < x_1 < 1 \right\}\), and let \(f\) be some real function on this set. Say we want to integrate the function \(f:X\to \mathbb{R}\) over this set. The double integral is the same as an iterated integral, no matter which order we integrate, though the bounds of integration will be different. That is:

\[\int_{X} f(x_1,x_2) ~ d(x_1,x_2) = \int_{0}^1 \int_{x_2}^1 f(x_1,x_2) ~ d{x_1} ~ dx_2 = \int_{0}^1 \int_{0}^{x_1} f(x_1,x_2) ~ dx_2 ~ dx_1\]

Integration over the simplex

In this class we often want to marginalize over a parameter \(\vec\theta\), where \(\vec\theta = (\theta_1, \ldots \theta_K)\) is a \(K\)-dimensional probability vector. That is, \(\vec\theta\) is an element of \(\Delta_K\), the \((K-1)\)-dimensional probability simplex:

\[\Delta_K = \left\{ \vec\theta \in \mathbb{R}^K : \sum_{k=1}^K\theta_k = 1,~\theta_k \ge 0\text{ for all $k$} \right\}\]

When we marginalize over \(\vec\theta\) we need to evaluate an integral over the simplex of some product function \(f(\vec\theta)= \prod_{k=1}^K f_k (\theta_k)\), so we are computing an iterated integral of the following form:

\[\int_{\Delta_k} f(\vec\theta) ~ d\vec\theta = \int_{\Delta_K} \prod_{k=1}^K f_k (\theta_k) ~ d \vec\theta \\ = \int_{0}^1 \int_{0}^{1 - \theta_1} \dots \int_{0}^{1 - \sum\limits_{k=1}^{K-2} \theta_k} f_K\left(1 - \sum_{k=1}^{K-1} \theta_k \right)\prod_{k=1}^{K-1} f_k (\theta_k) ~ d \theta_{K-1} \dots d \theta_{2} d \theta_{1}\]

Dirichlet distribution application

For the Dirichlet distribution, we have an integral like the above, where \(f_k(\theta_k) = {\theta_k}^n\), and \(n = \alpha_k - 1\).1 For a derivation of the identity in class, that

\[\int_{\Delta_K}\prod_{k=1}^K {\theta_k}^{\alpha-1} ~ d \vec\theta = \frac{\prod_{k=1}^K \Gamma(\alpha_k)}{\Gamma\left(\sum_{k=1}^K \alpha_k\right)},\]

see this post (which uses the inverse Laplace transform to derive the identity).

  1. Note these polynomial functions are nice and continuous, and so we don’t need to worry about anything too fancy: we can be using Riemann integrals.