In Principles of Inductive Inference, we saw how a principle of inductive inference could be formulated using a quality function. So far, we have built our quality function using just the likelihood of a model given the data. However, as we saw in the last chapter, it is also useful to introduce an intrinsic measure of the quality or complexity of a particular model that is independent of the data itself—in the last chapter we used our prior distribution on \(\vec{\theta}\) for this.

The approach can be be schematized in a general way as follows for a model class indexed by \(\vec{\theta}\).

\[Q(\vec{\theta};\mathbf{C}) = \mathcal{f}(\mathcal{F}(\vec{\theta};\mathbf{C}), \mathcal{S}(\vec{\theta}))\]

As before, \(Q(\vec{\theta};\mathbf{C})\) is the overall quality of hypothesis given the corpus \(\mathbf{C}\). However, it now has two components. First, \(\mathcal{F}(\vec{\theta};\mathbf{C})\) is some measure of the fit to the corpus, such as the likelihood. Second, \(\mathcal{S}(\vec{\theta})\) is some a priori measure of the quality of the hypothesis—so called because it is independent and prior to the data. It is a measure of intrinsic goodness of the hypothesis, without considering the data. In practice, this measure is often used to implement notions such as simplicity, elegance, generalizability, parsimony, or plausibility of the hypothesis. The function \(\mathcal{f}\) is a function which combines the fit to the data and the a priori measure in some way to produce a final value.

We will now take some time to look at one set of approaches to incorporating such prior constraints into our quality function using Bayesian methods.


16 Hierarchical Models 18 Inference Using Conditionalization