The first problem we turn to for context-free grammars is the problem of parsing. We saw in our study of HMMs/FSAs that the parsing problem can be formulated in several different ways, depending on the principle of inductive inference that one is using.

In all such cases, the problem arose because of the exponential number of possible derivation of individual sentences. In the case of HMMs/FSAs, the number of derivations was the number of categories/states raised to the length of the input string. For CFGs, derivation are of course tree shaped, and there are many many more possible trees than strings of categories/sybols. To get a sense for this, note that the number of unlabeled binary trees over a sequence grows like the Catalan numbers

In the next chapters, we will explore algorithms for parsing CFGs.


44 Parsing as Deduction