A normal form for some formalism refers to a standard format in which the formalism can be represented. In the theory of context-free languages, there are several famous normal form theorems which show that any context-free grammar can be expressed in a particular format. Some well known normal forms are Greibach normal form and Chomsky normal form.

Chomsky Normal Form

A context-free grammar is in Chomsky normal form if every rule is either of the form

  1. \[A \longrightarrow B C\]
  2. \[A \longrightarrow a\]
  3. \[S \longrightarrow \epsilon\]

where \(A\), \(B\), and \(C\), \(a\) is a terminal, \(S\) is a start symbol, and \(\epsilon\) is the empty string. In other words, every rule consists of a pair of nonterminals, a single terminal, or directly rewrites the start symbol to the empty string.

It can be proven that the string language generated by any context-free grammar can be generated by a grammar in Chomsky normal form. To see this, note that for any terminal that appears on the right-hand-side of a rule \(r: A \longrightarrow \alpha a \beta\), we can (i) introduce a new nonterminal \(X^{a}_{r}\), (ii) replace \(a\) with this nonterminal in \(r\), that is, \(A \longrightarrow \alpha X^{a}_{r} \beta\) and (iii) introduce a rule \(X^{a}_{r} \longrightarrow a\). By applying this process to every rule in the grammar we can make sure that terminals are only introduced by unary production rules.

Similarly, for any rule that has more than \(2\) nonterminals on its right-hand side \(r: A \longrightarrow A B \alpha\) where \(\alpha \in N^+\), we can introduce a new nonterminal \(X^{A B}_{r}\), rewrite the original rule as \(r: A \longrightarrow X^{A B}_{r} \alpha\) and introduce a new rule \(X^{A B}_{r} \longrightarrow A B\).

In addition to these transformations, some care must be taken to eliminate null productions and unary rules.

One reason that Chomsky normal form is important, is that it motivates a parsing algorithm known as the Cocke–Younger–Kasami algorithm or CYK algorithm.

CYK Parsing

Here is an example of a set of deductions that lead to a valid parse of the example sentence above.


44 Parsing as Deduction 46 Semirings