When we studied the strictly local languages, we introduced suffix substitution closure (SSC), which we termed an abstract characterization of the languages in the class. An abstract characterization of a class of languages is set of necessary and sufficient conditions, stated only using the strings in the language, that defines the class. They can be used to prove that certain languages are either in the class or are not.

N-SSC said that if \(u_1 x v_1\) and \(u_2 x v_2\) were in the language for a substring \(x\) with length \(n-1\) then the string \(u_1 x v_2\) was also in the language. In a sense, it encodes the idea that a strictly-local grammar can only look \(n-1\) symbols backwards in the string. If the grammar has recognized \(u_1 x\) and \(u_2 x\), it can’t see further back than \(x\) and thus cannot tell which string it is in beyond this window of length \(n-1\), thus if \(x v_1\) and \(x v_2\) are both possible completions from \(x\), then both can be combined with any prefix before \(x\) that is also valid. There is no way for the grammar to be sensitive to any of the context before \(x\).

We have seen that finite-state machines can recognize regular languages like \(L_{\le1b}\). What this demonstrates is that unlike strictly-local languages, finite-state machines can have memories which are unbounded in the length of the string. In particular, for \(L_{\le1b}\) the machine needs to remember that it has seen that one $b$ for as long as the string lasts, to rule out future $b$’s from appearing. Giving ourself hidden state like this fundamentally changes the capacity of the machine, because it provides a form of long term memory.

Thus, something akin to SSC where there is a bound on memory in terms of some number of string symbols simply won’t work for finite-state machines who have memory that can last over any distance in string space. What kind of abstract characterization do we have for regular languages.

While an FSA has memory that isn’t bounded by the string, it’s memory can only be in a finite number of states—this the essential part of the definition. What this means is that no matter what the FSA remembers, it can only remember a finite number of different things.

This means that any two input string prefixes that drive the machine into the same state will be indistinguishable from one another with respect to the remainder of the string. In other words, if we know that string prefixes \(x\) and \(y\) drive the machine into the same state \(q\), then \(x\) and \(y\) must be indistinguishable from one another with respect to the rest of the string. In other words for any suffix \(z\), either \(xz\) and \(yz\) are both in the language recognized by the automaton or they are both not in the language. This analog of SSC for regular languages is known as the Myhill-Nerode Theorem. We will now make this precise.

Myhill-Nerode Relations

Let \(L\) be any formal language \(L \subseteq V^*\) we define a Myhill-Nerode relation on all pairs of strings of \(V^*\) called \(\equiv_L\) (i.e., \(\equiv_L \subseteq V^* \times V^*\)).

\[x \equiv_L y \iff \forall z\in V^*\,[xz \in L \iff yz \in L]\]

This relation says that two strings are in the relation together if for any continuation \(z\) they are either in the language or out of the language.

Recall some basic properties that a binary relation \(R\) on a set \(S\) may satisfy:

When a relation satisfies these three properties, it is called an equivalence relation. Intuitively, an equivalence relation \(R\) classifies elements into sets which are all the “same” under the relation. Things are the same as themselves, and if \(x\) is the same as \(y\), then \(y\) is the same as \(x\), finally sameness is transitive. Consider the set of things which stand in an equivalence relation with some element of a set \(x\), \(\{y \mid R(x,y)\}\). This set is called the equivalence class of $x$, and is denoted \([x]_R\).

\[[x]_R=\{y \mid R(x,y)\}\]

Any equivalence relation induces a partition of the set it is defined on into some number of disjoint equivalence classes. In other words, the relation separates the set \(S\) into some number of non-empty subsets which contain every element and don’t overlap. This partition—the set of all equivalence classes—is denoted \({S}/{R}\)

The Myhill-Nerode relation \(\equiv_L\) is an equivalence relation on \(V^*\). First note that it is reflexive since \(x \equiv_L x \iff \forall z[xz \in L \iff xz \in L]\), it is symmetric since the condition on membership (\(xz \in L \iff yz \in L\)) is not sensitive to order. Finally, it is transitive since if \(x \equiv_L y\) and \(y \equiv_L q\) then \(\forall z[xz \in L \iff yz \in L]\) and \(\forall z[yz \in L \iff qz \in L]\) but these last two statements imply \(\forall z[xz \in L \iff qz \in L]\).

In other words, the Myhill-Nerode relation is an equivalence relation on \(V^*\) and thus partitions \(V^*\) into some number of non-empty, disjoint subsets the union of which contains all of \(V^*\). This set of equivalence classes is denoted \({V^*}/{\equiv_L}\).

Myhill-Nerode Theorem: A language \(L\) is regular if and only if the partition induced by \(\equiv_L\) has a finite number of components.

The intuition behind the Myhill-Nerode Theorem is that we can categorize every string in \(V^*\) by the state that it drives the finite state automaton \(\mathcal{A}_L\) into.

To prove the theorem, we need to show two things. First, we need to show that every regular language \(L\) (that is, every language that can be represented with an FSA) has a Myhill-Nerode relation which partitions \(L\) into a finite number of components. To see intuitively why this is the case, simply see that if the FSA ends up in the same state after reading in $x$ as $y$, then \(x \equiv_L y\). Since \(Q_{\mathcal{A}}\) is finite, the number of equivalence classes is also finite.

Then we need to show that given some Myhill-Nerode relation \(\equiv_L\) for the language \(L\) we can construct a finite state automaton. This direction is harder, but intuitively, we can introduce a state \(q\) for each class in the partition, so long as there are only a finite number of them.

An Example: \(a^n b^n\)

Consider the set \(L=a^n b^n\). For any two prefixes \(a^k\) and \(a^l\), with \(k \neq l\) \(a^k \not\equiv_L a^l\) because \(a^k b^k \in L\) and \(a^l b^l \in L\) but \(a^k b^l\) is not. But since there are an infinite number of pairs \(a^k\) and \(a^l\) then the number of partitions induced by \(\equiv_L\) cannot be finite.

In fact, you can show that \(\equiv_L\) induces a partition of \(\{a,b\}^*\) into an infinite number of equivalence classes of three kinds: \(G_k=\{a^k\} \ \forall[k \geq 0]\), \(H_k= \{a^{(n+k)} b^n \mid n \geq 1 \} \forall[k \geq 0]\), and all of the other strings in \(V^*\), i.e. \(E = V^* - (G_k \cup H_k) = V^* - \{a^n b^m\ |\ m \leq n\}\). In other words, the states of whatever machine can recognize \(a^n b^n\) have to be able to count! What kinds of machines do we need to do this?

Using Myhill-Nerode

Is natural language regular? Consider the following set of sentences.

I know that the cats [that the dogs pursue] hunt.

I know that the cats [that the dogs [that the cats [that the dogs pursue] hunt] pursue] hunt.

I know that the cats [that the dogs [that the cats [that the dogs [that the cats [that the dogs pursue] hunt] pursue] hunt] pursue] hunt.

This set of strings has the form

I know (that the cats that the dogs)\(^n\) (pursue hunt)\(^n\)

First, let’s intersect English with the regular language

I know (that the cats that the dogs)\(^i\) (pursue hunt)\(^j\)

This gives back the set of strings described above. Next, let’s apply the following homomorphism.

\[\begin{eqnarray} h(\textit{dogs}) & = &a\\ h(\textit{cats}) & = & a\\ h(\textit{hunt}) & = & b\\ h(\textit{pursue}) & = & b\\ h(x) & = & \epsilon \ \forall\ x \notin \{\textit{dogs},\textit{cats},\textit{hunt},\textit{pursue}\} \end{eqnarray}\]

This gives us back the set \(a^n b^n\), which we have already seen, is not a regular language. Thus, English cannot be a regular language.

This argument may seem suspect due to the unnaturalness of the center embedded sentences above. In reality, it is not arguments like this that lead us away from finite-state models of natural language syntax, but rather arguments based on parsimony or compression. There are many much more natural phenomena in natural language which can receive far more elegant treatment by the more powerful formalisms that we are about to introduce.


27 Finite-State Automata 29 Determinism and Non-Determinism