As emphasized in Classes of Languages and Universal Grammar, we are really interested in the question of defining a modeling framework that captures possible human languages, rather than individual human languages. From this perspective, it is useful to ask what class of languages can be defined by strictly-local grammars and automata.

The strictly \(n\)-local languages over vocabulary \(V\) are those languages for which there exist strictly \(n\)-local grammars.

\[\mathrm{SL}_n = \left\{L \in \mathcal{P}({V^{*}}) \,\middle|\, (\exists \mathcal{G})[L=L(\mathcal{G}) \wedge \mathcal{G}\ \mathrm{is\ strictly}\ n\mathrm{-local} ]\right\}\]

In other words, the class of all SL\(_n\) languages (with respect to alphabet \(V\)) is the class of all strictly \(n\)-local languages.

The strictly local languages are the union of all the strictly \(n\)-local languages.

\[\mathrm{SL} = \bigcup\limits_{n \in \mathbb{N}} \mathrm{SL}_{n}\]

Hierarchies

One of the dual goals of this course is to understand how different kinds of approaches to grammar can define different kinds of patterns in strings or sentences. One of the most useful tools in the toolbox of formal language theory is the notion of a hierarchy of classes of formal languages. A hierarchy of formal languages (or models), is an ordered sequence of classes of formal languages where classes later in the sequence contain classes earlier in the sequence. This means that patterns that can be captured by models lower in the hierarchy can be captured by models higher in the hierarchy, but not vice versa.

The strictly-local languages represent our first example of a hierarchy of such formal languages.

To see this, consider the relationship between \(\mathrm{SL}_n\) and \(\mathrm{SL}_l\) for \(n \neq l\)? In general, we claim that \(\mathrm{SL}_n \subsetneq \mathrm{SL}_{n+i} \forall n > 1, i \geq 1\). In other words,

\[\mathrm{SL}_1 \subsetneq \mathrm{SL}_2 \subsetneq \mathrm{SL}_3 \subsetneq \dots\]

How can we prove this? We need to show two things.

First, we need to show that each class in the hierarchy is contained within the subsequent class. That is, \(\forall n[(\forall i\geq 1)[L \in \mathrm{SL}_n \implies L \in \mathrm{SL}_{n+i}]]\). Then we need to show that each class is proper subset of the next. That is, \(\forall n[(\exists L)[L \in \mathrm{SL}_{n+1} \wedge L \notin \mathrm{SL}_{n}]]\).

Weak Generative Capacity

It is useful at this point to introduce some additional terminology. The generative capacity of a formalism (type of grammar or type of automaton) defined as the sets that that formalism can be used to define. When we specifically care about the string languages definable by a formalism, the term weak generative capacity is used. In this course, we will often examine the weak generative capacity of various formalisms—finding surprising cases where similar formalisms define different formal languages and different formalisms define identical sets of formal languages. Weak generative capacity is often contrasted with strong generative capacity which is typically used to mean the capacity of a formalism to generate certain strings using some particular kind of computation (e.g., using certain computation graphs).


19 Dependencies between Words 21 N-gram Models