We have identified grammaticality as our empirical target phenomenon. Now we would like to build up a set of mathematically precise models of the phenomenon. To do this, we need a mathematical idealization of this concept. We build this idealization on the foundation of set theory.

Set Theory

The basic notion of set theory is the set. A set is just a collection of things, called objects or elements. The most fundamental relationship in set theory is membership if an object \(o\) is a member of set \(S\), we write \(o \in S\) and also say that \(o\) is an element of \(S\).

We often write sets using bracket notation like this \(\{ o_1, o_2\}\) which represents a set with two elements \(o_1\) and \(o_2\). We can also define sets using set-builder notation

\[\{x \mid x \in \mathbb{N} \wedge x \geq 10 \}\]

Here we use a variable \(x\) which ranges over set elements and put some logical conditions on those elements to the right of the symbol \(\mid\), which is read as “such that.” Thus we read the above notation as “all \(x\) such that \(x\) is a natural number and is greater than or equal to \(10\).”

An important thing about sets is that their elements are unique. That is, they don’t allow copies of the “same” element (it is up to you how to define “sameness,” of course). Thus, there is one set, the empty set containing no elements, and it is written as \(\{\}\) or \(\emptyset\) or \(\varnothing\). If a set contains only one element, we say it is a singleton set. Note that a singleton set containing an element is not the same thing as the element itself \(\{1\} \neq 1\)!

In general, the size or cardinality of a set \(S\) is written \(|S|\). For instance, \(|\{1,2,3\}| =3\).

If one set \(A\) contains only elements that are also contained in another set \(B\) we say that \(A\) is a subset \(B\) and write \(A \subseteq B\). If it is also the case that \(A \neq B\) (that is, there are elements of \(B\) not in \(A\) and/or vice versa) then we say that \(A\) is a proper subset \(B\) and write \(A \subset B\).

Set theory has a number of other operations on sets which will be important to our model building below. There are several binary operators on pairs of sets.

There is also an important unary operator.

Note that any set has the empty set \(\emptyset\) as a subset, so the power set of any set always includes the empty set as an element.

Besides sets and elements of sets, another important kind of mathematical object in set theory is the tuple. A tuple is an ordered sequence of objects. We write will write them as follows \(\langle o_1, o_2 \rangle\). This tuple is an ordered pair since it has two elements. Longer tuples are usually known as \(n\)-tuples. With \(n\) referring to the arity of the tuple.

An operation in set theory for forming tuples is the:

We can also iterate the Cartesian product to generate all the \(n\)-tuples of the product of \(n\) sets \(A_1 \times \dots \times A_n\). When the Cartesian product is extended to \(n\) places on a single set, we often write \(A^n\).

This leads us to our final kind of objects in set theory: relations and functions. A (two-place) relation on sets \(A\) and \(B\) is a set \(R \subseteq (A \times B)\). In other words, it is just a subset of the set of all possible pairs of elements in the two sets. An \(n\)-ary relation \(R\) on a sequence of sets \(A_i, i=1,...,n\) is a subset of \(A_1 \times A_2, \dots, A_n\), i.e., \(R \subseteq A_1 \times A_2, \dots, A_n\). Often we talk about relations defined on tuples of sets with themselves, for example, \(R \subseteq (A \times A)\).

A function is a relation where each element in the first position is only associated with a single element in the second \(\langle x, y \rangle \in R \implies \langle x, y^\prime \rangle \notin R\ \forall y^\prime \neq y\). That is, a function is a relation where each input is associated with a single output. The set from which the first element of a function is taken is called the domain of the function and the set which each element of the domain is mapped to is called the codomain of the function. We often write functions as \(f \colon X \to Y\) when we want to specify the domain and range of the function. A function is partial if it is only defined for some elements of its domain, and total if it is defined for all elements of its domain. The image of a function \(f\) refers to the subset of elements of the codomain \(y \in Y\) for which there exists some element of the domain such that \(f(x) \mapsto y\). The preimage of \(Y\) is the subset of the domain for which the function is defined.

A function whose image is equal to its codomain is called surjective or onto. A function for which every element of the codomain is in the image of at most one element of the domain is called injective, or one-to-one. A function which is both surjective and injective is called bijective or invertible.

One special and important kind of function is the characteristic function or indicator function of some set \(A \subseteq B\), written \(\mathbf{1}_A \colon B \to \{ \top, \bot \}\). The characteristic function \(\mathbf{1}_A\) assigns true (\(\top\)) to every element of \(B\) that is also in \(A\) and false (\(\bot\)) to every element of \(B\) that is not in \(A\).

Lexica and Strings

To begin our idealized model of grammaticality, let’s assume that we have a fixed, finite set of words in our language. We will write this set \(V\) which stands for vocabulary of our language, sometimes this is called the lexicon—the technical term linguists use for the inventory of words (or morphemes) in a language. In formal language theory, this is primitive set of symbols is often also called the alphabet and is typically written \(\Sigma\). But, since we are focused on sentence structure in this course we will use the term vocabulary and the symbol \(V\) for this set. As an example, we might have a small vocabulary only consisting of the three words John, loves and mary, that is, \(V = \{\mathrm{John}, \mathrm{loves}, \mathrm{Mary}\}\). When we don’t care much about the actual members of \(V\), and just want to illustrate a formal point, we will just use symbols like \(a\),\(b\) instead of real words.

To begin studying possible and impossible sentences, we will need some notion of a sequence of words. In formal language theory, this is called a string and is any finite sequence of symbols in \(V\). For simplicity in the following examples, let \(V = \{a,b\}\). Some strings over this vocabulary are \(a\), \(aaaab\), \(ababababa\), etc. Note that our notation (which is standard in formal language theory) is ambiguous in the case of single symbols: The symbol \(a\) can both refer to the single alphabet symbol \(a\) and the string containing just that symbol. Most of the time this notation doesn’t lead to any confusion, but if we need to, we can use to special symbols called the start-string symbol \(\rtimes\) and the end-string symbol \(\ltimes\) like this \(\rtimes a \ltimes\). We will sometimes use variables \(x, y, z, ...\) to refer to strings over the vocabulary \(V\), like this \(x = abbabab\).

The length of a string \(|x|\) is the number of symbols in the string. For example, \(|aba|=3\). There is a unique string called the null string or the empty string that has length \(0\) and is usually written \(\epsilon\), i.e., \(|\epsilon|=0\) but could also be written \(\rtimes\ltimes\).

The most basic operation that we can perform on strings is concatenation. If we have two string variables \(x\) and \(y\) we write the concatenation like \(x \cdot y\), or sometimes just \(xy\). For instance, \(ab\cdot ba=abba\).

Concatenation is associative, which means it doesn’t matter what order you do it in, i.e., \((xy)z=x(yz)\), and the null string is an identity for the operation which means concatenating the empty string on anything gives you back the same string: \(x \cdot \epsilon=\epsilon \cdot x=x\). The length of a concatenated pair of strings is the sum of the lengths of each \(|xy|=|x| + |y|\).

A string \(x\) of length \(l\) is similar to an \(l\)-tuple of symbols and throughout the course we will treat these two datastructures as the same. By analogy with the Cartesian product, we define the concatenation of two sets of strings \(L\) and \(L^\prime\) as \((L \cdot L^\prime) =\{xy \mid x \in L \land y \in L^\prime \}\). When we have a set of symbols, \(V\) we write \(V^n\) for the set of all possible strings of length \(n\) over this vocabulary. Two special cases are worth mentioning. First, \(V^0=\{\epsilon\}\). In other words, the set of all length \(0\) strings over a vocabulary is always just the set including the empty string. Second, \(V^1\) is just the set of all length one strings containing all of the vocabulary items. For example, if \(V\) was equal to \(\{a,b\}\) then \(V^1 =\{\rtimes a \ltimes, \rtimes b \ltimes\}\) and \(V^2 =\{\rtimes aa \ltimes, \rtimes ab \ltimes, \rtimes bb \ltimes, \rtimes ba \ltimes\}\).

The set of all possible strings of any length over a vocabulary \(V\) is written \(V^*\), the star symbol here is known as the Kleene star. Thus, \(V^*=\{a\}^*=\{\epsilon,a,aa,aaa,aaaa,...\}\) and \(V^*=\{a,b\}^*=\{\epsilon,a,b,ab,ba,aaa,bbb,aba,bba,abb,...\}\). Note that there are an infinite number of strings in \(V^*\) whenever \(V\) has any elements.

We abuse notation slightly and also write \(x^n\) for \(n\) concatenations of the string \(x\). So if \(x=ab\), \(x^3=ababab\), \(x^1=ab\), and \(x^0=\epsilon\). Similarly, we will use \(x^*\) to mean \(0\) or more copies of string \(x\). So \((ab)^* = \{\epsilon, ab, abab, ababab, \dots\}\)

A prefix of a string \(x\) is an initial substring of \(x\). For instance, the string \(ab\) is a prefix of the string \(abababab\). Note that we consider the empty string a prefix of every other string and every string is a prefix of itself.

A suffix of a string \(x\) is an final substring of \(x\). For instance, the string \(ab\) is also a suffix of the string \(abababab\). We consider the empty string a suffix of every other string and every string is a suffix of itself as well.

Representing Formal Strings

We have now defined a simple idealization of a sentence as a string of words (or symbols). How can we represent this in LISP? In this course, we will use symbols to represent the atomic symbols in \(V\). As our notation has hopefully made clear, a natural representation for formal strings is the list datatype.

(def my-string '(a b b a))
my-string

Note we will not use the string datatype for strings in formal languages.

Concatenation

We have already seen how we can build up lists by using cons together with the null string '(). When using cons to build a string of symbols, we pass in two arguments: a symbol and another string.

(cons 'a my-string)

Can we use cons to define concatenation? What happens if we cons two strings together?

(cons '(a b) '(b a))

We have ended up with the list ((a b) b a). This is a list with another list (a b) in the first position rather than a single list with the elements of both! This is not the desired behavior as an implementation of concatenate. What happened?

Clojure has a primitive operation which takes two lists and produces the result of putting them together to make a single string called concat which will use as our implementation of concatenate.

(concat '(a b) '(b a))

We can also define a predicate prefix? which tests whether one string is a prefix of another.

(defn prefix? [pr str]
  (if (> (count pr) (count str))
      false
      (if (empty? pr)
          true
          (if (= (first pr) (first str))
              (prefix? (rest pr) (rest str))
              false))))

(prefix? '(a b c) '(a b))

A nicer way to write conditionals with multiple cases is with the cond special form.

(defn prefix? [pr str]
  (cond
   (> (count pr) (count str)) false
   (empty? pr) true
   (= (first pr) (first str)) (prefix? (rest pr) (rest str))
   :else false))

(prefix? '(a b c) '(a b c))

This special form has the following format, where each condition is a boolean-valued expression and each consequent is the expression to evaluate just in case that condition holds.

(cond 
  condition1 consequent1
  condition2 consequent2 
  ... 
  :else alternative)

The return value of cond is the consequent corresponding to the first condition which evaluates to true—or the alternative, if none of the conditions are met.

Formal Languages

So far we have defined the formal notion of a string as a model of word sequences in natural language. Our initial goal was however to be able to distinguish between grammatical strings, like colorless green ideas sleep furiously, and ungrammatical ones, like *furiously sleep green ideas colorless. How can we capture this distinction?

One simple way to distinguish between well-formed strings and ill-formed strings is simply to say that the well-formed strings constitute a set, called a formal language, \(L\). Another word that we will use for a formal language is a stringset. Recall that \(V^*\) is the set of all possible strings over some vocabulary \(V\). Thus a formal language is simply a subset of \(V^*\). For any language or pair of languages, the set theoretic operations that we discussed above apply.

In the case of formal languages, we often think about the language specifically with reference to \(V^*\). Thus, \(\mathord{\sim} L\) is the complement in \(V^*\) of \(L\), that is, \(V^* \setminus L\), the set of all possible strings over our lexicon \(V\) that are not in \(L\). Since there are an infinite number of strings in \(V^*\), this set may not be easy to compute.

The concatenation of two formal languages, \(L \cdot L'\), was defined above: Similar to a Cartesian product, the concatenation of two formal languages takes every string in the first language and concatenates it to every string in the second language. That is, it is the set \(\{xy \mid x \in L \wedge y \in L' \}\). Note that if there are \(5\) strings in \(L\) and \(10\) strings in \(L'\) there will be at most 50 strings in \(L \cdot L'\) (there will be fewer than 50 if there are strings that can be built in more than one way).

We write \(L^n\) for \(n\) concatenations of the language \(L\) to itself and \(L^*\) for the set of strings that results from concatenating \(L\) to itself \(n\) times for all \(n\). This is again a slight abuse of notation since we have also defined \(L^n\) to mean the set if \(n\)-tuples of elements of \(L\). Usually the intended meaning will be clear in context, and when it is not, we will be careful to specify what we mean.

Language Homomorphisms

A language homomorphism \(h\) is a function \(h: V_1^* \rightarrow V_2^*\) such that \(h(xy)=h(x)h(y)\) for all \(x,y \in V_1^*\). Intuitively, this property means that the function \(h\) is structure preserving. That is, given two languages \(L_1 \subseteq V_1^*\) and \(L_2 \subseteq V_2^*\), if a string in \(L_1\) is made up of \(x\) concatenated with \(y\), and is mapped to some string \(z \in L_2\), then the images of \(x\) and \(y\) when concatenated also equal \(z\).

This property can be captured in the following diagram

\[\matrix{ V_1^*\times V_1^* &\color{blue}{\xrightarrow{\color{black}{ h\times h}}} & V_2^*\times V_2^* \cr \phantom{\small\mathrm{concat}}\color{red}{\downarrow}{\small\mathrm{concat}} & & \phantom{\small\mathrm{concat}}\color{blue}{\downarrow}{\small\mathrm{concat}} \cr V_1^* & \color{red}{\xrightarrow{\color{black}{h}}} & V_2^* \cr }\]

where \(h\times h\) is just the function which takes a tuple and maps the function \(h\) onto it: \((h\times h) (\langle x,y\rangle) \mapsto \langle h(x) , h(y) \rangle\).

This diagram is meant to illustrate that if you start with any pair of strings in \(V_1^*\times V_1^*\) (upper left), and first concatenate them and then apply \(h\) to the result (the red path), you’ll get the same exact result as if you had instead first applied \(h\) to each of them separately (getting a pair of elements of \(V_2^*\)), and then concatenated the results (blue path). This is what it means to say the diagram commutes, and this fact defines the homomorphism property.

Note that for all homomorphisms, \(h(\epsilon)=\epsilon\) follows from the above definition as well. That is, any homomorphism will map the null string to itself (however, other elements can also be mapped to \(\epsilon\)). It also implies that we can specify a homomorphism by specifying a string in \(L^\prime\) for each input word or symbol in the vocabulary of \(L\), that is, \(V_{L}\). For example, if \(V_{L}=\{a,b\}\), and \(h\) takes \(a \mapsto ccc\), and \(b \mapsto d\) then \(h(abba) = cccddccc\). Finally, it can be shown that any map from all elements of \(V_L\) to strings in \(L^\prime\) specifies a unique homomorphism. Thus, we can define a homomorphism by specifying such a map.

Examples of Formal Languages

There are two different ways we may describe a formal language. We might simply specify the language by specifying the set of strings. Perhaps by listing all of the strings in it. This is an extensional definition. On the other hand we might find a property that is true of all the strings of the set, and none of those outside it. That is, we can describe the conditions for membership, perhaps with a computer program. This is an intensional definition.

Some important formal languages we will encounter in this course include the following:

We will see that many of these formal languages have special properties related to natural language.


4 Well-Formed Sentences and Grammaticality 6 Universal Grammar and Classes of Formal Languages