This is a course on the foundations of computational linguistics.

Goals

Most courses taught on computational linguistics or natural language processing tend to be fairly advanced survey courses, assuming a large amount of knowledge of computer science and covering a variety of somewhat unrelated modeling techniques, problem domains, and tasks. This course takes a different approach, seeking to build up a small set of conceptual tools and empirical methods from the ground up.

Some specific goals of the course include:

Content

The modern study of language originated in the 1950s when tools from the recently-emerged theory of computation started to be applied to the problem of language. One important early result in this area was the Chomsky Hierarchy which gave a containment hierarchy of phrase-structure grammars or grammars built out of rewrite rules like the following:

\[\texttt{S} \longrightarrow \texttt{NP}\ \texttt{VP}\]

Chomsky showed that restricting the form of these rules into four classes (left-linear, context-free, context-sensitive, and unrestricted) gave rise to classes of formal languages which were strictly nested.

\[\mathcal{L}_{\texttt{LL}} \subset \mathcal{L}_{\texttt{CF}} \subset \mathcal{L}_{\texttt{CS}} \subset \mathcal{L}_{\texttt{UR}}\]

The Chomsky hierarchy remains a foundational result in theoretical computer science, and forms a basic entry point for studying formal grammars in linguistics. The conceptual tools underlying the hierarchy are useful for understanding even modern systems such as recurrent or recursive neural networks.

In COMPLING 445/645 we will study the Chomsky Hierarchy up to the left-linear languages. We will cover refinements of this hierarchy and how to work with it extensions using probability. Along the way, we will introduce the foudnations of probability theory and Bayesian inference.

What this course is not

What you need: curiosity, ambition, fearlessness, and ability to use Google to find answers to technical problems.

What is the difference betwen COMP 550, COMP/LING 445/645, and COMP/LING 596/483/682?

These three courses cover somewhat overlapping but mostly complementary material, with differences in assumed background knowledge and focus. While some topics will appear in all courses, the emphasis will be different enough that you can fruitfully take all three courses (if you are an undergraduate student).

COMP 550 assumes more computational background, including deep familiarity with probability and algorithms. COMP 550 focuses on technological perspectives of natural language processing as a subdiscipline of artificial intelligence. I includes a strong machine learning component. It covers computational semantics, discourse, and applications such as automatic summarization and machine translation, which COMP/LING 445 does not. COMP/LING 445 would be a valuable class to have taken before 550.

COMP/LING 445 goes in depth into the fundamental and formal mathematical and linguistic principles that underlie modern computational linguistics, with an emphasis on applications in linguistic analysis. It draws connections to the theory of computation (automata theory), and rigorously derives some of models that form the basic analytical toolbox of computational linguistics, which COMP 550 does not. The tools taught in this course can serve as the foundation for later coursework in computational linguistics and NLP.

COMP/LING 596/483 is a hands-on introduction to working with linguistic data. It assumes less programming and mathematical background than either COMP 550 or COMP/LING 445. It is the least advanced and most applied of the three courses. It’s focus will be on the data science of language and as such it will make use of both engineering examples as well as scientific examples, like working with data derived from psycholinguistic experiments.

Programming

We will use Clojure, a LISP based on the \(\lambda\)-calculus. Clojure has several advantages.

Resources

General LISP and Clojure learning resources:


2 Models of Computation