# 1 Introduction

This is a course on the foundations of computational linguistics.

# Goals

Most courses taught on computational linguistics or natural language
processing tend to be fairly advanced *survey* courses, assuming a
large amount of knowledge of computer science and covering a variety
of somewhat unrelated modeling techniques, problem domains, and
tasks. This course takes a different approach, seeking to build up a
small set of conceptual tools and empirical methods from the ground
up.

Some specific goals of the course include:

- Learning foundational conceptual tools needed to understand computational models of linguistic structure.
- Understanding the notion of a model of computation and how it can be applied to understand aspects of linguistic structure.
- Understanding how to develop empirical evaluations of such models.
- Learning a bunch of specific, useful skills such as programming in ClojureScript.

# Content

The modern study of language originated in the 1950s when tools from
the recently-emerged theory of computation started to be applied to
the problem of language. One important early result in this area was
the *Chomsky Hierarchy* which gave a containment hierarchy of
*phrase-structure grammars* or grammars built out of *rewrite rules*
like the following:

Chomsky showed that restricting the form of these rules into four
classes (*left-linear*, *context-free*, *context-sensitive*, and
*unrestricted*) gave rise to classes of formal languages which were
strictly nested.

The Chomsky hierarchy remains a foundational result in theoretical computer science, and forms a basic entry point for studying formal grammars in linguistics. The conceptual tools underlying the hierarchy are useful for understanding even modern systems such as recurrent or recursive neural networks.

In COMPLING 445/645 we will study the Chomsky Hierarchy up to the
*left-linear* languages. We will cover refinements of this hierarchy
and how to work with it extensions using probability. Along the way,
we will introduce the foudnations of probability theory and Bayesian
inference.

# What this course is not

- A survey of current approaches in NLP (see COMP 550 below).
- A course about some specific software (e.g., Python libraries).
- A course that expects a lot of background knowledge (we will try to build everything up from first concepts, but it will be fast).

What you need: curiosity, ambition, fearlessness, and ability to use Google to find answers to technical problems.

# What is the difference betwen COMP 550, COMP/LING 445/645, and COMP/LING 596/483/682?

These three courses cover somewhat overlapping but mostly complementary material, with differences in assumed background knowledge and focus. While some topics will appear in all courses, the emphasis will be different enough that you can fruitfully take all three courses (if you are an undergraduate student).

COMP 550 assumes more computational background, including deep familiarity with probability and algorithms. COMP 550 focuses on technological perspectives of natural language processing as a subdiscipline of artificial intelligence. I includes a strong machine learning component. It covers computational semantics, discourse, and applications such as automatic summarization and machine translation, which COMP/LING 445 does not. COMP/LING 445 would be a valuable class to have taken before 550.

COMP/LING 445 goes in depth into the fundamental and formal mathematical and linguistic principles that underlie modern computational linguistics, with an emphasis on applications in linguistic analysis. It draws connections to the theory of computation (automata theory), and rigorously derives some of models that form the basic analytical toolbox of computational linguistics, which COMP 550 does not. The tools taught in this course can serve as the foundation for later coursework in computational linguistics and NLP.

COMP/LING 596/483 is a hands-on introduction to working with linguistic data. It assumes less programming and mathematical background than either COMP 550 or COMP/LING 445. It is the least advanced and most applied of the three courses. It’s focus will be on the data science of language and as such it will make use of both engineering examples as well as scientific examples, like working with data derived from psycholinguistic experiments.

# Programming

We will use Clojure, a LISP based on the \(\lambda\)-calculus. Clojure has several advantages.

- It is functional
- It is simple to learn, but powerful.
- It is increasingly important (especially thanks to ClojureScript).
- LISP and the \(\lambda\)-calculus are pervasive in linguistics and have been used frequently in the history of AI.
- It is related to a number of
*probabilistic programming languages*. - It allows metalinguistic abstraction.

# Resources

General LISP and Clojure learning resources:

- SICP — this is for MIT Scheme, but still one of the best programming books ever written.
- SICP in Clojure
- Clojure
- ClojureScript