Formal Languages and Chomsky Hierarchy (Alan Du, 3/19/13)

This was a pretty information dense lecture, which was more theoretical computer science oriented. Sorry that it was pretty disjoint – I was pretty tired.

TOPICS (thanks to Megan Chao): Formal language (subset of *), alphabets (), strings, strings as functions (length, position, concatenation, reversal, kleene operators), grammars (generative vs. analytical), finite state acceptors (FSAs), production rules, formal grammar (, N, P ,S), Chomsky-Schützenberger Hierarchy of formal grammars (from most general to least: type 0- recursively enumerable, type 1- context sensitive, type 2- context-free, type 3- regular), the Church-Turing thesis, lambda calculus, some basic LISP, and Peano axioms.

Powerpoint available (also in Google Drive Resources collection).

More detailed notes:

The Alphabet
We define an alphabet as an nonempty finite set of symbols, usually denoted with Σ. A couple of really common examples are the Roman alphabet (i.e. {a, b, c, d, …, z}) and the binary alphabet ({0, 1}). An important note is that the alphabet does not have to be made of letters. For example, with syntax, it’s usually more convenient to think of the alphabet of a list of words, rather than a list of letters.

A string is a sequence of symbols from an alphabet. We define concatenation, represented with the ○ symbol. We also define the Kleene operators, which are the sets of repeated strings, along with the empty string, ϵ.

Formal Languages and Grammars
We define a formal language as any subset of Σ* (the set of all strings). The rule(s) which determine whether a string is in this set is called the grammar. We talked briefly about analytic vs generative grammars. In a nutshell, an analytic grammar is like a function that takes a string and returns either grammatical or ungrammatical. Grammatical strings are included in the language, and ungrammatical ones are excluded.

Generative grammars are rules that allow you to generate all possible strings in the language. Generative grammars are usually composed of rewrite rules (A → B). Essentially, we replace all occurrences of A w/ B, where A and B are two arbitrary strings. We’ve seen these before in our syntactic trees and X-bar theory.

Chomsky Hierarchy
We talked briefly about the Chomsky-Schützenberger hierarchy, invented by Noam Chomsky and Marcel-Paul Schützenberger. It essentially is a broad classification for types of languages. I also enumerated, without proof, lots of different definitions of each language type. Because there are so many different definitions, we have some intuition that these language types are somehow important. For more info, check this or the powerpoint out.
Regular ⊂ Context-Free ⊂ Context-Sensitive ⊂ Recursively Enumerable.

We also branched out into a lot of theoretical computer science. We talked about the Church-Turing thesis, and showed that some problems are undecidable, and hence uncomputable (see the Halting problem). I briefly went over lambda calculus and defining arithmetic based on LISP list operations and counting empty lists. This is not how arithmetic is actually defined in LISP, and we glossed over lots of the details (e.g. dotted pairs). For more details about the original LISP, see here. I also briefly went over Peano’s axioms, which define the natural numbers, and how arithmetic could be defined based on the successor function. It seems that only Functions kids did the monsters in the water, so be psyched for that in Logic.

About Alan Du

I'm one of the founders and co-presidents of this club. I also maintain this website. My main interests are all about cognition and intelligence. The idea that a bunch of atoms can combine and form something self-aware is absolutely fascinating. Linguistically, I'm interesting in integrating theoretical syntax with NLP, grammar inference, figuring out how the brain processes language, and creating a program with true artificial language capacities.

Leave a Reply

Your email address will not be published. Required fields are marked *