Here’s a brief summary of the lecture I gave yesterday to the incoming freshman. Slides can be found here.
We started by looking at question formation. We assumed there was some kind of operation that transformed every sentence into some kind of question. For example:
1 b) Can John sing?
2 b) Is Mary singing?3 a) They are sleeping.
3 b) Are they sleeping?
In all of these cases (and in fact, every case with auxiliaries), we move the auxiliary verb in front of the subject. But what happens when we have two auxiliaries? Well, then we can choose to move either the first auxiliary or the second auxiliary:
4 b) Is he saying that she should sleep?
*4 c) Should he is saying that she sleep?
5 b) Will I get the books that are about computers?
*5 c) Are I will get the books that about computers?
In these examples, and most examples in English, you move the first auxiliary to make a question. If you move the wrong auxiliary (marked with *), you get word salad: total nonsense. But what about these examples?
*6 b) Is the dog that sleeping will run?
6 c) Will the dog that is sleeping run?
*7 b) Am the fact that I sleeping should surprise you?
7 c) Should the fact that I am sleeping surprise you?
In these examples, we need to move the second auxiliary. So that means choosing which auxiliary to move must involve something more than just “pick the first auxiliary” or “pick the second auxiliary”.
To answer what kind of rule question formation needs, we need to turn to constituency. If we look at the two types of sentences we saw, they have structures like this:
(If you need a briefing on syntax trees, check out these notes from last year.)
From these syntax trees, it becomes clear that we need to choose the auxiliary that’s closest to the subject DP.
What we’re basically arguing is that English has hierarchical structure, rather than a strictly linear one. The question formation island constraints give us a pretty good argument for English, but what about other languages? Is hierarchical structure a natural law governing all language (i.e. principle) or just something that happens to be true of English (i.e. parameter)?
One way to do this would be to look at all the languages in the world (about 7000 living ones). But no one’s done a syntactic analysis of all of these languages; there’s just not enough time. We don’t even know exactly how many languages there are. Instead, we’re going to use babies as our evidence. If babies consistently exhibit a certain pattern that isn’t learned, then we can be pretty sure it’s built-in.
In 1987, Crain and Nakayama published a paper called Structure Dependence in Grammar Formation. In their paper, they looked at baby corpora (large samples of texts spoken to and by small children). They found two key things. First, they learned that babies almost never hear sentences like 8a, just because people tend to simplify their sentence structure when talking to babies. Nevertheless, babies never form questions like 8b.
*8 b) Is the dog that sleeping will run?
8 c) Will the dog that is sleeping run?
Because the babies never saw sentences like 8a, they had no reason not to just move the first auxiliary. But they always moved the second auxiliary in these types of sentences.
This is a type of poverty of stimulus argument. A POS argument goes roughly like this: there’s some linguistic phenomena (islands — bits of the sentence that can’t be moved to the front) that require some kind of knowledge (hierarchical structure). Babies demonstrate the knowledge, despite having no way of actually learning it. The conclusion is that the knowledge is somehow innate to the baby; it’s “built-in” to the universal grammar.
We now have evidence that all languages have hierarchical grammar. But why? Why should we have hierarchical grammar, rather than the (arguably simpler) linear grammar?
To answer that, we’re going argue that hierarchical grammar is actually simpler than linear grammar. And because it’s simpler, hierarchical grammar is more likely to evolve (via natural selection) than a linear grammar. Occam’s Razor is doubly important in linguistics because language is only ~100,000 years old (estimates vary from 50,000 to 400,000 years old). 100,000 years isn’t enough time to evolve any kind of complex system, so the language faculty must be as simple as possible.
To do this, we’re going to have to look back to rewrite rules. Rewrite rules are a way of displaying all possible linguistic structures. If we take the set of hierarchical and linear rewrite rules that can completely describe English, then we could use Kolmogorv complexity to see which grammar (set of rules) is simpler. Unfortunately, Kolmogorov complexity id uncomputable. So we’ll approximate it with the Minimum Description Length: essentially, we’ll just count the number of rewrite rules. The grammar with the least number of rewrite rules is simpler.
There’s a program to do it called the Linguistica project. But minimum description length grammatical inferences (getting the rewrite rules) is provably NP-hard. So for any reasonably sized set of sentences, it’d take almost literally forever to get the MDL grammar.
Instead, we’ll turn to the Chomsky Hierarchy. By definition, linear grammars are regular, and hierarchical grammars are context-free. And because regular grammars are a subset of context-free grammars, we can prove any MDL hierarchical grammar will be at least as simple as an MDL linear grammar.
As a side note, we can prove that no regular grammar can describe English.