Monthly Archives: March 2013

Formal Languages and Chomsky Hierarchy (Alan Du, 3/19/13)

This was a pretty information dense lecture, which was more theoretical computer science oriented. Sorry that it was pretty disjoint – I was pretty tired.

TOPICS (thanks to Megan Chao): Formal language (subset of *), alphabets (), strings, strings as functions (length, position, concatenation, reversal, kleene operators), grammars (generative vs. analytical), finite state acceptors (FSAs), production rules, formal grammar (, N, P ,S), Chomsky-Schützenberger Hierarchy of formal grammars (from most general to least: type 0- recursively enumerable, type 1- context sensitive, type 2- context-free, type 3- regular), the Church-Turing thesis, lambda calculus, some basic LISP, and Peano axioms.

Powerpoint available (also in Google Drive Resources collection).

More detailed notes:

The Alphabet
We define an alphabet as an nonempty finite set of symbols, usually denoted with Σ. A couple of really common examples are the Roman alphabet (i.e. {a, b, c, d, …, z}) and the binary alphabet ({0, 1}). An important note is that the alphabet does not have to be made of letters. For example, with syntax, it’s usually more convenient to think of the alphabet of a list of words, rather than a list of letters.

A string is a sequence of symbols from an alphabet. We define concatenation, represented with the ○ symbol. We also define the Kleene operators, which are the sets of repeated strings, along with the empty string, ϵ.

Formal Languages and Grammars
We define a formal language as any subset of Σ* (the set of all strings). The rule(s) which determine whether a string is in this set is called the grammar. We talked briefly about analytic vs generative grammars. In a nutshell, an analytic grammar is like a function that takes a string and returns either grammatical or ungrammatical. Grammatical strings are included in the language, and ungrammatical ones are excluded.

Generative grammars are rules that allow you to generate all possible strings in the language. Generative grammars are usually composed of rewrite rules (A → B). Essentially, we replace all occurrences of A w/ B, where A and B are two arbitrary strings. We’ve seen these before in our syntactic trees and X-bar theory.

Chomsky Hierarchy
We talked briefly about the Chomsky-Schützenberger hierarchy, invented by Noam Chomsky and Marcel-Paul Schützenberger. It essentially is a broad classification for types of languages. I also enumerated, without proof, lots of different definitions of each language type. Because there are so many different definitions, we have some intuition that these language types are somehow important. For more info, check this or the powerpoint out.
Regular ⊂ Context-Free ⊂ Context-Sensitive ⊂ Recursively Enumerable.

We also branched out into a lot of theoretical computer science. We talked about the Church-Turing thesis, and showed that some problems are undecidable, and hence uncomputable (see the Halting problem). I briefly went over lambda calculus and defining arithmetic based on LISP list operations and counting empty lists. This is not how arithmetic is actually defined in LISP, and we glossed over lots of the details (e.g. dotted pairs). For more details about the original LISP, see here. I also briefly went over Peano’s axioms, which define the natural numbers, and how arithmetic could be defined based on the successor function. It seems that only Functions kids did the monsters in the water, so be psyched for that in Logic.

UMD Visit

The UMD trip has been scheduled for 4/24/2013, which is a Wednesday (even day). There will be a talk at the LingBrains meeting, which focuses mostly neurolinguistics.

There will also be a a CLIP (Computer Linguistics & Information Processing) colloquium, although this may be too technical and inaccessible for us. However, there are lots of people from the CLIP Lab1 (ignore the security warning) that are willing to talk to us. Some other people have also volunteered to come talk with us, depending on our interests. So, send us a an email if you have a topic you really want to hear about at

For a general feel for the research that UMD does, take a look at the UMD Linguistics page and the IGERT page.

Cognitive Control and Bilingualism (Guest Speaker Susan Teubner-Rhodes 3/12/13)

Our visitor this Tuesday was Susan Teubner-Rhodes, a graduate student in the Psychology Department and Neuroscience and Cognitive Science Program at UMD who researches cognitive control and language processing. She spoke to us about cognitive control and bilingualism in general.

Susan asked that we read pages 240-245 of Bilingualism: consequences for the mind, which is available in the Google Drive Resources collection.

TOPICS: Bilingual cognition, balanced bilingualism, interference suppression, activation of both languages in bilinguals (cross-linguistic priming, verbal fluency), inhibition of irrelevant languages (asymmetrical/trilingual switch costs), conflict monitoring, interference vs. facilitation, low vs. high monitoring, and brain activity for conflict detection

Slideshow available in Google Drive Resources collection.

Thanks to Megan Chao for notes.

Jimmy Lin Study Group

Next Meeting: Tentatively on 4/6/13 at Rockville Library, 1-3. There’ll hopefully be some decent internet.

Homework: Download the Stanford toolkit. You’re looking for the jar files here, people. Start playing around with them. Also, if you haven’t already, install git and ant.

Daniela’s Notes

  • Will download and analyze tweets
  • NLP analysis in layers: stream of text -> tokenization -> morphological analysis -> POS tagging -> syntactic analysis -> named entity detection -> semantic role labeling
  • Stanford NLP toolkit does most of these for you. See demo and start playing around with it
  • Two types of parsing: constituent (trees) and dependency (links between words)
  • Most NLP tools are more engineering and statistically motivated, rather than linguistically motivated. So expect lots of counting, and not a lot of syntax
  • Hooking Stanford toolkit up to Twitter stream is a bad idea, because it’s made for newspaper-type sentences.
  • Amazon EC2 and cloud computing. You can rent virtual machines rather than buying a computer. Very useful if you need varying capacities, and $.05/machine-hour (extra for bandwidth).
  • Really easy sentiment analysis: separate Tweets with w/ 🙂 from those w/ 🙁
  • Mistakes in NLP not too worrying, as long as you make the same mistakes consistently.

Website Skeleton Up

We’ve got the website skeleton up and running. Now, we just need to fill in some meat. Here’s a quick to-do list:

  • Find a better theme. The default one’s too plain.
  • Fill in some lecture notes. Maybe integrate it with the schedule Google Doc, if possible, so we know what each lecture covered.
  • Make the website prettier. I’m sure there’s a lot of formatting things to be done

If you want to help with any of these, come talk to us (club presidents). Or shoot us an email

Sociolinguistics and Discourse Analysis (Guest Speaker Daniel Ginsberg, 3/5/13)

Daniel Ginsberg, a graduate student at Georgetown, came this Tuesday to give us an introduction to sociolinguistics, and some more detail about discourse analysis.

TOPICS: Context, Schiffrin’s model of discourse cohesion (exchange structure, action structure, ideational structure, information structure, and participation framework), and analyzing discourse cohesion in data using conversation transcripts

There was no slideshow.

Thanks to Megan Chao for notes.