Jimmy Lin Study Group

Next Meeting: Tentatively on 4/6/13 at Rockville Library, 1-3. There’ll hopefully be some decent internet.

Homework: Download the Stanford toolkit. You’re looking for the jar files here, people. Start playing around with them. Also, if you haven’t already, install git and ant.

Daniela’s Notes

  • Will download and analyze tweets
  • NLP analysis in layers: stream of text -> tokenization -> morphological analysis -> POS tagging -> syntactic analysis -> named entity detection -> semantic role labeling
  • Stanford NLP toolkit does most of these for you. See demo and start playing around with it
  • Two types of parsing: constituent (trees) and dependency (links between words)
  • Most NLP tools are more engineering and statistically motivated, rather than linguistically motivated. So expect lots of counting, and not a lot of syntax
  • Hooking Stanford toolkit up to Twitter stream is a bad idea, because it’s made for newspaper-type sentences.
  • Amazon EC2 and cloud computing. You can rent virtual machines rather than buying a computer. Very useful if you need varying capacities, and $.05/machine-hour (extra for bandwidth).
  • Really easy sentiment analysis: separate Tweets with w/ 🙂 from those w/ 🙁
  • Mistakes in NLP not too worrying, as long as you make the same mistakes consistently.

About Alan Du

I'm one of the founders and co-presidents of this club. I also maintain this website. My main interests are all about cognition and intelligence. The idea that a bunch of atoms can combine and form something self-aware is absolutely fascinating. Linguistically, I'm interesting in integrating theoretical syntax with NLP, grammar inference, figuring out how the brain processes language, and creating a program with true artificial language capacities.

Leave a Reply

Your email address will not be published. Required fields are marked *