Dr. Lin’s NLP Study Group (4/13/13)

We will be meeting Dr. Lin from 1-3 on 4/13 at Quince Orchard Library. As a reminder, last time’s “homework” was to:

  • Look at this Java package. Try to get it running and tap into the public sample Twitter stream. (Sam’s created a package to help).
  • Download and play w/ the Stanford NLP tools. Play with some POS tagging, NE tagging, parsing, etc. Learn the API.
  • Think of interesting project ideas

Jimmy Lin Study Group

Next Meeting: Tentatively on 4/6/13 at Rockville Library, 1-3. There’ll hopefully be some decent internet.

Homework: Download the Stanford toolkit. You’re looking for the jar files here, people. Start playing around with them. Also, if you haven’t already, install git and ant.

Daniela’s Notes

  • Will download and analyze tweets
  • NLP analysis in layers: stream of text -> tokenization -> morphological analysis -> POS tagging -> syntactic analysis -> named entity detection -> semantic role labeling
  • Stanford NLP toolkit does most of these for you. See demo and start playing around with it
  • Two types of parsing: constituent (trees) and dependency (links between words)
  • Most NLP tools are more engineering and statistically motivated, rather than linguistically motivated. So expect lots of counting, and not a lot of syntax
  • Hooking Stanford toolkit up to Twitter stream is a bad idea, because it’s made for newspaper-type sentences.
  • Amazon EC2 and cloud computing. You can rent virtual machines rather than buying a computer. Very useful if you need varying capacities, and $.05/machine-hour (extra for bandwidth).
  • Really easy sentiment analysis: separate Tweets with w/ 🙂 from those w/ 🙁
  • Mistakes in NLP not too worrying, as long as you make the same mistakes consistently.