NACLO Round II results are out! Congratulations to Samuel Zbarsky, who’s an alternate for the US team, and Michelle Noh, who received an award for best solution of problem 1. Sam was on the US team last year, and received an honorable mention at the IOL.
We had six Round II qualifiers from Blair. In order of final ranking: Sam Zbarsky (#11), Victor Xu (#15), Alan Du (#24), Michelle Noh (#78), Daniela Ganelin (#82), and Daniel Amir (#115). Combined with a homeschooled student who qualified from our site, Blair was 5th in the nation for number of qualifiers.
Check back next year for 2014 NACLO information!
We will be meeting Dr. Lin from 1-3 on 4/13 at Quince Orchard Library. As a reminder, last time’s “homework” was to:
- Look at this Java package. Try to get it running and tap into the public sample Twitter stream. (Sam’s created a package to help).
- Download and play w/ the Stanford NLP tools. Play with some POS tagging, NE tagging, parsing, etc. Learn the API.
- Think of interesting project ideas
The UMD trip has been scheduled for 4/24/2013, which is a Wednesday (even day). There will be a talk at the LingBrains meeting, which focuses mostly neurolinguistics.
There will also be a a CLIP (Computer Linguistics & Information Processing) colloquium, although this may be too technical and inaccessible for us. However, there are lots of people from the CLIP Lab1 (ignore the security warning) that are willing to talk to us. Some other people have also volunteered to come talk with us, depending on our interests. So, send us a an email if you have a topic you really want to hear about at email@example.com.
For a general feel for the research that UMD does, take a look at the UMD Linguistics page and the IGERT page.
Next Meeting: Tentatively on 4/6/13 at Rockville Library, 1-3. There’ll hopefully be some decent internet.
Homework: Download the Stanford toolkit. You’re looking for the jar files here, people. Start playing around with them. Also, if you haven’t already, install git and ant.
- Will download and analyze tweets
- NLP analysis in layers: stream of text -> tokenization -> morphological analysis -> POS tagging -> syntactic analysis -> named entity detection -> semantic role labeling
- Stanford NLP toolkit does most of these for you. See demo and start playing around with it
- Two types of parsing: constituent (trees) and dependency (links between words)
- Most NLP tools are more engineering and statistically motivated, rather than linguistically motivated. So expect lots of counting, and not a lot of syntax
- Hooking Stanford toolkit up to Twitter stream is a bad idea, because it’s made for newspaper-type sentences.
- Amazon EC2 and cloud computing. You can rent virtual machines rather than buying a computer. Very useful if you need varying capacities, and $.05/machine-hour (extra for bandwidth).
- Really easy sentiment analysis: separate Tweets with w/ 🙂 from those w/ 🙁
- Mistakes in NLP not too worrying, as long as you make the same mistakes consistently.