Next Meeting: Tentatively on 4/6/13 at Rockville Library, 1-3. There’ll hopefully be some decent internet.
- Will download and analyze tweets
- NLP analysis in layers: stream of text -> tokenization -> morphological analysis -> POS tagging -> syntactic analysis -> named entity detection -> semantic role labeling
- Stanford NLP toolkit does most of these for you. See demo and start playing around with it
- Two types of parsing: constituent (trees) and dependency (links between words)
- Most NLP tools are more engineering and statistically motivated, rather than linguistically motivated. So expect lots of counting, and not a lot of syntax
- Hooking Stanford toolkit up to Twitter stream is a bad idea, because it’s made for newspaper-type sentences.
- Amazon EC2 and cloud computing. You can rent virtual machines rather than buying a computer. Very useful if you need varying capacities, and $.05/machine-hour (extra for bandwidth).
- Really easy sentiment analysis: separate Tweets with w/ 🙂 from those w/ 🙁
- Mistakes in NLP not too worrying, as long as you make the same mistakes consistently.