Syllabus

NLTK book

Ming the Social Web 

Class Information:

CMPS 143 - Spring 2016: Introduction to Natural Language Processing

Schedule: Tuesday-Thursday 10:00 - 11:45 AM

Location: Jack Baskin room 165

 

Lab Sections:

Tuesday-Thursday 4:00 - 6:00 PM

Location: Socia Science I Mac Computer Lab - Room 135

 

Instructor:

Elahe Rahimtoroghi

Office: E2 - 255

email: elahe@soe.ucsc.edu

Office hours: Thursday 2 - 3:30 

 

TA/Tutor:

Picture of Jiaqi

Jiaqi Wu 

email: jwu64@ucsc.edu

 

Course Description:

This class introduces advanced undergraduates to the theory and practice of Natural Language Processing. We will focus on NLP programming for processing and generation of narratively structured text, such as classic stories like Aesop's Fables as well as personal narratives that can be mined on the web. CMPS 143 provides a combination of homeworks and exams targeted at learning the basics of NLP using the NLTK toolkit and other publicly available software.  Previous experience with Python is a prerequisite.

Textbook:

Natural Language Processing with Python. Available electronically and from the bookstore. Henceforth referred to as NLLP:

http://www.nltk.org/book/

We will be using NLTK 3.0 and the updated version of the online book that corresponds to it. The version of the book in the bookstore is slightly out of date wrt what is on the web. 

Additional resources:

Speech and Natural Language Processing. Jurafsky and Martin. Coursera online lectures and parts of book available online.
https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition. Has some useful stuff for getting data off the web.

 

Grading Policy:

Attendance: 5%
Homeworks and discussion in class: 45%
Project (assignments that include project, and final presentation of project during Finals slot): 25%
Midterm: 25%
Final: 25%
Homework Delivery: Turn it in on eCommons assignments. Please include any code, files, and written documents in a zip file. Written documents should be plain text or PDF only. Multiple uploads (to overwrite) are enabled. Late HW accepted until noon the next day with a 10% penalty.

 

Special Accommodations:

If you have special needs, we will accommodate you. The Disability Resource Center offers services that are confidential and free of charge.  After you contact the Disability Resource Center, bring your Accommodation Authorization form to me after class or during office hours and we will discuss your accommodations.

Student Responsibilities:

1.  Students contact the DRC to determine their eligibility for accommodations. When approved by DRC, they will receive their Accommodation Authorization form

2.  Students then notify their instructor during office hours or after class of their accommodations, and provide their instructor with their Accommodation Authorization form.

3. Please note that it is the student's responsibility to contact the instructor about their accommodations. If they do not contact their instructor, accommodations will not be made.

4.Students should submit their requests to faculty no later than 7 days before a regular exam and 14 days before a final exam.

 

 

Weekly Syllabus:

This syllabus will be updated gradually and is subject to change!

Check the Announcements frequently! 

Week 1. NLP Pipeline and Basic Text Processing with Python

Lecture 1 - March 29: Overview of the course structure, NLP pipeline, Word Counts and frequency distributions
  • HW 0 published. 
Lecture 2 - March 31: Working with data in NLTK, Tokenization, Sentence segmentation, stemming, collocations, pOS tagging, Introduction to lexical resources (WordNet)
  • HW 1 posted. 

 

Week 2:  what's beyond Words, POS Tagging, More WordNet, Statistical NLP, corpus-based NLP, Language models & N-Grams, Review of probability and Regular expressions  

Lecture 3 - April 5: More WordNet and NLTK API, lexical relations, semantic similarity, introducing statistical nLP, corpus-based Approaches, review of probability & conditional probability

 

Lecture 4 - April 7: Bayes' Theorem, Language Models, Markov assumption, N-grams, Maximum likelihood estimation, Regular expressions
  • HW 1 due tomorrow 11:55 pm!
  • Send me an email tomorrw or later (after you submit) to volunteer for HW1 discussion in class.

 

Week 3: Natural Language Understanding I: Text Classification I, Using Sentiment Lexicons, Lexical Resources.

Lecture 5 - April 12: More Regex, Uses of regex in NLP, Classifying text, Supervised classification, Getting labels, feature extraction, overfitting, Train/dev/test sets, Classifying with NLTK, Evaluation measures
  • Abhishek Grover reviewed HW 1.

 

Lecture 6 - April 14: More classification for NLP, movie reviews classification example, feature selection methods, lexical resources, extracting features from LIWC, error analysis
  • I ran code from movie_reviews.py and word_category_counter.py in class today.
  • HW 2 due tomorrow 11:55 pm!
  • Send me an email tomorrw or later (after you submit) to volunteer for HW2 discussion in class.

 

Week 4: More Classification: Feature Analysis, Error Analysis. POS Tagging As A Classification Problem.

Lecture 7 - April 19: HW2 Review in class, Naive Bayes Classifier, Bayesian Classification, maximum likelohood estimation, Smoothing, numerical stability, labov's theory of narratives, using LIWC

 

Lecture 8 - April 21 - Cancelled

 

Week 5: The Lexicon, Verbs And Their Subcategorization.Discourse & Narrative Meaning.

Lecture 9 - April 26: HW3 review in class, Supervised and Unsupervised models, different approaches for automatic POS tagging, n-Gram tagging, Introduction to SIG and Scheherazade, VerbNet, Introduction to Parsing, constituent structure, Context-Free Grammars, Ambiguity in Parsing

 

Lecture 10 - April 28: Continue Parsing, syntactic ambiguity, treebank, Probabilistic CFG, Lexicalized PCFG, Dependency grammar, dependency parse, types of parsers, shift-Reduce parsing, probabilistic dependency parsing, parsing as a classification problem, story intention graph, SIG encodings, Scheherazade annotation tool

 

Week 6: Midterm

Lecture 11 - May 3: Midterm Review
Lecture 12 - May 5: Midterm exam

 

Week 7: Natural Language Understanding II: Chunking, Sentence Structure And Parsing, Natural Language Understanding For Q&A.

Lecture 13 - May 10: HW3 held-out results, Introduction to question-answering, types of questions, iR-based factoid QA, question processing, answer type taxonomy, answer type detection approaches, assignment 6 overview, evaluation metrics for qA, qA Pipeline, Question reformulation, introduction to using syntax for QA 

 

Lecture 14 - May 12: what is syntax, using syntactic representation for question-answering, chunking, constituency parse, dependency parse, data structure for using parse trees, manipulating constituency trees, reading dependency graphs, increasing precision using syntax, stanford parser dependency structure, HW6 Stub code demo

 

Week 8:  Question Answering  II: Working With NLU Representations For Q&A.

 

Week 9: Question Answering III: Lexicons & Lexical Semantics for Q&A.

 

Week 10: Question Answering Competition & Final Exam In Class Slot.