CSE 5525: Foundations of Speech and Language Processing (AU20, Wed/Fri 12:45-2:05PM, Online)

Instructor: Prof. Huan Sun

Teaching Assistants: Mr. Dingkang Wang (dk.wang19920830@gmail.com)

Level and credits: U/G, 3

Prerequisites and Co-requisites: (CSE 3521 or CSE 5521) and (CSE 5522 or Stat 3460 or Stat 3470)

Office hours (Instructor, Online): Wed 4:30PM-5:30PM

Office hours (TA, Online): Fri 4:30PM-5:30PM

All lectures and office hours will take place via Zoom. The corresponding Zoom URLs are shared on Carmen. All lectures and the instructor's office hours share the same Zoom URL.

Description

Fundamentals of natural language processing, automatic speech recognition, projects concentrating on building systems to process written language.

Grading Plan (Note: All the deadlines are 11:59PM (midnight) of the due dates. No late submissions!)

  • Participation: 10% (e.g., actively asking and answering questions in class, on Piazza, or during office hours; attending classes; etc.)
  • Homework: 50%
  • Midterm Exam: 20%
  • Final Project: 20% [Final Project Expectation (see example requirements from Dr. Greg Durrett); Start from Day 1 ;-) ]
  • No Final Exam

Textbooks

We will assign readings from two great textbooks (freely available online):

Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd Edition). We will refer to this book as "JM" in our lecture notes.

Jacob Eisenstein. Natural Language Processing. We will refer to this book as "Eisenstein" in our lecture notes.

Recommended books for reading:

Academic Integrity Policy

Academic integrity is essential to maintaining an environment that fosters excellence in teaching, research, and other educational and scholarly activities. Thus, The Ohio State University and the Committee on Academic Misconduct (COAM) expect that all students have read and understand the University’s Code of Student Conduct, and that all students will complete all academic and scholarly assignments with fairness and honesty. Students must recognize that failure to follow the rules and guidelines established in the University’s Code of Student Conduct and this syllabus may constitute “Academic Misconduct.” For more info, click here.

Anonymous Feedback/Comments/Suggestions?

Feel free to leave any comments and suggestions about how the instructor/grader can do better to help you learn this course, such as whether the lectures are clear, examples are helpful, questions are answered timely, etc. Please check this anonymous form. Your input is highly appreciated! ;-)

Course Syllabus and Schedule (subject to updates)

Week Date Topic Assignment Out Assignment Due Lecture Notes
1 08/26 Class Outline + Introduction Assignment 1 & Final Project Expectation on Team Diversity Chapter 1 (Jurafsky and Martin)
1 08/28 Machine Learning (Binary Classification) classification notes, Eisenstein 2.0-2.6 (Algorithm 5), 4.2-4.4.1, JM 4, JM 5.0-5.5
2 09/02 Machine Learning (Binary/Multiclass Classification) JM 5.6, Eisenstein 4.2, structured SVM secs 1-2
2 09/04 Machine Learning (Multiclass Classification)
3 09/09 Sequence Labeling 1: HMMs Assignment 2 Eisenstein 7.0-7.4 (Alg. 11), 8.1, JM 8, Viterbi algorithm lecture note
3 09/11 Sequence Labeling 1: HMMs&Sequence Labeling 2: CRFs Assignment 1 Due Sutton CRFs 2.3, 2.6.1, Eisenstein 7.5, 8.3,Wallach CRFs tutorialIllinois NER
4 09/16 Sequence Labeling 2: CRFs Sutton CRFs 2.3, 2.6.1, Eisenstein 7.5, 8.3,Wallach CRFs tutorialIllinois NER
4 09/18 NN1: Feedforward + Word embeddings For Feedforward NNs: Eisenstein 3.0-3.3; Goldberg 1-4, 6; ffnn_example.py; For Word embeddings: Eisenstein 3.3.4, 14.5-14.6, JM 6, Goldberg 5, word2vec, GloVe
5 09/23 NN2: RNNs JM 9.1-9.4, Goldberg 10-11
5 09/25 NN4: Language Modeling and Pretraining(Guest Lecture: Xiang Deng) Eisenstein 6, JM 9.2.1, ELMo, BERT, Frozen or fine-tuned
6 09/30 Seq2seq 1 + semantic parsing Assignment 3 seq2seq, [Jia and Liang, 2016]
6 10/02 Seq2seq 2 (attention) Attention, [Luong Attention], Transformer
7 10/07 Syntactics 1: Constituency, PCFGs Assignment 2 Due JM 12.1-12.6, 12.8, JM 14.1-14.4, Eisenstein 10.0-10.5
7 10/09 Syntactics 2: Dependency 1 Eisenstein 11.1-11.2, JM 13.1-13.3, 13.5
8 10/14 Midterm Exam (No Class) Take-home
8 10/16 Syntactics 3: Dependency Parsers JM 15.1-15.4
9 10/21 Semantics Final Project Proposal Due
9 10/23 Question Answering 1
10 10/28 Question Answering 2
10 10/30 HW and Midterm Discussion and Brief Pytorch Tutorial
11 11/04 Dialogue
11 11/06 Dialogue & Information Extraction Assignment 3 Due
12 11/11 No Class
12 11/13 Information Extraction & Pre-trained language models
13 11/18 Pre-trained language models & Machine translation
13 11/20 Speech recognition
14 11/25 Summary + Ethics in NLP
14 11/27 No Class
15 12/02 Final project presentations
15 12/04 Final project presentations

Course materials were largely adapted from Prof. Greg Durrett at UT Austin and were based on previous offerings by Prof. Alan Ritter, Prof. Wei Xu, and Prof. Eric Fosler-lussier at OSU. Many thanks to them and to people who helped them develop the course.