Methods in Computational Linguistics I
CUNY Graduate Center – Fall 2025
- Instructor: Prof. Spencer Caplan
- Practicum leader: Mei Ying Ki
- Lecture: Monday 11:45-1:45, GC 7314
- Practicum: Wednesday 2:00-4:00, GC 7314
- Office hours: Thursdays 3:15-4:15, GC 7400.02 and by appt
Synopsis
This course is the first of a two-semester series introducing principles of computation and modern software development. The intended audience is students interested in speech and language processing technologies, though the materials will be beneficial to all language researchers.
Objectives
Using the Python programming language, students will be able to write programs which count the frequencies of various linguistic phenomena in text. They will be able to process text stored in various structured data formats. They will come to understand how computers encode multilingual text. They will learn the basic principles of command-line design and master regular expressions.
Materials
Readings will be assigned throughout the term and posted to the course schedule, although there will be no official "textbook." Students are strongly encouraged to bring a laptop computer to the lecture and practicum. Students are also welcome to use the Computational Linguistics Laboratory (7400.13) for practice and assignments.
Assignments
Regular assignments will take the form of a small software development projects accompanied by a write-up describing the general approach taken and any challenges encountered. Students will usually be able to verify the technical correctness of their code by running a provided unit test. Students will also be graded on the readability of their code, and the quality of the write-up. We will use GitHub for assignment turn-in. You will receive a link either in-class or via email which will generate your GitHub repo for each assignment.
There will be an in-class (pencil and paper) midterm during the middle of the semester (details to follow towards the beginning of the semester).
The final assignment will be an open-ended project which will involve collecting basic statistics (e.g., counts) of some linguistic phenomenon from either raw text or structured data. Students are encouraged to conceive of projects relevant to their research interests. Students should discuss project plans with the instructor during office hours to confirm that it is both feasible and of appropriate scope. Because of the open-ended nature of the final assignment, unit tests will not be provided.
Grading
50% of students' grades will be derived from the regular assignments; 15% will be from the in-class midterm; 15% will be from the open-ended final assignment; and the remaining 20% will be reserved for participation and attendance. Assignments must be submitted on time or will receive a 0 grade (barring a documented emergency).
Accommodations
The instructor will attempt to provide all reasonable accommodations to students upon request. If you believe you are covered under the Americans With Disabilities Act, please direct accommodations requests to Vice President for Student Affairs Matthew G. Schoengood.
Attendance
Students are extended to attend all lectures and practica (in person). There will in general be no accomodation to attend class online. However, students who have reason to believe they may be contagious for COVID-19 or other infectious diseases should stay at home if they are feeling unwell and contact the instructor. Other absences will not be excused, and the instructor reserves the right to tie grades to attendance records. The instructor and practicum leader are not responsible for reviewing materials missed to absence.
Integrity
In line with the Student Handbook policies on plagiarism, students are expected to complete their own work. However, a student is permitted to collaborate with another student during the coding phase of an assignment so long as they: do not share lines of code with each other, mutually disclose their collaboration in their write-ups, and do not collaborate at all on their write-ups.
The general ethos of the integrity policy is that actions which shortcut the learning process are forbidden while actions which promote learning are encouraged. Studying lecture notes together, for example, provides an additional avenue for learning and is encouraged. Using a classmate’s solution to a homework or prompting an LLM about the assignment, however, is prohibited because it avoids the learning process entirely. If you have any questions about what is or is not permissible, please contact your instructor.
The instructor reserves the right to refer violations to the Academic Integrity Officer.
Respect
For the sake of the privacy, students are asked not to record lectures. Students are expected to be considerate of their peers and to treat them with respect during class discussions.
Links and references
- Bird, S., Klein, E. and Loper, E. n.d. Natural Language Processing with Python. URL: https://www.nltk.org/book/.
- Church, K. W. No date. Unix™ for poets. URL: http://doc.cat-v.org/unix/for-poets/kwc-unix-for-poets.pdf.
- Joyner, D. 2016. Introduction to Computing. McGraw-Hill Education.
- Jurafsky, D., and Martin, J. H. 2009. Speech and Language Processing. 2nd edition. Pearson. (See also: 3rd edition draft.)
- Kuchling, A. 2007. Python's dictionary implementation: being all things to all people. In A. Oram and G. Wilson (ed.), Beautiful Code, pages 293-301. O'Reilly.
- Shaw, Z. A. 2017. Learn Python 3 the Hard Way: A Very Simple Introduction to the Terrifyingly Beautiful World of Computers and Code. Addison-Wesley Professional.
Schedule
W0 | Date | Due | Class | Topics | Slides | Reading |
W | 8/27 | Lect. | Syllabus; CS Principles |
Slides-L1 | ||
W1 | ||||||
M | 9/1 | HW0 due |
No class (Labor Day) | Bird §1 Joyner §1 Shaw Preface |
||
W | 9/3 | Prac. | First practice Jupyter notebooks |
Notebook-P1 (solution) |
Joyner §2 Shaw §1-14 |
|
W2 | ||||||
M | 9/8 | HW1 due |
Lect. | Literals vs. Variables; Control flow (start) |
Slides-L2 | Joyner §3-3.3 Shaw §27-33 |
W | 9/10 | Prac. | Notebook-P2 (solution) |
|||
W3 | ||||||
M | 9/15 | Lect. | More Control Flow; Indexing |
Slides-L3 | Joyner §4.2-4.3 Shaw §34 Shaw §36-38 |
|
W | 9/17 | HW2 due |
Prac. | Notebook-P3 (solution) |
||
W4 | ||||||
M | 9/22 | No classes at the GC (Rosh Hashanah) | ||||
W | 9/24 | No classes at the GC (Rosh Hashanah) | ||||
W5 | ||||||
M | 9/29 | Lect. | Functions | Slides-L4 | Joyner §3.4 | |
W | 10/1 | No classes at the GC (Yom Kippur) | ||||
F | 10/3 | HW3 due |
||||
W6 | ||||||
M | 10/6 | Lect. | File I/O; Modules; Function Stubs |
Slides-L5 | Joyner §4.4 Shaw §15-17 Bird §3-3.2 |
|
W | 10/8 | Prac. | Notebook-P4 (solution) |
|||
W7 | ||||||
M | 10/13 | HW4 due |
No classes at the GC (Columbus Day) | |||
T | 10/14 GC on a Monday schedule |
Lect. | Text encoding; normalization; Basic Unix |
Slides-L6 | Bird §3.3 Gorman Spolsky chardet unicodedata |
|
W | 10/15 | Prac. | Notebook-P5 | |||
W8 | ||||||
M | 10/20 | No classes at the GC (Diwali) | ||||
W | 10/22 | Prac. | Git; GitHub |
Chacon & Straub ch. 1.1-3.2 6.1-6.3 |
||
F | 10/24 GC on a Monday schedule |
Midterm (in class) | ||||
W9 | ||||||
M | 10/27 | Lect. | Searching; Sorting |
Joyner §5.2 | ||
W | 10/29 | Prac. | ||||
W10 | ||||||
M | 11/3 | HW5 due |
Lect. | More Sorting; Comprehensions |
No new readings |
|
W | 11/5 | Prac. | ||||
W11 | ||||||
M | 11/10 | Lect. | BSTs; Dictionaries; Hash-Based Containers |
|||
W | 11/12 | Prac. | ||||
W12 | ||||||
M | 11/17 | HW6 due |
Lect. | Regular expressions; More Command-Line; Modules (time-permitting) |
||
W | 11/19 | Prac. | Structured Data | |||
W13 | ||||||
M | 11/24 | Term paper specs |
Lect. | ArgParse; Classes; OOP |
||
W | 11/26 | HW7 due |
We won't hold class the day before Thanksgiving | |||
W14 | ||||||
M | 12/1 | Lect. | More OOP; "Blackjack"; Recursion |
|||
W | 12/3 | HW8 due |
Prac. | |||
W15 | ||||||
M | 12/8 | Lect. | NLTK Abstract Data Types (ADTs) |
|||
W | 12/10 | Prac. | ||||
W16 | ||||||
M | 12/15 | Lect. | We always end with something fun :) | |||
W | 12/17 | No class (Reading Days) -- Work on Final Projects | ||||
M | 12/22 | Term paper due (Official end to the GC Fall Semester) |