Statistics for Linguistics Research
CUNY Graduate Center – Fall 2025
- Instructor: Prof. Spencer Caplan
- Practicum leader: Matt Malone
- Lecture: Monday 4:15-6:15, GC 7314
- Practicum: Thursday 11:45-1:45, GC 7395
- Office hours: Thursdays 3:15-4:15, GC 7400.02 and by appt
Synopsis
This class provides an introduction to statistical and quantitative data analysis from various areas in linguistics research. Topics covered include probability, descriptive and inferential statistics, hypothesis testing, analyses of variance, regression models (linear, logistic, and mixed-effects), and approaches to corpus and experimental data. Students will learn to use the R statistical environment and a wide variety of methods for data wrangling, visualization, and statistical inference, and will gain experience with best practices for clearly and fairly reporting results. Emphasis will be placed on developing statistical reasoning, understanding the assumptions behind common tests, and critically evaluating methods in the (psycho)linguistics literature.
Objectives
By the end of the course students will be able to:- Use R to import, organize, and manipulate linguistic data.
- Summarize and visualize data using appropriate techniques (e.g., ggplot2, descriptive statistics).
- Select and apply suitable statistical tests (e.g., t-tests, chi-square, ANOVA, linear/logistic regression, mixed-effects models) to linguistic research questions.
- Interpret and report statistical results clearly, rigorously, and reproducibly (e.g., effect sizes, confidence intervals, assumptions).
- Critically evaluate statistical methods and claims in published linguistics research.
- Apply statistical reasoning to the design and analysis of both experimental and corpus-based studies.
Materials
Readings will be assigned throughout the term and posted to the course schedule, although there will be no official "textbook." Readings are intended to provide additional background and details to the lectures, and as such, students can choose to consume the readings either before or after the associated lecture depending on their personal preferences.
Assignments
Regular assignments will take the form of small software projects for which you will submit valid R code to solve some given problem. We will use Slack for assignment turn-in (messaging Spencer with your submission).
There will be one in-class (pencil and paper) midterm during the middle of the semester (exact timing and details to follow).
The final assignment will be an open-ended project which will involve statistical analysis and visualization of some linguistic phenomenon using existing experimental or corpus data Students are encouraged to conceive of projects relevant to their research interests. Students should discuss project plans with the instructor during office hours to confirm that it is both feasible and of appropriate scope. Because of the open-ended nature of the final assignment, unit tests will not be provided.
Grading
20% students' grades will be derived from the in-class midterm(s); 30% will be from the open-ended final project; 30% will be from the regular R assignments; and the remaining 20% will be reserved for participation and attendance. Assignments must be submitted on time or will receive a 0 grade (barring a documented emergency).
Accommodations
The instructor will attempt to provide all reasonable accommodations to students upon request. If you believe you are covered under the Americans With Disabilities Act, please direct accommodations requests to Vice President for Student Affairs Matthew G. Schoengood.
Attendance
Students are extended to attend all lectures and practica (in person). There will in general be no accomodation to attend class online. However, students who have reason to believe they may be contagious for COVID-19 or other infectious diseases should stay at home if they are feeling unwell and contact the instructor. Other absences will not be excused, and the instructor reserves the right to tie grades to attendance records. The instructor and practicum leader are not responsible for reviewing materials missed to absence.
Integrity
In line with the Student Handbook policies on plagiarism, students are expected to complete their own work. However, a student is permitted to collaborate with another student during the coding phase of an assignment so long as they: do not share lines of code with each other, mutually disclose their collaboration in their write-ups, and do not collaborate at all on their write-ups.
The general ethos of the integrity policy is that actions which shortcut the learning process are forbidden while actions which promote learning are encouraged. Studying lecture notes together, for example, provides an additional avenue for learning and is encouraged. Using a classmate’s solution to a homework or prompting an LLM about the assignment, however, is prohibited because it avoids the learning process entirely. If you have any questions about what is or is not permissible, please contact your instructor.
The instructor reserves the right to refer violations to the Academic Integrity Officer.
Respect
For the sake of the privacy, students are asked not to record lectures. Students are expected to be considerate of their peers and to treat them with respect during class discussions.
Schedule
(The schedule is still being finalized and will be posted here when available.)
W0 | Date | Due | Class | Topics | Reading | |
M | 9/1 | No class (Labor Day) | ||||
R | 9/4 | No practicum until after first lecture | ||||
W1 | ||||||
M | 9/8 | Lect. | Course Intro; Descriptive Statistics Lp norm |
Navarro Ch. 1 Ch. 2 |
||
R | 9/11 | Prac. | First practice with R | Navarro Ch. 3 | ||
W2 | ||||||
M | 9/15 | Lect. | Probability & Statistics; Variables and Likelihood; Binomial Distribution |
Navarro Ch. 9 | ||
R | 9/18 | Prac. | Data Frames | Winter (1.5-2.6) | ||
W3 | ||||||
M | 9/22 | Lect. | No classes at the GC (Rosh Hashanah) | |||
R | 9/25 | Prac. | More Data Frames; dplyr; RMarkdown | |||
W4 | ||||||
M | 9/29 | Lect. | Sampling Distributions; Confidence Intervals |
|||
W | 10/1 | HW1 Due | ||||
R | 10/2 | Prac. | No classes at the GC (Yom Kippur) | |||
W5 | ||||||
M | 10/6 | Lect. | Discrete & Continuous data; t-distribution |
Navarro Ch. 10 | ||
R | 10/9 | Prac. | Visualization; ggplot2; Handout; Markdown |
Wickham (2010); Wickham (2011) |
||
W6 | ||||||
M | 10/13 | GC Closed (Columbus Day) | ||||
T | 10/14 GC on a Monday schedule |
Lect. | Hypothesis testing | Navarro Ch. 11 | ||
R | 10/16 | Prac. | Data wrangling (Part I) | |||
F | 10/17 | HW2 Due | ||||
W7 | ||||||
M | 10/20 | GC Closed (Diwali) | ||||
R | 10/23 | Prac. | Data wrangling (Part II) | |||
F | 10/24 GC on a Monday schedule |
Midterm (in class) | ||||
W8 | ||||||
M | 10/27 | Lect. | ||||
R | 10/30 | Prac. | ||||
W9 | ||||||
M | 11/3 | Lect. | ||||
R | 11/6 | Prac. | ||||
F | 11/7 | HW3 Due | ||||
W10 | ||||||
M | 11/10 | Lect. | ||||
R | 11/13 | Prac. | ||||
W11 | ||||||
M | 11/17 | Lect. | ||||
R | 11/20 | Prac. | ||||
W12 | ||||||
M | 11/24 | Lect. | ||||
R | 11/27 | No class (Thanksgiving) | ||||
W13 | ||||||
M | 12/1 | Lect. | ||||
R | 12/4 | Prac. | ||||
W14 | ||||||
M | 12/8 | Lect. | ||||
R | 12/11 | Prac. | (This day *might* be adjusted) (I'll update with plenty of notice when certain) |
|||
W15 | ||||||
M | 12/15 | Lect. | ||||
R | 12/18 | No class (Reading Days) | ||||
M | 12/22 | Final project due (Official end to the GC Fall Semester) |