STAT 992: Science of Large Language Models
Term: Spring 2026
Meeting time: Tue/Thu 4:00–5:15pm, from Jan 20 to Mar 8
Location: Morgridge Hall 2538, 1205 University Avenue, Madison
Instructor: Yiqiao Zhong
Email: MyFirstName dot MyLastName at wisc doc edu
Q&A: Canvas Discussion Page
Quick links
Announcements
Feb 11, 2026
I will be giving a talk “Do LLMs reason as we do? A synthetic study of transformers’ learning dynamics for compositions” at Machine Learning Lunch and Meetings (MLLM) on Feb 17, 2026, starting from 12:15 pm at Morgridge Hall 7560.
Jan 29, 2026
One exciting local event AI Meets Society (AIMS) Symposium is scheduled on February 21st, 2026. Many great faculty and researchers will share perspectives on the future of AI. Consider registering NOW!
Jan 27, 2026
We are slightly changing the class format: at each meeting, we will now begin with a discussion of the previous lecture, followed by the new lecture.
Jan 20, 2026
Welcome! Please check out the schedule and add your info to the Google doc.
Course description
This is a new topic course focusing on interpretability and understanding the internal mechanisms of large language models (LLMs). Since GPT-3, LLMs are rapidly advancing in their capabilities, yet we don’t have a good understanding of how they operate, and why they work or not work. We will cover the fundamentals of LLMs, new phenemena, mathematical structures, statistical techinques, and their applications in sciences.
The goal is to bring interactions across different departments (Stats, CS, ECE, BMI, math, etc.). This is a one-credit, seven-week long course with minimal workload. We will meet twice each week, and each meeting will involve a mix of lectures and discussions.
A tentative outline of the course is the following; see Schedule for details.
- Week 1–2: Emergent phenomena in LLMs: basics of transformers, emergent abilities and grokking, prompting and in-context learning, out-of-distribution generalization, induction heads, chain-of-thought reasoning.
- Week 3–4: Mathematical structures of LLMs: linear representation hypothesis, feature superposition, sparsity and low-rankness in embeddings, layerwise analysis, near orthogonal representation.
- Week 5–6: Statistical techniques for LLMs: PCA and factor analysis, dictionary learning (SAE), causal tracing and circuits, leave-one-out, influence functions.
- Week 7: Case studies in domain applications: genomics foundation models, watermarking, memorization and copyrights.
Logistics
- Class format: Each meeting consists of two parts: (i) a short lecture (mostly by myself) that introduces the bascis of a topic in LLMs, (ii) 20–30 mins in-class discussions led by two participants.
- In-class discussions: The participants need to prepare a few slides to initiate the discussions. The participants are encouraged to use their own research background and knowledge to provide complementary perspectives or critiques of the lectures; they are also welcome to raise concerns about LLMs or share preliminary research ideas.
- Grading: No homework, exam, or course projects. Grading is based on class participation and in-class discussions.
- Additional questions: Please use the Canvas page to ask addtional questions outside of classes.