# Mathematical Statistics

## Course information

**Instructor**: Jonathan Niles-Weed (jnw@cims.nyu.edu), Office hours: Friday 11 am-12 pm, zoom

**Teaching Assistants**:

- Haoxiang Huang (recitation leader), Office hours: Tuesday 10 am-11 am, zoom
- Aram-Alexandre Pooladian (grader)

#### Lecture

Friday 2-3:40 PM, 60 5th Avenue 150

#### Recitation

Friday 4:55-5:45 PM, GCASL 475

*Note*: In accordance with NYU policies, in-person attendance is expected
for Fall 2021 classes.
Until the end of the add/drop period (September 15), we will be be making recordings of the lectures and recitations available upon request.

#### Piazza

For announcements and questions, please sign up on Piazza.

### Description

The goal of this course is to develop mathematical tools for analyzing statistical procedures.

### Prerequisites

Probability, linear algebra, mathematical maturity (comfort with proofs)

### Books

While there is no required textbook, a portion of the material for this class
will be drawn from *All of Statistics* by Larry Wasserman.

This will be rigorous where possible, but several mathematically tricky issues will be ignored. A more rigorous treatment of some of the topics we cover is available in:

*Theoretical Statistics*by Robert W. Keener*Mathematical Statistics*by Jun Shao

All three books are available for free via the links above with NYU credentials.

### Lectures

Lecture notes will be written for each week’s lecture. Do not print them out, since they will be updated continuously throughout the semester. We will follow the following approximate schedule.

**Unit 1: Asymptotics and non-asymptotics**

- Week 1: Concentration inequalities
- Week 2: Maximal inequalities and uniform convergence
- Week 3: Asymptotics

**Unit 2: Classical statistical tasks**

- Week 4: Statistical modeling (regression, classification, and clustering)
- Week 5: Estimation
- Week 6: Testing
- Week 7:
**Midterm exam** - Week 8: Regularization

**Unit 3: Applications and extension**

- Week 9: Monte-Carlo methods
- Week 10: Model selection and cross-validation
- Week 11: Non-parametric statistics
- Week 12: Causal inference
- Week 13: Bayesian statistics
- Week 14: High-dimensional regression

### Homeworks

The homework assignments will be drawn from the list of exercises at the end of each chapter of the lecture notes.
Homeworks are due Thursdays at **11:59 pm Eastern time** via Gradescope (entry code: 3Y236X).
Each student may request one homework extension over the course of the semester, with no excuse necessary.
Further requests will not be considered.

You may work with other students, however you **must** a) write
solutions to the homework yourself and b) list the names of the students you
collaborated with.
If you consult any other sources (printed or online), you **must** cite those
in your homework as well.
Any violation of these policies will be considered cheating.

- HW 1 (due 9/9): Chapter 1, Exercises 1-6
- HW 2 (due 9/16): Chapter 2, Exercises 1-4
- HW 3 (due 9/23): Chapter 3, Exercises 1-5
- HW 4 (due 9/30): Chapter 4, Exercises 1-4
- HW 5 (due 10/7): Chapter 5, Exercises 1-5
- HW 6 (due 10/28): Chapter 7, Exercises 1-4
- HW 7 (due 11/4): Chapter 8, Exercises 1-4
- HW 8 (due 11/11): Chapter 9, Exercises 1-3
- HW 9 (due 11/18): Chapter 10, Exercises 1-3

## Grading

40% Homework + 30% Midterm + 30% Final project

### Homework

There will be approximately 10 homework assignments over the course of the semester. The lowest homework score will be dropped.

### Exams

There will be an in-class midterm exam on October 15

### Final Project

In lieu of a final exam, this course will have a final project involving reading and summarizing a recent paper (or papers) of statistical interest.
Working in **groups of at most 2**, your job is to: a) summarize the main idea/question of the paper, connecting it to ideas we’ve discussed in this course and
b) carefully explain one part of the paper (by giving full details for one of the proofs, recreating an experiment, etc.)

Write-ups should be 5-10 pages, and are due on the last day of the semester (Dec 14).

You may choose any paper you wish, but you may wish to discuss it with me first to make sure it is of sufficient quality. The following example papers may be a good place to start:

- “Convexity, Classification, and Risk Bounds” (Bartlett, Jordan, McAuliffe; JASA 2006)
- “Robust Estimation of a Location Parameter” (Huber; Annals of Stats 1964)
- “Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming” (Belloni, Chernozhukov, Wang; Biometrika 2011)
- “Honest variable selection in linear and logistic regression models via ℓ1 and ℓ1 + ℓ2 penalization” (Bunea; EJS 2008)
- “Universal inference” (Wasserman, Ramdas, Balakrishnan; PNAS 2020)
- “Controlling the false discovery rate via knockoffs” (Barber, Candes; Annas of Stats 2015)
- “Understanding Black-box Predictions via Influence Functions” (Koh, Liang; ICML 2017)
- “Inference for Empirical Wasserstein Distances on Finite Spaces” (Sommerfeld, Munk; JRSS-B 2018)

### Cheating

NYU policy prescribes strong punishments for students caught cheating. The course staff will be carefully monitoring assignments and exams for signs of academic dishonesty.