Master2 Data Sciences

Theoretical guidelines for high-dimensional data analysis


12/10: amphi Monge, Ecole Polytechnique, 14h-18h
09/11: amphi Carnot, Ecole Polytechnique, 14h-18h
16/11: amphi Carnot, Ecole Polytechnique, 14h-18h
23/11: amphi Carnot, Ecole Polytechnique, 14h-18h
30/11: amphi Carnot, Ecole Polytechnique, 14h-18h


This course is not suited as a training for a PhD in mathematical statistics. You should instead follow the course Statistiques en grande dimension from the Master2 MDA and MSV.


Goal of the lectures: The lecture will be based on some recent research papers (How to read a paper?). The presence during the lectures is mandatory and taken into account in the final evaluation.

Paper(s)SlidesFurther reading
1False discoveries, multiple testing, online issue
paper 1 (short review) Slides
Reliability of scientific findings? Online FDR control
2Strength and weakness of the Lasso
Paper 1
Slides No free computationnal lunch
3Adaptive data analysis
Paper 1
Slides Kaggle overfiting
4Curse of dimensionality, robust PCA, theoretical limits
Paper 1 (suppl. material)
Slides Robust PCA
5Robust learning
Paper 1 Slides Learning with Median Of Means


The reports must be sent by email by February 15 in a zip file including:
- the report in pdf format (8 to 12 pages)
- the source code for the numerics