30607 - FOUNDATIONS OF DATA SCIENCE
Course offered to incoming exchange students
Department of Decision Sciences
OMIROS PAPASPILIOPOULOS
Suggested background knowledge
Mission & Content Summary
MISSION
CONTENT SUMMARY
For a video presentation of the course objectives, deliverables and the overall student experience see here:
http://datasciencebocconi.github.io/foundations.html
Part A: The basics
+ Intro to course and case studies
+ (Less) Basic Python programming
+ Basic data management and visualization with Python
+ Messy data and feature engineering
Part B: Predictive modelling
+ Fundamentals: supervised learning and optimization
+ Lasso regression
+ Classification
+ Representational learning: trees, bagging and boosting
Part C: Causal inference and machine learning
+ Elements of causal inference
+ Predictive models for causal inference with observational data
Part D: Projects
Intended Learning Outcomes (ILO)
KNOWLEDGE AND UNDERSTANDING
- define data analysis methodology
- carry out basic data warehousing to represent, visualize and transform data
- build, train and evaluate machine learning models and algorithms
- Integrate machine learning with uncertainty quantification and basic causal inference
- develop models, algorithms and code
- understand the fundamental machine learning methodologies
APPLYING KNOWLEDGE AND UNDERSTANDING
- apply appropriate data analysis methodologies
- choose appropriate machine learning algorithms and evaluate their performance
- produce measures of uncertainty associated with the statistical learning
- carry out causal inference using appropriate assumptions and algorithms
- develop and adapt Python code for all the above tasks
Teaching methods
- Face-to-face lectures
- Exercises (exercises, database, software etc.)
- Case studies /Incidents (traditional, online)
- Individual assignments
- Group assignments
- Interactive class activities on campus/online (role playing, business game, simulation, online forum, instant polls)
DETAILS
Combination of 5 basic approaches:
0. Videos distributed before course that review background knowledge in Statistics, computing and Python
1. few lectures on the foundations of the methodology
2. most of the lectures are based on jupyter notebooks where models and algorithms are illustrated directly on data and the students can interact with the code
3. guided project sessions
4. TA sessions on more practical coding aspects
Assessment methods
Continuous assessment | Partial exams | General exam | |
---|---|---|---|
|
x | ||
|
x | ||
|
x |
ATTENDING AND NOT ATTENDING STUDENTS
- 9/31 of the mark is on the basis of exercises given at the end of each theme and correspond to the guided project sessions.
- 13/31 of the mark is for a group project for Part B, done in groups of 4. This will take the form of a hackathlon managed through the Bocconi Data Science Challenges Platform.
- 9/31 of the mark is based on an individual final exam
The student (attending or not) should submit the projects within the deadlines specified in the blackboard announcements otherwise their submission will be invalid.
For non-attending students it is their responsibility to join/create groups for the group assignement, and individual submission for a group project will also be invalid
Teaching materials
ATTENDING AND NOT ATTENDING STUDENTS
0. Videos distributed before the course
1. Jupyter notebooks
2. Lecture notes
Suggested references:
0. Art of Statistics
https://www.amazon.it/Art-Statistics-Learning-Data/dp/0241398630
This is an excellent book for understanding modern Statistics and it can serve as a preparation before starting the course
The following three books can be used to understand deeper the machine learning methods we will cover
1. Elements of Statistical Learning
https://www.amazon.it/Elements-Statistical-Learning-Inference-Prediction/dp/0387848576/ref=sr_1_1?adgrpid=54230735724&gclid=Cj0KCQjw-daUBhCIARIsALbkjSZOMjFXZB-g4Nbo7ccbC7-1-2vbv4NqoVYrCnkuIDKD94LaTcmy-OsaAk3sEALw_wcB&hvadid=255139979982&hvdev=c&hvlocphy=1008463&hvnetw=g&hvqmt=e&hvrand=3531467951480362546&hvtargid=kwd-299792246878&hydadcr=18578_1822585&keywords=elements+of+statistical+learning&qid=1654013448&sr=8-1
2. Pattern recognition and machine learning
https://www.amazon.it/Pattern-Recognition-Machine-Learning-Christopher/dp/0387310738
3. Deep Learning
https://www.deeplearningbook.org/
Parts of the course will also be based on the forthcoming book:
5. Veridical Data Science
The Practice of Responsible Data Analysis and Decision Making
6. There will be references to certain articles. The following four are particularly relevant for the aims of this course:
+ Statistical Modeling: The Two Cultures (2001) by Leo Breiman
+ Prediction, Estimation and Attribution (2020) by Brad Efron
+ Statistics in the big data era: Failures of the machine (2018) by
David Dunson
+ 50 years of Data Science (2017) by David Donoho