Insegnamento a.a. 2024-2025

30677 - MACHINE LEARNING (MODULE I - INTRODUCTION)

Department of Decision Sciences

Course taught in English

code 30677 ‘Machine learning (Module I - Introduction)’ and code 30678 ‘Machine learning (Module II - Deep learning)’ are respectively the first and the second module of the course code ‘Machine learning'

Class timetable
Exam timetable

Go to class group/s: 45

BIG (5 credits - II sem. - OB | SECS-S/01)

Course Director:
OMIROS PAPASPILIOPOULOS

Classes: 45 (II sem.)

Instructors:
Class 45: OMIROS PAPASPILIOPOULOS

Suggested background knowledge

Preliminaries to the course are basic probability (at the level of Chapter 6 of the first book mentioned above), very basic calculus and linear algebra, and computing with Python at the level obtained at the course in the first semester. Additionally, this course is in conversation with the concurrent course in Data Analytics and each course will benefit from concepts developed earlier in the other.

Mission & Content Summary

MISSION

The course provides a hands-on introduction to Statistical Machine Learning, the priority is on the implementation of algorithms and the illustration of the ideas using practical examples. All coding is done using Python, in particular numpy and sklearn modules, within jupyter-notebooks. Note that most of the fundamental code used in the course will be provided to the students. The mathematical aspects of statistical machine learning are kept to the minimum.

CONTENT SUMMARY

The course is organized along the following themes:

1. Introduction
- presentation of the goals of the course; statistics vs machine learning, data science vs artificial intelligence; supervised vs unsupervised machine learning - some case studies and some toy data sets
- overview of supervised learning by showcasing prediction results and challenges on the case studies and toy examples
- models for machine learning; loss functions; learning as an optimization problem

2. Predictive modelling pt 1
- a basic linear model; learning as a least squares problem; illustrations on case studies and toy examples
- feature engineering pt 1; models of increasing complexity; evaluating predictive performance pt 1

3. Preprocessing
- categorical (input/output) variables, transformations, basis functions, data splits
- case study: predicting with text and images
- dealing with missing data

4. Predictive modelling pt 2
- bias-variance tradeoff; best subset selection and the lasso
- optimizing hyperparameters: cross-validation
- classification pt 1: main concepts and algorithms
- classification pt 2: measuring performance and multiclass
- case studies

5. Smooth lines and curves
- regression splines

6. Predictive modelling pt 3
- regression and classification trees: concepts, interpretations and training algorithms
- bagging, random forests, and ensemble methods

7. Network data and algorithms
- introduction to networks
- network statistics and connectivity properties
- visualization and community detection
- basic models for networks

Intended Learning Outcomes (ILO)

KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

understand basic predictive algorithms
disinguish between prediction and causal inference
appreciate what are missing data and how to deal with them
identify network structures
analyze network data

APPLYING KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

build predictive algorithms
evaluate predictive performance
smooth one and two dimensional data
carry out network analytics

Teaching methods

Lectures
Practical Exercises
Collaborative Works / Assignments

DETAILS

practical exercises in terms of applying algorithms on real and synthetic datasets
collaborative work in terms of a group project on either predictive analytics or network analytics

Assessment methods

	Continuous assessment	Partial exams	General exam
Written individual exam (traditional/online)		x	x
Collaborative Works / Assignment (report, exercise, presentation, project work etc.)	x

Teaching materials

ATTENDING AND NOT ATTENDING STUDENTS

There will be slides provided for the methodological part of the course and they will form an important part of the reading material for understanding the main concepts. In terms of book references, one is

https://press.princeton.edu/books/hardcover/9780691222271/quantitative-social-science

which is used in Data Analytics but will also be relevant here. An advanced but very classic textbook is

https://link.springer.com/book/10.1007/978-0-387-21606-5

which is also freely available online. Note, however, that the mathematical level of the course is much more elementary than this book. Additionally, this book:

https://www.deeplearningbook.org/

which is also freely available online and can be consulted.

Last change 29/11/2024 14:56