Insegnamento a.a. 2024-2025

30677 - MACHINE LEARNING (MODULE I - INTRODUCTION)

Department of Decision Sciences

Course taught in English
Go to class group/s: 45
BIG (5 credits - II sem. - OB  |  SECS-S/01)
Course Director:
OMIROS PAPASPILIOPOULOS

Classes: 45 (II sem.)
Instructors:
Class 45: TO BE DEFINED


Suggested background knowledge

Preliminaries to the course are basic probability (at the level of Chapter 6 of the first book mentioned above), very basic calculus and linear algebra, and computing with Python at the level obtained at the course in the first semester. Additionally, this course is in conversation with the concurrent course in Data Analytics and each course will benefit from concepts developed earlier in the other.

Mission & Content Summary

MISSION

The course provides a hands-on introduction to Statistical Machine Learning, the priority is on the implementation of algorithms and the illustration of the ideas using practical examples. All coding is done using Python, in particular numpy and sklearn modules, within jupyter-notebooks. Note that most of the fundamental code used in the course will be provided to the students. The mathematical aspects of statistical machine learning are kept to the minimum.

CONTENT SUMMARY

The course is organized along the following themes:

1. Introduction
   - presentation of the goals of the course; statistics vs machine learning, data science vs artificial intelligence; supervised vs unsupervised machine learning - some case studies and some toy data sets
   - overview of supervised learning by showcasing prediction results and challenges on the case studies and toy examples 
   - models for machine learning; loss functions; learning as an optimization problem

2. Predictive modelling pt 1
   - a basic linear model; learning as a least squares problem; illustrations on case studies and toy examples
   - feature engineering pt 1; models of increasing complexity; evaluating predictive performance pt 1

3. Preprocessing
   - categorical (input/output) variables, transformations, basis functions, data splits
   - case study: predicting with text and images
   - dealing with missing data

4. Predictive modelling pt 2
   - bias-variance tradeoff; best subset selection and  the lasso 
   - optimizing hyperparameters: cross-validation 
   - classification pt 1: main concepts and algorithms
   - classification pt 2: measuring performance and multiclass
   - case studies

5. Smooth lines and curves   
   - regression splines
   
6. Predictive modelling pt 3
   - regression and classification trees: concepts, interpretations and training algorithms
   - bagging, random forests, and ensemble methods

7. Network data and algorithms
   - introduction to networks    
   - network statistics and connectivity properties
   - visualization and community detection
   - basic models for networks
 


Intended Learning Outcomes (ILO)

KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

+ understand basic predictive algorithms

+ disinguish between prediction and causal inference

+ appreciate what are missing data and how to deal with them

+ identify network structures

+ analyze network data

APPLYING KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

+ build predictive algorithms

+ evaluate predictive performance

+ smooth one and two dimensional data

+ carry out network analytics


Teaching methods

  • Lectures
  • Practical Exercises
  • Collaborative Works / Assignments

DETAILS

+ practical exercises in terms of applying algorithms on real and synthetic datasets 

+ collaborative work in terms of a group project on either predictive analytics or network analytics


Assessment methods

  Continuous assessment Partial exams General exam
  • Written individual exam (traditional/online)
  x x
  • Collaborative Works / Assignment (report, exercise, presentation, project work etc.)
x    

Teaching materials


ATTENDING AND NOT ATTENDING STUDENTS

There will be slides provided for the methodological part of the course and they will form an important part of the reading material for understanding the main concepts. In terms of book references, one is

https://press.princeton.edu/books/hardcover/9780691222271/quantitative-social-science

which is used in Data Analytics but will also be relevant here. An advanced but very   classic textbook is

https://link.springer.com/book/10.1007/978-0-387-21606-5

which is also freely available online. Note, however, that the mathematical level of the course is much more elementary than this book. Additionally, this book:

https://www.deeplearningbook.org/

which is also freely available online and can be consulted. 
 

Last change 28/05/2024 18:21