Insegnamento a.a. 2023-2024

30401 - MATHEMATICS AND STATISTICS - MODULE 2 (STATISTICS)

Department of Decision Sciences

Course taught in English
Go to class group/s: 25
BEMACS (8 credits - II sem. - OB  |  SECS-S/01)
Course Director:
OMIROS PAPASPILIOPOULOS

Classes: 25 (II sem.)
Instructors:
Class 25: OMIROS PAPASPILIOPOULOS


Suggested background knowledge

Basic mathematics and basic R programming

Mission & Content Summary

MISSION

The course develops the principles of scientific learning from data. For each of the topics discussed the course starts from actual data and a specific scientific question and develops the statistical learning and uncertainty quantification tools to gather knowledge from the available data for the given question.

CONTENT SUMMARY

The course is organized in themes. Each theme starts with a theme overview, it introduces some motivating data and associated scientific questions and then develops the statistical tools (models, algorithms, mathematical concepts) needed to gather knowledge from the data to address the motivating questions. The theme finishes with a summary and exercises. 

 

 

The themes are:

 


1. Data visualization and summarization

Data: heart attack study, Shipman's dead patients, daily homicides, test results, jelly beans competition

Concepts: barplots, box plots, means and medians and variational formulation, logarithmic scale, correlation and distance correlation

2. From randomization to randomness

Data: chocolates and nobel prizes, university admission data, death penalty data

Concepts: spurious correlations, experimental vs observational data, random numbers, randomized control trials, confounders, simpson's paradox

3. What is probability and what is it useful for

Concepts: Bernoulli distr., probability densities, Poisson distribution, series and limits, learning a model from the data

4. The calculus of probability

Concepts: events, basic set theory, axioms of probability

5. More models for more data

Data: birth weights, human heights,  heart transplant survival data

Concepts: density functions, Gaussian distribution, survival analysis, exponential distribution, censoring, gamma distribution and special functions, uniform distribution, transformation of variables, simulation of random variables

6. Joint distributions, independence and combinatorics

Data: 10 year maturity bonds, heights of fathers and sons, the Sally Clark story

Concepts: joint and marginal distributions, independence, statistical arguments in Law, the binomial distribution
7. Expectation

Concepts: expected value and interpretation, properties of expectation, moments, variance, standard deviation and interpretation, the uncertainty rule of thumb, skewness and interpretation, sample and population moments 

8. Elements of Network Science

Data: the Internet, employees communication network, the actor network

Concepts: Erdos-Renyi network model, degree distributions, six degrees of separation, heavy tails, scale-free property, power laws, the Student-t distribution

9. Concentration, inequalities and limit theorems

Concepts: Markov inequality, Chebyshev inequality, uncertainty quantification, weak law of large numbers, a basic understanding of the central limit theorem

10. Statistical learning

Data: cholestor and heart disease, arm-folding and sex, bowel cancer rates in the UK

Concepts: quantifying evidence in data about a hypothesis, p-value, Fisher exact test, multiple testing, confidence intervals from concentration inequalities, bootstrap and confidence intervals, funnel plots

 


Intended Learning Outcomes (ILO)

KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

+ fomulate statistical learning questions

+ identify appropriate data analysis methodologies

+ carry out uncertainty quantification

+ learn basic models from data

 

APPLYING KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

+ choose appropriate data summaries and visualization

+ carry out basic network analysis

+ derive basic probability calculations

+ use statistical learning tools


Teaching methods

  • Face-to-face lectures
  • Exercises (exercises, database, software etc.)
  • Group assignments
  • Interactive class activities on campus/online (role playing, business game, simulation, online forum, instant polls)

DETAILS

Exercises (Exercises, database, software etc.):

 

Special sessions with exercises, examples and illustrations of concepts and methods, also with the help of statistical software R, will be provided.

 

Group assignments:

 

A project will be given for students to work in groups that will involve both methodology and data analysis


Assessment methods

  Continuous assessment Partial exams General exam
  • Written individual exam (traditional/online)
  x x
  • Group assignment (report, exercise, presentation, project work etc.)
x    

ATTENDING AND NOT ATTENDING STUDENTS

Students may choose between the following two options:

- Two partial written exams (a mid-term and a final) that contribute to the final grade with a 50% weight each.

- A single general written exam (after the end of the course) that counts for 100% of the final mark.

 

The tests consist of exercises. They aim at ascertaining students' mastery of concepts and results discussed during lectures as well as an adequate knowledge of R.

In each test the maximum grade is 31.

 

The assessment method is the same for both attending and non-attending students.

 

Students who take the mid-term exam may still take the general exam instead of taking the final exam.

 

Importantly, access to the final (or second partial) exam follows the rules indicated in Section 7.6 of the Guide to the University.

 

There will be an optional group project that will receive a maximum of 1.5/31 points. These will be added to the total mark achieved by the previous options but it will be applicable to exams taken before the end of June, that is the project mark cannot be carried over to exams taken after June.


Teaching materials


ATTENDING AND NOT ATTENDING STUDENTS

The teaching material will be primarily that developed during the classes and distributed to the students in a PDF format after each class.

 

The course will use examples and extracts primarily from the first book listed below. It is advisable to acquire this book either in its original publication or its Italian translation (it is also available as an e-book), since it is an excellent modern resource to learn Probability and Statistics and why these are fundamental in anything that has to do with learning from data. 

 

Early chapters from the second book provide an excellent more technical introduction to Probability. The introduction and some Appendices of the third book provide an excellent and accessible introduction to statistical machine learning and the use of Probability and Statistics for designing and analyzing algorithms. The fourth is a textbook whose syllabus correlates highly with the contents of this course. For a number of basic concepts the corresponding Wikipedia pages are a great resource. Please use that instead of random blogs, webpages or videos posted on youtube. 

 

  • Spiegelhalter, The Art of Statistics: How to Learn from Data, Penguin, 2019, ISBN 978-1541618510 (available also in Italian translation)

  • Barabasi, Network Science

  • Grimmett and Stirzaker, Probability and Random Processes, Oxford, Fourth Edition, 2020, ISBN 978-0198847595 
  • Bishop, Pattern Recognition and Machine Learning, Springer, 2006, ISBN 978-0387310732
  • S. ROSS, Introduction to Probability and Statistics for Engineers and Scientists, Fourth Edition, Academic Press, 2014 
Last change 07/12/2023 13:27