Facebook pixel
Bocconi Home


Didattica > Materiali didattici

20630 Introduction to Sport Analytics

This course provides the analytics requirements of a Sports Management program. It is also an opportunity for applied work for all  students interested in Data Science. All applications in the course will be based on the statistical software R. The course is taught through a combination of lectures, class discussion, group presentations. Students are required to read assignments from the texts as well as additional sources provided by the instructor. Students must attend class prepared to engage in discussions; have, articulate and defend a point of view; and ask questions and provide comments based on their reading and on their own R applications.
Projects will be allocated to groups of attending students. Project reports and their presentation will be part of the evaluation for attending students. 

Pre-Requisites: Students are expected to have attended a core course in statistics and to be familiar with basic calculus and linear algebra.
Teaching Assistant: Office Hours will be held online viaTeams, there are two Teaching Assistants who will follow students both on projects and on exercises
Luca Badolato, luca.badolato@studbocconi.it, office hours fridays 18-19.30 
Ludovico de Fazio, ludovico.defazio@studbocconi.it, office hours  thursdays 18-19.30

Past Exams: 2019_1, 2019_2


Accessing Application Programme Interfaces with R
Accessing APIs from R a tutorial , an R code for the tutorial, an illustrative code to access data from Genius Sport C gold 
Accessing data from Github using an R code 

Dynamic Documents with R Markdown
build a report with all results and comments
An introduction to R Markdown
an illustrative R Markdown code

Github and Github Desktop
A tutorial online 

Project 1: Creating Web Applications with Rshiny  (TA Luca Badolato)
The objective of this project is to create a Sport related web application with RShiny. An Illustration based on NBA data is provided together with projects produced in 2020. Students should feel free to choose their preferred field and application.
ONline tutorials on mastering RShiny  
Learning Shiny with NBA DATA  (by Julia Wrobel),
Programmes for NBA Shiny short version , Programmes for NBA shiny long version 
Rshiny example

Instructions for those who have opted for the Shiny Project in 2020  are available HERE
link to the recorded briefing session: https://eu-lti.bbcollab.com/recording/9e30b37ebe464bc1a84da816b371076e 
LINK to the recorded Presentation of RSHINY PROJECT GROUP 1-2020: https://eu-lti.bbcollab.com/recording/32d77ba3a0ef4c69bc58b2bd8f915e9a
LINK to the recorded Presentation of RSHINY PROJECT GROUP 2-2020: https://eu-lti.bbcollab.com/recording/9de07214a1144749912973001aeb48c9

Project 2: An Application of Unsupervised Machine Learning to Sport Analytics (TA Luca Badolato)
The objective of this project is to apply unsupervised machine learning, and in particular cluster analysis, to finding groups in Sport Analytics data.
P. Zuccolotto and M. Manisera (2020) Basketball Data Science – With Applications in RChapman and Hall/CRC. (Chapter 4
link to basketball analyzeR: https://bdsports.unibs.it/basketballanalyzer/
James, Witten, Habstie and Tibshirani (2011) An Introduction to Statistical Learning- With Applications in R 

LINK to the recorded Presentation of the Cluster Analysis project 2020: https://eu-lti.bbcollab.com/recording/8570729e9532435b951e9b40de8470a5
SLIDES and  Rmd codes  

Project 3: An Application of Supervised Machine Learning to Sport Analytics (TA Ludovico De Fazio)
The objective of this project is to apply supervised machine learning techniques , and in particular techniques to solve the many predictor problem to predict top athletes compensations. 
Students should use as a benchmark the model presented in the lectures and evaluate it against alternatives generated by modern machine learning techniques.
A further possibility for a group undertaking this project is the costruction of a data challenge related to the topic of the project using the data challenge website of Bocconi University.

James, Witten, Habstie and Tibshirani (2011) An Introduction to Statistical Learning- With Applications in R,  
Stock J. and M.Watson (2020) Introduction to Econometrics, 4th edition,  Chapter 14

Project 4: Evaluating the Home Advantage Effect from quasi-Natural Experiments (TA Ludovico De Fazio and Luca Badolato)
Following the COVID shock many games in many sport were played without attendance within "bubbles" in which no team had the "home advantage effect". The objective of this project is to use sport data to construct a quasi-natural experiment for the evaluation of the Home Advantage Effect.
Stock J. and M.Watson (2020) Introduction to Econometrics, 4th edition,  Chapter 13
Presentation of N.Sita(2020) thesis on Evaluating the Home Advantage in NBA 

Project 5: Measuring Competitive Advantage and its effects (TA Ludovico De Fazio) 
The objective of this project is to introduce, discuss the concept of Competitive Balance in the Sport Industry. Both a discussion of the theory and applications are possible. 

Berri D.J.,M.B.Schmidt and S. Brook(2006), The Wages of Wins, Stanford University Press, Ch 3,4
Brandes L. and E.Franck(2007) "Who made who? An Empirical Analysis of Competitive Balance in European Soccer Leagues" Eastern Economic Journal
Haddock D. and L.P.Cain(2006) "Measuring Parity:Tying into the Idealized Standard Deviation", Journal of Sport and Economics
Koning R.H.(2000) Balance in competition in Dutch soccer, The Statistician, 49, Part 3, pp.419-431
Szimansky S.(2001) "Income inequality, competitive balance and the attractiveness of team sports:some evidence and a natural experiment from English Soccer" the Economic Journal,111, F69-F84


Course Content Summary

Section 1: Sport Analytics. an Introduction

The Questions in Sport Analytics. 
The Answers
Modelling Data in Sports
Theory Based Models
Supervised Machine Learning
Unsupervised Machine Learning

Berri D.J.,M.B.Schmidt and S. Brook(2006), The Wages of Wins, Stanford University Press  
Berri D.J., M. B. Schmidt (2010) Stumbling On Wins.Two Economists Expose the Pitfalls on the Road to Victory in Professional Sports-FT Press 
Goldsberry K.(2019) Sprawlball. A visual tour of the new era of NBA, Houghton Mifflin Harcourt
James, Witten, Habstie and Tibshirani (2011) An Introduction to Statistical Learning- With Applications in R,
Shea S.(2014) Basketball analytics. Spatial Tracking
P. Zuccolotto and M. Manisera (2020) Basketball Data Science – With Applications in RChapman and Hall/CRC.
Winston W.L.(2009) Mathletics, Princeton University Press

Section 2: An introduction to R
link to the recorded session:https://eu-lti.bbcollab.com/recording/c90e0504b9f447d092ad5f8fe32908dc

Install R and R studio on your computer and learn how to run them
Learn what is a package and how to install it
Understand what is a view
define a default directory 
have some fun with R Shiny 
An online introduction to R 
R Code
Torfs Brauer "A Very Short Intro to R" , SOLUTIONS FOR the Torfs-Brauer TO DO LIST 

Data-Objects in R
Data Objects in R (data types) and Data Structures In R (Vectors, Matrices, Arrays, Data Frames, Lists)

Data Handling in R
Importing and Exporting, transforming and selecting data 

Programming and Control Flow
if-else statements, using switch, loops, functions in R

all R codes used in Singh and Allen are downloaded at 
R CODES (from Singh and Allen) : Data ObjectsData Handling, Programming, binomial model included

Singh AK and DE Allen(2017) R in Finance and Economics. A Beginners Guide, World Scientific Publishing, Ch 1,2,3,4
Heiss F. (2016)  Using R for introductory Econometrics http://urfie.net/read/mobile/index.html#p=4, 
Yihui Xie, Dynamic Documents with R and Knitr,  Chapman and Hall 

EXERCISE 1 Write an R code that answers to all the ToDo points  in  Torfs P. and C. Bauer(2014) “A (very short) introduction to R” ,
EXERCISE 2 An introduction to Data Handling, SOLUTION

Section 3: Graphical and Descriptive Analysis of Sport Statistics (NBA data)
link to the recorded session 2020: https://eu-lti.bbcollab.com/recording/8a25c6bb1b41437da7678cbc2a47c5cd

Graphical Analysis
Correlation Analysis
QQ plots and Histogram
Subsetting data and TS plots
Introduction to model building and Simulation

The NBA database: download and import in R. teamsoverall2020.csv, NBAteams.zip (new file shorter for download) 
R CODES :  code, please not that you need to create Teams_overall2020.csv ro run the code 
EXERCISE 3:  text, code
link to part 1 of the recorded exercise discussion https://eu-lti.bbcollab.com/recording/9ee78218570f4cf79b199c8997f24ee3
link to part 2 of the recorded exercise discussion https://eu-lti.bbcollab.com/recording/9ac1e2afc9644375951ceac2b60faeec

Section 4: The Linear Regression Model 
SLIDES 1 link to recorded lecture: https://eu-lti.bbcollab.com/recording/222f72cc0a09482fb87c1663ba5483a9
link to recorded lecture  part 1 :https://eu-lti.bbcollab.com/recording/ae33d07795744151b2f423d4068f989a
link to recorded lecture part 2: https://eu-lti.bbcollab.com/recording/f5b2e2b6ea634665b33049447d8cb111
Models for Experimental and non-Experimental Data
Models as outcomes of reduction processes
Model Estimation: the OLS and its properties 
Interpreting Regression Results: Statistical Significance and Relevance 
The Effects of Model Misspecification 

EXERCISE 4:   The Four Factor Model,  NOTES , solution
LInk to the virtual class https://eu-lti.bbcollab.com/recording/8a302141684d4d188c4f123966eb5f4c

Winston W.L.(2009) Mathletics, Princeton University Press, Chapter 28 

Section 5: Using Models to Weight NBA Statistics  
link to recorded lecture PART 1: https://eu-lti.bbcollab.com/recording/ee5f33b8368e4ee08f15e88207eedbcf
link to recorded lecture PART 2: https://eu-lti.bbcollab.com/recording/637c0540c9df42b7a5224160aa47ccfa
link to recorded lecture on players evaluation:https://eu-lti.bbcollab.com/recording/d24e358ced93401d8c65f12a96e357c3 

Weighting Statistics to measure performance
Correlation analysis
The NBA Efficiency Measure
Using a Model based on Possession
Offensive Efficiency and   Defensive Efficiency
Modelling Wins
Evaluating Statistics by Simulation: Monte-Carlo and Bootstrap methods
Completing the Model
Evaluating Players' Efficiency: WINS, assists and WINS48    

R CODES: team_stat , players_statdata on players, NOTES
EXERCISE 6:    textnotes, SOLUTION 
link to the recorded lecture discussing exercise 5-6 : https://eu-lti.bbcollab.com/recording/de664fe34f454bdd86545151f7b67ce2

Berri D.J.,M.B.Schmidt and S. Brook(2006), The Wages of Wins, Stanford University Press, Ch 6,7  

SPORTBUSINESS, https://www.sportbusiness.com/
PRESENTATION by Mark Nervegna (Head of Research and Analytics, SportBusiness, www.sportbusiness.com) 

Section 1
Introduction of  SportBusiness to give the students an idea for who the firms work with and how clients use our data and services. Within this first part Mark Nervegna  will run through his role as Head of Research and Analytics and how the team collects, validates, runs analysis and produces reports on the data which we collect each day, highlighting the challenges which we face searching for and validating hard to find information.

Focus on:

  • Sponsorship Analysis
  • Media Rights Analysis
  • Ad-hoc Consulting Projects
  • Fan Analysis
  • Soccer Product

Section 2
Follow up on the Soccer Product and showcase the in-depth analysis conducted on the Soccer Sponsorship Data. The presentation wil focus on  the sources of information and the Mixed Regression Model coded in R to predict the sponsorship values within the rights holders portfolio which are not made available to the market -with showcase examples through SB exclusive platform.



Ultimo aggiornamento 23/03/2021

In questa sezione