Insegnamento a.a. 2023-2024

20570 - DATA ANALYTICS AND VISUALIZATION

Department of Decision Sciences

Course taught in English
Go to class group/s: 27 - 28
EMIT (8 credits - II sem. - OB  |  SECS-S/01)
Course Director:
RAFFAELLA PICCARRETA

Classes: 27 (II sem.) - 28 (II sem.)
Instructors:
Class 27: FILIPPO TRENTINI, Class 28: RAFFAELLA PICCARRETA


Suggested background knowledge

For an effective learning experience, it is strongly recommended to have basic notions of statistics (as taught at undergraduate school in Bocconi), in particular of univariate and bivariate descriptive statistics (graphs - in particular boxplots; summary measures - in particular mean, variance, covariance, correlation) and of basic inference (in particular hypotheses testing and p-value). Also, a basic knowledge of software R (RStudio) is expected (at least R objects - vectors, matrices, dataframes, lists - and basic functions). The preparatory course 20354 provides material to get aligned, and includes online tests to verify the level of knowledge and understanding of the concepts used during the course. Online meetings are organized on September and at the end of January/beginning of February.

Mission & Content Summary

MISSION

Modern graduates need to use data to a much greater extent compared to their past counterparts. Data management (retrieving, filtering, or cleaning), exploratory data analysis, and appropriate data visualization are becoming more and more relevant in any field. In this course, students are introduced to problems related to the extraction of information from data collected on a relevant number of variables and cases, and gain an applied understanding of the most relevant techniques of data analytics, with specific reference to unsupervised learning. The key goal of the course is to illustrate methods useful to analyse and summarize the most salient features of data sets with respect to both the variables and the cases. The course features hands-on classes, where the application of each techniques is discussed with reference to real datasets.

CONTENT SUMMARY

Data analytics is a broad term that defines the activities in the process of analysing data to draw meaningful and actionable insights. It involves a number of steps and procedures, including:

•        Data manipulation and analysis, aimed at discovering the salient patterns in data

•        Visualisation (e.g. effective presentation) of results, interpretation and communication to stakeholders, in order to drive business strategy and outcomes.

 

The course introduces exploratory techniques to efficiently analyse, summarize and visualize data collected on (relatively) large sets of data. The goal is to reduce the dimension of data while preserving information about the most salient/distinctive features. Such simplification applies both to variables and to cases.

 

The course is articulated as follows:

·        Introduction to multivariate data

In the first part of the course, summaries of data collected on many variables will be introduced, by extending to the multivariate case central tendency and dispersion measures

·        Dimensionality reduction techniques

We will introduce Principal Components and Factor analysis, two techniques aimed at discovering low-dimensional indicators/summaries that capture some structure underlying the (possibly high-dimensional) input data

·        Clustering techniques

The last part of the course introduces techniques to group cases based on their similarities or differences.

 

Beyond traditional classes, the course features hands-on classes, where the statistical software R - and in particular the integrated development environment (IDE) RStudio - is used to apply the considered techniques (Principal Components Analysis, Factor Analysis, and Cluster Analysis) to real data, and to properly interpret and present results, via suitable visualisation tools.


Intended Learning Outcomes (ILO)

KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

·        Identify the technique most suitable to simplify relevant information in a dataset with reference to a specific goal of analysis.

·        Recognize appropriate and inappropriate applications and approaches with reference to a specific goal of analysis.

·        Justify the adoption of a specific path of analysis and of the choices made during the analysis.

·        Compare the results obtained using different approaches, evaluate the stability of results.

·        Write R scripts to analyse data

APPLYING KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

·        Design/develop scripts in the R-programming language to read, manipulate, analyse and visualise data

·        Interpret and critically analyse results, emphasizing the most relevant conclusions both from a technical and from an interpretative point of view.

·        Effectively present the output, using suitable visualization tools allowing an immediate and unbiased understanding of the most salient features in data.


Teaching methods

  • Face-to-face lectures
  • Exercises (exercises, database, software etc.)
  • Group assignments

DETAILS

During the course, there will be 3 blocks of hands-on classes, one for each of the three techniques taught during the course. For each technique, teams of students will work on an assignment concerning a substantive problem using data analysis.

Such assignments aim at assessing the ability to design a work flow to analyse data using the software R, as well as the ability to draw substantive conclusions based on the software output.

During each hands-on class, teams will answer to the specific questions presented in class writing a memorandum uploaded on Bboard by the end of the class.

Each block of hands-on classes will be followed by a session where individual tests will be administered containing questions on the theoretical aspects of the considered technique, on the results obtained during the hands-on classes, and on the aspects taken into account to develop the analysis presented by the students with their team.

Students who actively participate to groups work and give individual tests can give the exam as attending (see the session on assessment methods for details and rules)


Assessment methods

  Continuous assessment Partial exams General exam
  • Written individual exam (traditional/online)
x   x
  • Individual assignment (report, exercise, presentation, project work etc.)
    x
  • Group assignment (report, exercise, presentation, project work etc.)
x    
  • Peer evaluation
x    

ATTENDING STUDENTS

Effective class participation includes attendance, preparation, making an active and constructive contribution to the class discussion, asking questions, making constructive comments, and having a positive attitude toward learning.

To be considered attending, students must participate to the activities described below.

  • During the course, there will be 3 blocks of hands-on classes, one for each of the three techniques taught during the course. For each technique, teams of students will work on an assignment concerning a substantive problem using data analysis.

Such assignments aim at assessing the ability to design a work flow to analyse data using the software R, as well as the ability to draw substantive conclusions based on the software output.

Students must be able and ready to contribute to their team’s assignment, both with respect to the R-commands needed to perform the required analyses and with respect to the knowledge of the technique, in order to contribute both to the definition of the path of analysis and to the interpretation and critical evaluation of the obtained results. During each hands-on class, teams will answer to the specific questions presented in class writing a memorandum uploaded on Bboard by the end of the class.

  • Each block of hands-on classes will be followed by a session where individual tests will be administered containing questions on the theoretical aspects of the considered technique, on the results obtained during the hands-on classes, and on the aspects taken into account to develop the analysis presented by the students with their team.

Such tests aim at assessing the knowledge on the techniques introduced in the course, also with respect to the obtained output.

 

To measure the acquisition of the learning outcomes, the students’ assessment is based on three main components:

  • The team assignment will count for the 20% of the final grade (6 points overall, 2 points for each block of hands-on classes). Students should be aware that a peer review process will be in place, and that critical situations reported by peers might imply substantial reduction of the final grade
  • The individual tests taken during the course will count for the 30% of the final grade (9 points overall, 3 points for each individual test)
  • A final exam (denoted as S - scritto - on the Bocconi website) at the end of the course – counting for the 50% of the final grade (15 points overall) – consisting in an in-class (lab) computer assignment.

Students will use their own laptop to analyse a set of data using the techniques illustrated during the course, writing a script from the scratch using the software R and preparing a short report with their analysis, also offering a substantive interpretation of the obtained results.

The exam aims at assessing the individual ability to apply the techniques illustrated during the course, to coherently design a work flow to analyse data using the software R and to draw substantive conclusions on the data at hand based on the software output.

 

Below are the dates/hours scheduled for the 3 hands on classes and the individual tests for the three modules:

PCA: 5/3/2024 (10.15),   6/3/2024 (8.30),   7/3/2024 (16.30),    8/3/2024 (14.45)

FA:    9/4/2024 (10.15), 10/4/2024 (8.30), 11/4/2024 (16.30), 12/4/2024 (14.45)

CA:    7/5/2024 (10.15),   8/5/2024 (8.30),   9/5/2024 (16.30), 10/5/2024 (12.00)

 

Important:

  • Students who skip more than one block of hands-on classes and more than one individual test cannot qualify as attending. Partial participation to team work will imply a proportional reduction of the grade on team assignment
  • Students who skip one or more hands on classes will have their grade in the group work proportionally reduced
  • Students who skip the individual test will not be allowed to retake it
  • Students who sign as attending and are not present in class - besides the consequences stated in the honour code - will not be allowed to take the exam as attending students. 
  • There is no midterm exam.
  • To be admitted to the final exam it is mandatory to register to it. No exception will be made to this rule
  • Students of the past years who already sat for the final (practical) exam and/or who participated to the teams assignment in the past years cannot qualify as attending. This is in line with the rules stated in the syllabi of the past years. The same rule will apply to the students enrolled in the current academic year.

NOT ATTENDING STUDENTS

The non-attending students can take a final exam at the end of the course. Such exam will be articulated into a practical exam similar to the final exam for the 70% of the final grade (21 points) and a theoretical exam counting for the 30% of the final grade (9 points).

 

Important:

  • There is no midterm exam.
  • To be admitted to the final exam it is mandatory to register to it. No exception will be made to this rule
  • Students of the past years who already sat for the final (practical) exam and/or who participated to the teams assignment in the past years must take the exam as not attending. This is in line with the rules stated in the syllabi of the past years, and is coherent with the structure of the exam in the past years.

 


Teaching materials


ATTENDING AND NOT ATTENDING STUDENTS

to be defined

Last change 12/12/2023 14:44