20570 - DATA ANALYTICS AND VISUALIZATION
Course taught in English
Go to class group/s: 22
Lezioni della classe erogate in presenza
For an effective learning experience, it is strongly recommended to have basic notions of statistics, in particular of univariate and bivariate descriptive statistics and of the most relevant inferential concepts (samples, statistics, estimators, hypothesis testing, p-values). To this aim, an online preparatory course (20354) is available, including online tests to verify the level of knowledge and understanding of the concepts used during the course. Students are expected to be able to work with Excel and Word (basic skills)
Modern graduates need to use data to a much greater extent compared to their past counterparts. Data management (retrieving, filtering, or cleaning), exploratory data analysis, and appropriate data visualization are becoming more and more relevant in any field. In this course, students are introduced to big datasets, and gain an applied understanding of the most relevant techniques of multivariate data analysis, with specific reference to unsupervised learning. The key goal of the course is to illustrate methods useful to analyze and summarize the most salient features of large data sets with respect to both the variables and the cases. The course features hands-on classes, where the application of each techniques is discussed with reference to real datasets.
- Description and summary of multivariate samples.
- Variables reduction. Optimal indicators and Principal components analysis.
- Variables reduction. Latent concepts and Factor Analysis.
- Cases reduction. Finding groups in data and Cluster Analysis.
- Summarizing association using Simple Correspondence Analysis.
- Individuate the technique most suitable to simplify relevant information in a dataset with reference to a specific goal of analysis.
- Recognize appropriate and inappropriate applications and approaches with reference to a specific goal of analysis.
- Justify the adoption of a specific path of analysis and the choices made during the analysis.
- Compare the results obtained using different approaches, evaluate the stability of results.
- Prepare data for the analysis.
- Analyze data using a statistical software.
- Interpret and critically analyze results, emphasizing the most relevant conclusions both from a technical and from an interpretative point of view.
- Effectively present the output, using suitable visualization tools allowing an immediate and unbiased understanding of the most salient features in data.
- Face-to-face lectures
- Exercises (exercises, database, software etc.)
- Interactive class activities (role playing, business game, simulation, online forum, instant polls)
The course is articulated into different types of teaching methods:
- Face-to-face classes introducing the most relevant theoretical concepts relative to each technique.
- Face-to-face classes discussing the appropriate application of the technique with reference to a specific problem and set of data. The choices left to the analyst and the possible available methods are presented. Criteria to evaluate and compare results are discussed.
- Exercises. Hands-on classes (in lab). Students are guided to the analysis of a real set of data with reference to a specific goal of analysis. In particular attention is focused on:
- Proper definition of a suitable path of analysis.
- Development from the scratch of a program allowing the analysis of data using standard functions and macro functions created for the course.
- Discussion of results and comparison of alternative strategies.
- Criteria to choose one out of the available solutions.
- Visualization of results and interpretation.
- Discussion (interactive class activities). Lessons where students are divided into small groups. A problem is described with reference to a set of data, and the output obtained under different choices and using alternative approaches is made available. The groups discuss about the path of analysis that they would follow. The alternative views and proposals made by different groups are contrasted and discussed.
For attending students the exam as the same structure described with reference to not attending students. Nonetheless, attending students can divide the theoretical exams into two partial exams. Students qualify as attending if:
- They attend at least 4 of the labs dedicated to principal components and factor analysis.
- They attend at least 4 of the labs dedicated to cluster analysis and simple correspondence analysis.
- They attend at least one lab for each technique.
- For a detailed description of the practical and of the theoretical exam, please refer to the description of the exam for not attending students.
The exam consists of two parts:
- A theoretical exam (40%).
- A practical exam (60%).
- The practical exam (denoted as S - scritto - on the Bocconi website) is an in-class (lab) computer assignment, and it is the same both for attending and not attending students. It consists of the analysis of a dataset using the techniques illustrated during the course, and on the interpretation of the obtained results. Students must write their own computer programs from scratch. The exam is closed-book and closed-notes but the students can consult documents illustrating the commands needed to obtain different output.
- The (general) theoretical exam (denoted as O - orale - on the Bocconi website) is a written exam (open questions). It aims at assessing the knowledge on the techniques introduced in the course, also with respect to the quantities that can be obtained using a software. The exam includes some “paper-and-pencil” derivation questions, as well as questions about results obtained applying the illustrated techniques to specific datasets. The latter questions do not test the knowledge of the software, but do require an understanding of typical output. The exam is closed-book and closed-notes, but students need to use a simple calculator to answer some questions.
- Attending students can give two partial exams (20% each, denoted as I on the Bocconi website) instead of the general theoretical exam.
- The final grade is obtained by combining the grades taken in the different parts. Additional (decimal) points are assigned to those students who took the final test proposed in the preparatory course 20354 (Data Analysis) in one of the two dedicated time windows (about the first week of September or about one week before the beginning of the course). Students whose final result is not an integer receive extra-decimals (max 0.5, depending on performance). In any case, the final grade obtained as described is rounded to the upper integer only when the decimal part is higher than 0.7.
Slides of the theoretical lessons are uploaded on the Bboard. These notes are complete and cover the whole program. For more detailed discussions of the topics, students can refer to:
- R.A. JOHNSON, D.W. WICHERN, Applied Multivariate Statistical Analysis, Prentice Hall, 2002, 5th ed or subsequent editions OR
- J. LATTIN, J.D. CARROLL, P.E. GREEN, Analyzing Multivariate Data, Thomson, 2003.