20570 - DATA ANALYTICS AND VISUALIZATION
Course taught in English
Go to class group/s: 22
Lezioni della classe erogate in forma blended (in parte online e in parte in presenza)
For an effective learning experience, it is strongly recommended to have basic notions of statistics, in particular of univariate and bivariate descriptive statistics and of the most relevant inferential concepts (samples, statistics, estimators, hypothesis testing, p-values). To this aim, an online preparatory course (20354) is available, including online tests to verify the level of knowledge and understanding of the concepts used during the course. Students are expected to be able to work with Excel and Word (basic skills)
Modern graduates need to use data to a much greater extent compared to their past counterparts. Data management (retrieving, filtering, or cleaning), exploratory data analysis, and appropriate data visualization are becoming more and more relevant in any field. In this course, students are introduced to big datasets, and gain an applied understanding of the most relevant techniques of multivariate data analysis, with specific reference to unsupervised learning. The key goal of the course is to illustrate methods useful to analyze and summarize the most salient features of large data sets with respect to both the variables and the cases. The course features hands-on classes, where the application of each techniques is discussed with reference to real datasets.
- Description and summary of multivariate samples.
- Variables reduction. Optimal indicators and Principal components analysis.
- Variables reduction. Latent concepts and Factor Analysis.
- Cases reduction. Finding groups in data and Cluster Analysis.
- Individuate the technique most suitable to simplify relevant information in a dataset with reference to a specific goal of analysis.
- Recognize appropriate and inappropriate applications and approaches with reference to a specific goal of analysis.
- Justify the adoption of a specific path of analysis and the choices made during the analysis.
- Compare the results obtained using different approaches, evaluate the stability of results.
- Write R scripts to analyze data
Design/develop a script in the R-programming language that read, manipulate, analyse and visualise data
- Interpret and critically analyze results, emphasizing the most relevant conclusions both from a technical and from an interpretative point of view.
- Effectively present the output, using suitable visualization tools allowing an immediate and unbiased understanding of the most salient features in data.
- Face-to-face lectures
- Online lectures
- Exercises (exercises, database, software etc.)
- Group assignments
- Interactive class activities (role playing, business game, simulation, online forum, instant polls)
The course is articulated into different types of teaching methods:
- theory. Online lessons introducing the most relevant theoretical concepts relative to each technique.
- theory&app. Online lessons illustrating and discussing the appropriate application of the technique with reference to a specific problem and set of data. The choices left to the analyst and the possible available methods are presented. Criteria to evaluate and compare results are discussed.
- R-labs. In-class lessons illustrating and discussing the scripts in the R-programming language employed to obtain the results discussed during the theory&app lessons.
- Hands-on. In-class assignments whereby students divided into groups are required to analyze a set of data. In particular attention is focused on:
- Proper definition of a suitable path of analysis.
- Development from the scratch of a program allowing the analysis of data using standard functions and macro functions created for the course.
- Discussion of results and comparison of alternative strategies.
- Criteria to choose one out of the available solutions.
- Visualization, interpretation and discussion of results.
- Discussion. Interactive class activities where students discuss the analysis developed during the hands-on classes. Instant polls and tests are used to identify the alternative views and choices, that are contrasted and discussed afterwards.
To be considered as attending, students must actively participate to the hands-on and to the discussion classes. Specifically students must contribute to their group's assignment, and must participate to the instant polls and test run during the discussion classes.
The in-class participation, the quality of the discussion, and the considerations supporting the answers given at tests and polls will be evaluated and will count for 9 points (the 30%) of the final grade.
In addition attending students will be evaluated through a theoretical exam (counting for 12 points - the 40% - of the final grade) and a practical exam (counting for 9 points - the 30% of the final grade). For a description of these exams, please refer to the assessment methods for not attending students.
For not attending students the exam consists of two parts:
- A theoretical exam (40%).
- A practical exam (60%).
- The practical exam (denoted as S - scritto - on the Bocconi website) is an in-class (lab) computer assignment. It consists of the analysis of a dataset using the techniques illustrated during the course, and on the interpretation of the obtained results. Students must write their own computer programs from scratch. The exam is closed-book and closed-notes but the students can consult documents illustrating the commands needed to obtain different output.
- The theoretical exam (denoted as O - orale - on the Bocconi website) is an (open-answers and close-answers) online exam. It aims at assessing the knowledge on the techniques introduced in the course, also with respect to the quantities that can be obtained using a software. The exam includes some “paper-and-pencil” derivation questions, as well as questions about results obtained applying the illustrated techniques to specific datasets. The latter questions do not test the knowledge of the software, but do require an understanding of typical output. The exam is closed-book and closed-notes, but students need to use a simple calculator to answer some questions.
For both attending and not attending students the theoretical exam can be split into two partial exams. The first partial will cover Principal components analysis (and will count for 4 points). The second Partial will cover Factor Analysis and Cluster Analysis
Slides of the theoretical lessons are uploaded on the Bboard. These notes are complete and cover the whole program. For more detailed discussions of the topics, students can refer to:
- R.A. JOHNSON, D.W. WICHERN, Applied Multivariate Statistical Analysis, Prentice Hall, 2002, 5th ed or subsequent editions OR
- J. LATTIN, J.D. CARROLL, P.E. GREEN, Analyzing Multivariate Data, Thomson, 2003.