20150 - STATISTICS FOR ECONOMICS AND BUSINESS
EMIT
Department of Decision Sciences
Course taught in English
RAFFAELLA PICCARRETA
Course Objectives
The key aim of this course is to provide the students with basic skills in multivariate data analysis. In particular, students learn techniques and methods useful to analyze and synthesize rich data sets (e.g. cluster and factor analyses), with respect to both the number of variables and the number of observations. All methods are taught through hands-on classes, during which the students analyze a number of databases relevant to their studies (e.g. R&D data, patent data, investment data).
Course Content Summary
Introduction: Data matrices
- 
    Matrix algebra: basic notions.
- 
    Multivariate random variables and multivariate samples. Summary statistics for multivariate samples. Geometric interpretation of data matrices. Space of the cases and distances. Total and generalized variance and their geometric interpretation.
Factorial Techniques:
- 
    Principal component (PC) analysis. PC transformation. Property of PCs and their interpretation. Evaluation of results. Sample PC.
- 
    Factor analysis. The Factor model: definition and assumptions. Parameter estimates: the principal component and the principal factor methods. Interpretation of factors: factors rotation. Factor Scores.
- 
    Simple correspondence analysis. Association between categorical variables. Profiles and Chi-square metric. Factors and their interpretation. Graphical representation and analysis of results. 
Dissimilarity matrices and clustering:
- 
    Cluster analysis. Distance and dissimilarity matrices. Hierarchical and partitioning clustering methods. Choice of the number of clusters. Criteria for the evaluation of a partition. 
Detailed Description of Assessment Methods
For non attending students: 
For non attending students the final grade is based on:
- 
    A practical exam. Analysis of a real data set (Pc-lab session).
- 
    A theoretical exam (written exam concerning the methodological issues discussed during the course).
The practical and theoretical exam must be given in the same session.
For attending students:
For attending students the exam is organized as follows. 4 assignments will be proposed during the course, one for each part of the course (PCA, FA, CA, SCA). Students will receive data and a set of questions and will produce a document with the output necessary to answer the questions. The final grade is based on:
- 2 partial exams (written exams) concerning the assigned home-works and the theory (methodological issues discussed during the course)
- Practical analysis - Analysis of a real data set (Pc-lab session). This exam is much shorter compared to that prepared for non attending students.
A student is considered as attending if:
- He/she attended at least 5 lab sessions in the first part of the course (labs dedicated to introduction to SAS, principal components and factor analysis) and at least 4 lab sessions in the second part of the course (labs dedicated to cluster and simple correspondence analysis).
- He/she sat for both the partial exams.
Textbooks
The slides of the course will be made available through the e-learning platform. For details and more in depth description of the techniques described in the course the following text can be referred to:
- 
    R.A. JOHNSON, D.W. WICHERN, Applied Multivariate Statistical Analysis, Prentice Hall, 2002, 5th ed.
or:
- 
    J. LATTIN, J.D. CARROLL, P.E. GREEN, Analyzing Multivariate Data, Thomson, 2003.
Prerequisites
- Basic notions of statistics. Descriptive statistics univariate and bivariate. Most relevant inferential concepts (samples, statistics, estimators, hypothesis testing, p-values).
- Students are expected to be able to work with Excel and Word (basic skills).
