20150  STATISTICS FOR ECONOMICS AND BUSINESS
EMIT
Course taught in English
Go to class group/s: 22
The key aim of this course is to provide the students with basic skills in multivariate data analysis. In particular, students learn techniques and methods useful to analyze and synthesize rich data sets (e.g. cluster and factor analyses), with respect to both the number of variables and the number of observations. All methods are taught through handson classes, during which the students analyze a number of databases relevant to their studies (e.g. R&D data, patent data, investment data).
Introduction: Data matrices

Matrix algebra: basic notions.

Multivariate random variables and multivariate samples. Summary statistics for multivariate samples. Geometric interpretation of data matrices. Space of the cases and distances. Total and generalized variance and their geometric interpretation.
Factorial Techniques:

Principal component (PC) analysis. PC transformation. Property of PCs and their interpretation. Evaluation of results. Sample PC.

Factor analysis. The Factor model: definition and assumptions. Parameter estimates: the principal component and the principal factor methods. Interpretation of factors: factors rotation. Factor Scores.

Simple correspondence analysis. Association between categorical variables. Profiles and Chisquare metric. Factors and their interpretation. Graphical representation and analysis of results.
Dissimilarity matrices and clustering:

Cluster analysis. Distance and dissimilarity matrices. Hierarchical and partitioning clustering methods. Choice of the number of clusters. Criteria for the evaluation of a partition.
For non attending students:
For non attending students the final grade is based on:

a practical exam. Analysis of a real data set (Pclab session).

A theoretical exam (written exam concerning the methodological issues discussed during the course).
The practical and theoretical exam must be given in the same session.
For attending students:
For attending students the exam is organized as follows. 4 assignments will be proposed during the course, one for each part of the course (PCA, FA, CA, SCA). Students will receive data and a set of questions and will produce a document with the output necessary to answer the questions. The final grade is based on:
 2 partial exams (written exams) concerning the assigned homeworks and the theory (methodological issues discussed during the course)
 Practical analysis  Analysis of a real data set (Pclab session). This exam is much shorter compared to that prepared for non attending students.
A student is considered as attending if:
 he/she attended at least 5 lab sessions in the first part of the course (labs dedicated to introduction to SAS, principal components and factor analysis) and at least 4 lab sessions in the second part of the course (labs dedicated to cluster and simple correspondence analysis).
 He/she sat for both the partial exams.
The slides of the course will be made available through the elearning platform. For details and more in depth description of the techniques described in the course the following text can be referred to:

R.A. JOHNSON, D.W. WICHERN, Applied Multivariate Statistical Analysis, Prentice Hall, 2002, 5^{th} ed.
or:

J. LATTIN, J.D. CARROLL, P.E. GREEN, Analyzing Multivariate Data, Thomson, 2003.

Basic notions of statistics. Descriptive statistics univariate and bivariate. Most relevant inferential concepts (samples, statistics, estimators, hypothesis testing, pvalues).

Students are expected to be able to work with Excel and Word (basic skills).