20564 - BIG DATA FOR BUSINESS DECISIONS
Course taught in English
Go to class group/s: 31
A good familiarity with statistics: hypothesis testing, linear modeling. A basic knowledge of what a programming language is. Familiarity with the R programming language.
Today's world is constellated by interdisciplinary professional figures, like data scientists, who are able to successfully mix different technical skills to provide extremely powerful insights from data. The mission of this course is to teach students of business administration and related fields how to reduce the gap when interacting with more quantitative colleagues. The course provides some basic knowledge about statistical modeling, data visualization as well as computer programming which are all fundamental aspects when developing an impactful storytelling.
- Definition of Big Data.
- Parallel and distributed computing.
- Statistical modeling: from linear regression to clustering and classification.
- Model evaluation.
- Data visualization.
- Natural Language Processing
- Artificial Neural Networks.
- Introduction to programming in R and Python.
- Statistical modeling in R and Python.
- Recognize business problems where Big Data can be applicable.
- Recognized the main statistical models generally adopted to extract insights.
- Understand the complexity of analyzing textual data.
- Understand what an Artificial Neural Network is and what are its main components.
- Solve business problems by data-analytic thinking.
- Use several tools and techniques to practically implement solution methods.
- Use R to carry out simple statistical analyses and visualizations.
- Prepare and discuss a scientific report.
- Online lectures
- Guest speaker's talks (in class or in distance)
- Exercises (exercises, database, software etc.)
- Individual assignments
- Group assignments
- Speakers from both practice and academia are brought to class.
- Reviews of programming lectures are given to students for home studying.
- For attending students only, a practical group assignment is presented in class at the end of the course.
Continuous assessment | Partial exams | General exam | |
---|---|---|---|
x | |||
x | |||
x |
- One group assignment representing 80% of final grade to be presented and discussed in class.
- One final multiple-choice written exam representing 20% of final grade.
- One individual assignment representing 40% of final grade to be given to the instructor for review.
- One final multiple-choice written exam representing 60% of final grade.
Main source:
- Slides provided by the instructors.
- Papers will also be circulated by the instructors.
Additional sources:
- G. RUBERA, F. GROSSETTI (edition by Egea BUP), Python for non-Pythonians: How to Win Over Programming Languages.
- J. SILGE, D. ROBINSON (edition by O'REALLY),Text Mining with R: A Tidy Approach.
- G. GROLEMUND, H. WICKHAM (edition by O'REALLY), R for Data Science.
- G. GROLEMUND (edition by O'Really), Hands-On Programming with R: Write Your Own Functions and Simulations.
Advanced readings:
-
Trevor Hastie, Robert Tibshirani, Jerome Friedman:The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (available in pdf here: https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf
- F. CHOLLET (edition by Manning Publications), Deep Learning with R.
- F. CHOLLET (edition by Manning Publications), Deep Learning with Python.