Info
Foto sezione
Logo Bocconi

Course 2023-2024 a.y.

20879 - LANGUAGE TECHNOLOGY

AI
Department of Computing Sciences

Course taught in English



Go to class group/s: 29

AI (8 credits - II sem. - OB  |  ING-INF/05)
Course Director:
DIRK HOVY

Classes: 29 (II sem.)
Instructors:
Class 29: DIRK HOVY


Synchronous Blended: Lezioni erogate in modalità sincrona in aula (max 1 ora per credito online sincrona)

Suggested background knowledge

To feel comfortable in this course, you should have good knowledge of programming in Python, as well as simple linear algebra (what are vectors and matrices, how are they multiplied) and probability theory (what is a probability distribution, what is conditional probability). Additional knowledge of data structures (classes, Counter, defaultdict) make many of the applications easier to solve.


Mission & Content Summary
MISSION

Natural Language Processing and language technology tools are becoming ubiquitous: from everyday tools like machine translation or smart speakers, to industry applications for hiring, customer analysis, etc. Machine-learning based text analysis tools provide a range of possibilities and are a growing field of expertise. The advance of large language models like (chatGPT, etc) have changed and greatly expanded NLP capabilities. This course provides an overview and hands-on experience in all relevant techniques.

CONTENT SUMMARY

Information theory, basics and history of NLP, language models, representations, topic models, classification, NLP applications, ethics of AI and NLP.


Intended Learning Outcomes (ILO)
KNOWLEDGE AND UNDERSTANDING
At the end of the course student will be able to...

- understand the power of large langauge models

- reason about the risks and benefits of various approaches

- come up with an appropriate method for a given problem

APPLYING KNOWLEDGE AND UNDERSTANDING
At the end of the course student will be able to...

- implement various NLP methods

- develop, run, and analyze various tools


Teaching methods
  • Face-to-face lectures
  • Guest speaker's talks (in class or in distance)
  • Exercises (exercises, database, software etc.)
  • Individual assignments
  • Group assignments
DETAILS

The course has lectures, with slides and explanantions, and associated practice Jupyter notebooks.


Each student completes individual assignments to get experience in implementation details, and students work together in groups to solve a joint task. If applicable/available, students have the option to participate in external competitions such as Kaggle competitions or shared tasks in natural language processing.


Assessment methods
  Continuous assessment Partial exams General exam
  • Individual assignment (report, exercise, presentation, project work etc.)
  • x   x
  • Active class participation (virtual, attendance)
  • x    
    ATTENDING AND NOT ATTENDING STUDENTS

    Best two out of three individual assignments (50%)
    Final project (50%)

     

    Projects are graded based on the performance of the system and the quality of the report. Assessment of projects will include their clarity of presentation and performance of models used, as well as ambitiousness of the project.


    Teaching materials
    ATTENDING AND NOT ATTENDING STUDENTS

    Jupyter notebooks are provided for each class, as well as class notes for required reading.
     

    OPTIONAL READING
    Hovy, Dirk. Text Analysis in Python for Social Scientists, Discovery and Exploration. Cambridge University Press, 2020.
    Jurafsky, Dan, and James H. Martin. Speech and language processing. Vol. 3. London: Pearson, 2014.
    Manning, Christopher D., and Hinrich Schütze. Foundations of statistical natural language processing. MIT press, 1999.
    Marsland, Stephen. Machine learning: an algorithmic perspective. CRC press, 2015.
    Chollet, Francois. Deep learning with Python. Manning Publications Co., 2017.
    Goldberg, Yoav. A Primer on Neural Network Models for Natural Language Processing. ArXiv, 2015.
    Eisenstein, Jacob. Introduction to Natural Language Processing. MIT Press, 2019.
     

    Last change 06/02/2024 10:45