Project Overview

Project Code: CIT 01

Project name:

Studying Privacy in Large Language Models (SPILL)

TUM Department:

CIT - Informatics

TUM Chair / Institute:

Chair of Software Engineering for Business Information Systems (sebis, I19)

Research area:

Natural Language Processing

Student background:

Computer ScienceComputer Science/ Informatics

Further disciplines:

Participation also possible online only:

Planned project location:

TUM Garching Campus, Informatics building.

Project Supervisor - Contact Details


Title:

Prof. Dr.

Given name:

Florian

Family name:

Matthes

E-mail:

matthes@tum.de

Phone:

089 289 17132

Additional Project Supervisor - Contact Details


Title:

Given name:

Stephen

Family name:

Meisenbacher

E-mail:

stephen.meisenbacher@tum.de

Phone:

089 289 17137

Additional Project Supervisor - Contact Details


Title:

Given name:

Family name:

E-mail:

Phone:

Project Description


Project description:

SPILL: Studying Privacy in Large Language Models

Motivation
The age of LLMs has brought about wonderful advances in the capabilities of NLP techniques to help with everyday tasks, from the mundane to even more complex tasks requiring reasoning. At the same time, however, it has been shown that the performance of such models relies on high quality, large-scale training data, much of which originates from data collected from humans.

As a result, an entire subfield of NLP has become popular in recent years, which works at the intersection of data protection (privacy) and NLP. Questions that have been asked include how we can train models privately, how to achieve an optimal privacy-utility trade-off, and what it even means to protect privacy in language. Indeed, the field is quite challenging, with many open research questions and lots of work to do.

This is where you come in!

Project Details
In this project, you will be participating in ongoing research working at the intersection of privacy and NLP. Specific topics of interest include Differential Privacy, synthetic data, benchmarking text privatization, and improving generated private text quality. While the exact project details will be finalized in the months prior to the project start, you can expect to be working with (large) Language Models, investigating both privacy in LLMs, as well as leveraging LLMs for privacy. In doing this, you can look forward to coding new techniques for text privatization, as well as conducting experiments to perform critical analyses of existing methods. Concretely, you can expect:
- Regular checkups and communication with your project supervisor
- Individual coding assignments as part of the larger project
- Documenting your completed work
- Preparing the work for publication and participating in scientific writing

Expected Outcomes
As a result of the proposed project, you will perform a deep dive into a timely and important subfield of Natural Language Processing, which has implications for the advancement of AI going forward. In particular, the student will learn not only to implement technical privacy solutions, but also to think critically about what it means to preserve privacy in the age of LLMs.

Tangible outcomes include a project report and code deliverables, which will be presented to the supervisor and research chair at the end of the project. In addition, you will have the opportunity to take part in scientific paper writing, including disseminating the results from this project’s work.

Finally, you will gain insights into the other exciting work happening within our research group through participation in our weekly meeting and seminar. This will allow for a broadening of perspectives beyond the privacy domain!

Working hours per week planned:

35

Prerequisites


Required study level minimum (at time of TUM PREP project start):

3 years of bachelor studies completed

Subject related:

Fundamentals of Computer Science (algorithms, data structures, etc.)
Strong experience with coding, preferably in Python.
Basic knowledge of Natural Language Processing (practical experience is beneficial)
Preferable: introductory course in ML/DL
Nice to have: experience with (text) data processing and model training

Other:

Strong work ethic, an interest in privacy, and a willingness to learn!

  • Keine Stichwörter