SPILL: Studying Privacy in Large Language Models
Motivation
The age of LLMs has brought about wonderful advances in the capabilities of NLP techniques to help with everyday tasks, from the mundane to even more complex tasks requiring reasoning. At the same time, however, it has been shown that the performance of such models relies on high quality, large-scale training data, much of which originates from data collected from humans.
As a result, an entire subfield of NLP has become popular in recent years, which works at the intersection of data protection (privacy) and NLP. Questions that have been asked include how we can train models privately, how to achieve an optimal privacy-utility trade-off, and what it even means to protect privacy in language. Indeed, the field is quite challenging, with many open research questions and lots of work to do.
This is where you come in!
Project Details
In this project, you will be participating in ongoing research working at the intersection of privacy and NLP. Specific topics of interest include Differential Privacy, synthetic data, benchmarking text privatization, and improving generated private text quality. While the exact project details will be finalized in the months prior to the project start, you can expect to be working with (large) Language Models, investigating both privacy in LLMs, as well as leveraging LLMs for privacy. In doing this, you can look forward to coding new techniques for text privatization, as well as conducting experiments to perform critical analyses of existing methods. Concretely, you can expect:
- Regular checkups and communication with your project supervisor
- Individual coding assignments as part of the larger project
- Documenting your completed work
- Preparing the work for publication and participating in scientific writing
Expected Outcomes
As a result of the proposed project, you will perform a deep dive into a timely and important subfield of Natural Language Processing, which has implications for the advancement of AI going forward. In particular, the student will learn not only to implement technical privacy solutions, but also to think critically about what it means to preserve privacy in the age of LLMs.
Tangible outcomes include a project report and code deliverables, which will be presented to the supervisor and research chair at the end of the project. In addition, you will have the opportunity to take part in scientific paper writing, including disseminating the results from this project’s work.
Finally, you will gain insights into the other exciting work happening within our research group through participation in our weekly meeting and seminar. This will allow for a broadening of perspectives beyond the privacy domain!