Background:
Deep learning methods have proven successful in many different fields, such as image generation (DALL-E, Midjourney), natural language processing (GPT), and operator learning (DeepONet, FNO) to name a few. The process of gradient optimization by backpropagation has been a key factor to its success. However, this training process and the resulting models are still hard to understand, with the networks often referred to as "black-box models". Gaining a better understanding of the algorithms and models of deep learning is an important scientific task.
To train neural networks means to find good parameters based on data. There are some existing mathematical connections between the input space (or data space) and the parameter space (or weight space), namely they are dual of each other [1]. With this in mind, a method called "sampling where it matters" (SWIM) [2] was developed in the group where the project will be conducted. SWIM makes the connection of data and parameters even more explicit, by taking an approach different to backpropagation when constructing feed-forward networks: For each neuron, one samples a pair of points from the data set, and constructs the weight and bias parameters using this pair. This means the parameter space is directly related to the input space (specifically, the space of pairs of input points). The seemingly reduced parameter space is, however, as expressive and flexible as regular neural networks (proven in [2]). This means one can start exploring the connection between the input space and parameter space more explicitly.
Specific project:
The focus of this work is to create a map from the resulting parameter space found through iterative optimization (SGD, Adam optimizers) to the parameter space explicitly defined by the input space sampling method (SWIM, [2]). This must be done by identifying pairs of points in our dataset that are important for the training. This allows to gain understanding of the gradient optimization techniques and explain the resulting model. It is also possible to create such mappings during the iterations of training, and see how the emphasis on different data points changes over the iterations. Finally, we are also interested in the inverse map, where we start with the weights constructed by SWIM, and improve upon such weights by mapping them to the parameter space induced by optimization.
The initial groundwork has already been done in a previous thesis [3], by testing simple methods to create maps for shallow neural networks and low-dimensional input spaces. The student(s) will continue this work, by either extending methods or creating new methods for deep neural networks and higher-dimensional input spaces. Depending on the students preference, this can be achieved through principled and mathematically rigorous methods such as optimal transport theory, or more heuristic approaches such as computational experiments.
Outcomes for the student:
1) Gain understanding of the underlying structure of deep neural networks and back-propagation, which is highly relevant in today's AI landscape.
2) The chance to dive deep into the mathematical foundation of mapping between probability spaces, such as optimal transport theory.
3) Make meaningful contribution and advancement into explaining deep learning and understanding both its training phase and the resulting model. Such understanding will be crucial for a world that uses more and more machine learning and AI.
4) Conduct research to explore whether a future in academia is of interest or not, and gain valuable experience in a scientific working environment.
Citations:
[1] L. Spek, T. Heeringa, C. Brune, "Duality for Neural Networks through Reproducing Kernel Banach Spaces", arXiv, 2022.
[2] E. Bolager, I. Burak, C. Datar, Q. Sun, F. Dietrich, "Sampling weights of deep neural networks", NeurIPS, 2023.
[3] Dhia Bouassida, TUM Bachelor thesis on "Converting Neural Networks to Sampled Networks", 2023. https://mediatum.ub.tum.de/1726944