Background:
Deep learning methods have proven successful in many different fields, 
such as image generation (DALL-E, Midjourney), natural language 
processing (GPT), and operator learning (DeepONet, FNO), to name a few. 
The process of gradient optimization by backpropagation has been a key 
factor in its success. However, this training process and the resulting 
models are still hard to understand, with the networks often referred to 
as "black-box models". Gaining a better understanding of the algorithms 
and models of deep learning is an important scientific task.
To train neural networks means to find good parameters based on data. 
There are some existing mathematical connections between the input space 
(or data space) and the parameter space (or weight space), namely, they 
are dual to each other [1]. With this in mind, a method called "sampling 
where it matters" (SWIM) [2] was developed in the group where the 
project will be conducted. SWIM makes the connection of data and 
parameters even more explicit by taking an approach different from 
backpropagation when constructing feed-forward networks: For each 
neuron, one samples a pair of points from the data set and constructs 
the weight and bias parameters using this pair. This means the parameter 
space is directly related to the input space (specifically, the space of 
pairs of input points). However, the seemingly reduced parameter space 
is as expressive and flexible as regular neural networks (proven in 
[2]). This means one can more explicitly explore the connection between 
the input space and parameter space.
Specific project:
The focus of this work is to create a map from the resulting parameter 
space found through iterative optimization (SGD, Adam optimizers) to the 
parameter space explicitly defined by the input space sampling method 
(SWIM, [2]). This must be done by identifying pairs of points in our 
dataset that are important for the training. This allows us to gain an 
understanding of the gradient optimization techniques and explain the 
resulting model. It is also possible to create such mappings during 
training iterations and see how the emphasis on different data points 
changes over the iterations. Finally, we are also interested in the 
inverse map, where we start with the weights constructed by SWIM and 
improve upon such weights by mapping them to the parameter space induced 
by optimization.
The initial groundwork has already been done in a previous thesis [3], 
by testing simple methods to create maps for shallow neural networks and 
low-dimensional input spaces. This was built upon through a research 
internship, where algorithms from optimal transport theory were applied 
to higher-dimensional input spaces. The student(s) will continue this 
work by either extending methods or creating new strategies for deep 
neural networks and higher-dimensional input spaces. Depending on the 
student's preference, this can be achieved through principled and 
mathematically rigorous methods, such as optimal transport theory, or 
more heuristic approaches, such as computational experiments.
Outcomes for the student:
1) Gain understanding of the underlying structure of deep neural 
networks and back-propagation, which is highly relevant in today's AI 
landscape.
2) The chance to dive deep into the mathematical foundation of mapping 
between probability spaces, such as optimal transport theory.
3) Make a meaningful contribution and advancement in explaining deep 
learning and understanding its training phase and the resulting model. 
Such understanding will be crucial for a world that uses more and more 
machine learning and AI.
4) Conduct research to explore whether a future in academia is of 
interest, and gain valuable experience in a scientific working environment.
Citations:
[1] L. Spek, T. Heeringa, C. Brune, "Duality for Neural Networks through 
Reproducing Kernel Banach Spaces", arXiv, 2022.
[2] E. Bolager, I. Burak, C. Datar, Q. Sun, F. Dietrich, "Sampling 
weights of deep neural networks", NeurIPS, 2023.
[3] Dhia Bouassida, TUM Bachelor thesis on "Converting Neural Networks 
to Sampled Networks", 2023. https://mediatum.ub.tum.de/1726944