Instructors: Prof. Dr. Nassir Navab, Dr. Shahrooz Faghihroohi, Dr. Azade Farshad, Yousef Yeganeh

Time: TBA

Registration

Registration must be done through TUM Matching Platform (please pay attention to the Deadlines)
In order to increase your priority, please also apply via our own Registration system.
The maximum number of participants: 14

Announcements

The presentation and blogpost guidelines are available here: TBA
The DLMA Introduction slides can be found here: DLMA-PreliminaryMeeting-WS25-26.pdf

Introduction

Deep Learning is growing tremendously in Computer Vision and Medical Imaging as well. Highly impacted journals in the medical imaging community, i.e. IEEE Transaction on Medical Imaging, recently published their special edition on Deep Learning [1]. The Seminar will propose a list of recent scientific articles related to the main current research topics in deep learning for Medical Applications, together with some interesting papers from other communities (CVPR, NeurIPS, ICCV, ICLR, ICML, ...).

Course Structure

In this Master Seminar (Hauptseminar), students select one scientific topic from the list provided by course organizers. The students should read the proposed sample papers by the tutors, find the topic-related articles, summarize and compare them in their presentation and blogpost:

Presentation: The selected paper is presented to the other participants (Maximum 25 minutes presentation, 10 minutes questions). You can use the CAMP templates for PowerPoint TUM-Template.pptx.
Blog Post: A blog post of 3000-3500 words excluding references, should be submitted before the deadline. The blog post must include all references used and must be written completely in your own words. Copy and paste will not be tolerated.
Attendance: Participants have to participate actively in all seminar sessions. Each presentation is followed by a discussion, and everyone is encouraged to actively participate.

Submission Deadline: You have to submit the blog post one week before the first presentation date and can modify it a bit until the last session of the course.

Schedule

Date

Session: Topics

Students

11/12/2025

Surgical Video Understanding

Understanding Neurosurgery (Pituitary Surgery)

Tissue biomechanics modelling in Deep Learning

Yugantseva Polina

Scheipl, Julian

Barclay, Patrick

18/12/2025

Medical / Biomedical AI Agent

Any-to-Any Multimodal LLM

Abdelmoula Adem

Quan, Zhiheng

08/01/2026

Adaptations of generalist segmentation models to medical scenes

Segmentation of delicate tissue layers in OCT images of the posterior eye

OCT-based retina representation, but where to put it?

Oueslati Talel

Dinh Trung Che

Vladimir Turov

15/01/2026

implicit neural representation combined with diffusion models

Audio Reconstruction/Generation from Video Sequences

Gaussian Splatting in minimally invasive surgery

Laarmann, Felix

Can, Hüseyin Bartu

Amine Ben Dhiab

List of Topics and Materials

The proposed papers for each topic in this course are usually selected from the following venues/publications:

CVPR: Conference on Computer Vision and Pattern Recognition
ICLR: International Conference on Learning Representations
NeurIPS: Neural Information Processing Systems

TPAMI: IEEE Transactions on Pattern Analysis and Machine Intelligence

TMI: IEEE Transaction on Medical Imaging
JBHI: IEEE Journal of Biomedical and Health Informatics
MedIA: Medical Image Analysis (Elsevier)

MICCAI: Medical Image Computing and Computer-Assisted Intervention.
BMVC: British Machine Vision Conference
MIDL: Medical Imaging with Deep Learning

List of topics

No	Topic	Sample Papers	Journal/ Conference	Tutor	Student	Link
1	Medical / Biomedical AI Agent	CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis	NeurIPS 2025	Han Li	Abdelmoula Adem	https://arxiv.org/abs/2505.20510
		MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making	NeurIPS 2024			https://arxiv.org/abs/2404.15155
		Enhancing diagnostic capability with multi-agents conversational (MAC)	NPJ Digital Medicine			https://www.nature.com/articles/s41746-025-01550-0
2	Any-to-Any Multimodal LLM	NExT-GPT: Any-to-Any Multimodal LLM	ICML 2024	Han Li	Quan, Zhiheng	https://arxiv.org/abs/2309.05519
		Training Transitive and Commutative Multimodal (LoReTTa)	NeurIPS 2023			https://arxiv.org/abs/2305.14243
		Meta-Transformer: A Unified Framework for Multimodal Learning	arxiv			https://arxiv.org/abs/2307.10802
3	Audio Reconstruction/Generation from Video Sequences	Speech Audio Generation from Dynamic MRI via a Knowledge Enhanced Conditional Variational Autoencoder	MICCAI 2025	Luis David Reyes Vargas	Can, Hüseyin Bartu	https://papers.miccai.org/miccai-2025/paper/2374_paper.pdf
		MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis	CVPR 2025			https://arxiv.org/pdf/2412.15322
		FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds	arxiv			https://arxiv.org/pdf/2407.01494
4	Tissue biomechanics modelling in Deep Learning	Real-time simulation of viscoelastic tissue behavior with physics-guided deep learning	arxiv	Luis David Reyes Vargas	Barclay, Patrick	https://arxiv.org/pdf/2301.04614
		Data-Driven Tissue- and Subject-Specific Elastic Regularization for Medical Image Registration	MICCAI 2024			https://papers.miccai.org/miccai-2024/paper/3303_paper.pdf
		Review of Machine Learning Techniques in Soft Tissue Biomechanics and Biomaterials	Review article			https://link.springer.com/article/10.1007/s13239-024-00737-y
5	Gaussian Splatting in minimally invasive surgery	T2GS: Comprehensive Reconstruction of Dynamic Surgical Scenes with Gaussian Splatting	MICCAI 2025	Hannes Firzlaff	Amine Ben Dhiab	https://papers.miccai.org/miccai-2025/paper/5019_paper.pdf
		SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting	MICCAI 2025			https://papers.miccai.org/miccai-2025/paper/1324_paper.pdf
		EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting	MICCAI 2024			https://papers.miccai.org/miccai-2024/paper/0791_paper.pdf
6	OCT-based retina representation, but where to put it?	Gaussian Primitive Optimized Deformable Retinal Image Registration	MICCAI 2025	Hannes Firzlaff	Vladimir Turov	https://papers.miccai.org/miccai-2025/paper/3875_paper.pdf
		Retinal OCT Image Registration: Methods and Applications	IEEE RBME Vol. 16 2021			https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9531445
		RetinaRegNet: A zero-shot approach for retinal image registration	Comput. Biol. Med 2025			https://www.sciencedirect.com/science/article/pii/S001048252401730X
7	Adaptations of generalist segmentation models to medical scenes	UM-SAM: Unsupervised Medical Image Segmentation using Knowledge Distillation from Segment Anything Model	MICCAI 2025	Diego Biagini	Oueslati Talel	https://papers.miccai.org/miccai-2025/paper/2296_paper.pdf
		ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking	MICCAI 2025			https://papers.miccai.org/miccai-2025/paper/0617_paper.pdf
		SR-SAM: Subspace Regularization for Domain Generalization of Segment Anything Model	MICCAI 2025			https://papers.miccai.org/miccai-2025/paper/2210_paper.pdf
8	Segmentation of delicate tissue layers in OCT images of the posterior eye	Automated retinal boundary segmentation of optical coherence tomography images using an improved Canny operator	Nature Sci Rep 2022	Hannes Firzlaff	Dinh Trung Che	https://www.nature.com/articles/s41598-022-05550-y#citeas
		Retinal OCT image segmentation with deep learning: A review of advances, datasets, and evaluation metrics	CMIG 2025			https://www.sciencedirect.com/science/article/abs/pii/S0895611125000485
		Weakly supervised segmentation of retinal layers on OCT images with AMD using uncertainty prototype and boundary regression	MedIA 2025			https://www.sciencedirect.com/science/article/abs/pii/S1361841525001197
9	Understanding Neurosurgery (Pituitary Surgery)	F2PASeg: Feature Fusion for Pituitary Anatomy Segmentation in Endoscopic Surgery	MICCAI 2025	Maximilian Fehrentz	Scheipl, Julian	https://papers.miccai.org/miccai-2025/paper/1527_paper.pdf
		Automatic summarization of endoscopic skull base surgical videos through object detection and hidden Markov modeling	CMIG 2023			https://www.sciencedirect.com/science/article/pii/S0895611123000666
		SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery	arxiv			https://arxiv.org/pdf/2503.09474
10	Surgical Video Understanding	Future Slot Prediction for Unsupervised Object Discovery in Surgical Video	MICCAI 2025	Maximilian Fehrentz	Yugantseva Polina	https://papers.miccai.org/miccai-2025/paper/4725_paper.pdf
		General surgery vision transformer: A video pre-trained foundation model for general surgery	arxiv			https://arxiv.org/pdf/2403.05949
		LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning	arxiv			https://arxiv.org/pdf/2408.07981
11	implicit neural representation combined with diffusion models	Highly accelerated MRI via implicit neural representation guided posterior sampling of diffusion models	MedIA 25	Mohammad Farid Azampour	Laarmann, Felix	https://www.sciencedirect.com/science/article/pii/S1361841524003232?casa_token=h74qJgOHDwYAAAAA:koHfi_l4bAZVBBE7hmP-WmtUEx-jqWuQBYmtS8XdqqhNgahHyXQ1I9cpnPCppDvf_KjxBYhYeg
		Hyperdiffusion: Generating implicit neural fields with weight-space diffusion	Iccv 23			http://openaccess.thecvf.com/content/ICCV2023/papers/Erkoc_HyperDiffusion_Generating_Implicit_Neural_Fields_with_Weight-Space_Diffusion_ICCV_2023_paper.pdf
		Boosting 3D Liver Shape Datasets with Diffusion Models and Implicit Neural Representations	Miccai 25			https://arxiv.org/pdf/2504.19402

Literature and Helpful Links

A lot of scientific publications can be found online.

The following list may help you to find some further information on your particular topic:

Some publishers:

Libraries (online and offline):

http://rzblx1.uni-regensburg.de/ezeit/ (Elektronische Zeitschriften Bibliothek)
Verbundkatalog des Bibliotheksverbundes Bayern (BVB)
Computer ORG
http://www.ub.tum.de/ (TUM Library)
- To get access onto the electronic library, see http://www.ub.tum.de/medien/ejournals/readme.html
- "proxy.biblio.tu-muenchen.de" mit Port 8080 (nur fuer http). Damit klappen zumindest portal.acm.org und computer.org meistens
Various proceedings of conferences in our AR-Lab, 03.13.036 (These proceedings are not for lending!)

Some further hints for working with references:

JabRef is a Java program for comfortable working with Bibtex literature databases. Handy feature: if you know the PubMed ID for an article, JabRef can import data from there (via "Web Search/Medline").
Mendeley is a cross-platform program for organising your references.

If you find useful resources that are not already listed here, please tell us, so we can add them for others. Thanks.

Seitenhierarchie

DLMA Winter 2025/26