Instructors: Prof. Dr. Nassir Navab, Dr. Shahrooz Faghihroohi, Dr. Azade Farshad, Yousef Yeganeh
Time: TBA
Registration
- Registration must be done through TUM Matching Platform (please pay attention to the Deadlines)
- In order to increase your priority, please also apply via our own Registration system.
- The maximum number of participants: 14
Announcements
- The presentation and blogpost guidelines are available here: TBA
- The DLMA Introduction slides can be found here: DLMA-PreliminaryMeeting-WS25-26.pdf
Introduction
- Deep Learning is growing tremendously in Computer Vision and Medical Imaging as well. Highly impacted journals in the medical imaging community, i.e. IEEE Transaction on Medical Imaging, recently published their special edition on Deep Learning [1]. The Seminar will propose a list of recent scientific articles related to the main current research topics in deep learning for Medical Applications, together with some interesting papers from other communities (CVPR, NeurIPS, ICCV, ICLR, ICML, ...).
Course Structure
In this Master Seminar (Hauptseminar), students select one scientific topic from the list provided by course organizers. The students should read the proposed sample papers by the tutors, find the topic-related articles, summarize and compare them in their presentation and blogpost:
- Presentation: The selected paper is presented to the other participants (Maximum 25 minutes presentation, 10 minutes questions). You can use the CAMP templates for PowerPoint TUM-Template.pptx.
- Blog Post: A blog post of 3000-3500 words excluding references, should be submitted before the deadline. The blog post must include all references used and must be written completely in your own words. Copy and paste will not be tolerated.
- Attendance: Participants have to participate actively in all seminar sessions. Each presentation is followed by a discussion, and everyone is encouraged to actively participate.
Submission Deadline: You have to submit the blog post one week before the first presentation date and can modify it a bit until the last session of the course.
Schedule
TBA
List of Topics and Material
The proposed papers for each topic in this course are usually selected from the following venues/publications:
CVPR: Conference on Computer Vision and Pattern Recognition
ICLR: International Conference on Learning Representations
NeurIPS: Neural Information Processing Systems
TPAMI: IEEE Transactions on Pattern Analysis and Machine Intelligence
TMI: IEEE Transaction on Medical Imaging
JBHI: IEEE Journal of Biomedical and Health Informatics
MedIA: Medical Image Analysis (Elsevier)
MICCAI: Medical Image Computing and Computer-Assisted Intervention.
BMVC: British Machine Vision Conference
MIDL: Medical Imaging with Deep Learning
List of topics
| No | Topic | Sample Papers | Journal/ Conference | Tutor | Student | Link |
|---|---|---|---|---|---|---|
| 1 | Medical / Biomedical AI Agent | CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis | NeurIPS 2025 | https://arxiv.org/abs/2505.20510 | ||
| MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making | NeurIPS 2024 | https://arxiv.org/abs/2404.15155 | ||||
| Enhancing diagnostic capability with multi-agents conversational (MAC) | NPJ Digital Medicine | https://www.nature.com/articles/s41746-025-01550-0 | ||||
| 2 | Any-to-Any Multimodal LLM | NExT-GPT: Any-to-Any Multimodal LLM | ICML 2024 | https://arxiv.org/abs/2309.05519 | ||
| Training Transitive and Commutative Multimodal (LoReTTa) | NeurIPS 2023 | https://arxiv.org/abs/2305.14243 | ||||
| Meta-Transformer: A Unified Framework for Multimodal Learning | arxiv | https://arxiv.org/abs/2307.10802 | ||||
| 3 | Audio Reconstruction/Generation from Video Sequences | Speech Audio Generation from Dynamic MRI via a Knowledge Enhanced Conditional Variational Autoencoder | MICCAI 2025 | |||
| MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | CVPR 2025 | https://arxiv.org/pdf/2412.15322 | ||||
| FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds | arxiv | https://arxiv.org/pdf/2407.01494 | ||||
| 4 | Tissue biomechanics modelling in Deep Learning | Real-time simulation of viscoelastic tissue behavior with physics-guided deep learning | arxiv | https://arxiv.org/pdf/2301.04614 | ||
| Data-Driven Tissue- and Subject-Specific Elastic Regularization for Medical Image Registration | MICCAI 2024 | |||||
| Review of Machine Learning Techniques in Soft Tissue Biomechanics and Biomaterials | Review article | https://link.springer.com/article/10.1007/s13239-024-00737-y | ||||
| 5 | Gaussian Splatting in minimally invasive surgery | T2GS: Comprehensive Reconstruction of Dynamic Surgical Scenes with Gaussian Splatting | MICCAI 2025 | |||
| SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting | MICCAI 2025 | |||||
| EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting | MICCAI 2024 | |||||
| 6 | OCT-based retina representation, but where to put it? | Gaussian Primitive Optimized Deformable Retinal Image Registration | MICCAI 2025 | |||
| Retinal OCT Image Registration: Methods and Applications | IEEE RBME Vol. 16 2021 | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9531445 | ||||
| RetinaRegNet: A zero-shot approach for retinal image registration | Comput. Biol. Med 2025 | https://www.sciencedirect.com/science/article/pii/S001048252401730X | ||||
| 7 | Adaptations of generalist segmentation models to medical scenes | UM-SAM: Unsupervised Medical Image Segmentation using Knowledge Distillation from Segment Anything Model | MICCAI 2025 | Diego Biagini | ||
| ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking | MICCAI 2025 | |||||
| SR-SAM: Subspace Regularization for Domain Generalization of Segment Anything Model | MICCAI 2025 | |||||
| 8 | Segmentation of delicate tissue layers in OCT images of the posterior eye | Automated retinal boundary segmentation of optical coherence tomography images using an improved Canny operator | Nature Sci Rep 2022 | |||
| Retinal OCT image segmentation with deep learning: A review of advances, datasets, and evaluation metrics | CMIG 2025 | https://www.sciencedirect.com/science/article/abs/pii/S0895611125000485 | ||||
| Weakly supervised segmentation of retinal layers on OCT images with AMD using uncertainty prototype and boundary regression | MedIA 2025 | https://www.sciencedirect.com/science/article/abs/pii/S1361841525001197 | ||||
| 9 | Understanding Neurosurgery (Pituitary Surgery) | F2PASeg: Feature Fusion for Pituitary Anatomy Segmentation in Endoscopic Surgery | MICCAI 2025 | |||
| Automatic summarization of endoscopic skull base surgical videos through object detection and hidden Markov modeling | CMIG 2023 | https://www.sciencedirect.com/science/article/pii/S0895611123000666 | ||||
| SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery | arxiv | https://arxiv.org/pdf/2503.09474 | ||||
| 10 | Surgical Video Understanding | Future Slot Prediction for Unsupervised Object Discovery in Surgical Video | MICCAI 2025 | |||
| General surgery vision transformer: A video pre-trained foundation model for general surgery | arxiv | https://arxiv.org/pdf/2403.05949 | ||||
| LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning | arxiv | https://arxiv.org/pdf/2408.07981 | ||||
| 11 | implicit neural representation combined with diffusion models | Highly accelerated MRI via implicit neural representation guided posterior sampling of diffusion models | MedIA 25 | Mohammad Farid Azampour | ||
| Hyperdiffusion: Generating implicit neural fields with weight-space diffusion | Iccv 23 | |||||
| Boosting 3D Liver Shape Datasets with Diffusion Models and Implicit Neural Representations | Miccai 25 | https://arxiv.org/pdf/2504.19402 |
Literature and Helpful Links
A lot of scientific publications can be found online.
The following list may help you to find some further information on your particular topic:
- Microsoft Academic Search
- Google Scholar
- CiteSeer
- CiteULike
- Collection of Computer Science Bibliographies
Some publishers:
- ScienceDirect (Elsevier Journals)
- IEEE Journals
- ACM Digital Library
Libraries (online and offline):
- http://rzblx1.uni-regensburg.de/ezeit/ (Elektronische Zeitschriften Bibliothek)
- Verbundkatalog des Bibliotheksverbundes Bayern (BVB)
- Computer ORG
- http://www.ub.tum.de/ (TUM Library)
- To get access onto the electronic library, see http://www.ub.tum.de/medien/ejournals/readme.html
- "proxy.biblio.tu-muenchen.de" mit Port 8080 (nur fuer http). Damit klappen zumindest portal.acm.org und computer.org meistens
- Various proceedings of conferences in our AR-Lab, 03.13.036 (These proceedings are not for lending!)
Some further hints for working with references:
- JabRef is a Java program for comfortable working with Bibtex literature databases. Handy feature: if you know the PubMed ID for an article, JabRef can import data from there (via "Web Search/Medline").
- Mendeley is a cross-platform program for organising your references.
If you find useful resources that are not already listed here, please tell us, so we can add them for others. Thanks.
