Basic Information
- The seminar consists of 4 topical blocks spread over 4 block sessions.
- Each block consists of 4 presentations, for a total of 16 presentations.
- Since we have 8 participants, that means every participant presents twice.
- Each participant has to pick one topic from block 1/2, and one topic from block 3/4.
- Please email your topic preferences to lasser@cit.tum.de until August 19, 2025
- pick your top-3 preferences from the topics from Block 1+2
- and your top-3 preferences from the topics from Block 3+4
- and send the 3+3 preferences by email to lasser@cit.tum.de
Table of Contents
General materials
Please note: the following materials and papers listed for each topic are a selection only. You are encouraged to check out further resources, such as blog posts or other educational websites.
In order to play around with diffusion models, have a look at HuggingFace Diffusers.
Block 1: Introduction to Basics of Diffusion Techniques (October 8, 2025)
Topic 1-1: Diffusion models - how it all started (Pablo)
This topic is the first introductory session of the seminar. It presents the very first seminal papers on diffusion models.
- “Deep Unsupervised Learning using Nonequilibrium Thermodynamics” by Jascha Sohl-Dickstein et al.
- https://arxiv.org/abs/1503.03585
- “Denoising Diffusion Probabilistic Models” (DDPM) by Jonathan Ho, Ajay Jain, and Pieter Abbeel (2020)
- https://arxiv.org/abs/2006.11239
- “Score-Based Generative Modeling through Stochastic Differential Equations” by Yang Song et al. (2021)
- https://arxiv.org/abs/2011.13456
Topic 1-2: Sampling and acceleration (Borys)
This topic presents sampling techniques and noise schedules that made diffusion techniques practical to use.
- “Denoising Diffusion Implicit Models” (DDIM) by Jiaming Song, Chenlin Meng, and Stefano Ermon (2020)
- https://arxiv.org/abs/2010.02502
- “Improved Denoising Diffusion Probabilistic Models” by Alex Nichol and Prafulla Dhariwal (2021)
- https://arxiv.org/abs/2102.09672
Topic 1-3: Conditioning and control (Julian)
This topic presents conditioning and classifier-free guidance for diffusion models.
- “GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models” by OpenAI (2021)
- https://arxiv.org/abs/2112.10741
- “Hierarchical Text-Conditional Image Generation with CLIP Latents” (DALL-E 2) by OpenAI (2022)
- https://arxiv.org/abs/2204.06125
Topic 1-4: Key architectural innovations (Mykyta)
This topic presents the seminal papers on image generation using diffusion models, first with Unets as a backbone, then with diffusion transformers as backbone.
- “High-Resolution Image Synthesis with Latent Diffusion Models” (Stable Diffusion) by Robin Rombach et al. (2022)
- https://arxiv.org/abs/2112.10752
- “Scalable Diffusion Models with Transformers” (DiT) by William Peebles and Saining Xie (2022)
- https://arxiv.org/abs/2212.09748
Block 2: Generating Images and Videos with Diffusion (October 9, 2025)
Topic 2-1: Generating images with Stable Diffusion (Fabio)
This topic presents some of the image generation models and techniques from the model family called Stable Diffusion.
- “High-Resolution Image Synthesis with Latent Diffusion Models” by Robin Rombach et al. (2022)
- https://arxiv.org/abs/2112.10752
- “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis” by Dustin Podell et al. (2023)
- https://arxiv.org/abs/2307.01952
- “Stable Diffusion 3: Research Paper” by Stability AI (2024)
- https://arxiv.org/abs/2403.03206
Topic 2-2: ControlNets (Eslam)
This topic presents control nets, a technique to control the generations of diffusion models.
- “Adding Conditional Control to Text-to-Image Diffusion Models” by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala (2023)
- https://arxiv.org/abs/2302.05543
- “Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models” (2023)
- https://openreview.net/forum?id=VgQw8zXrH8
- "Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models" (2024)
- https://arxiv.org/abs/2411.07126
Topic 2-3: FLUX model family (Tizian)
This topic presents some of the image generation models and techniques from the model family called FLUX.
- “FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space” (2025)
- https://arxiv.org/abs/2506.15742
- https://github.com/black-forest-labs/flux
Topic 2-4: Video generation (Chen)
This topic presents how diffusion models are used for video generation.
- “Video Diffusion Models” by Jonathan Ho et al. (2022)
- https://arxiv.org/abs/2204.03458
- “Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models” by Andreas Blattmann et al. (2023)
- https://arxiv.org/abs/2304.08818
- "Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control" by Hassan Abu Alhaija et al. (2025)
- https://arxiv.org/abs/2503.14492
Block 3: Optimization Techniques for Diffusion Inference (December 10, 2025)
Topic 3-1: Quantization (Mykyta)
This topic presents an overview of quantization techniques for inference of diffusion models.
- “Q-Diffusion: Quantizing Diffusion Models” by Xiuyu Li et al. (2023)
- https://arxiv.org/abs/2302.04304
- “Diffusion Model Quantization: A Review” (2025)
- https://arxiv.org/abs/2505.05215
- “BitsFusion: 1.99 bits Weight Quantization of Diffusion Model” by Yang Sui et al. (2024)
- https://arxiv.org/abs/2406.04333
Topic 3-2: Distillation (Fabio)
This topic presents the technique of distillation to accelerate inference of diffusion models.
- “Progressive Distillation for Fast Sampling of Diffusion Models” by Tim Salimans and Jonathan Ho (2022)
- https://arxiv.org/abs/2202.00512
- “Consistency Models” by Yang Song et al. (2023)
- https://arxiv.org/abs/2303.01469
- “Simplifying, Stabilizing, and Scaling Continuous-time Consistency Models” by OpenAI (2024)
- https://arxiv.org/abs/2410.11081
Topic 3-3: Caching (Pablo)
This topic presents various caching techniques to accelerate the inference of diffusion models.
- "Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model" by Feng Liu et al. (2025)
- https://arxiv.org/abs/2411.19108
- "Cache Me if You Can: Accelerating Diffusion Models through Block Caching" by Felix Wimbauer et al. (2024)
- https://openaccess.thecvf.com/content/CVPR2024/papers/Wimbauer_Cache_Me_if_You_Can_Accelerating_Diffusion_Models_through_Block_CVPR_2024_paper.pdf
- “CacheQuant: Comprehensively Accelerated Diffusion Models” (2025)
- https://openaccess.thecvf.com/content/CVPR2025/papers/Liu_CacheQuant_Comprehensively_Accelerated_Diffusion_Models_CVPR_2025_paper.pdf
Topic 3-4: Low-level optimizations (Tizian)
This topic presents various low-level optimization techniques to accelerate diffusion model inference.
Please note: this topic is special in that there are few, if any, papers that concern themselves with these low-level optimization techniques. Hence the sources here are blog posts and websites, to serve as starting points. Some of the sources will also reference the techniques of the previous topics (quantization, distillation, caching) - please ensure with the presenters of those topics that you do not have too much overlap.
- https://developer.nvidia.com/blog/optimizing-transformer-based-diffusion-models-for-video-generation-with-nvidia-tensorrt/
- https://developer.nvidia.com/blog/tensorrt-accelerates-stable-diffusion-nearly-2x-faster-with-8-bit-post-training-quantization/
- https://www.felixsanz.dev/articles/ultimate-guide-to-optimizing-stable-diffusion-xl
- https://www.vrushankdes.ai/diffusion-inference-optimization
Block 4: Applications of Diffusion Techniques (December 11, 2025)
Topic 4-1: Medical imaging (Julian)
This topic presents a few pointers to where diffusion techniques are applied in the field of medical imaging.
- "MAISI: Medical AI for Synthetic Imaging" by Pengfei Guo et al. (2024)
- https://arxiv.org/abs/2409.11169
- "Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model" by Pengfei Guo et al. (2025)
- https://arxiv.org/abs/2505.04522
- Awesome Diffusion Models in Medical Imaging (note: this is just an overview - feel free to pick some of the references, or not - no need (or chance) to cover everything!)
- https://arxiv.org/abs/2211.07804
- https://github.com/amirhossein-kz/Awesome-Diffusion-Models-in-Medical-Imaging?tab=readme-ov-file
Topic 4-2: Structure prediction (Borys)
This topic presents how diffusion techniques are used for structure prediction in biology.
- "DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking" by Gabriele Corso (2022)
- https://arxiv.org/abs/2210.01776
- "Accurate structure prediction of biomolecular interactions with AlphaFold 3" by Josh Abramson et al. (2024)
- https://www.nature.com/articles/s41586-024-07487-w
- "Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction" by Saro Passaro (2025)
- https://www.biorxiv.org/content/10.1101/2025.06.14.659707v1
Topic 4-3: Physical AI (Chen)
This topic presents how diffusion techniques matter for physical AI.
- "GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control" by Xuanchi Ren (2025)
- https://research.nvidia.com/labs/toronto-ai/GEN3C/paper.pdf
- "Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control" by Hassan Abu Alhaija et al. (2025)
- https://arxiv.org/abs/2503.14492
- "Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models" by Jay Zhangjie Wu (2025)
- https://arxiv.org/abs/2503.01774
Topic 4-4: Diffusion-based large language models (Eslam)
This topic presents how diffusion techniques can also work for large language models.
- “Large Language Diffusion Models (LLaDA)” by Shen Nie et al. (2025)
- https://arxiv.org/abs/2502.09992
- “Mercury: Ultra-Fast Language Models Based on Diffusion” by Inception Labs (2025)
- https://arxiv.org/abs/2506.17298