Project Overview

Project Code: CIT 19

Project name:

Vision-Language Modeling in Medical Imaging

TUM Department:

CIT - Informatics

TUM Chair / Institute:

TUINI32 Informatics 32 - Chair of Computational Imaging and AI in Medicine (Prof. Schnabel)

Research area:

Machine Learning; Medical Imaging

Student background:

Computer Science

Further disciplines:

Participation also possible online only:

Planned project location:

Garching, IAS

Project Supervisor - Contact Details


Title:

Given name:

Cosmin-Ionut

Family name:

Bercea

E-mail:

cosmin.bercea@tum.de

Phone:

089 289 10686

Additional Project Supervisor - Contact Details


Title:

Given name:

Family name:

E-mail:

Phone:

Additional Project Supervisor - Contact Details


Title:

Given name:

Family name:

E-mail:

Phone:

Project Description


Project description:

Project Title: Multi-Modal Learning for Enhanced Pre-Training of Medical Imaging Models Using Text and Image Fusion

Project Description:

The proposed research project focuses on developing and evaluating a novel multi-modal learning framework that integrates textual and visual data to pre-train deep learning models for medical imaging tasks. By leveraging the complementary nature of image and text features, the project aims to create more robust and accurate models capable of improving diagnostic accuracy and reducing the need for extensive labeled data.

Background:

Medical imaging, including modalities such as X-rays, MRIs, and CT scans, plays a critical role in modern healthcare, aiding in the diagnosis and treatment of various conditions. However, training highly effective models for medical image analysis typically requires large amounts of labeled data, which is often scarce or expensive to obtain. In contrast, a significant amount of unstructured textual data, such as radiology reports, is readily available. This project seeks to harness the potential of these textual descriptions to complement visual information, thereby enhancing the pre-training process of medical imaging models.

Objectives:

1. Create a framework that effectively integrates image and text features for the pre-training of deep learning models in the medical imaging domain.
2. Investigate various machine learning techniques for fusing image and text data to enhance model performance and generalization capabilities.
3. Assess the impact of multi-modal pre-training on different medical imaging tasks, focusing on improving accuracy, robustness, and data efficiency.
4. Experiment with different pre-training and fine-tuning strategies to maximize the benefits of the multi-modal approach across diverse imaging tasks.

Working hours per week planned:

35

Prerequisites


Required study level minimum (at time of TUM PREP project start):

2 years of bachelor studies completed

Subject related:

machine learning; natural language processing; computer vision

Other:

  • Keine Stichwörter