`

1.1. Time and Place

The defense session in June will take place on July 26, 2024, Friday at 10:00 AM. You can join the Zoom meeting room using the information below. 

Join Zoom Meeting

https://tum-conf.zoom-x.de/j/61786165009?pwd=Zkg0R2VvbWJHd28vbmJCa2RHdkZEQT09

Passcode: i6defense

1.2. Schedule

10:00 - 10:20 Karl Kompatscher (BA Thesis)

Title: Internal Real-time Communication in the context of Vehicle Systems

Advisor: Sven Kirchner

Keywords: Real-Time Communication, Automotive

Abstract: In modern vehicles, internal real-time communication systems are crucial for seamless interaction among numerous interconnected components like sensors, Electronic Control Units (ECUs), and Internet of Things (IoT) devices. These systems manage heavy data loads from Advanced Driver Assistance Systems (ADAS) and LiDARs (Light Detection And Ranging), requiring high throughput and low latency for reliable, safe operation under ISO 26262 standards. This thesis explores the centralized communication protocol, Data Distribution Service (DDS), meeting these stringent requirements, designed to be future-proof, compatible with existing standards, and flexible for the dynamic automotive market.

10:20 - 10:45 Qiwen Xu (MA Thesis)

Title: CLIP-Enhanced 3D Object Detection

Advisor: Xingcheng Zhou

Keywords: Autonomous Driving, Contrastive Learning, 2D Object Detection, 3D Object Detection, Pretrained Vision Language Model

Abstract: Object detection is a cornerstone of computer vision, but current approaches face significant challenges. Relying primarily on visual features, these methods often struggle to differentiate between visually similar objects or those partially occluded, highlighting a critical lack of contextual understanding. Addressing these limitations by manually adding semantic and contextual information is prohibitively labor-intensive, creating a pressing need for more efficient solutions. Therefore, we propose a CLIP-Enhanced method for object detection that leverages the rich semantic features and zero-shot capabilities of CLIP \cite{clip} to acquire pseudo-segmentation information for any class at a low cost. Our approach seamlessly combines CLIP's semantic understanding with prototypical contrastive learning techniques, enhancing object differentiation and detection accuracy, particularly for confusing or partially obscured objects. By generating prototypes for object categories using 2D pseudo-semantic features from CLIP, our method incorporates vital contextual information without requiring extensive manual annotation. Crucially, our solution is designed as a plug-and-play enhancement, allowing for easy integration with existing 2D and 3D detection frameworks. Extensive experiments across multiple datasets and base detection frameworks demonstrate the effectiveness and versatility of our approach. We achieve significant improvements in both 2D and 3D detection tasks, including a 1.2 mAP increase on the Cityscapes dataset and a 1.4 mAP boost on the KITTI dataset. These results underscore the generalizability of our method across diverse urban scenarios and detection frameworks. Our CLIP-Enhanced method bridges the gap between visual and semantic features, advancing the state-of-the-art in object detection. By addressing key limitations of current approaches, we open new avenues for more accurate, context-aware, and efficient computer vision systems.

10:45 - 11:10 Deyu Fu (MA Thesis)

Title: Domain Adaptation for Road-Side Vision-Based 3D Object Detection

Advisor: Xingcheng Zhou

Keywords: Roadside Perception, Domain Adaptation, Weak Supervision, 2D Object Detection, 3D Object Detection, Autonomous Driving, V2I 

Abstract: Existing roadside perception systems are significantly limited by the lack of large-scale, publicly available, high-quality 3D datasets, which are crucial for developing robust and accurate 3D object detection models. To address this challenge, we explore the use of cost-effective, large-scale synthetic datasets and weak supervision to enhance the performance of roadside monocular 3D detection. In this study, we introduce the TUMTraf Synthetic Dataset, a novel dataset offering a diverse and extensive collection of high-quality 3D annotations. This dataset serves as an essential supplement to the scarce real-world datasets currently available, providing a more comprehensive training resource. Additionally, we present WARM-3D, a concise yet highly effective framework designed to facilitate Sim2Real domain adaptation for roadside monocular 3D detection. WARM-3D leverages inexpensive synthetic datasets and 2D labels derived from an off-the-shelf 2D detector to provide weak supervision. Our method demonstrates significant performance enhancements on the TUMTraf Intersection Dataset. Specifically, WARM-3D achieves a 12.40% increase in mean Average Precision for monocular 3D object detection mAP3D over the baseline, attaining 39.17 mAP3D with only pseudo-2D supervision. Moreover, when employing 2D ground truth as weak labels, WARM-3D's 3D performance approaches the Oracle baseline, illustrating its effectiveness. Furthermore, WARM-3D significantly improves the ability of 3D detectors to recognize unseen samples across various real-world environments, highlighting its potential for practical applications. This robustness to out-of-distribution scenarios underscores the framework's utility in real-world deployments.

11:10 - 11:35 Baran Ekin Özdemir (MA Thesis)

Title: Knowledge Distilled Traffic Environment Understanding

Advisor: Xingcheng Zhou

Keywords:  Autonomous Driving, Vision Language Models, BEV Perception, Natural Language Processing, Traffic Environment Understanding

Abstract: Vision-Language Models (VLMs) have recently demonstrated impressive capabilities in challenging tasks that require comprehensive understanding and reasoning across both modalities, such as image captioning, text-guided image generation, and visual question-answering leading to the widespread application of VLMs within computer vision. This work explores the utilization of VLMs for traffic environment understanding, where integrating linguistic and visual features can enhance generalization to unseen scenarios and improve the interpretability of autonomous driving systems. To achieve these goals, an efficient, lightweight VLM employing spatio-temporal Bird's Eye View (BEV) maps as visual features is developed, and various methods to exploit refined vision knowledge are studied. The proposed model undertakes a comprehensive Visual Question Answering task tailored for autonomous driving, achieving promising results on the DriveLM dataset with 54.9% accuracy on multiple-choice questions, 57.1% in GPT evaluation, 34.5% in matching, 61.7% in BLEU, and 68.9% in ROUGE-L. This study demonstrates the ability to answer questions by effectively utilizing vision and language features within a lightweight framework and showcases the potential of VLMs for traffic environment understanding.

11:35 - 12:00 Rabia Varol (MA Thesis)

Title: Leveraging Large Language Models in Artificial Intelligence Planning for Robotics

Advisor: Fengjunjie Pan

Keywords: LLMs, AI Planning, Planning Domain Definition Language (PDDL)

Abstract: Recent advances in large language models (LLMs) have demonstrated outstanding performance in natural language processing (NLP) tasks, such as machine translation and code generation. This has sparked excitement about their potential applications across various fields, including artificial intelligence (AI) planning for robotics. Despite their success, recent studies have revealed that LLMs alone cannot reliably generate executable plans from high-level natural language descriptions for robotic tasks. On the other hand, classical AI planners can effectively and reliably identify feasible plans when provided with appropriately formatted problems. This thesis proposes a hybrid approach that leverages the strengths of both LLMs and classical AI planners, utilizing LLMs as an intuitive language-based interface between non-expert users and robots and utilizing classical AI planners to find feasible plans in industrial automation scenarios. Our proposed pipeline involves taking a natural language description of a planning problem from a user, translating this description into Planning Domain Definition Language (PDDL) employing an LLM, using an AI planner to generate a plan, translating the PDDL plan back into natural language using the LLM to inform the user of the action steps, and finally executing the plan on a robot. We define three domains with varying complexities and use three prompting strategies to evaluate the capabilities of LLMs in generating PDDL files. Our experiments across diverse problem scenarios show that the proposed method consistently provides executable plans that fulfill user-specified goals. This demonstrates that our method enables non-expert users to successfully command robots using natural language, thereby significantly advancing the accessibility and practicality of automation tasks.

12:00 - 12:25 Chaima Ghaddab (MA Thesis)

Title: Supervised 3D Perception on Roadside LiDARs Under Different Weather Situations

Advisor: Walter Zimmer  

Keywords: Autonomous Driving, Deep Learning, LiDAR 3D Perception, 3D Object Detection, Adverse Weather, Data Augmentation, Labeling, Datasets, PointPillars, TensorRT, TUM Traffic Dataset, AUTOTech.agil 

Abstract: LiDAR technology is necessary for applications such as autonomous driving. However, adverse weather conditions like rain, snow, and fog significantly impact the quality and reliability of state-of-the-art LiDAR 3D detectors. This thesis investigates the enhancement of LiDAR-based 3D object detection under such conditions by employing data augmentation and point cloud concatenation techniques to improve the performance of the baseline PointPillars model used in the AUTOtech.agil project. Realistic rain and snow effects are introduced into the point clouds using the LISA library, enhancing the model's resilience to weather-induced data distortions. Combined point cloud concatenation and point cloud filtering techniques are further used to improve LiDAR performance by increasing the density of data points. These methods fill gaps in the data, ensuring detailed environmental models even in adverse weather conditions. The experimental results demonstrate significant improvements in the PointPillars model's accuracy and robustness. While the point cloud concatenation method only increased the mean AP of the model by ~1%, data augmentation resulted in an average precision (AP) increase from 63.01% to 66.49%. We also show that the augmentation improves the model's resilience to unseen adverse weather, such as fog. By enhancing LiDAR's resilience to environmental factors, this research contributes to safer and more reliable autonomous systems capable of operating in various conditions. The code for our implementation will be open-sourced.

The I6 defense day is held monthly, usually on the last Friday of each month. The standard formats of talks are:

TypeTime of PresentationTime for questions & answers
Initial topic presentation5 min5 min
BA thesis15 min5 min
Guided Research10 min5 min
Interdisciplinary Project15 min5 min
MA thesis20 min5 min

More information on preparing presentations can be found in the Thesis Submission Guidelines.

  • Keine Stichwörter