Author:

Paula Krumm
Supervisor:Prof. Gudrun Klinker
Advisor:Christian Eichhorn (@christian.eichhorn@uni-a.de)
Submission Date:[created]

Abstract

In computer vision, accurate segmentation of objects in images, especially from an egocentric perspective, poses significant challenges. These are caused by the dynamic environment and the frequent presence of objects that are small, thin, or with high motion. This thesis fine-tunes the Segment Anything Model (SAM) to improve its performance and effectively address these challenges by exploring three different approaches: full training of the mask decoder, selective fine-tuning of adapter layers within the image encoder, and a hybrid approach that integrates both strategies. A diverse collection of egocentric video data supports a robust and targeted training framework. Through various experiments, the research finds that fine-tuning only the adapter layers is the most effective way to improve SAM's segmentation accuracy without sacrificing its adaptability. This fine-tuning approach can further improve various egocentric segmentation tasks, potentially benefiting applications in augmented reality and assistive technologies.

Results/Implementation/Project Description

Conclusion

[ PDF (optional) ] 

[ Slides Kickoff/Final (optional)]