Project Overview

Project Code: CIT 23

Project name:

Clustering for DNA-Based Data Storage

TUM Department:

CIT - Electrical and Computer Engineering

TUM Chair / Institute:

Coding and Cryptography

Research area:

Coding Theory

Student background:

Computer EngineeringComputer Science/ InformaticsElectrical EngineeringMathematics

Further disciplines:

Participation also possible online only:

Planned project location:

Building N4 - Central City Campus
Theresienstrasse 90
80333 München

Project Supervisor - Contact Details


Title:

Given name:

Frederik

Family name:

Walter

E-mail:

Frederik.Walter@tum.de

Phone:

+49(89) 289.23492

Additional Project Supervisor - Contact Details


Title:

Dr.

Given name:

Jessica

Family name:

Bariffi

E-mail:

jessica.bariffi@tum.de

Phone:

Additional Project Supervisor - Contact Details


Title:

Prof. Dr.

Given name:

Antonia

Family name:

Wachter-Zeh

E-mail:

antonia.wachter-zeh@tum.de

Phone:

Project Description


Project description:

The rate of global data generation has in recent decades increased exponentially year-to-year; this demand, however, is not met by similar advancements in contemporary storage media necessary to safely retain the data being generated. Synthetic DNA has emerged as a potential solution to that challenge, due to its extraordinary long-term stability and data density, with the potential to store at least two petabytes of data per gram. However, the process of encoding, storing, and retrieving data from DNA involves complex challenges, particularly in managing the errors that can arise during these processes.
This project focuses on the critical steps of sampling and clustering in DNA-based data storage, which are essential for improving the efficiency and reliability of data retrieval. In the context of DNA storage, sampling refers to the selection and amplification of DNA sequences from a larger pool, often using methods like Polymerase Chain Reaction (PCR) to ensure that the correct sequences are available in sufficient quantities for decoding. Clustering, on the other hand, involves grouping similar DNA sequences to facilitate error correction and data reconstruction.

Errors such as mutations, duplications, or deletions of nucleotides can occur during DNA synthesis and sequencing, making accurate data retrieval a significant challenge. Effective sampling and clustering are crucial for mitigating these errors. By carefully selecting and amplifying the most representative sequences and organizing them into clusters, handling errors in the final decoded data can be improved, and the overall reliability of DNA as a storage medium can be enhanced.

This project aims to develop and optimize techniques for sampling and clustering DNA sequences in the context of DNA-based data storage. By improving these processes, the project aims to enhance the accuracy and efficiency of data retrieval, bringing us closer to realizing the full potential of DNA as a next-generation storage solution.

[1] Yazdi et al. “DNA-Based Storage: Trends and Methods”, 2015.
[2] https://youtu.be/r8qWc9X4f6k?si=_cxa5Dpb_WJRca0l

Working hours per week planned:

40

Prerequisites


Required study level minimum (at time of TUM PREP project start):

2 years of bachelor studies completed

Subject related:

Linear Algebra, Probability Theory

Other:

  • Keine Stichwörter