Clustering for DNA Storage

dc.contributor.authorLuo, Youjun
dc.contributor.departmentChalmers tekniska högskola / Institutionen för elektrotekniksv
dc.contributor.examinerGraell i Amat, Alexandre
dc.date.accessioned2023-06-15T18:55:38Z
dc.date.available2023-06-15T18:55:38Z
dc.date.issued2023
dc.date.submitted2023
dc.description.abstractAbstract Deoxyribonucleic acid (DNA) has emerged as a potential storage medium due to its high information density and durability. Playing a critical role in the DNA storage process, the clustering partitions similar sequenced reads into groups for the decoder. However, the synthesis, storing, and sequencing of DNA introduce insertion, deletion and substitution (IDS) errors, making the clustering of reads harder. And because of the large numbers of reads in DNA storage, traditional clustering methods in biological domains become time-consuming. Recently, a trie-based algorithm called Clover is proposed to accelerate the clustering process in DNA storage by fuzzy searching the input reads on a trie structure. However, it only considers substitutions during the search, while deletions and insertions are addressed through multiple tests on different regions of input reads afterwards. In this thesis, we proposed efficient clustering algorithms that optimize the trie searching by considering the IDS channel. In our algorithm, discrete IDS errors are corrected with a depthlimited strategy. And a cluster merging method is developed to improve the success rate of searching. We validate the proposed methods on three real-world DNA storage datasets, achieving the lowest runtime and comparable accuracy compared to state-of-the-art DNA clustering tools.
dc.identifier.coursecodeEENX60
dc.identifier.urihttp://hdl.handle.net/20.500.12380/306253
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectDNA storage, Indexing, Clustering, Trie, Levenshtein distance, Poucet search, Depth-limited search, Cluster merging.
dc.subjectDNA storage
dc.subjectIndexing
dc.subjectClustering
dc.subjectTrie
dc.subjectLevenshtein distance
dc.subjectPoucet search
dc.subjectDepth-limited search
dc.subjectCluster merging
dc.titleClustering for DNA Storage
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeCommunication Engineering (MPCOM), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Youjun_MSc_Thesis__DNA_clustering_final_report.pdf
Storlek:
1.79 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: