Clustering for DNA Storage
dc.contributor.author | Luo, Youjun | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för elektroteknik | sv |
dc.contributor.examiner | Graell i Amat, Alexandre | |
dc.date.accessioned | 2023-06-15T18:55:38Z | |
dc.date.available | 2023-06-15T18:55:38Z | |
dc.date.issued | 2023 | |
dc.date.submitted | 2023 | |
dc.description.abstract | Abstract Deoxyribonucleic acid (DNA) has emerged as a potential storage medium due to its high information density and durability. Playing a critical role in the DNA storage process, the clustering partitions similar sequenced reads into groups for the decoder. However, the synthesis, storing, and sequencing of DNA introduce insertion, deletion and substitution (IDS) errors, making the clustering of reads harder. And because of the large numbers of reads in DNA storage, traditional clustering methods in biological domains become time-consuming. Recently, a trie-based algorithm called Clover is proposed to accelerate the clustering process in DNA storage by fuzzy searching the input reads on a trie structure. However, it only considers substitutions during the search, while deletions and insertions are addressed through multiple tests on different regions of input reads afterwards. In this thesis, we proposed efficient clustering algorithms that optimize the trie searching by considering the IDS channel. In our algorithm, discrete IDS errors are corrected with a depthlimited strategy. And a cluster merging method is developed to improve the success rate of searching. We validate the proposed methods on three real-world DNA storage datasets, achieving the lowest runtime and comparable accuracy compared to state-of-the-art DNA clustering tools. | |
dc.identifier.coursecode | EENX60 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12380/306253 | |
dc.language.iso | eng | |
dc.setspec.uppsok | Technology | |
dc.subject | DNA storage, Indexing, Clustering, Trie, Levenshtein distance, Poucet search, Depth-limited search, Cluster merging. | |
dc.subject | DNA storage | |
dc.subject | Indexing | |
dc.subject | Clustering | |
dc.subject | Trie | |
dc.subject | Levenshtein distance | |
dc.subject | Poucet search | |
dc.subject | Depth-limited search | |
dc.subject | Cluster merging | |
dc.title | Clustering for DNA Storage | |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.degree | Master's Thesis | en |
dc.type.uppsok | H | |
local.programme | Communication Engineering (MPCOM), MSc |