Parallel construction of variable length Markov models for DNA sequences
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Modern CPUs that contain multiple cores allows for parallel execution of algorithms,
and while the technology exists it is not always used by existing implementations.
Within this project one such case is investigated, namely the construction of variable
length Markov models (VLMC).
This work builds upon the unpublished work of J. Gustafsson, a base implementation
for the construction of VLMCs on DNA-sequences. In addition to implementing
a parallel variant, the focus has also been on constructing models for large genomes,
something not yet undergone within the base project. The report presents two potential
practical parallel variants of this base, and early on selects the most promising
for further analysis. For this selected approach multiple tests are performed to
present runtime, speedup and memory consumption. The load distribution is also
analysed, and presents an opportunity for future improvement.
The highest level of speedup was approximately a factor of 7, on 32 cores, compared
to seriel execution. This test was performed with an input string of 22 GB. The
memory footprint of the implementation, albeit high, is expected because of the
adaptation to large input sizes.
Beskrivning
Ämne/nyckelord
variable length Markov models, VLMC, parallel computation