Machine learning for big sequence data: Wavelet-compressed Hidden Markov Models
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Hidden Markov models are among the most important machine learning methods
for the statistical analysis of sequential data, but they struggle when applied on
big data. Their relative inefficiency has been addressed several times by the use of
some compression techniques, either for the computation. This thesis explores the
former, with the application of a data compression technique based on wavelets and
the subsequent adaptation of the main HMMs algorithms from the literature: the
forward, Viterbi and Baum-Welch algorithms used to solve the evaluation, decoding
and training problem respectively. The testing phase shows that this new technique
generally yields equal or better results, obtaining some extremely high speedups in
the training problem, making it even thousands of times faster; this allows to easily
train a HMM with big data on a commodity laptop.
Beskrivning
Ämne/nyckelord
machine, learning, sequence, wavelet, compression, hidden, markov, models, viterbi, training