Off-policy latent variable modeling for fast bandit personalization
Typ
Examensarbete för masterexamen
Program
Publicerad
2022
Författare
Liljeqvist, Ludvig
Truvé, Viktor
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Medical treatments are decided based on medical history and the current symptoms
of a patient. For chronic illnesses this can be difficult, as long-time patients
develop an amount of medical data that is hard to grasp. We propose the use of
machine-learning methods to both condense this information, and then utilize it to
recommend medical treatments. Our goal is thus to develop an efficient method for
finding optimal treatments for patients – optimized for doing this in as few rounds
of treatment as possible.
We do this in a two step process: the first step is to develop a generalist model for
treatment recommendation using a combination of a seq2seq model, and a Variational
Autoencoder (VAE). The VAE condenses intricate patient information into
an encoding, and has the ability to reconstruct that information using this encoding.
We can thus consider each possible encoding as a patient type, that indicates which
treatment is best for that particular type, on average. Seq2seq adapts the VAE to
be applicable to sequential data – in our case, medical records. The second step is
to use the generalist model to produce specialized policies for individual patients,
inside a latent bandit model.
The ambition is that this solution will lead to faster personalization compared to
simpler methods, such as contextual bandits and multi-armed bandits, among others.
We present results showing that the proposed model performs better in earlier
rounds of treatment than other bandit algorithms, and also converges to a nearoptimal
policy faster.
Beskrivning
Ämne/nyckelord
Machine Learning , Health Care , AI , Latent Variable Models , Multiarmed Bandits , VA E , Latent Bandit