Off-policy latent variable modeling for fast bandit personalization
Examensarbete för masterexamen
Medical treatments are decided based on medical history and the current symptoms of a patient. For chronic illnesses this can be difficult, as long-time patients develop an amount of medical data that is hard to grasp. We propose the use of machine-learning methods to both condense this information, and then utilize it to recommend medical treatments. Our goal is thus to develop an efficient method for finding optimal treatments for patients – optimized for doing this in as few rounds of treatment as possible. We do this in a two step process: the first step is to develop a generalist model for treatment recommendation using a combination of a seq2seq model, and a Variational Autoencoder (VAE). The VAE condenses intricate patient information into an encoding, and has the ability to reconstruct that information using this encoding. We can thus consider each possible encoding as a patient type, that indicates which treatment is best for that particular type, on average. Seq2seq adapts the VAE to be applicable to sequential data – in our case, medical records. The second step is to use the generalist model to produce specialized policies for individual patients, inside a latent bandit model. The ambition is that this solution will lead to faster personalization compared to simpler methods, such as contextual bandits and multi-armed bandits, among others. We present results showing that the proposed model performs better in earlier rounds of treatment than other bandit algorithms, and also converges to a nearoptimal policy faster.
Machine Learning , Health Care , AI , Latent Variable Models , Multiarmed Bandits , VA E , Latent Bandit