Off-policy latent variable modeling for fast bandit personalization

Liljeqvist, Ludvig; Truvé, Viktor

Off-policy latent variable modeling for fast bandit personalization

Ladda ner

CSE 22-67 Liljeqvist Truve.pdf (2.93 MB)

Publicerad

2022

Författare

Liljeqvist, Ludvig

Truvé, Viktor

Typ

Examensarbete för masterexamen

Sammanfattning

Medical treatments are decided based on medical history and the current symptoms of a patient. For chronic illnesses this can be difficult, as long-time patients develop an amount of medical data that is hard to grasp. We propose the use of machine-learning methods to both condense this information, and then utilize it to recommend medical treatments. Our goal is thus to develop an efficient method for finding optimal treatments for patients – optimized for doing this in as few rounds of treatment as possible. We do this in a two step process: the first step is to develop a generalist model for treatment recommendation using a combination of a seq2seq model, and a Variational Autoencoder (VAE). The VAE condenses intricate patient information into an encoding, and has the ability to reconstruct that information using this encoding. We can thus consider each possible encoding as a patient type, that indicates which treatment is best for that particular type, on average. Seq2seq adapts the VAE to be applicable to sequential data – in our case, medical records. The second step is to use the generalist model to produce specialized policies for individual patients, inside a latent bandit model. The ambition is that this solution will lead to faster personalization compared to simpler methods, such as contextual bandits and multi-armed bandits, among others. We present results showing that the proposed model performs better in earlier rounds of treatment than other bandit algorithms, and also converges to a nearoptimal policy faster.

Ämne/nyckelord

Machine Learning, Health Care, AI, Latent Variable Models, Multiarmed Bandits, VA E, Latent Bandit

URI

https://hdl.handle.net/20.500.12380/304918

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Off-policy latent variable modeling for fast bandit personalization

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced