Off-policy latent variable modeling for fast bandit personalization

dc.contributor.authorLiljeqvist, Ludvig
dc.contributor.authorTruvé, Viktor
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.examinerJohansson, Moa
dc.contributor.supervisorJohansson, Fredrik
dc.date.accessioned2022-06-28T11:19:04Z
dc.date.available2022-06-28T11:19:04Z
dc.date.issued2022sv
dc.date.submitted2020
dc.description.abstractMedical treatments are decided based on medical history and the current symptoms of a patient. For chronic illnesses this can be difficult, as long-time patients develop an amount of medical data that is hard to grasp. We propose the use of machine-learning methods to both condense this information, and then utilize it to recommend medical treatments. Our goal is thus to develop an efficient method for finding optimal treatments for patients – optimized for doing this in as few rounds of treatment as possible. We do this in a two step process: the first step is to develop a generalist model for treatment recommendation using a combination of a seq2seq model, and a Variational Autoencoder (VAE). The VAE condenses intricate patient information into an encoding, and has the ability to reconstruct that information using this encoding. We can thus consider each possible encoding as a patient type, that indicates which treatment is best for that particular type, on average. Seq2seq adapts the VAE to be applicable to sequential data – in our case, medical records. The second step is to use the generalist model to produce specialized policies for individual patients, inside a latent bandit model. The ambition is that this solution will lead to faster personalization compared to simpler methods, such as contextual bandits and multi-armed bandits, among others. We present results showing that the proposed model performs better in earlier rounds of treatment than other bandit algorithms, and also converges to a nearoptimal policy faster.sv
dc.identifier.coursecodeDATX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/304918
dc.language.isoengsv
dc.setspec.uppsokTechnology
dc.subjectMachine Learningsv
dc.subjectHealth Caresv
dc.subjectAIsv
dc.subjectLatent Variable Modelssv
dc.subjectMultiarmed Banditssv
dc.subjectVA Esv
dc.subjectLatent Banditsv
dc.titleOff-policy latent variable modeling for fast bandit personalizationsv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 22-67 Liljeqvist Truve.pdf
Storlek:
2.93 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.51 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: