Comparison of Arm Selection Policies for the Multi-Armed Bandit Problem

Johansson, Fifi; MCHOME, MIRIAM

Comparison of Arm Selection Policies for the Multi-Armed Bandit Problem

dc.contributor.author	Johansson, Fifi
dc.contributor.author	MCHOME, MIRIAM
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)	en
dc.date.accessioned	2019-07-03T14:56:24Z
dc.date.available	2019-07-03T14:56:24Z
dc.date.issued	2018
dc.description.abstract	Web content optimization involves deciding what content to put on a web page, its layout and design. All of which involve selecting few options among many. With the advent of personalization, many companies seek to make this decision even on a per-user basis in order to improve customer experience and satisfaction. Contextual multi-armed bandit provides several strategies to address this online decision-making problem at a lower experimental cost than traditional A/B testing. In this study, we compare three common Contextual Bandit strategies that exist in literature namely E-greedy, LinUCB and Thompson Sampling, and apply two of them, E-greedy and LinUCB, to three datasets. In doing so we offer further empirical evidence on the performance of these strategies and insights for practitioners on what strategy might work for them. Our results suggest that both approaches, E-Greedy and LinUCB are effective in improving click-through rate compared to the random approach. The more sophisticated approach has better results with large datasets, and a quite unstable performance when the number of datapoints is small. On the other hand, we find that the more sophisticated approach is more sensitive to parameter tuning and can have significantly worse outcome when parameters are incorrect. Our study also finds that LinUCB can have higher data requirements when performing evaluation offline. Collectively the varying performance of these approaches across dataset signal the need for better tools and procedures to help practitioners decide on the appropriate approach.
dc.identifier.uri	https://hdl.handle.net/20.500.12380/256336
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	Data- och informationsvetenskap
dc.subject	Computer and Information Science
dc.title	Comparison of Arm Selection Policies for the Multi-Armed Bandit Problem
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master Thesis	en
dc.type.uppsok	H
local.programme	Software engineering and technology (MPSOF), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: 256336.pdf
Storlek:: 1.5 MB
Format:: Adobe Portable Document Format
Beskrivning:: Fulltext

Ladda ner

Samlingar

Examensarbeten för masterexamen