Modulating Reinforcement- Learning Parameters Using Agent Emotions

Examensarbete för masterexamen

Please use this identifier to cite or link to this item:
Download file(s):
File Description SizeFormat 
173825.pdfFulltext1.6 MBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Master Thesis
Title: Modulating Reinforcement- Learning Parameters Using Agent Emotions
Authors: von Haugwitz, Rickard
Abstract: When faced with the problem of learning a strategy for social interaction in a multiagent environment, it is often difficult to satisfactorily define clear goals, and it might not be clear what would constitute a “good” course of action in most situations. In this case, by using a computational model of emotion to provide an intrinsic reward function, the task can be shifted to optimisation of emotional feedback, allowing more high-level goals to be defined. While of most interest in a general, not necessarily competitive, social setting on a continuing task, such a model can be better compared with more conventional reward functions on an episodic competitive task, where its benefit is not as readily visible. A reinforcement-learning system based on the actor-critic model of temporal-difference learning was implemented using a fuzzy inference system functioning as a normalised radial-basis-function network capable of dynamically allocating computational units as needed and to adapt its features to the actual observed input. While adding some computational overhead, such a system requires less manual tuning by the programmer and is able to make better use of existing resources. Tests were carried out on a small-scale multi-agent system with an initially hostile environment, with fixed learning parameters and separately with modulated parameters that were allowed to deviate from their base values depending on the emotional state of the agent. The latter approach was shown to give marginally better performance once the hostile elements were removed from the environment, indicating that emotion-modulated learning may lead to somewhat closer approximation of the optimal policy in a difficult environment by focusing learning on more useful input and increasing exploration when needed.
Keywords: Informations- och kommunikationsteknik;Människa-datorinteraktion (interaktionsdesign);Information & Communication Technology;Human Computer Interaction
Issue Date: 2012
Publisher: Chalmers tekniska högskola / Institutionen för tillämpad informationsteknologi (Chalmers)
Chalmers University of Technology / Department of Applied Information Technology (Chalmers)
Series/Report no.: Report - IT University of Göteborg, Chalmers University of Technology and the University of Göteborg
Collection:Examensarbeten för masterexamen // Master Theses

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.