Learning to Play Games from Multiple Imperfect Teachers
Examensarbete för masterexamen
Complex adaptive systems (MPCAS), MSc
This project evaluates the modularity of a recent Bayesian Inverse Reinforcement Learning approach  by inferring the sub-goals correlated with winning board games from observations of a set of agents. A feature based architecture is proposed together with a method for generating the reward function space, making inference tractable in large state spaces and allowing for the combination with models that approximate stateaction values. Further, a policy prior is suggested that allows for least squares policy evaluation using sample trajectories. The model is evaluated on randomly generated environments and on Tic-tac-toe, showing that a combination of the intentions inferred from all agents can generate strategies that outperform the corresponding strategies from each individual agent.
Data- och informationsvetenskap , Computer and Information Science