Efficient Solving Methods for POMDP-based Threat Defense Environments on Bayesian Attack Graphs
Examensarbete för masterexamen
Computer science – algorithms, languages and logic (MPALG), MSc
In this work, we show how to formulate a threat defense environment as a Partially Observable Markov Decision Process (POMDP) that allows for fast approximate defense algorithms against multiple attackers. It is done through an action extension, coined the Inspect action, which allows the agent to reveal the true state of the environment, thereby reducing the problem into a traditional Markov Decision Process (MDP) for the current time-step. The work is an extension of previous definitions of the same problem. Furthermore, based on the new definition we define and show the optimal policy, as well as two new solving algorithms, n-Myopic and n-Lookahead. To evaluate their performance, we show and compare the results of these new algorithms to more standard solving algorithms, such as Q-learning and Policy Gradients. The experimental results show that the new algorithms perform better than previous attempts and allows for larger scale threat environments thanks to the approximate MDP reduction. Additionally, to facilitate future research, two OpenAI Gym environments were developed and are publicly available for new research to build upon. We encourage new research with similar problem description to use this software library, opening up to standardized performance results.
Data- och informationsvetenskap , Computer and Information Science