Shaping Rewards with Temporal Information to Guide Reinforcement Learning
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Reinforcement learning (RL) methods that apply pretrained Vision-Language Models
(VLMs) to compute rewards typically use a single observation of the environment
to do so. This is problematic because any information emerging from the sequential
nature of RL, i.e. temporal information, is therefore disregarded. This thesis
explored how temporal information can be incorporated into the VLM reward computation,
by first distinguishing between fixed and adaptive temporal information.
In fixed temporal information, additional inputs are provided to describe the environment’s
progression through time, but these inputs remain unchanging throughout
each episode. In contrast, adaptive temporal methods take additional inputs that
can change as the episode progresses. Positional and directional rewards were defined
to take advantage of fixed and adaptive temporal information respectively,
along with new supervised finetuning methods for the directional reward functions.
Evaluated with a sample efficiency metric over 6 robotic manipulation tasks, the
best new positional rewards performed 18.4% better than previous methods, while
directional rewards performed 23.0% better. Combining positional and directional
rewards showed a 25.4% improvement, which was the best performance achieved by
any method in this thesis.
Beskrivning
Ämne/nyckelord
VLM, reinforcement learning, machine learning, transfer learning, neural networks
