Shaping Rewards with Temporal Information to Guide Reinforcement Learning

Lundgren, Linus

Shaping Rewards with Temporal Information to Guide Reinforcement Learning

dc.contributor.author	Lundgren, Linus
dc.contributor.department	Chalmers tekniska högskola / Institutionen för elektroteknik	sv
dc.contributor.examiner	Ramirez-Amaro, Karinne
dc.contributor.supervisor	Lu, Wenhao
dc.contributor.supervisor	Liang, Zhitao
dc.date.accessioned	2025-12-22T08:19:24Z
dc.date.issued	2025
dc.date.submitted
dc.description.abstract	Reinforcement learning (RL) methods that apply pretrained Vision-Language Models (VLMs) to compute rewards typically use a single observation of the environment to do so. This is problematic because any information emerging from the sequential nature of RL, i.e. temporal information, is therefore disregarded. This thesis explored how temporal information can be incorporated into the VLM reward computation, by first distinguishing between fixed and adaptive temporal information. In fixed temporal information, additional inputs are provided to describe the environment’s progression through time, but these inputs remain unchanging throughout each episode. In contrast, adaptive temporal methods take additional inputs that can change as the episode progresses. Positional and directional rewards were defined to take advantage of fixed and adaptive temporal information respectively, along with new supervised finetuning methods for the directional reward functions. Evaluated with a sample efficiency metric over 6 robotic manipulation tasks, the best new positional rewards performed 18.4% better than previous methods, while directional rewards performed 23.0% better. Combining positional and directional rewards showed a 25.4% improvement, which was the best performance achieved by any method in this thesis.
dc.identifier.coursecode	EENX30
dc.identifier.uri	http://hdl.handle.net/20.500.12380/310826
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	VLM
dc.subject	reinforcement learning
dc.subject	machine learning
dc.subject	transfer learning
dc.subject	neural networks
dc.title	Shaping Rewards with Temporal Information to Guide Reinforcement Learning
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Data science and AI (MPDSC), MSc
local.programme	Systems, control and mechatronics (MPSYS), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: linulun_reward_shaping_thesis.pdf
Size:: 6.85 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen