Adaptive KV Cache Management for Efficient Transformer-based LLM Inference - Leveraging Attention Sparsity for Memory Optimization

Xu, Dikai

Adaptive KV Cache Management for Efficient Transformer-based LLM Inference - Leveraging Attention Sparsity for Memory Optimization

Ladda ner

CSE 25-152 DX.pdf (1.06 MB)

Publicerad

2025

Författare

Xu, Dikai

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Computer systems and networks (MPCSN), MSc

Sammanfattning

This Master’s thesis addresses the critical challenge of memory inefficiency in Transformerbased Large Language Models (LLMs) during inference, specifically focusing on the prohibitive memory footprint of the Key-Value (KV) cache. As LLMs scale, the KV cache becomes a significant bottleneck, limiting longer context windows and overall operational efficiency. To mitigate this issue, we propose and evaluate Adap-KV, a novel adaptive memory management strategy for the KV cache. Adap-KV employs a layer-aware dynamic allocation approach that intelligently adjusts KV cache size in real-time, leveraging insights from attention sparsity patterns. Our method aims to optimize memory utilization without compromising the performance or quality of LLM inference. Experimental results demonstrate that Adap-KV significantly reduces KV cache memory consumption, thereby enhancing the efficiency and scalability of Transformer-based LLMs, making them more amenable for real-world deployments with extended context capabilities.

Ämne/nyckelord

Large Language Models, Transformers, KV Cache, Memory Optimization, Adaptive Memory Management, Attention Sparsity, Deep Learning Inference, Resource Efficiency

URI

https://hdl.handle.net/20.500.12380/310903

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Adaptive KV Cache Management for Efficient Transformer-based LLM Inference - Leveraging Attention Sparsity for Memory Optimization

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By