Optimizing latency in multi-agent systems
Hämtar...
Ladda ner
Publicerad
Författare
Typ
Examensarbete på kandidatnivå
Bachelor Thesis
Bachelor Thesis
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Large Language Model (LLM)-based multi-agent systems are increasingly used to
solve complex tasks through collaboration between specialized agents. However,
the use of multiple agents, tool invocations, and inter-agent communication can
introduce significant latency and cost, limiting practical deployment.
This thesis investigates how architectural optimizations affect the performance of an
LLM-based multi-agent system. A financial analysis pipeline was implemented using
the Agent-to-Agent (A2A) protocol for inter-agent communication and the Model
Context Protocol (MCP) for tool use. Four cumulative optimization techniques
were evaluated: agent parallelization, tool batching, schema pruning, and model
assignment. Performance was assessed using end-to-end latency, inference cost, and
output quality.
The results show that agent parallelization provides negligible latency improvement
under the evaluated deployment conditions due to shared model endpoint contention.
In contrast, tool batching reduces median latency up to 27.4% and inference cost
by 54.6% while improving output quality from 4.18 to 5.00. Schema pruning and
model assignment techniques further reduces inference cost up to 77.1% compared
to the baseline without degrading quality. Overall, the results suggest that reducing
tool invocation overhead and unnecessary context transfer provides greater benefits
than agent-level parallelization in the evaluated multi-agent architecture.
Beskrivning
Ämne/nyckelord
Agent-to-Agent, Multi-agent system, Optimization, Latency, MCP
