Chalmers Open Digital Repository
Välkommen till Chalmers öppna digitala arkiv!
Här hittar du:
- Studentarbeten utgivna på lärosätet, såväl kandidatarbeten som examensarbeten på grund- och masternivå
- Digitala specialsamlingar, som t ex Chalmers modellkammare
- Utvalda projektrapporter
Forskningspublikationer, rapporter och avhandlingar hittar du i research.chalmers.se
Enheter i Chalmers ODR
Välj en enhet för att se alla samlingar.
Senast inlagda
Automation of Antenna Isolation Modeling
(2025) Damaraju, Sri Sai Satyanarayana
This thesis enhances an existing Python-based automation framework for antenna
isolation modeling in large-scale antenna arrays, developed by Ericsson Research.
The framework utilizes ANSYS HFSS via the PyAEDT API for automating
electromagnetic workflows, enabling systematic variation of antenna array parameters
including geometry, element spacing, polarization configurations, etc. which are
loaded for model creation. This builds a complete 3D antenna array model in HFSS
with proper excitation and boundary conditions, which then extracts comprehensive
S-parameter data from simulation results and provides visualization capabilities
with mathematical error metrics for evaluating mesh convergence and simulation reliability.
This work extends the framework with advanced performance evaluation and
additional comparative analysis capabilities. Key contributions include the development
of simulation performance evaluation tools that systematically analyze HFSS
log files to extract information as resource utilization and convergence behavior, estimating
error-free dynamic range and computational efficiency enabling quantitative
comparison of simulation performance across different array configurations and computational
runs. This is critical for optimizing simulation workflow and validating
result accuracy in large-scale antenna modeling. Additionally, the extended framework
provides comprehensive S-parameter comparison tools that perform quantitative
analysis between different antenna simulations facilitating systematic evaluation
of design variation and isolation performance characteristics.
ML assisted circuit design using active learning
(2026) Bark, Omar
This thesis explores the use of active learning (AL) to reduce the amount of training
data required for machine learning (ML) models used for circuit design by selective
sampling of the most informative data points. In this study, an uncertainty-based
AL approach was implemented. This method leverages the model’s prediction uncertainty
to selectively sample the most informative data points. A convolutional
neural network (CNN) is trained to predict scattering parameters (S-parameters)
from pixelated representations of 2-port passive microwave circuits. This method
enables the ML model to act as a fast surrogate to electromagnetic (EM) solvers.
The goal in this project is to speed up the training process for the ML models using
AL.
The performance of the AL-based model is compared to a baseline model trained
using random sampling. Evaluation is conducted on a fixed test set, as well as across
different frequency ranges and S-parameter. Results show that AL consistently outperforms
the baseline in terms of root mean square error (RMSE), particularly at
higher frequencies where EM behavior becomes more complex.
Ensemble models were also investigated to assess their potential in improving the
sampling strategy. However, they did not yield better results. Each ensemble run required
over two weeks of computation, limiting further experimentation. An ensemble
of models refers to a collection of multiple individual models whose predictions
are combined to improve overall performance and robustness.
Finally, the models were tested in a design task where a genetic algorithm generated
circuits from targeted S-parameters. The AL model achieved a 32.9% lower mean
RMSE than the baseline when comparing predicted and simulated S-parameters.
These findings highlight AL as a promising approach for improving data efficiency
in ML-based circuit design.
Reducing MPI Communication Latency with FPGA-Based Hardware Compression
(2025) BOURBIA, ANIS
High-performance computing (HPC) clusters face significant communication overhead in distributed deep learning, where frequent data exchanges via the Message Passing Interface (MPI) can bottleneck overall training. This thesis explores an
FPGA-based hardware compression approach to reduce MPI communication latency. We prototype integrating an FPGA compression module into the MPI stack, enabling on-the-fly compression of message payloads using fast lossless algorithms LZ4, Snappy, and Zstd. This hardware-accelerated compression offloads work from CPUs/GPUs and shrinks data volume before network transmission, thereby speeding up inter-node communication. In our evaluation, LZ4/Snappy/Zstd achieved
compression ratios of 1.53x/1.51x/1.84x and reduced communication time by 34.6%, 33.8%, and 45.7%, yielding overall training speedups of 1.34x, 1.32x, and 1.50x, respectively. Experimental evaluation on representative deep learning workloads
demonstrates up to a 1.50x improvement in end-to-end training time with the FPGA compression enabled. Among the tested compressors, Zstd achieved the highest compression ratio, translating to the greatest latency reduction and performance gain. These results highlight that FPGA-based compression can substantially improve throughput in distributed training by alleviating network delays, with negligible added overhead. The proposed method offers a practical path to accelerate HPC communications and scale deep learning workloads more efficiently.
Full System-Level Simulation of Neural Compute Architectures
(2025) Kalamkar, Arjun
The proliferation of large-scale artificial intelligence models necessitates specialized hardware like Neural Processing Units (NPUs) to achieve efficient computation. However, an NPU’s real-world performance is deeply influenced by system-level
effects that are often overlooked. Existing simulation tools typically lack the ability to model a detailed NPU microarchitecture within a full-system context, obscuring critical performance bottlenecks arising from the operating system, device drivers, and memory contention. This thesis introduces gem5-fsnpu, a novel simulation framework that bridges this gap by integrating a reconfigurable, transaction-level cycle-accurate NPU model into the gem5 full-system simulator [1], [2]. The framework includes a complete, vertically-integrated software stack, featuring a custom Linux driver and a user-space library with an intelligent, hardware-aware tiling algorithm, enabling realistic hardware-software co-design studies. We demonstrate the framework’s capabilities through a comprehensive Design Space Exploration, evaluating NPU performance on benchmarks including general matrix multiplication (GEMM) and complex Transformer layers like Multi-Head Attention (MHA). Architectural parameters such as systolic array dimensions (2D vs. 3D), on-chip memory size, and dataflow are systematically varied. The results reveal that system-level overheads are frequently the dominant performance bottleneck. For
instance, the framework shows how for command-intensive workloads like MHA, the software control path latency can eclipse the hardware computation time, becoming the primary performance limiter. The study also quantifies the critical relationship between on-chip memory size and software tiling efficiency, demonstrating that an undersized memory can nullify the benefits of a powerful compute core. This work validates the necessity of full-system simulation for accelerator design and provides a powerful tool for researchers, proving that a holistic, hardware-software co-design approach is paramount to achieving efficient AI acceleration.
Evaluating Guest Isolation on a Hypervised System
(2025) Asp, Agnes; Karlsson, Alfred
As mixed-critical systems become more prevalent in automotive systems, virtualization has emerged as a promising solution to reduce system complexity and improve costefficiency. This thesis investigates the ability of hypervisors to maintain temporal isolation between virtual machines (VMs) under conditions that simulate disturbances. Two general-purpose hypervisors, Xen and QEMU/KVM, are evaluated on an ARM-based Pi 4B using ZephyrOS as a Real-Time Operating System (RTOS) in both measurer and stressor roles. A test framework was developed to benchmark low-level latency operations and applicationlevel performance using adapted MiBench workloads (Qsort and Basicmath), and longterm scheduling behavior through thread metrics. Performance metrics were collected under various configurations, including stressed and unstressed scenarios across different CPU core assignments. The results show that while both hypervisors provide a baseline level of temporal isolation, their behaviors diverge under stress. QEMU/KVM generally demonstrates better raw performance and responsiveness, whereas Xen offers more predictable behavior in specific scheduling configurations. These findings underscore the trade-offs involved in selecting a hypervisor for real-time automotive applications and contribute to a broader understanding of how virtualization affects temporal determinism in embedded systems.
