Methods for Optimizing BERT Model on Edge Devices - Accelerating Biomedical NLP with Pruned and Quantized BERT Models

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Named-entity recognition (NER) of clinical efficacy endpoints in oncology abstracts supports downstream discovery pipelines at AstraZeneca. Yet, the fine-tuned transformer models currently used are too slow and over parameterized for large-scale CPU deployment. This thesis evaluated whether post-training model compression techniques can accelerate inference without retraining or harming extraction quality. In the first stage of this project, standard BERT and BioBERT were individually pruned with a three-stage, Fisher-guided structured pruning workflow at three levels of sparsity. Subsequently, in the second stage, dynamic 8-bit integers quantization using ONNX Runtime was applied to standard BERT, BioBERT, and DistilBERT. The third stage involved combining both pruning and quantization, further optimizing the pre-trained standard BERT and BioBERT transformers. Experiments were run on annotated MEDLINE sentences covering 25 efficacy labels, with F1 score and inference latency per sample serving as primary metrics. A 25% structured-sparsity level yielded no measurable drop in F1 score, and the additional 8-bit integers dynamic step cut latency further. The best configuration, 25%- pruned+8-bit integers BioBERT, reduced mean CPU inference time from 32.52 ms to 12.02 ms (2.6-fold speed-up) while accuracy fell only from 0.982 to 0.980 and F1 score from 0.954 to 0.948. The Post-training structured pruning combined with 8-bit integers dynamic quantization makes the oncology-NER pipeline about three times faster in inference time on standard CPUs without compromising the extraction quality or needing special hardware or libraries.

Beskrivning

Ämne/nyckelord

Natural language processing, Named entity recognition, Post-training quantization, Structured pruning, Model compression

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced