Key Sentence Extraction From CRISPR-Cas9 Articles Using Sentence Transformers
Examensarbete för masterexamen
Computer science – algorithms, languages and logic (MPALG), MSc
Stranden Lae, Brage
The annotation of CRISPR-related articles and extraction of key content has traditionally relied on manual efforts. Manual annotation is error-prone and timeconsuming. This thesis presents an alternative approach using transfer learning and pre-trained models based on the Transformer architecture. Specifically, Sentence Transformer models are fine-tuned using a CRISPR-related dataset. The dataset contains articles and key sentences, enabling automatic extraction of keyphrases. The study explores various modifications to the models and data to enhance performance for this task. The results demonstrate the effectiveness of fine-tuning Sentence Transformer models for keyphrase extraction, achieving an Average R-precision of 90.4 %. Future research could focus on alternative approaches or further automation to identify entities and relations within key sentences. Key sentence extraction is complex due to the varying definitions of key content, content location, and specific use cases. However, the potential benefits of time savings and improved workflow efficiency make this approach highly valuable.
NLP , Transformers , CRISPR , semantic search , keyphrase extraction