Pharmaceutical assay search with AI

Typ
Examensarbete för masterexamen
Master's Thesis
Program
Data science and AI (MPDSC), MSc
Publicerad
2024
Författare
Alladin, Ali
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Retrieving historical assay data in pharmaceutical research is often restricted by reliance on specific metadata, overlooking the contextual information in associated protocol documents. This thesis investigates the potential of utilizing these plain English protocol documents alongside Natural Language Processing (NLP) techniques to implement semantic search for assays. A baseline TF-IDF model and the Transformer models BERT, SBERT, and Longformer were used to get embeddings of protocol documents from a corpus of historical protocols. Their performance in retrieving relevant historical protocols was evaluated based on key technical criteria, where the TF-IDF models and BERT using the chunking technique showed the best results. However, limitations in the evaluation scope introduce some uncertainty to the findings, highlighting the need for more rigorous validation. Nevertheless, the conclusions suggest that integrating NLP-driven semantic search systems could reduce the time and manual effort required for assay retrieval, even though the current approach may need further refinement for practical application. These insights are a promising foundation for developing AI-powered search systems used for pharmaceutical texts.
Beskrivning
Ämne/nyckelord
Pharmaceutical texts, Assays, Semantic Textual Similarity (STS), Artificial Intelligence (AI), Natural Language Processing (NLP), Large Language Model (LLM), TF-IDF, BERT, SBERT and Longformer
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index