Pharmaceutical assay search with AI
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Data science and AI (MPDSC), MSc
Publicerad
2024
Författare
Alladin, Ali
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Retrieving historical assay data in pharmaceutical research is often restricted by
reliance on specific metadata, overlooking the contextual information in associated
protocol documents. This thesis investigates the potential of utilizing these plain
English protocol documents alongside Natural Language Processing (NLP) techniques
to implement semantic search for assays. A baseline TF-IDF model and the
Transformer models BERT, SBERT, and Longformer were used to get embeddings
of protocol documents from a corpus of historical protocols. Their performance in
retrieving relevant historical protocols was evaluated based on key technical criteria,
where the TF-IDF models and BERT using the chunking technique showed the best
results. However, limitations in the evaluation scope introduce some uncertainty
to the findings, highlighting the need for more rigorous validation. Nevertheless,
the conclusions suggest that integrating NLP-driven semantic search systems could
reduce the time and manual effort required for assay retrieval, even though the
current approach may need further refinement for practical application. These insights
are a promising foundation for developing AI-powered search systems used
for pharmaceutical texts.
Beskrivning
Ämne/nyckelord
Pharmaceutical texts, Assays, Semantic Textual Similarity (STS), Artificial Intelligence (AI), Natural Language Processing (NLP), Large Language Model (LLM), TF-IDF, BERT, SBERT and Longformer