Pharmaceutical assay search with AI

Publicerad

Författare

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Retrieving historical assay data in pharmaceutical research is often restricted by reliance on specific metadata, overlooking the contextual information in associated protocol documents. This thesis investigates the potential of utilizing these plain English protocol documents alongside Natural Language Processing (NLP) techniques to implement semantic search for assays. A baseline TF-IDF model and the Transformer models BERT, SBERT, and Longformer were used to get embeddings of protocol documents from a corpus of historical protocols. Their performance in retrieving relevant historical protocols was evaluated based on key technical criteria, where the TF-IDF models and BERT using the chunking technique showed the best results. However, limitations in the evaluation scope introduce some uncertainty to the findings, highlighting the need for more rigorous validation. Nevertheless, the conclusions suggest that integrating NLP-driven semantic search systems could reduce the time and manual effort required for assay retrieval, even though the current approach may need further refinement for practical application. These insights are a promising foundation for developing AI-powered search systems used for pharmaceutical texts.

Beskrivning

Ämne/nyckelord

Pharmaceutical texts, Assays, Semantic Textual Similarity (STS), Artificial Intelligence (AI), Natural Language Processing (NLP), Large Language Model (LLM), TF-IDF, BERT, SBERT and Longformer

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced