Pharmaceutical assay search with AI

dc.contributor.authorAlladin, Ali
dc.contributor.departmentChalmers tekniska högskola / Institutionen för matematiska vetenskapersv
dc.contributor.examinerJonasson, Johan
dc.contributor.supervisorJonasson, Johan
dc.date.accessioned2024-11-28T12:42:44Z
dc.date.available2024-11-28T12:42:44Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractRetrieving historical assay data in pharmaceutical research is often restricted by reliance on specific metadata, overlooking the contextual information in associated protocol documents. This thesis investigates the potential of utilizing these plain English protocol documents alongside Natural Language Processing (NLP) techniques to implement semantic search for assays. A baseline TF-IDF model and the Transformer models BERT, SBERT, and Longformer were used to get embeddings of protocol documents from a corpus of historical protocols. Their performance in retrieving relevant historical protocols was evaluated based on key technical criteria, where the TF-IDF models and BERT using the chunking technique showed the best results. However, limitations in the evaluation scope introduce some uncertainty to the findings, highlighting the need for more rigorous validation. Nevertheless, the conclusions suggest that integrating NLP-driven semantic search systems could reduce the time and manual effort required for assay retrieval, even though the current approach may need further refinement for practical application. These insights are a promising foundation for developing AI-powered search systems used for pharmaceutical texts.
dc.identifier.coursecodeMVEX03
dc.identifier.urihttp://hdl.handle.net/20.500.12380/309014
dc.language.isoeng
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectPharmaceutical texts, Assays, Semantic Textual Similarity (STS), Artificial Intelligence (AI), Natural Language Processing (NLP), Large Language Model (LLM), TF-IDF, BERT, SBERT and Longformer
dc.titlePharmaceutical assay search with AI
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeData science and AI (MPDSC), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Master_Thesis_Ali_Alladin_2024.pdf
Storlek:
1.89 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: