Exploring Image-to-Text Visual Search Using Open Source Models

Liu, Tommy

Exploring Image-to-Text Visual Search Using Open Source Models

Ladda ner

Thesis_Final_Report-Tommy_Liu.pdf (2.14 MB)

Publicerad

2026

Författare

Liu, Tommy

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Data science and AI (MPDSC), MSc

Sammanfattning

Visual searching refers to the use of visual data, typically images, in order to perform a search rather than textual input. Most visual search implementations rely on performing similarity searching over image features, in which a user-submitted query image is compared against all searchable entries’ features before returning sufficiently similar results. This thesis explores a different method which utilizes image descriptions generated by vision-language models instead of image features, where the descriptions are converted into embeddings in order to match with other search entries. Evaluation data indicate that the method can provide satisfactory retrieval performance in addition to maintaining a low search query execution time, provided that an adequate vision-language model is employed and sufficient server capacity is available.

Ämne/nyckelord

visual search, machine learning, deep learning, embedding, vision-language model, transformer, e-commerce

URI

http://hdl.handle.net/20.500.12380/310968

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Exploring Image-to-Text Visual Search Using Open Source Models

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By