Exploring Image-to-Text Visual Search Using Open Source Models

dc.contributor.authorLiu, Tommy
dc.contributor.departmentChalmers tekniska högskola / Institutionen för elektrotekniksv
dc.contributor.examinerHäggström, Ida
dc.contributor.supervisorHäggström, Ida
dc.contributor.supervisorDahlin, Albert
dc.date.accessioned2026-02-09T13:23:57Z
dc.date.issued2026
dc.date.submitted
dc.description.abstractVisual searching refers to the use of visual data, typically images, in order to perform a search rather than textual input. Most visual search implementations rely on performing similarity searching over image features, in which a user-submitted query image is compared against all searchable entries’ features before returning sufficiently similar results. This thesis explores a different method which utilizes image descriptions generated by vision-language models instead of image features, where the descriptions are converted into embeddings in order to match with other search entries. Evaluation data indicate that the method can provide satisfactory retrieval performance in addition to maintaining a low search query execution time, provided that an adequate vision-language model is employed and sufficient server capacity is available.
dc.identifier.coursecodeEENX30
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310968
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectvisual search
dc.subjectmachine learning
dc.subjectdeep learning
dc.subjectembedding
dc.subjectvision-language model
dc.subjecttransformer
dc.subjecte-commerce
dc.titleExploring Image-to-Text Visual Search Using Open Source Models
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeData science and AI (MPDSC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Thesis_Final_Report-Tommy_Liu.pdf
Storlek:
2.14 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: