Cross-modal image feature matching between infrared and visual images. Adapting intra-modal feature matching models for cross-modal matching
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Abstract
Image feature matching is an essential part to various computer vision applications. Many modern solutions apply machine learning techniques to achieve state-of-theart results. A lesser studied problem is matching image features between images of different modalities. This thesis investigates this problem for the visual–LWIR (long-wave infrared) case by utilizing the matching capabilities of the pre-trained intra-modal models SuperPoint and SuperGlue. This is done by adding interfacing models and additional layers to mitigate problems such as catastrophic forgetting and data biasing in the pre-trained models. These techniques prove only marginally successful compared to the pre-trained models themselves. For training these models, a method for sparse pseudo ground truth point correspondence is proposed, and evaluation is done via pose estimation. This thesis provides insight into some specific methods of transfer learning for the SuperPoint and SuperGlue models, methods for ground truth estimation, and discusses the difficulties faced in this problem. Further studying of this problem may be able to construct improved models for LWIR–visual matching, which would enable more reliable methods for cross-modal camera calibration & registration, localization, and image retrieval, with numerous applications in the automotive, defense, and healthcare industries.
Beskrivning
Ämne/nyckelord
Keywords: feature matching, deep learning, computer vision, pose estimation, multimodal, infrared imaging, graph neural networks.