Similarity-Based Patent Selection using Natural Language Processing

Typ
Examensarbete för masterexamen
Program
Computer science – algorithms, languages and logic (MPALG), MSc
Publicerad
2021
Författare
Aliyev, Elmar
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Many companies spend a lot of resources and put significant effort into R&D activities to keep themselves informed of the latest advances in technology. As patent data is the world’s largest technology repository, it is frequently utilized by technology managers for this purpose. The patent analysis of this kind usually involves much manual works, for example collecting patents to represent technology fields. It has been observed that creating such patent sets is the most critical part since poor patent selection would lead to biased results, no matter how well the analysis is performed. Manual nature, on the other hand, makes the quality of the patent selection process questionable. This thesis studied the subject and proposed a novel method (called “SBPS”) that assists users in building effective queries and, based on these queries, finds representative patents for technology fields. The proposed method is divided into three main stages, namely query building, similarity calculation, and threshold finding. The essence of the first stage is offering synonyms to the user’s query through the use of trained word embeddings. The second stage involves employing a keyword extraction algorithm for calculating document vectors and the cosine similarity measure for ranking documents based on similarity to the query. The third stage requires the adjustment of the similarity threshold between the range of 0 and 1. This manual step lets the users to define the degree of patent relatedness to the query. To evaluate the method, four technology battles were studied from the development history viewpoint and compared to the histogram and growth curve graphs extracted for the corresponding technologies using the SBPS method. The results from the comparative analysis showed significant agreement between the historical events and the graphs and proved the potential of the proposed method. Keywords:
Beskrivning
Ämne/nyckelord
Patent , NLP , word2vec , similarity , technology , patent search , technology watch
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index