Similarity-Based Patent Selection using Natural Language Processing
Ladda ner
Typ
Examensarbete för masterexamen
Program
Computer science – algorithms, languages and logic (MPALG), MSc
Publicerad
2021
Författare
Aliyev, Elmar
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Many companies spend a lot of resources and put significant effort into R&D activities
to keep themselves informed of the latest advances in technology. As patent
data is the world’s largest technology repository, it is frequently utilized by technology
managers for this purpose. The patent analysis of this kind usually involves
much manual works, for example collecting patents to represent technology fields.
It has been observed that creating such patent sets is the most critical part since
poor patent selection would lead to biased results, no matter how well the analysis
is performed. Manual nature, on the other hand, makes the quality of the patent
selection process questionable.
This thesis studied the subject and proposed a novel method (called “SBPS”)
that assists users in building effective queries and, based on these queries, finds
representative patents for technology fields.
The proposed method is divided into three main stages, namely query building,
similarity calculation, and threshold finding. The essence of the first stage is
offering synonyms to the user’s query through the use of trained word embeddings.
The second stage involves employing a keyword extraction algorithm for calculating
document vectors and the cosine similarity measure for ranking documents based
on similarity to the query. The third stage requires the adjustment of the similarity
threshold between the range of 0 and 1. This manual step lets the users to define
the degree of patent relatedness to the query.
To evaluate the method, four technology battles were studied from the development
history viewpoint and compared to the histogram and growth curve graphs
extracted for the corresponding technologies using the SBPS method. The results
from the comparative analysis showed significant agreement between the historical
events and the graphs and proved the potential of the proposed method.
Keywords:
Beskrivning
Ämne/nyckelord
Patent , NLP , word2vec , similarity , technology , patent search , technology watch