Dental Implant Technology Clustering and 

Technology Life-span Analysis Using Ontology-

based Patent Intelligence 

 
Master of Science Thesis in the Master Degree Programme, Biotechnology  

 
SEAN LONG HOANG 

 
Department of Chemical and Biological Engineering  

Division of Applied Surface Chemistry   

CHALMERS UNIVERSITY OF TECHNOLOGY 

Göteborg, Sweden, 2012 

  
Title: Dental Implant Technology Clustering and Technology Life-span Analysis Using 

Ontology-based Patent Intelligence 

Author: Sean Long Hoang 

 
© SEAN LONG HOANG, 2012 

 
Department of Applied Surface Chemistry  

Chalmers University of Technology 

SE-412 96 Göteborg 

Sweden 

 
Supervisor: Dr. Charles V. Trappey 

Department of Management Science 

National Chiao Tung University 

300, Hsinchu 

Taiwan 

 
Examiner: Professor Krister Holmberg 

Department of Chemical and Biological Engineering 

Division of Applied Surface Chemistry 

SE-412 96 Göteborg 

Sweden 

 
Göteborg, Sweden 2012 


Dental Implant Technology Clustering and 

Technology Life-span Analysis Using Ontology-

based Patent Intelligence 

 
SEAN LONG HOANG 

 
Department of Chemical and Biological Engineering 

CHALMERS UNIVERSITY OF TECHNOLOGY 

Göteborg, Sweden 


i 
 

Dental Implant Technology Clustering and Technology Life-span Analysis 

Using Ontology-based Patent Intelligence 

 
Student: Sean Long Hoang  Advisor:  Dr. Charles V. Trappey 

    National Chiao Tung University 

   Examiner:  Prof. Krister Holmberg 

    Chalmers University of Technology 

ABSTRACT 

Rapid technology development shorter product life cycle, fierce competition in the 

marketplace, establishes patent analyses are an important strategic tool for R&D management. 

This thesis develops a technology clustering and life-span analysis framework based on data 

mining techniques to help companies effectively and rapidly gain domain-specific knowledge 

and technology insight. In addition, patent documents contain complex terminologies which 

require experts to perform patent analysis. This research applies patent analysis 

methodologies to create domain-specific ontologies. The advantage of using an ontology is 

that it contains specific domain concepts and helps researchers to understand the relationships 

between concepts. In addition, ontologies are used to effectively extract domain knowledge, 

cluster patents and create graphs for tend recognition. Life-span analysis of technology 

clusters helps companies to gain a quick snapshot of their own patent portfolio and identify 

potential technology clusters for investment.  

This thesis proposes the process of knowledge extraction for domain-specific patents 

using patent analysis methodologies which improve domain-knowledge understanding. The 

methodologies proposed in this research include key phrase analysis, patent technology 

clustering, patent document clustering, domain-specific ontology, and life-span analysis. With 

these methodologies, companies quickly derive domain-specific ontologies to help R&D 

engineers relate data and increase understanding of a specific domain and the relationship 

between concepts. Life-span analysis helps companies’ direct strategic R&D plans and 

evaluates the timing of investments using the methodologies proposed in this research. The 

validity and reliability of the methodology are tested by studying the application of a set of 

dental implant patents. 

Keywords: Dental implant, ontology, key phrase analysis, clustering, life-span analysis 


ii 
 

ACKNOWLEGEMENTS 

I would like to thank my advisor Dr. Charles V. Trappey for guidance, advices, and 

helping me to excel to a higher level. I deeply appreciate his sharing of knowledge and 

experiences, and moreover his fun stories were always enjoyable. I would also like to thank a 

very kind person, Dr. Chun-Yi Wu, for his guidance, time, and effort throughout my time 

working on this thesis. Without the use of the IPDSS software, developed by Dr. Wu, this 

thesis would not have been possible to execute.  

I would also like to thank my family and friends! My time in Taiwan has been wonderful 

thanks to your kindness and hospitality.  

 
 “We make a living by what we get, but we make a life by what we give” 

By Sir Winston Churchill 

 
Sean Long Hoang 

Chalmers University of Technology 

Hsinchu, Taiwan  

December 2011  


iii 
 

TABLE OF CONTENTS 

 
ABSTRACT .............................................................................................................................. I 

ACKNOWLEGEMENTS ....................................................................................................... II 

TABLE OF CONTENTS ...................................................................................................... III 

LIST OF TABLES ................................................................................................................... V 

LIST OF FIGURES ............................................................................................................... VI 

1. INTRODUCTION ................................................................................................................ 1 

1.1. Research background ............................................................................................ 1 

1.2. Research motivation ............................................................................................. 2 

1.3. Research procedure............................................................................................... 3 

1.4. Research objectives .............................................................................................. 4 

1.5. Research limits ..................................................................................................... 4 

2. BACKGROUND ................................................................................................................... 5 

2.1. Dental implant ...................................................................................................... 5 

2.2. Data mining .......................................................................................................... 6 

2.2.1. Text mining ....................................................................................................... 8 

2.2.2. Limitations in text mining ................................................................................. 9 

2.3. Ontology ............................................................................................................... 9 

2.4. Patent analysis .................................................................................................... 12 

2.4.1. Data in patent documents ................................................................................ 15 

2.4.2. Limitations in patent analysis ......................................................................... 16 

2.5. Patent clustering ................................................................................................. 16 

2.6. Key phrase analysis ............................................................................................ 18 

2.6.1. Term frequency approach ............................................................................... 19 

2.6.2. Key phrase correlation matrix ......................................................................... 21 

2.6.3. Key phrase and patent correlation matrix ....................................................... 22 

2.6.4. Limitations in key phrases extraction of textual documents ........................... 23 

2.7. Technology life cycle analysis ........................................................................... 23 

2.8. Research framework ........................................................................................... 25 

3. METHODOLOGY ............................................................................................................. 26 


iv 
 

3.1. Patent domain definition ..................................................................................... 27 

3.2. Key phrase analysis process ............................................................................... 28 

3.4. Processing domain-specific ontologies .............................................................. 32 

3.5. Processing life-span analysis in patent clusters .................................................. 40 

4. CASE STUDY AND ANALYSIS ...................................................................................... 43 

4.1. Dental patent documents samples....................................................................... 43 

4.2. Dental implant technology key phrase analysis processing results .................... 43 

4.3. Dental implant technology key phrases .............................................................. 47 

4.4. Dental technology ontology ............................................................................... 48 

4.5. Ontology-based technology clustering of dental implant patents ...................... 53 

5. DISCUSSION AND CONCLUSION ................................................................................ 61 

5.1. Discussion ........................................................................................................... 61 

5.2. Conclusion .......................................................................................................... 64 

5.3. Future research suggestions ................................................................................ 65 

REFERENCES ....................................................................................................................... 66 

APPENDIX 1 Key phrase and patent correlation matrix ................................................. 71 

APPENDIX 2 Sub-domain key phrase matrix ................................................................... 72 

APPENDIX 3 Validation and modification of ontology .................................................... 74 

 
v 
 

LIST OF TABLES 

Table 1. Description of a typical patent analysis scenario ....................................................... 14 

Table 2. Definition of patent classifications for dental implants .............................................. 14 

Table 3. Key phrases and patent correlation matrix ................................................................. 18 

Table 4. Key phrases correlation matrix ................................................................................... 21 

Table 5. Key phrases and patent correlation matrix ................................................................. 22 

Table 6. Methodology outline of this research ......................................................................... 25 

Table 7. Key phrases and patent correlation matrix ................................................................. 30 

Table 8. Key phrases correlation matrix ................................................................................... 32 

Table 9. Patent and UPC matrix ............................................................................................... 34 

Table 10. Key phrase and UPC matrix ..................................................................................... 34 

Table 11. Key phrase organization matrix ................................................................................ 35 

Table 12. Key phrases and patent correlation matrix ............................................................... 38 

Table 13. Key phrases of each cluster ...................................................................................... 38 

Table 14. Patent information and average age ......................................................................... 42 

Table 15. List of patents in each classification and each dimension ........................................ 44 

Table 16. Part of dental implant key phrase and patent correlation matrix .............................. 45 

Table 17. Part of dental implant sub-domain key phrase matrix .............................................. 46 

Table 18. Part of key phrase matrix for each dimension .......................................................... 47 

Table 19. Key phrases for improvement of implant ontology .................................................. 50 

Table 20. List of phrases for ontological sub-domains of test patents ..................................... 53 

Table 21. Patent information and dental implants patents over the years ................................ 54 

Table 22. Implant assembly sub-domain patent information ................................................... 55 

Table 23. Screw device sub-domain patent information .......................................................... 55 

Table 24. Implant fixture sub-domain patent information ....................................................... 56 


vi 
 

LIST OF FIGURES 

Figure 1. Research procedure and framework ............................................................................ 4 

Figure 2. The construction of dental implants ............................................................................ 6 

Figure 3. Overview of data mining ............................................................................................. 7 

Figure 4. An example ontology tree: RFID technology ........................................................... 11 

Figure 5. General process of clustering data. ........................................................................... 17 

Figure 6. S-curve of TLC stages ............................................................................................... 24 

Figure 7. Login page of IPDSS ................................................................................................ 26 

Figure 8. Workspace of IPDSS ................................................................................................ 26 

Figure 9. Processing of patent domain definition ..................................................................... 27 

Figure 10. The key phrase analysis process ............................................................................. 29 

Figure 11. Processing of ontology ............................................................................................ 33 

Figure 12. Processing TYPE II ontologies ............................................................................... 35 

Figure 13. Improvement steps of domain-specific ontologies ................................................. 37 

Figure 14. Processing of life-span analysis .............................................................................. 41 

Figure 15. Proposed life-span analyses of dental implant patent clusters ................................ 42 

Figure 16. Dental implant ontology tree ................................................................................... 49 

Figure 17. Dental technology ontology tree ............................................................................. 52 

Figure 18. Life-span of dental implant clusters without expired patents ................................. 57 

Figure 19. Life-span of dental implant clusters including expired patents .............................. 57 

Figure 20. Life-span comparisons of dental implants over the years ....................................... 58 

Figure 21. Process for building a domain-specific ontology .................................................... 63 

 
1 
 

1. INTRODUCTION 

This chapter describes the value of patent analysis and the importance of ontologies. The 

background, motivations, procedures, and objectives of this research are also discussed.   

1.1. Research background  

The global dental equipment and product market is estimated to grow at a Compound 

Annual Growth Rate (CAGR) of 7% reaching US$27.6 billion by 2015 (Salisonline 2011). 

Some countries face challenges in the medical dental device industry, since they lack 

sufficient expertise and skills, public and private funding specially at early phases where 

innovation is developing at a high risk level, and also lack a national priority or strategy for 

the developing sector (University of Ottawa 2011). Other issues are litigation, government 

regulations and most importantly, Intellectual Property (IP) management of innovative 

technologies as well as existing core technologies (University of Ottawa 2011).  

It is important to maintain a high quality patent portfolio to quickly create innovative 

products, advance innovative technology development and protect innovations as well as 

avoid litigations (Trappey et al. 2011). Traditional patent analysis requires time, effort and 

expertise to interpret the research results. Recent patent analysis techniques use data mining 

as a tool to extract data of large volumes from information which decrease the time and effort 

for analyzing patents (Lee et al. 2009). These techniques apply statistical analysis techniques 

to automatically perform key phrase analysis, document clustering, life cycle analysis, and so 

on. High technology companies strive to orient R&D and strategic plans with emerging 

technologies. Patent documents are available through databases and are rich sources of 

information which provide a foundation for technology trend analysis. 

Patent documents in specific technology domains contain domain-specific terms which 

require domain experts and their experience to perform patent analysis (Jun & Uhm 2010). 

This limits the opportunity for researchers or R&D engineers to explore and understand patent 

information using data mining techniques. However, according to Wanner et al. (2008) using 

data mining techniques should include an ontology. Ontologies can be seen as an organized 

hierarchical structure with abstracted domain concepts and relations expressed in terms of 

domain terminologies and concepts. The advantage of using an ontology is that it contains a 

specific domain corpus to understand the meaning of these terminologies. The ontology helps 


2 
 

researchers and R&D engineers relate the data and increase the understanding of a specific 

domain as well as understand the relationship between domain concepts.  

The current life cycle stage of the technology influences companies when invest R&D 

capital on technologies (Haupt et al. 2007). A life cycle can be divided into four stages: 

introduction, growth, maturity and decline. The introduction stage of a new technology often 

includes technical problems or scientific fundamental problems and is associated with high 

risk of investments of R&D capital. A technology at the mature stage requires companies to 

evaluate the boundaries of patents to avoid infringement issues,   trade IP, or license IP. 

1.2. Research motivation  

A company must meet the market demands for new products. Designs to maintain 

market position consistently require advancement in technology and innovation. Companies 

with a strong patent portfolio use intellectual property as leverage in the marketplace to gain 

competitive advantage. In addition, companies face increased competitiveness and budget 

constraints to effective allocate resources to specific technology areas for research and 

development. The rapid advancement in technology and shorter product life cycles results in 

time constraints for companies to strategically plan R&D activities. Moreover, patent 

documents contain complex terminologies that require hiring high-fee experts that require 

much time for analysis. Since R&D engineers often do not have the skills to perform patent 

analysis. The main advantage of using ontologies is that it contains domain-specific 

knowledge and concepts which enable R&D engineers to gain valuable insights of that 

domain to understand concept relationships. Companies need help to direct R&D plans and 

evaluate the timing for future technology investments, a life-span analysis can be used to gain 

a quick snapshot of the potential technology clusters for potential investment strategies. Patent 

analysis helps companies to target R&D plans towards recent technology trends and identify 

future R&D plans (Trappey et al. 2011). Companies with a strong patent portfolio and 

conduct strategic patenting activities are more successful than other companies that remain 

inactive in for example the field of mechanical engineering and biotechnology sector (Ernst 

1995; Austin 1993).  

Companies using patent analysis can observe technology development and identify 

potential competitors at the market place. Since a patent is granting the inventor exclusive 

rights for a limited time period to exclude anyone to produce or use this specific device, 

apparatus, process, or design (Grilliches 1998). In addition, a patent or patentability of a 


3 
 

technology is also one of the preconditions of the commercial potential of a technology. 

Moreover, patents can be used to generate life cycle analysis to monitor technology 

development and help companies to identify potential R&D investment opportunity or 

strengthen their IP-position (Haupt et al. 2007). Patent analysis can be used to study a 

country‟s performance (Grilliches 1990), governments use it to allocate resources to specific 

technology areas (Yoon & Yongtae 2007), and monitor technology development of 

competitors (Jun & Uhm 2010). Some companies avoid filing patents and patent only the 

most successful innovation. Subramanian & Soh (2010) explain that most high technology 

firms are in a patent race and patents are considered to be linked to the firms‟ performance. 

1.3. Research procedure    

The research procedure is divided into 6 phases and listed accordingly: 

 Phase 1: Motive and objectives  

 Phase 2: Literature review 

 Phase 3: Selection of research direction 

The direction of this research is selected according to predefined objectives and 

literature reviews. Selected development is: data mining, key phrase analysis, 

technology clustering, patent document clustering, and technology life-span 

analysis.  

 Phase 4: Methodology development 

 Phase 5: Case study  

 Phase 6: Evaluation  

The research procedure and framework is shown in Figure 1.  


4 
 

Motive and 

objectives

Literature 

review

Selection of 

research 

direction

Case study

Evaluation

Data mining

Key phrase 

analysis

Technology 

clustering

Patent 

document 

clustering

Technology 

life-span 

analysis

1.

2.

3.

5.

6.

4.

Methodology development
 

Figure 1. Research procedure and framework 

1.4. Research objectives  

The goal of this research is the development of a domain-specific ontology framework, 

which will be used for technology clustering and life-span analysis of clusters. This research 

intend to: 

1. To synthesis methodologies to generate the useful information to construct a 

domain-specific patent ontology structure and procedure based on dental 

implants patents  

2. To develop a patent domain-specific ontology framework based on patent 

classifications to gain domain-specific knowledge and relationships between 

concepts  

3. To develop a procedure for ontological sub-domain technology clustering and 

technology life-span analysis  

1.5. Research limits 

This research is subjected to the following limitations. The first limitation points to 

patent data are only from the United States Patent and Trademark Office (USPTO). The 

ontologies are built based on patent data only from USPTO, the researchers experience and 

knowledge on the dental implant domain, and with help from dictionary (WordNet) to link the 

concepts in the ontology.  


5 
 

2. BACKGROUND 

This chapter has eight sections, separately discusses dental implant, data mining, patent 

analysis, key phrase extraction, clustering, technology life cycle analysis and research 

framework. The background covers the foundation methodologies data mining, key phrase 

analysis, patent technology clustering, patent document clustering and technology life cycle 

analysis for developing the methodology of creating a domain-specific patent ontology.  

2.1. Dental implant 

The largest segment of the global market is restorative and preventive dentistry, but the 

fastest growing segment is dental implantology. Dental implant has a market share of 18% of 

the global dental device market (Markets and Markets 2010).The global dental equipment and 

product market is experiencing a steady growth due to a number of factors such as demand of 

dental implants, desire for aesthetics, and increased usage of dental preventive care (Brocair 

Partners 2010; Salisonline 2011). The popularity of cosmetic dental treatments and implants 

along with increasing demand for better dental care will drive the growth for new innovative 

dental implant technologies and products (Salisonline 2011).  

Currently, millions of people benefit from having dental implant since this is the ideal 

option for people, who has lost a tooth or teeth due to gum disease, an injury, or some other 

reason. Dental implant is an artificial tooth root that is placed into the jaw that holds a 

replacement tooth. Dental implants are the only possible option of replacing missing teeth 

which closely resemble a natural tooth and that behaves exactly like real roots and bonds 

naturally to the jaw bone. The crown is then bonded to the top of the dental implant.  

Paterson and Zamanian (2009) confirm that the dental implant industry will experience a 

strong growth from 2011 to 2015 for the global market and along with emerging technologies 

will improve the dental efficiency of dental procedures and reduce the time. For an example, 

Salisonline (2011) points out is 3D imaging techniques has improved patient diagnosis and 

procedure planning. Other emerging technologies in dental implant industry or biotechnology 

industry are dental biomaterials and tissue regenerative materials which offer a more natural 

and long-term solution. For the U.S and EU market, these trends are changing the customers 

need towards a shift to cosmetic dentistry and drive the dental implant market to high-end 

dental solutions and products (Salisonline 2011).  

Dental implant is defined as an implant that replaces a natural tooth (WordNet Princeton 

University, 2011). The main components of a dental implant include a screw that is able to 


6 
 

connect with a custom-made crown. Figure 2 describes the main components of a dental 

implant compared with a natural tooth.  

 
Figure 2. The construction of dental implants (Puja Dental Group 2011). 

Various components of dental implants are: implant body, cover screw (prevent to access 

the bone), transmuscosal abutment (links the implant body to the mouth), healing abutment 

(temporarily placed on implant to maintain potency of the muscosal penetration), healing caps 

(temporary covers for abutments), crowns, bridges, gold cylinder (to fit an abutment and form 

part of prosthesis), laboratory analogue (a base metal replica of implant or abutment), etc. 

(AstraTech Dental 2011 and Free Dental Implant 2011).  

2.2. Data mining 

Recently, data mining is one popular alternative to use for extracting information from 

databases. Data mining applies machine learning and statistical analysis techniques to access 

and extract information from databases with large volumes of patent documents (Lee et al. 

2009). It automatically discovers patterns in databases and is useful for mapping scientific and 

technical information for complex analysis of large volumes of information (Kim et al. 2008). 

Fayyad et al. (1996) states that it requires to develop methods and techniques to interpret the 

data for it to make sense for humans. The facts are that data volumes are growing rapidly in 

both objects and records in thousands of various fields, for example in the medical field it can 

easily be divided into hundreds of different fields (Fayyad et al. 1996).  

Data are captured for different purposes, in business for gaining competitive advantage 

or environmental data to better understand the effects. It has been applied in marketing, 

investment, fraud detection, manufacturing, and telecommunications. For example in 

marketing, the primary application with data mining is to analyze the database to identify 


7 
 

customer groups and forecast future buying patterns or behavior (Fayyad et al. 1996). 

Furthermore, data are a set of facts and pattern equals to the language that describes a model 

or finding a structure from the data. In general, by model or structure means the value of the 

pattern combined with validity, novelty, usefulness, and simplicity. All of these conditions 

does not define “knowledge” but rather define the framework of the pattern recognition of 

knowledge and it should also be taken into consideration that it is purely user oriented and 

domain specific as well as functions are determined by the user (Fayyad et al. 1996).  

Fayyad et al. (1996) states that the term Knowledge Discovery in Databases (KDD) is 

the overall process of discovering knowledge from a database and that data mining is actually 

a step in that process. Data or text mining are the same, historically it has been given a variety 

of names but the concept of finding patterns in data is the same. The concept of data mining is 

to use algorithms to extract patterns in data, also known as information retrieval (Fayyad et al. 

1996). KDD uses additional steps such as data preparation (storage and access), data 

selection, data cleaning, incorporation of appropriate prior knowledge, and logical 

interpretation of the results, are important and need to be understandable knowledge extracted 

from the data, statistics to provide the framework and language for discovery of patterns 

(Fayyad et al 1996). The overall data mining framework is shown in figure 3. 

 
Figure 3. Overview of data mining, adapted from Han J. et al. (2011)  


8 
 

2.2.1. Text mining 

Text mining is a technique developed from data mining to analyze textual data especially 

unstructured (free text, abstract, etc.) textual documents for example patent documents (Lee et 

al. 2009; Kostoff et al. 2009). Text mining utilizes a technique to put a label of each document 

and link them to specific words which allows the discovery be based on labels (Lee et al. 

2009). A text document is unclear, and according to Nasukawa and Nagano (2001), it contains 

various types of information, richness in information and represents factual information. Most 

information stored in a database is in the form of text documents. Text mining is used for 

automatic discover knowledge and patterns in a database by applying statistical algorithms 

(Weiss et al. 2005). Text mining is a broad field that involves information retrieval, text 

analysis, information extraction, clustering, categorization, visualization, machine learning, 

and data mining (Tan 2011; Lee et al. 2005). 

Patent documents contain detailed information in complex technical and legal terms only 

experts in specific field understand and the purpose is to make it difficult for non-specialist to 

read and analyze (Tseng et al. 2005). Patent documents are often lengthy and traditional 

patent analysis is inefficient, require long time and human effort to analyze the contents which 

also is highly expensive to maintain (Lee et al. 2009; Tseng et al. 2005).  

In August 16, 2011 the United States Patent Trademark Office (USPTO) issued patent 

number 8 000 000 and according to USPTO statistics there are a couple hundred thousand 

patent applications pending for examination each year (USPTO 2011). The accumulation of 

patent documents at the USPTO has increased at a striking pace due to more patent 

applications and granted patents (USPTO 2011). Therefore, it exist a great demand for 

automated data mining techniques for extracting information from the rapid growing volume 

of data into a compact form that can be easier to absorb. Large text databases such as USPTO 

database potentially contain great amount of information (knowledge) if and only if it can be 

interpreted. The traditional method of turning raw data into knowledge require manual 

analysis and huge amount of reading and organizing the content, thus it requires huge amount 

of workload for analyzing only a tiny fraction of the database (Lee et al. 2009). Furthermore, 

it is expensive and very subjective analysis is provided by the analysts (Fayyad et al. 1996).  

 Recently, text mining has attracted researchers to apply it on patent analysis (Kim et al. 

2008; Yoon and Park 2004). For example, Tseng et al. (2007) used text mining techniques to 

create a patent map for technology domain of carbon nano-tubes. Tseng et al. (2005) applied 


9 
 

text mining techniques to automatically create important categorization features that might be 

good as human derived or even better. One of the advantages using text mining techniques in 

patent analysis is that it can handle large volumes of patent documents and extract useful 

information (Lee et al. 2009). Since patent documents are lengthy but contain significant 

technical information and automatic text mining will assist researchers, engineers or decision-

makers in patent analysis (Lee et al. 2009). However, extracted data has to meet specific 

quality criteria to be comprehensible for humans and also represent the concept of the text or 

benefit for the user (Yoon and Park 2004). Furthermore, text mining techniques has been 

applied for summarization, term association, cluster generation, topic identification, mapping 

information, technology trend analysis, automatic patent classification, and so on (Lee et al. 

2009; Yoon and Park 2004; Tseng et al. 2007).  

2.2.2. Limitations in text mining 

Although text mining seems to be a very promising technique for analyzing textual data 

there are some limitations such as areas that require accuracy. It is useful for providing 

supportive information for analysis (Smith 2002). Text mining techniques face a significant 

challenge in dealing with patent documents because algorithms cannot include compound 

words because of difficulties in determining them and cannot consider synonyms (Lee et al. 

2009; Smith 2002). Furthermore, in terms of accuracy using text mining to make sure there is 

a distinction between documents it requires a large number of keywords (Smith 2002). 

Additionally, using text mining for unstructured data in patent documents, difficulties can be 

encountered in distinguishing texts or keywords that are describing “prior art” from texts 

which describe the invention (Smith 2002). This is important since the description will 

address the technical characteristics of the patent invention. However, despite these 

limitations, advances in computer science and better text mining algorithms are expected to 

strengthen text mining advantages making it more efficient and accurate.  

2.3. Ontology 

Huang et al. (2008) describes that the concepts of ontology is a model which contain the 

concepts, links and relationships in a specific domain that reflects the reality of the world. 

WordNet from Princeton University (2011) define ontology as: a rigorous and exhaustive 

organization of some knowledge domain that is usually hierarchical and contains all the 

relevant entities and their relations. Ontology provides a unified knowledgebase expressed in 

the information domain that has integrated information from various sources (Taduri et al. 


10 
 

2011). For example, a company that needs to collect information of a specific technology and 

has only initial knowledge, domain-specific ontology can be used to collect relevant 

information much faster than existing systems (Taduri et al. 2011). The ontology contains 

domain concepts and relations which can be reused, modified, and shared among R&D 

engineers (Soo et al. 2006). Ontology provides an organized framework with a hierarchical 

structure and relationships of the domain which offers the possibility to understand relations 

between concepts (Rubin et al. 2007).  

Ontology links the semantic data between concepts which makes it possible to perform 

pattern recognition, similarity, and clustering of patent documents with respect to its content 

(Wanner et al. 2008). A variety of methods has been proposed to create knowledge domains 

and one of the methods suggests a single ontology that integrates all knowledge domains 

(Taduri et al. 2011). The potential drawback of this method is that it creates a very large set of 

knowledge domains which depending on the application may be unnecessary and inefficient 

(Lau et al. 2011). Alternatives ontology architectures propose having separate ontologies that 

are domain-specific which are application specific (Noy & McGuiness 2001). Many 

methodologies have been proposed and are used to create ontologies for capturing domain 

knowledge of patents to enhance the information retrieval (Taduri et al. 2011).   

Ontology-based patent intelligence 

A significant effort has to be taken into consideration when gathering relevant patent 

information across different patent databases (Khelif et al. 2007). For example, a start-up 

company wanting to patent their technology in the field of dental implants wants to search 

patent databases, scientific publications, and perform patent analysis for infringement 

purposes and competitor analysis (Taduri et al 2011). They face challenges to thoroughly 

search for patent documents and the large volumes of patent documents makes it an almost 

impossible task (Lau et al. 2011). Patent documents in specific technology domains contain 

domain-specific terms which cannot be covered in common dictionaries, therefore an 

advantage of ontology is that it contain specific domain terminologies (Soo et al. 2006). 

Trappey et al. (2010) points out that ontology are useful to extract related concept of key 

phrases. Ontology serves as an organized structure for arranging or classifying a domain. In 

addition, ontology is a way of formally represent knowledge domains with concepts, their 

attributes and relations between terminologies expressed in some well defined logic (Rubin et 

al. 2007). According to Wanner et al. (2008), using text mining techniques such as key phrase 

extraction for representing the content of a patent document it should also contain important 


11 
 

feature: ontology. The goal of using domain specific ontology is to reduce conceptual and 

terminological confusion among R&D engineers (Navigli and Velardi, 2004). In addition, 

sharing the domain-specific ontology can improve the communication, cooperation among 

people, better enterprise organization, and system engineering (reusability, reliability, and 

specification) (Navigli and Velardi, 2004).  

Ontology approach for knowledge extraction has been applied in various fields. For 

example, Trappey et al. (2009) applied an ontology tree for automatic patent document 

summarization which extracts key information into a shorten abstract with the key concepts of 

the patent document. The goal was to use the ontology to create a knowledge base for their 

software program to improve its architecture and consistency in capturing knowledge of their 

information system. Figure 4 demonstrates the example RFID technology ontology tree from 

Trappey et al. (2010) that were used for patent document summarization. Rubin et al. (2007) 

propose how biomedical ontologies can help researchers to accelerate their research because 

biomedical information available is exploding. Biomedical ontologies help researchers to 

structure complex biological domains and relate the data, for example, the gene ontology have 

gained huge attention in the biomedical community (Rubin et al. 2007).  

 
Figure 4. An example ontology tree: RFID technology (Trappey et al. 2010) 

RFID 

Ontology 

RFID 

Device  

Wired 

Connector  

Wireless  

Reader  
Memory  

Frequency band  

Tag  

Standard  Unit  

Value  

Antenna  

Frequency  

Band  

Active  

Passive  

Impedance  

 
directivity  

 
Gain  

Wave  

Direction  

Communication  

Tolerance  

Protocol  Security  

Processor  

Person  

Plant  

Parts  

Creature  Item 

Circuit  

Animal  

Encoding  

Range  

RFID 

application 

RFID  

Portable 

Tracking  

 
Interaction 

  
Identification  

Personal tracking  

Asset tracking  

Animal tracking  

Inventory  

Access  

Distribution  


12 
 

To create a domain-specific patent ontology requires phrases that describe the concepts 

of patent documents (Trappey et al. 2010). It requires identifying and defining relevant 

concepts and relating it to a given application (Navigli and Velardi, 2004). The challenge 

when dealing with specific technical domains such as telecommunications, biotechnology, 

biomedical, there is specific technical or domain specific terminology in patent documents 

(Taduri et al. 2011). The terms are often a challenge since it is presented in several forms such 

as synonyms and hyponyms etc. which makes the general language comparison in patent 

documents inefficient. Taduri et al. (2011) and Mukherjea & Bamba (2007) points out the 

advantages using ontology to capture the rich information available and allows the application 

to understand the semantics associations to avoid terminological inconsistencies. It also 

allows users to reason across the knowledge domain where some applications require small 

fragments of information which let users to choose to work with only information that is 

needed (Lau et al. 2011). For example, R&D engineers may only be interested in a 

technological sub-domain and ignore the other knowledge sub-domains.  

2.4. Patent analysis  

Patent documents contain rich detailed information about research results that are in 

complex technical and legal terms, it is valuable to the industry, business, law, and policy-

making communities (Tseng et al. 2007; Choi et al. 2007). Thus, the detailed content in patent 

documents, if carefully analyzed, can reveal technology development, inspire novel technical 

solutions, show technical relations, or help investment policy (Tseng et al. 2005). Tseng et al. 

(2007) point out that patent analysis has become important even at government level at some 

Asian countries such as China, Japan, Korea, Singapore, and Taiwan. These countries have 

invested various resources to create visualized results of patent analysis (Lee et al. 2009). 

Patents are a useful vehicle for R&D and technology management research since it is a source 

of technical and commercial information which can be turned into knowledge (Choi et al 

2007; Lee et al. 2009). Patent documents are often lengthy and require time, effort and 

expertise to interpret the research results into a technology development analysis. Tseng et al 

(2007) also emphasize that patent analysts also need a certain degree of expertise in 

information retrieval, domain-specific technologies, legal knowledge and some business 

intelligence. A typical patent analysis scenario is showed on Table 1, these multi-discipline 

area require hard to find analysts or costly to train and maintain. Thus, automated 

technologies assisting patent analysis are in great demand.  


13 
 

Patent analysis can be divided into two levels of analysis, macro level research of 

national or industrial analysis and micro level research of specific technology development or 

forecasting (Choi et al. 2007). Macro level analysis evaluates the major economical effect of 

technological innovations, technological development and competitiveness of countries 

(Grilliches 1990). At micro level, the focus is to identify technological development of 

specific areas/technologies, advantages and disadvantages of competitors, strategic planning 

of R&D activities, and patent data are analyzed to find the relation between companies and 

technologies (Haupt et al. 2006).  

Many researchers have tried to identify indicators or determinants of patent value 

(Sapsalis et al. 2006). By using different types of data sets of patent data such as regional 

patent offices, particular sample, specific sector such as biotechnology, or particular company 

in a given country. Several researchers have studied patents and its relationship or effect on 

economy, technological innovations/development, or a country‟s competiveness (Grilliches 

1990). It is important since on average only 1-3 patents out of 100 can generate significant 

financial returns. Although only a few patents have commercial success, most patents are 

developed by follow-up patenting into significantly important technologies (Ernst 1997). High 

value patents often have broad technical claims and a high citation index which increase the 

financial value of the company (Lerner 1995). Companies with a strong patent portfolio and 

conduct strategic patenting activities are more successful than other companies that remain 

inactive in the field of mechanical engineering and biotechnology sector (Ernst 1995, 2001; 

Austin 1993). Patent analysis can be effectively used for companies to gain competitive 

advantages at market place (Grilliches 1990). Moreover, patents are easily accessible 

throughout the world through databases in most countries (Lee et al. 2009).  

Before issuing a patent at USPTO each patent document are given one or several patent 

classification based on invention, claims, and content (Tseng et al. 2007). These 

classifications are denoted as UPC (U.S. Classification) and IPC (International Patent 

Classification) these are given in most patent documents. According to Tseng et al. (2007), 

patent classifications are sometimes too broad or cannot meet the requirements for particular 

analysis. In this research, UPC and IPC are used for analysis and examples of UPC and IPC 

definitions for dental implant are shown in Table 2.  

  
14 
 

Table 1. Description of a typical patent analysis scenario adapted from Tseng et al. (2007). 

A typical patent analysis scenario 

1. Task identification Define the scope, concepts, and purposes for the analysis task 

2. Searching Iteratively search, filter, and download related patents 

3. Segmentation Segment, clean, normalize structured and unstructured parts 

4. Abstracting  Analyze the patent content to summarize their claims, topics, 

functions, or technologies 

5. Clustering Group or classify analyzed patents based on some extracted 

attributes 

6. Visualization Create technology-effect matrices or topic maps 

7. Interpretation Predict technology or business trends and relations  

Source: Tseng et al. (2007). 

Table 2. Definition of patent classifications for dental implants 

Patent classifications 

International Patent Classification (IPC) 

Class  Dentistry; Apparatus or methods for dental hygiene 

A61C 8/00 Means to be fixed to the jaw-bone for consolidating natural teeth or for fixing 

dental prostheses thereon; Dental implants; Implanting tools 

A61C 13/00 Dental prostheses; Making same (tooth crowns for capping teeth; dental 

implants) 

U.S. Patent Classification (UPC)  

Class 433 Dentistry  

Subclass 

433/173 

By fastening to jawbone: This subclass is indented under subclass 172. 

Subject matter wherein the denture is secured directly to the jawbone of the 

patient.  

Subclass 

433/174 

By screw: This subclass is indented under subclass 173. Subject matter 

wherein the denture is secured to the jawbone by an elongated helically ribbed 

member 

Source: USPTO and WIPO (2011) 


15 
 

2.4.1. Data in patent documents 

A patent document contains items/details which can be divided into two groups, 

structured and unstructured data (Tseng et al. 2005). Structured data in patent documents are 

uniformthrough most patents such as patent number, filing date, inventor, and assignee. 

Unstructured data are defined such as free text of various length and content, claims, 

abstracts, or description of the invention. Patent analysis using structured information such as 

filing dates, assignees, and citations etc. have been in practice and literature for years (Ernst 

1997; Lai & Wu 2005). The visualized results of structured data (patent number, filing date, 

etc.) is called patent graph and most use bibliometric data of patent documents to provide 

statistical results for patent analysis (Lee et al. 2009). Unstructured data (abstract, free text, 

etc.) are called patent maps. However, the general term patent maps can be used to describe 

both structured and unstructured data (Tseng et al. 2007). Patent maps are the visualization 

step in Table 1. Patent maps can be used for decision-making about future R&D directions 

(understanding patent relations and how patents are invented in the past), or predict 

technology/business trends (trend of major competitors in the same industry), and discover 

technological trends and opportunities as well as technological holes for future innovations 

(Tseng et al. 2007; Choi et al. 2007). 

Bibliometrics is defined as the measurement of texts and information (Norton 2001). In 

general, most patent analysis utilize bilbiometric data (structured data) which explore, 

organize and analyze large amounts of data in order to identify patterns such as authors, 

technology field, citations, and so on (Daim et al. 2006). Although there are many items for 

analysis, one in particular has been employed more frequently, citation analysis. Patent 

citations or citation analysis are defined as the count of citations of a patent in subsequent 

patent, and thus citations per patent represents the relative importance of the patent (Lee et al. 

2009). One possible reason can be as Sapsalis et al. (2006) point out is that citations analysis 

are closely associated with patent value (increase of financial value of a company). However, 

the scope of analysis using bibliometric data is easy to understand and to create but are 

subjected to limited access of the richness of information in patent documents since it only 

uses bibliometric fields (Lee et al. 2009). Text mining has been proposed as an alternative to 

analyze unstructured textual data in patent documents (Kim et al. 2008).  


16 
 

2.4.2. Limitations in patent analysis 

There are certain limitations using patent analysis as indicator for forecasting technology 

development or business trends. First, not every company or organization patent their 

invention and Choi et al. (2007) mention that for example not all inventions meet the criteria 

made by patent offices and also some companies or industries rely on secrecy also it is a 

strategic decision not to patent an invention. Second, the results from patent analysis are 

interpreted differently across industries and companies, which results in inconsistent analysis. 

Third, patent laws changes over time which makes it difficult to analyze over time but 

recently companies are more inclined to file patents to mainly protect their invention from 

competitors (Choi et al. 2007).  

2.5. Patent clustering 

Recently, cluster analysis has become an important topic because of recent decade of 

advancement in data mining, increased computer power, and increased statistical software 

packages that include cluster analysis algorithms (Kettenring 2009). Given a set of 

documents, often there is a need to categorize documents into groups or clusters. For a small 

set of documents it can be done manually, on the other hand for a large set of documents the 

process will be time consuming and inefficient (Shahnaz et al. 2006). A patent document 

usually consists of a title, an abstract, claims, detailed description of the invention and 

bibliographic information. Moreover, all patent documents have manually assigned 

International Patent Classifications (IPCs) and if issued at USPTO it also consists of United 

States Patent Classification (UPC). Classification codes, IPC & UPC, are manually clustered 

by patent specialist or examiners. This type of classification is called supervised since it has 

predefined categories or topics for classification (Shahnaz et al. 2006). Unsupervised 

classification often deals with unstructured data. The goal is to organize and structure the 

unstructured data into groups or clusters based on the patterns of the collection itself (Dunham 

2003). According to Trappey et al. (2010), patent documents with the same classification 

codes may be entirely different.  

Clustering methodology is an important data analysis technique, which classify patterns 

of key phrases into categories based on the characteristics of relationship (Trappey et al. 

2009). The main concept is to measure the similarity in data and categorize it to the most 

suitable cluster and maximize the similarity of specified variables within the same cluster, in 

other words, create a homogenous cluster. It is necessary that each patent document belonging 


17 
 

to a cluster to be similar. The importance according to Almeida et al. (2007) is presence of 

high connectivity among these patent documents which is high association between objects.  

Clustering methodology has been applied to numerous of different fields. For example, 

Taiwan Semiconductor Manufacturing Company, Ltd, use clustering analysis to detect errors 

in the manufacturing process, by isolating and separating failure symptoms and group 

suspicious process steps for evaluation by the process engineer (Kettenring 2009). It has also 

been used in predicting consumer behavior by creating shopping clusters of consumers‟ 

previous purchasing behavior or patterns, to forecast future shopping behavior (Kettenring 

2009). A general clustering approach is shown in Figure 5.  

Data 

Conversion

Similarity Evaluation

AnalysisClustering

Data Collection Results


 

Figure 5. General process of clustering data. Adapted from Trappey et al. (2009) 

Patent technology clustering  

Patent technology clustering is a method to group similar or technology related patent 

documents into clusters rather than by UPC (Trappey et al. 2010). Patent technology 

clustering makes it possible to analyze the relationship between patent documents in specific 

domain technology and also the possibility to analyze patent or trends and development 

(Trappey et al. 2008). Patent technology clustering is derived from using key phrase 

correlation matrix as input and by applying the K-means algorithm (Trappey et al. 2010; 

Trappey et al. 2009). A more complete discussion on functions of K-means algorithm is 

provided by Han et al. (2011). Furthermore, the Root Mean Square Standard Deviation 

(RMSSTD) and R-Squared (RS) is used by Trappey et al. (2009) to find the optimal number 

of clusters in a set of data. RMSSTD is the standard deviation of all variables and represent 

the minimum variance in the same cluster therefore the value of RMSSTD should be as small 

as possible to gain optimal results. RS describe the maximum variance between different 

clusters and the value of RS should be as large as possible because RS is the sum of squares 

between different clusters divided by the total sum of squares for the set of data. A more 

detailed description of equations and functions of RMSSTD, RS and K-means are described 

by Trappey et al. (2009), Trappey et al. (2010) and Trappey et al. (2008). 


18 
 

Patent document clustering  

Patent document clustering uses the correlation matrix generated from patent technology 

clustering as input in K-means algorithm (Trappey et al. 2010). Patent document clustering is 

a method that measures the internal relationship of the key points of the patent document and 

classifies patent documents based on the similarity of the technologies (Taghaboni-Dutta et al. 

2009; Trappey et al. 2010). As a result it makes it easier for patent analyst to analyze the 

characteristics of patent documents in the clusters. This also solves the problem of patent 

classification systems (IPC and UPC) which may place the same code on patent documents 

which may be entirely different in technology (Taghaboni-Dutta et al. 2009). As shown in 

Table 3, the matrix is used as an input for patent document clustering.  

Table 3. Key phrases and patent correlation matrix (Trappey et al. 2010) 

 Patent1 Patent2 Patent3 … Patentn 

TC1 N1,1 N1,2 N1,3 .. N1N 

TC2 N2,1 N2,2 … … N2N 

TC3 N3,1 … … … N3N 

… … … … … … 

TCn … … … … Nnm 

Source: Trappey et al. (2010) 

 
2.6. Key phrase analysis 

Key phrase extraction is useful for document or information retrieval, document 

clustering, summarization, text mining, and so on (Matsuo and Ishizuka 2003). Turney (2000) 

also point out a dozen useful applications with key phrase extraction for example, 

highlighting key phrases in text, document classification, text compression, or constructing 

human-readable text. Most information stored in databases is textual documents. Extracting 

key phrases makes it possible to determine which document is important and also identify the 

relation among several documents since it extracts relevant key phrases (Matsuo and Ishizuka 


19 
 

2003; Hammouda et al. 2005). According to Voorhees (1999), the majority uses statistical 

approaches for information retrieval (key phrase extraction) because of the assumption that 

two texts in the same topic use the same key phrases. Statistical approach measure the 

similarity of key phrases between textual documents. There are different approaches for key 

phrase extraction and the most commonly used are a lexical approach, natural language 

processing (NLP), or term frequency approach (Trappey et al. 2008). Hammouda et al. (2005) 

divide key phrases extraction algorithms into two categories: key phrases extraction that 

requires supervised learning and are applied for single documents, on the other hand, key 

phrase extraction on a set of documents are unsupervised and self-learning which discover 

rather than learning from examples, also known as knowledge discovery.  

Research points towards that key phrases main goal are to represent the topics discussed 

in any text document (Turney 2000). Furthermore, Turney (2000) point out the relevance 

using key phrase extraction such as it enable the user quickly to determine if the key phrases 

are in the field of interest and it can be used for relevant indexing based on the key phrases. 

Key phrases extraction has been applied in many different fields, although mainly for 

summaries purposes (Turney 2000). For example Nenkova et al. (2006) studied the impact of 

automatic summarization systems based on key phrase extraction and its role in human 

summarization, the results showed that the key phrase frequency methodology used generated 

summaries comparable with state-of-the-art systems. Trappey et al. (2008) are using a 

hierarchy and semantic relationship concept to create a summarization system that uses key 

phrases to summarize any patent document based on the specific domain of the patent 

document.  

2.6.1. Term frequency approach  

The term frequency (TF) approach is based on the assumption that high frequent key 

phrases in a text document are more relevant to the concept of the content (Trappey et al. 

2008). Robertson (2004) also points out that high frequency of a term represent a document 

better. Furthermore, in information retrieval of terms (key phrase), the most common terms 

are used in weighting schemes to represent text documents (Aizawa 2002). For example, 

Robertson and Sparck Jones (1976) study the relevance of weighting methods of key phrase 

using term frequency weighted with the inverse document frequency (TF-IDF). Trappey et al. 

(2008) uses a normalized TF-IDF to extract key phrases and phrases for clustering of patents.  


20 
 

The concept of TF-IDF is that it weight frequent key terms in a series of documents to 

determine its relevance. Therefore, frequent key terms in one document cannot represent a 

domain but frequent key terms in a series of document might represent the concept of the 

domain (Robertson and Sparck Jones, 1976).  

The basic formula of IDF used by Robertson and Sparck Jones (1976) and Trappey et al. 

(2007) is expressed as:  

         (
 

)                                                                                                                                   ( ) 

where   is the total number of documents in the collection and     is the number of 

documents in the collection which containing term  .      itself represent the inverse 

document frequency (IDF) of term  . Trappey et al. (2007) describe      as a value of 

representation of term   and if      becomes a significant high value, the term   can represent 

a specific document. 

The weighting of key phrase using TF-IDF in text documents where TF are weighted in 

IDF is according to Trappey et al. (2007) expressed as:  

                                                                                                                                                  ( ) 

where     is defined as weight of term   in document   of the collection,      is the number of 

term   that occurs in document   of the collection, and      is the inverse document frequency 

of term  . Therefore, the highest value of     equals the most frequent key phrase in a specific 

text document and are identified as the key phrase for any document  . 

Furthermore, Trappey et al. (2008) normalize TF-IDF because of TF-IDF is a method 

that does not consider the difference of number of words in each document, therefore Trappey 

et al. (2010) applied a normalization of the weights frequency of key phrases by the number 

of words in each documents. According to Trappey et al. (2010), the normalized TF-IDF 

(NTF) can be expressed as following: 

         
∑    
 
   
                                                                                                       ( ) 

where      is the number of term   that occurs in document   of the collection,     is the 

words number of document   , and   is the total number of documents in the document 

collection.  


21 
 

2.6.2. Key phrase correlation matrix  

The key phrase correlation matrix calculates the correlation of important key phrases 

(KP) in each patent document which is used to understand the logical link between concept 

and methodologies (Trappey et al. 2010). Trappey et al. (2010) describes the methodology of 

using TF-IDF and NTF to calculate the correlation between key phrases to create a key phrase 

correlation matrix using inner product of vectors expressed as: 

           (         )  
         

‖    ‖‖    ‖
 

∑              
 
   
√∑    
      ∑    

  
                   ( ) 

where        (             )  and    
∑    
 
   
 average Word Number (WN). 

Trappey et al. (2010) use an algorithm of four stages. First, the algorithm transforms the 

patent document into a key phrases vector and analyzes the frequency of key phrases and 

phrases. Second, derive the key phrase vector by eliminating unnecessary key phrases and 

phrases. Third, the correlation values between key phrases are calculated using Equation (4). 

Fourth, the correlation coefficients are derived by the number of different key phrases 

occurring in each patent document. The key phrases correlation matrix is shown in Table 4.  

Table 4. Key phrases correlation matrix 

 KP1 KP2 KP3 … KPn 

KP1 R1,1 R1,2 R1,3 .. … 

KP2 R2,1 R2,2 … … … 

KP3 R3,1 … … … … 

… … … … … … 

KPm … … … … … 

Source: Trappey et al. (2010) 

The key phrase correlation matrix is used as an input for patent technology clustering. 

Key phrase correlation matrix represents the technology in each patent document and thus it 


22 
 

provide the internal relationship among patent documents instead of clustering patents 

according to classification codes such as UPC or IPC.  

2.6.3. Key phrase and patent correlation matrix 

In the key phrase and patent correlation matrix, the frequency (Fnm) of each key phrase 

(KP) appearing in each patent document is calculated as well as NTF, Rate (%) and NTFR. 

The Rate describes the percentage of KPm occurring among Patent1 to Patentn. NTFR is the 

product of NTF and Rate which express the relevance of KPm among the patent collection, 

shown in Equation (5). The key phrase, KPm, is a representative phrase in the patent, Patentn, 

if the frequency, Fnm, is large enough across Patent1 to Patentn, then KPm is a representative 

phrase of Patentn (Trappey et al. 2010). The key phrase and patent correlation matrix is shown 

in Table 5.  

          
∑    
 
   
                                                                                           ( ) 

     
∑    
       
   
If Fnm = 0; Xnm= 0 

Fnm > 0; Xnm= 1 

 
Table 5. Key phrases and patent correlation matrix  

 Patent1 Patent2 Patent3 … Patentn NTF Rate (%) NTFR 

KP1 F1,1 F1,2 F1,3 .. … … … … 

KP2 F2,1 F2,2 … … … … … … 

KP3 F3,1 … … … … … … … 

… … … … … … … … … 

KPm … … … … Fnm … … … 

Source: Trappey et al. (2010) 


23 
 

2.6.4. Limitations in key phrases extraction of textual documents 

Nasukawa and Nagano (2001) mentioned some issues using key phrases to represent a 

textual document. The problem is that textual documents are unclear because of natural 

language is ambiguity and same key phrase may have different meanings in the same textual 

document (Nasukawa and Nagano 2001). For example the word “watch” can represent a 

timepiece, to look, to observe or pay attention. Different words can also represent the same 

meaning, for example “laptop” and “notebook” or “cellular phone” and “mobile phone”.  

2.7. Technology life cycle analysis 

Life cycle analysis, as the name implies, is a straightforward methodology that assess all 

impact on a product or service, from initial extraction of raw material to the final output or 

disposal of the product (Ayres RU 1995). When companies invest R&D capital on 

technologies, it often depends on current life cycle stage of the technology (Haupt et al. 

2007). According to Haupt et al. (2007) and Ernst (1997), patent documents inform us about 

technical development and the life cycle stage of an industry since patent documents contain 

core technology information. A patent or patentability of a technology is also one of the 

preconditions of the commercial potential of a technology. In addition to these information, 

patent document contain data about patent application date which inform us about the life 

cycle of different products, based on the technology, before it can start being commercialized 

(Haupt et al. 2007). The concept of technology life cycle is similar to product life cycles and 

can be divided into four stages: introduction, growth, maturity, and decline or saturation 

(Haupt et al. 2007; Trappey et al. 2010). Haupt et al. (2007) also point out that regardless of 

what reference factor is for technology life cycle or that the patent based life cycles starts 

earlier than product/sales based one, the principles can still be applied for technology life 

cycle as for product life cycle.  

Several studies on technology life cycle based on patent document information show that 

an S-shaped curve can represent the technology life cycle. The S-shape curve include the four 

stages, introduction, growth, maturity and decline (Haupt et al. 2007). Andersen (1999) 

studied the S-curve with examples from the pharmaceutical industry. Trappey et al. (2010) 

studied the RFID technology in China and forecasted potential market and R&D 

opportunities. Another study by Trappey & Wu (2008) used S-curve analysis technique to 

evaluate short product life cycle products like electronics. The beginning of the life cycle, the 

introduction stage, of a new technology is the development of the scientific fundamental 


24 
 

problems. These technical problems have to be solved in order to rapidly progress in 

technological advancement and during this period of time awaits radical innovations. At this 

stage, the patent applications are low but slowly increasing because during this period there is 

a lot of uncertainty and there are pioneer firms that are willingly to take the R&D risk (Haupt 

et al 2007; Trappey et al. 2010; Trappey & Wu 2008). During this stage the patent application 

per applicant is relatively high compared with other stages of the life cycle and this is because 

of the problems of new innovative technologies as well as the cost is too high for customers‟ 

acceptance or standardization of the product has not evolved yet. During the growth stage is 

when the fundamental technical problems have been solved and the market uncertainty has 

“vanished”, many products is developed based on this technology, R&D risk decreases, and 

resulting in increase of patent applications (Haupt et al. 2007; Trappey et al. 2010). The 

growing number of patent application also decreases the patent application per applicant due 

to new competitors. The technology enters a mature stage when the number of patent 

applications is constant and there are now new features developed for this technology. 

Thereafter the technology enters the decline or saturation stage.  

Patent activity is an important indicator of current technology life cycle and furthermore, 

Haupt et al. (2007) and Ernst (1997) have implemented this S-curve methodology on niche 

technologies such as pacemaker technology. Ernst (1997) proposed that all cumulative patent 

applications per year for a specific technology over a certain period of time can be plotted as 

S-curve and the different technology life cycle stages can be analyzed. An example of the 

principles of S-curve is shown in Figure 6.   

Introductory

Stage

Growth

Stage

Maturity

Stage
Decline Stage

Total

Market

Sales

Time

A
cc

u
m

u
la

te
d

 p
at

en
ts

 p
er

 y
ea

r

1

 
Figure 6. S-curve of TLC stages. X-axis represents a period of time and Y-axis 

represents accumulated patents over the time period (Adapted from Trappey et al. 2010). 


25 
 

The technology life cycle analysis is important for companies to evaluate the timing of 

R&D or other investment opportunities of technologies. It is strategically important to account 

for technology life cycle analysis when for example at the introduction stage. Companies 

should aggressively apply for patent families of their core invention (patent) to strengthen 

their position at the marketplace. If however, the technology is at the growth phase it is 

important for companies to search for core technologies in that field and develop their own 

applications. At the mature stage it is important to evaluate the boundaries of patents to avoid 

infringement issues or create alliances to trade IP. Finally the declining stage implies that new 

technology is replacing the old and is the beginning of a new technology life cycle (Trappey 

et al. 2010; Trappey and Wu 2008; Haupt et al. 2007; Ernst 1997).  

2.8. Research framework  

This research proposes a domain-specific patent ontology methodology for technology 

clustering and life-span analysis. The methodology steps are based on previous research by 

Trappey et al. (2010), as shown in Table 6. Trappey et al. (2010) referenced a modified 

ontology to extract key phrases from patent documents. The first step is to define a patent 

domain and select IPC and UPC in this domain. The second step is to collect domain specific 

patent documents from USPTO database. These steps are completed by using data mining 

software Intellectual Property Defense-based Support System (IPDSS) (Wheeljet, 2011).  

 
Table 6. Methodology outline of this research 

Method by Trappey et al. (2010) Method in this research 

1. Data preprocessing  

2. Key phrase analysis (TF-IDF) 

3. Key phrase correlation measure 

4. Patent technology clustering 

5. Patent document clustering 

6. Lifecycle analysis 

1. Define domain (IPC & UPC) 

2. Data preprocessing 

3. Key phrase analysis (NTFR) 

4. Process ontology 

5. Ontological sub-domain 

technology clustering 

6. Life-span analysis of clusters 

 
26 
 

 3. METHODOLOGY 

This section describes the methodology development of this research and is divided into 

four parts patent domain definition, key phrase analysis process, processing domain-specific 

ontology, and technology life-span analysis of patent clusters. The IPDSS (Wheeljet, 2011) is 

used as a tool for data mining, key phrase extraction and clustering. Figure 7 and 8 shows the 

IPDSS software used in this research.  

 
Figure 7. Login page of IPDSS 

 
Figure 8. Workspace of IPDSS  


27 
 

3.1. Patent domain definition  

The first step is to define a specific patent domain and select relevant UPCs or IPCs. As 

shown, in Figure 9. As described at the literature review section, patent under the same 

classification code may be entirely different in technology (Taghaboni-Dutta et al. 2010).  

Patent 

database 

(USPTO)

Defined patent domain

Patent document 

collection

Study patent figures and abstracts

Define technology 

specific patents
Other patents

USPTO website

Study relevant UPCs

Patent search

Select relevant UPCs

Study UPCs patents

Delete or add UPCs

Stage 1 Stage 2

IPDSS

Defined patent domain
Domain-specific patents on IPDSS

Key phrase analysis process 

 
Figure 9. Processing of patent domain definition 

Stage 1 is to select patent domain based on patent classifications following these steps: 

1. Use USPTO or WIPO website and study UPC or IPC definitions for specific 

domain chosen. Patent classifications, UPC and IPC, are described at USPTO and 

WIPO, respectively (USPTO 2011; WIPO 2011). 

2. Select relevant IPCs or UPCs. (Note: Use either IPC or UPC, not both). 

3. Include 5 patent classifications to 15 patent classifications. 

4. Search for 3-4 patents for each individual IPCs or UPCs on USPTO or WIPO. 

5. Study those patents to determine if chosen IPCs or UPCs (from step 2) are 

relevant or not for chosen domain. Delete or add patent classifications to your 

domain.  

6. Patent domain defined. 


28 
 

Stage 2 is to collect training domain-specific patent documents according to UPCs or 

IPCs. Thereafter define the technology specific patents and exclude other. Following these 

steps: 

1. Use Intellectual Property Defense-based Support System (IPDSS) to download 

150 training patents from USPTO if UPC is chosen (WIPO if IPC is chosen) 

(Note: The patents have to be according to patent classifications chosen). 

2. Study patent figures and abstracts to define technology specific patents according 

to chosen IPCs or UPCs. Delete other patents. (Note: Patents under same 

classification code might represent different technology). 

3. Key phrase analysis process of domain-specific patents 

In this research, a dimension represents a domain, for example dental implant can also 

have dental implant tools and dental implant materials. One dimension includes several 

important classification codes to represent key concepts and technology. However, it requires 

limiting classifications to be technology specific. The choice of IPC or UPC depends on 

specific domain and technology.  

The software IPDSS has a function to connect with USPTO database (or WIPO) to 

perform patent search and download patents to IPDSS database. IPDSS also automatically 

preprocess all patent documents into standard format which means that spaces between words 

and phrases is removed to automatically perform frequency count of words and phrases of 

each patent document. For each dimension, key phrases are separately extracted from dental 

patent documents to build a domain-specific ontology of dental implants.  

3.2. Key phrase analysis process 

The key phrase analysis process generates a list of frequent and important phrases from 

each patent document. These phrases are used to form a logical link between concepts. In this 

research, key phrases analysis, key phrase correlation matrix and key phrase and patent 

correlation matrix are derived using IPDSS which apply the methodology normalized term 

frequency – inverse document frequency (NTF), as shown in Figure 10. The following 

sections will discuss NTF, key phrase correlation measure, key phrase and patent correlation 

matrix. 


29 
 

IPDSS

Raw patent document input

Data preprocessing

Key phrase correlation measure

Key phrase and patent correlation matrix

Output

Defined domain-specific patents

NTF calculation

Key phrase vector output

Key phrase analysis process

 
Figure 10. The key phrase analysis process 

Normalized term frequency – inverse document frequency (NTF) 

After IPDSS perform data preprocessing, the weight of each term is calculated using 

IPDSS that apply normalized term frequency – inverse document frequency (Trappey et al. 

2008). As described at the literature review section, TF-IDF is a weighting method that 

weight frequent terms in a series of documents to represent text documents (Aizawa 2002). 

However, TF-IDF is a method that does not consider the difference of number of words in 

each document (Trappey et al. 2010). Therefore, normalization is applied to the weights 

frequency of key phrases by the number of words in each document. Robertson (2004) points 

out those high frequent terms represent a document better. The normalized TF-IDF (NTF) can 

be expressed as following: 

         
∑    
 
   
                                                                                                      ( ) 

     = the number of term   that occurs in document   of the collection 

    = the words number of document   

  = the total number of documents in the document collection.  

The frequency (Fnm), Rate, NTF, and NTFR are calculated and tabulated at Table 7 

which is the output of the key phrase analysis.  


30 
 

Table 7. Key phrases and patent correlation matrix  

 Patent1 Patent2 Patent3 … Patentn NTF Rate (%) NTFR 

KP1 F1,1 F1,2 F1,3 .. … … … … 

KP2 F2,1 F2,2 … … … … … … 

KP3 F3,1 … … … … … … … 

… … … … … … … … … 

KPm … … … … Fnm … … … 

The formulas are described as following: 

    ∑     

 (   )

   
 (   ) = The number of key phrases (belonging to KPm) that are included in patentn 

KPFi = The frequency of the key phrase m (belonging to KPm) of document j 

The NTFR-value is expressed as  

                                                                                                                                        ( ) 

     
∑    
       
   
 if Fnm = 0; Xnm= 0 OR Fnm > 0; Xnm= 1 

Key phrase correlation measure 

IPDSS calculates the correlation values between key phrases to create a key phrase 

correlation matrix using inner product vector expressed as: 

           (         )  
         

‖    ‖‖    ‖
 

∑              
 
   
√∑    
      ∑    

  
                   ( ) 

where         (             ) = the vector of key phrase i 

        (             ) = the vector of key phrase j 


31 
 

∑    
 
   
 average Word Number (WN) 

                                                                                                                          ( ) 

      = the number of term   that occurs in document   of the patent collection 

     = the number of documents in the collection which containing term   

          (
 

)                                                                                                         (  ) 

   = the total number of documents in the collection 

First, the algorithm transforms the patent document into a key phrases vector and 

analyzes the frequency of key phrases. Second, derive the key phrase vector by eliminating 

unnecessary key phrases. Third, the correlation values between key phrases are calculated 

using Equation (11). Fourth, the correlation coefficients are derived by the number of 

different key phrases occurring in each patent document. The correlation coefficient is 

calculated according to the formula below: 

            (         )  
∑    (         )
 
   
                                                                            (  ) 

 
where              (         ) = the correlation value of key phrase i and key phrase j  

   in document k 

n = the total number of documents in the patent collection  

After the correlation coefficient is calculated, it can be shown as the key phrases 

correlation matrix in Table 8. The frequency is calculated for all terms and KPfi is used for the 

frequency of key phrase KPi in the document. RPfij is used to represent the frequency of 

related phrases RPij. The correlation of RPij and KPi are listed as Rij. The final frequency of 

KPi can be calculated as following: 

          ∑         

 
                                                                                                     (  ) 

A vector is created after all the KPF is calculated for all key phrases and are listed as 

following: 

[KPF1, KPF2, …, KPFn]                                                                                               (13) 


32 
 

This vector is used as the input of patent technology clustering and patent document 

clustering.  

Table 8. Key phrases correlation matrix 

 KP1 KP2 KP3 … KPn 

KP1 R1,1 R1,2 R1,3 .. … 

KP2 R2,1 R2,2 … … … 

… … … … … … 

KPm … … … … Rmn 

Source: Trappey et al. (2010) 

3.4. Processing domain-specific ontologies 

The domain-specific ontology is build by using Microsoft Visio 2007 (MS Visio) as a 

visualization tool for transferring domain-specific ontological schema (Huang et al. 2008). 

Ontology can be visualized as a pyramid and on the top of the pyramid represent the domain 

concept. In this research, the ontology structure is based on RFID ontology tree in Figure 4 

and top 50 NTFR-values of key phrases in the key phrase matrix (from the key phrase 

analysis process) are chosen to build the ontology. The processing of the ontology is 

described by the following steps and the overview processing is shown in Figure 11.  

Step 1: Organize patents with the same patent classification codes 

This step utilizes the key phrase matrix output from the key phrase analysis process. 

First, patents with same classification, for example Patent1, Patent6-10, and Patent23-28 has the 

same patent classification are grouped together. Second, analyze the frequency (Fnm) of each 

key phrase (KPn) of the key phrase matrix and determine which KPn is expressed in which 

UPC. This enables an overview of which classifications uses the same key phrases and this 

can be tabulated as shown in Table 9.  

Step 2: Map classification codes – TYPE I ontology 

From previous step, a new matrix is constructed, as shown in Table 10, to visualize 

which key phrase is expressed for each UPC. The Type I ontology is constructed using MS 


33 
 

Visio, as shown in Figure 11, the key phrases are placed out to visualize common phrases 

among patents. Key phrases that are expressed in the same number of UPCs, for example 

UPC1, UPC2, and UPC3 are colored using your own defined coloring scheme and a different 

color for UPC1 & UPC2. It helps R&D engineers to visualize common phrases in this domain 

by coloring scheme of key phrases.   

Patent 

database 

(USPTO)

Defined UPC

Key phrase analysis 

process

Step 2

Improvement steps

Domain-specific 

patents
Key phrase analysis process

Key phrase matrix 

Processing ontology

Organize patent 

classifications

Map classification 

codes

Step 1

Step 2

Pre-define 

ontological sub-

domain

Organize key 

phrases

Step 3

Step 4

KP6 KP12

Key phrase correlation 

vector

KP1

KP2

Patent 

documents

Cluster 1

Patent 

documents

Cluster 2

Key phrase analysis process

Key phrase matrix

Patent technology 

clustering

Patent document 

clustering

Patent document 

clusters

KP14

KP3

KP10

Microsoft Visio

KP6 KP12

KP1

KP2

KP14

KP3

KP10

Microsoft Visio

KP6

KP12

KP1

KP2

KP14

KP3

KP10

KP2

KP14

KP3

KP12

KP32

KP18

KP6

KP12

KP1

KP2

KP14

KP3

KP10

KP2

KP14

KP3

KP12KP32

KP2

KP14

KP3

KP2

KP14

KP12

KP6

KP1

KP10

Step 4

Improvement and 

define ontological sub-

domains

Microsoft Visio

Step 3

Step 5

TYPE I Ontology

TYPE II Ontology

TYPE III Ontology

Step 1Patent domain definition

IPDSS

IPDSS

 
Figure 11. Processing of ontology 

Step 3: Organize key phrases  

The key phrases from Table 10 are organized and grouped together according to their 

relationship and logical link, as shown in Table 11. Online dictionary WordNet (Princeton 


34 
 

University, 2011) and studying patent documents are used to understand the meaning of the 

key phrases. This procedure is based on personal experience and interpretation of words. The 

goal is to create 4-5 large groups of key phrases. For example, in Table 11, KP3, KP2, and KP6 

are one group. The first draft of ontology is improved using the group key phrases to provide 

better concept relationship.  

Table 9. Patent and UPC matrix  

 Patent1 Patent2 Patent3 Patent4 … Patentn NTF Rate (%) NTFR 

UPC UPC1 UPC1 UPC1 UPC2 … UPCz    

KP1 F1,1 F1,2 F1,3 .. .. … … … … 

KP2 F2,1 F2,2 … … … … … … … 

KP3 F3,1 … … … … … … … … 

… … … … … … … … … … 

KP50 … … … … … Fn50 … … … 

Note: UPCz = the UPC code of each patent 

 
Table 10. Key phrase and UPC matrix 

KP1 UPC1 UPC2 UPC3 .. … 

KP2 UPC1 UPC2 … … … 

KP3 UPC1 … … … … 

… … … … … … 

KP50 … … … … UPCn 

Note: Each column has the same classification code (IPC or UPC) 


35 
 

Table 11. Key phrase organization matrix  

KP3 UPC1 … … … … 

KP2 UPC1 UPC2 … … … 

KP6 UPC1 UPC2 UPC3 .. … 

… … … … … … 

KP50 … … … … UPCn 

Note: Each column has the same classification code (IPC or UPC) 

Step 4: Pre-define ontological sub-domain – TYPE II ontology 

The final step is to pre-define the key phrase groups in Table 11 by studying phrases, 

patent classification definitions, and patent technology to assign appropriate definition of the 

ontological sub-domains. The processing of these steps is shown in Figure 12.  

KP6

KP12KP1

KP2

KP14

KP3

KP10

KP6

KP12

KP2

KP14

KP3

KP10

KP2

KP14

KP3

KP12KP32
KP18

TYPE I ontology

KP3
KP6

KP12

KP1

KP2

KP14

KP3

KP10

KP18

KP3
KP12 KP2

KP14

Patent 

domain

KP1 KP12

TYPE II ontology

Use tools to understand 

 WordNet

 Patent 

classifications

 Patent document

 Your logic and 

experience

MS Visio MS Visio

MS Visio
 

Figure 12. Processing TYPE II ontologies 

The following steps describe the construction of Type II ontology using MS Visio: 

1. Use MS Visio to group key phrases according to groups in Step 3 for Type I ontology 

2. Start with the patent domain at the center in MS Visio. 

3. Use WordNet, patent classifications, patent document, and your logic to determine 

which key phrases are strongest associated with the patent domain. 

a. Start to link key phrases from the center (patent domain) and outwards using 

MS Visio. Ontology is hierarchical, use your logic and link the phrases of the 


36 
 

pre-defined sub-domains. (Note: Linking key phrases to create are reliant on 

patent domain, technology, and understanding which requires study of patent 

documents and domain). 

b. Use a color scheme to color the key phrases of pre-defined ontological sub-

domains using MS Visio.  

Improvement of domain-specific ontologies 

This stage use methods to improve the Type II ontology, as shown in Figure 13, following 

steps describe the process to create Type III ontology.  

Step 1: New domain-specific patents  

Collect 50 new domain-specific patent documents from USPTO based on patent domain 

definition and use IPDSS to preprocess the data and carry out key phrase analysis process, 

patent technology clustering, and patent document clustering.  

Step 2: Patent technology clustering   

The correlation matrix derived from the key phrases correlation analysis in IPDSS is used as 

input for technology clustering. A feature of patent technology clustering is to discover the 

relationships of patents. Since the key phrases represent the concept or technology for each 

patent document, the key phrase correlation matrix and key phrases extracted are important 

for the key phrase collected. Patent technology clusters are generated by applying K-means 

algorithm of the key phrase correlation matrix. This technique can help researchers to select 

technology clusters to analyze, however, in this research it is used as an input for patent 

document clustering which is describe in the next step.  

Step 3: Patent document clustering 

The vector output of patent technology clustering is derived and used as an input for 

patent document clustering. A matrix is constructed as input for patent document clustering, 

as shown in Table 12.  Patents under the same classification code can be entirely different and 

patent document clustering derives the internal relationship based on technologies. The output 

of this method is similar patens are clustered together to create a homogenous cluster. 

Therefore it is important and useful for researchers want to group technologies for analysis. 

Patents are clustered according to the formula below.  

    ∑                                                                                                                                     (  )

 (   )

   
37 
 

 (   ) = The number of key phrases (belonging to TCm) that are included in patentn 

KPFm = The frequency of the key phrase m (belonging to TCm) of document j 

 
KP6

KP12

KP2

KP14

KP3

KP10

KP2

KP14

KP3

KP12KP32

Patent 

domain

KP1 KP12

TYPE II ontology

Use tools to understand 

 WordNet

 Patent classifications

 Patent document

 Your logic and 

experience

MS Visio

Patent 

database 

(USPTO)

Defined UPC

Key phrase analysis process

Step 2

Step 1

Domain-specific 

patents

Patent 

documents

Cluster 1

Patent 

documents

Cluster 2

Key phrase analysis process

Key phrase matrix of each cluster

Patent technology clustering

Patent document clustering

KP6

KP12

Patent 

domain

KP2

KP14

KP3

KP10

KP6

KP12

KP1

KP2

KP14

KP3

KP10

KP2

KP14

KP3

KP12KP32

KP2

KP14

KP3

KP2

KP14

KP12

KP6

KP1

KP10

Step 4

Improvement and define ontological sub-domains

Step 3

Step 5
IPDSS

Cluster 1 Cluster 2 Cluster 3 … Clustern

KP1,1 KP1,2 KP1,3 .. KP1n

KP2,1 KP2,2 … … …

KP3,1 … … … …

… … … … …

… … … … KPn

KP6

KP12

KP1
KP2 KP14

KP3

KP10

KP18

KP3 KP12 KP2

KP14

KP6

KP1

KP3

KP10

KP6

KP12

KP1

KP2
KP14

KP3

KP10

KP18

Cluster 1

KP3

Cluster 2

Etc.

Cluster 1Cluster 4

Cluster 2Cluster 3

Use MS Visio to modify TYPE II 

ontology structure

 Add new key phrases

 Work from center and outwards

 Color scheme

 Define ontological sub-domains

TYPE III Ontology

Sub-domain

Sub-domain

Sub-domain

Sub-domain

 
Figure 13. Improvement steps of domain-specific ontologies 


38 
 

Step 4: Key phrase extraction of document clusters 

The patent document clustering creates clusters of the new patent documents according 

to step 3. The patent document clusters are separately subjected to the key phrase analysis 

process using IPDSS which applies NTFR and the top 15 NTFR-valued key phrases are 

extracted, tabulated as in Table 13. The key phrases are used to improve the pre-defined 

ontological sub-domain phrases.  

Table 12. Key phrases and patent correlation matrix  

 Patent1 Patent2 Patent3 … Patentn 

TC1 N1,1 N1,2 N1,3 .. N1n 

TC2 N2,1 N2,2 … … … 

… … … … … … 

TCm … … … … Nnm 

 
Table 13. Key phrases of each cluster  

Cluster 1 Cluster 2 Cluster 3 … Clustern 

KP1,1 KP1,2 KP1,3 .. KP1n 

KP2,1 KP2,2 … … … 

… … … … … 

… … … … KPn 

 
Step 5: Modification and define ontology sub-domain – TYPE III ontology 

The key phrases collection of each individual cluster in Table 13 is used to compare with 

key phrases in the Type II ontology. The following steps describe the procedure.  


39 
 

1. Use MS Visio – add the key phrases from Table 13 to the Type II ontology 

a. Note: Do not link phrases yet, group phrases according to its clusters 

2. Choose each cluster with key phrases from Table 13 to compare with the Type II 

ontological sub-domains.  

a. For example, key phrases by key phrase from cluster 1 in Table 13 are 

used to compare key phrase by key phrase of the Type II ontology. The 

more matching phrases, the better the cluster represent that sub-domain. 

b. Assign the best matching cluster to the Type II ontological sub-domains 

3. Use WordNet, patent classifications, patent documents and you logic to 

understand as well as determine if these clusters from Table 13 are relevantly 

grouped with the Type II ontology 

a. Use MS Visio to rearrange key phrases of Type II ontology and modify 

the structure of the pre-defined sub-domains (if it makes sense) on the 

Type II ontology 

b. Use Ms Visio to start linking new key phrases from each assigned clusters 

in the sub-domains of the Type II ontology. Work from center and 

outwards. Try to create sub-domains and from each sub-domain create 

hierarchical tree structure.  

Comments: The pre-defined sub-domains from Type II ontology can be 

deleted and new sub-domain definition is created. New key phrases from 

Table 13 are also added and linked and the shared key phrases from other 

sub-domains. Shared key phrases are usually the most common key 

phrases in one patent domain which can help engineers to understand the 

domain concept better.  

c. Type II ontology structure is modified and colored with a color scheme 

according to the sub-domain definition (next step) 

4. Define the ontological sub-domains which depends on previous step and each 

sub-domain (depends on cases to case) can be for example describing the main 

components of a technology and can be separated in several major parts.  


40 
 

3.5. Processing life-span analysis in patent clusters 

In this research, the procedure of clustering technology and life-span analysis is done by 

a case study.  In this research, ontology is used to cluster patents according to its sub-domain-

concepts by assigning each patent individually to each ontological sub-domain-cluster, as 

shown in Figure 14. Key phrases from each patent and compare with each sub-domain-cluster 

of the ontology which includes key phrases that are considered to be key concepts.  

Step 1: Test patents 

Thirty new domain test patents are downloaded from USPTO and processed in IPDSS. 

These test patents are new and have not been used in training the system, build or improved 

the ontology. Patent classifications on these patents are not extracted restricted. 

Step 2: Key phrase analysis process 

IPDSS apply NTFR-methodology to analyze key phrases of the patent documents and 

the output is key phrase matrix. The key phrase matrix lists the frequencies of all key phrases 

for each patent.  

Step 3: Ontological sub-domain clustering  

The list of key phrases for each patent is compared with the sub-domains of the Type III 

ontology. Key phrase by key phrase is compared with the sub-domain key phrases. The 

patents are assigned to that specific sub-domain if the key phrases describe the concepts and 

relationship of the sub-domain ontology most consistent. The patents assigned to each sub-

domain are clustered together and this is called ontological sub-domain clustering. The 

clusters are named after the sub-domain definition.  

Step 4: Life-span analysis  

For each ontological sub-cluster, the age of the patent is calculated from the filing date as a 

starting date, not issuing date, and up to today. Patents are protected from the filing date and 

when granted it is called issue date. It can take up to two or three years before it is issued. The 

average age is the sum of each patent age divided by the number of patents in the sub-domain 

cluster and an example of the information is shown in Table 14. Average age is calculated 

according to the formula below: 

            
∑   

 
where n = the total number of patents in a sub-domain 


41 
 

Patent database 

(USPTO)

Patent document 

collection

Step 2

Key phrase analysis process

Key phrase matrix 

Step 3

Life-span analysis

Step 4

Ontological sub-

domain clustering 

Sub-domain cluster 3

O
n
to

lo
g
ic

al
 s

u
b

-d
o
m

ai
n
s 

Average life-span of dental patent clusters

Sub-domain 

cluster 1

Sub-domain 

cluster 4

Sub-domain 

cluster 2

Mature or 

decline stage

Introductory or 

growth stage

Patent1 Patent2 Patent3 … Patentn NTFR

KP1 F1,1 F1,2 F1,3 .. … …

KP2 F2,1 F2,2 … … … …

KP3 F3,1 … … … … …

… … … … … … …

KP50 … … … … Fn50 …

IPDSS

KP6

KP3

KP1

0

KP

6
KP

1

KP1

KP

10

KP

18

Patent 1

KP3

Patent 2

KP6

KP12

Patent 

domain

KP2

KP14

KP3

KP10

KP6

KP12

KP1

KP2

KP14

KP3

KP10

KP2

KP14

KP3

KP12KP32

KP2

KP14

KP3

KP2

KP14

KP12

KP6

KP1

KP10

TYPE III Ontology

Sub-domain 2

Sub-domain 4

Etc.
KP

19

Compare key phrases of each 

individual patent with sub-domains 

of the Type III ontology 

 Assign patents to sub-domains if 

key phrases of individual patents 

match concepts of sub-domains

Sub-domain 3

KP6

KP1

0

KP

6
KP

1

KP

18

Patent 1

KP3

Patent 2

Sub-domain 1

KP6

KP1

0

KP

6
KP

1

KP

18

Patent 3

KP3

Patent 4

KP6

KP1

0

KP

6
KP

1

KP

18

Patent 7

KP3

Patent 8

KP6

KP1

0

KP

6
KP

1

KP

18

Patent 5

KP3

Patent 6

The patents assigned to each sub-domain 

are clustered together, this is called 

ontological sub-domain clustering

 The patents in each sub-domain 

forms a cluster named after the sub-

domain definition

The average age of each sub-domain 

cluster is calculated

 Each sub-domain cluster is 

plotted against the average age 

of the cluster

 
Figure 14. Processing of life-span analysis 


42 
 

Table 14. Patent information and average age 

Sub-domain 1  

Patent No. Patent title (PT1) UPC Filing date Age 

P1 PT1 UPC1; UPC2; etc. Month, Year Age 

P2 PT2 UPC1; UPC2; etc. Month, Year Age 

      Average age AA 

The average age of each cluster is plotted against the ontological sub-domain clusters, 

Figure 15 illustrate the analysis of potential emerging or declining clusters depending on its 

average age. The size of each bubble represents the number of patents, Y-axis is ontological 

sub-domain clusters and X-axis is the average age starting from 0-20 years (from right to left 

on the X-axis). Cluster 4 on Figure 15 represents a young cluster of a specific ontological sub-

domain, in other words, a specific sub-domain technology in dental implants. This mapping 

method allows researchers to explore which