Autonomous Topic-Based Website Categorization

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/183760
Download file(s):
File Description SizeFormat 
183760.pdfFulltext996.28 kBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Master Thesis
Title: Autonomous Topic-Based Website Categorization
Authors: Saberi, Golnaz
Abstract: Internet has influenced many aspects of our social, economical, educational and professional life. Because of the unique communication means it offers, the internet has grown dramatically since its advent. On the other hand, the ever growing volume of data on the internet has given rise to the demand for establishment of structure on this data. Ranking and indexing web pages by search engines, creation of hierarchical taxonomies of web resources, research on autonomous web page and website classification, are examples of attempts for construction of such structure. This project includes a study of autonomous website classification. This process has been researched for various purposes and on different levels, especially to improve search engines and directory services. However, the idea of this project comes from a different active area on the internet, i.e. online advertisement. One of the most common sorts of online advertisement are banner ads which are basically published randomly; however, ad servers try to use algorithms to improve the effectiveness of banner ads by publishing them intelligently. One way to do this is to correlate topic of ads and websites they are placed on. The current project is an attempt towards classification of websites based on their main topic. This work contains a brief study of different web page and website categorization methods conducted to date, as well as implementation of a classification algorithm and analysis of its effectiveness. The implementation consists of creating a graph model of websites and leveraging their link structure for pruning noisy web pages. In addition, a brief description of text classification methods and its relation to the purpose of this project is presented. In this study textual content as well as hyperlink information contained in a website are used to construct a vector space model which is applied for classification by support vector machines (SVM) learning model.
Keywords: Interaktionsteknik;Interaction Technologies
Issue Date: 2013
Publisher: Chalmers tekniska högskola / Institutionen för tillämpad informationsteknologi (Chalmers)
Chalmers University of Technology / Department of Applied Information Technology (Chalmers)
URI: https://hdl.handle.net/20.500.12380/183760
Collection:Examensarbeten för masterexamen // Master Theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.