Autonomous Topic-Based Website Categorization

Saberi, Golnaz

Autonomous Topic-Based Website Categorization

Ladda ner

Primär fil 183760.pdf (996.28 KB)

Publicerad

2013

Författare

Saberi, Golnaz

Typ

Examensarbete för masterexamen
Master Thesis

Sammanfattning

Internet has influenced many aspects of our social, economical, educational and professional life. Because of the unique communication means it offers, the internet has grown dramatically since its advent. On the other hand, the ever growing volume of data on the internet has given rise to the demand for establishment of structure on this data. Ranking and indexing web pages by search engines, creation of hierarchical taxonomies of web resources, research on autonomous web page and website classification, are examples of attempts for construction of such structure. This project includes a study of autonomous website classification. This process has been researched for various purposes and on different levels, especially to improve search engines and directory services. However, the idea of this project comes from a different active area on the internet, i.e. online advertisement. One of the most common sorts of online advertisement are banner ads which are basically published randomly; however, ad servers try to use algorithms to improve the effectiveness of banner ads by publishing them intelligently. One way to do this is to correlate topic of ads and websites they are placed on. The current project is an attempt towards classification of websites based on their main topic. This work contains a brief study of different web page and website categorization methods conducted to date, as well as implementation of a classification algorithm and analysis of its effectiveness. The implementation consists of creating a graph model of websites and leveraging their link structure for pruning noisy web pages. In addition, a brief description of text classification methods and its relation to the purpose of this project is presented. In this study textual content as well as hyperlink information contained in a website are used to construct a vector space model which is applied for classification by support vector machines (SVM) learning model.

Ämne/nyckelord

Interaktionsteknik, Interaction Technologies

URI

https://hdl.handle.net/20.500.12380/183760

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Autonomous Topic-Based Website Categorization

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By