Distributed Web Crawler

Typ
Examensarbete för masterexamen
Master Thesis
Program
Publicerad
2014
Författare
Bjerkander, Hans
Karlsson, Erik
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
This thesis investigates possible improvements in distributed web-crawlers. Web-crawling is the cornerstone of search-engines and a well defined part of Internet technology. Due to the size of the Web, it is important that a web-crawler is fast and efficient, since a web-crawler should be able to find the interesting sites before they change or disappear. The thesis will focus on crawler distribution concerning modularity, fault-tolerance and group membership services. The download order of crawlers will also be covered, since this greatly influences the efficiency of a crawler. In addition to the theoretical basis of the thesis, a prototype has been constructed in Java. The prototype is efficient, modular, fault-tolerant and configurable. The result from the thesis indicates that using a membership service is a good way to distribute a crawler and conclusively, the thesis also demonstrate a way to improve the crawling order compared to a breadth-first ordering.
Beskrivning
Ämne/nyckelord
Data- och informationsvetenskap, Computer and Information Science
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material