Distributed Web Crawler

Examensarbete för masterexamen

Använd denna länk för att citera eller länka till detta dokument: https://hdl.handle.net/20.500.12380/193680
Ladda ner:
Fil Beskrivning StorlekFormat 
193680.pdfFulltext677.95 kBAdobe PDFVisa
Typ: Examensarbete för masterexamen
Master Thesis
Titel: Distributed Web Crawler
Författare: Bjerkander, Hans
Karlsson, Erik
Sammanfattning: This thesis investigates possible improvements in distributed web-crawlers. Web-crawling is the cornerstone of search-engines and a well defined part of Internet technology. Due to the size of the Web, it is important that a web-crawler is fast and efficient, since a web-crawler should be able to find the interesting sites before they change or disappear. The thesis will focus on crawler distribution concerning modularity, fault-tolerance and group membership services. The download order of crawlers will also be covered, since this greatly influences the efficiency of a crawler. In addition to the theoretical basis of the thesis, a prototype has been constructed in Java. The prototype is efficient, modular, fault-tolerant and configurable. The result from the thesis indicates that using a membership service is a good way to distribute a crawler and conclusively, the thesis also demonstrate a way to improve the crawling order compared to a breadth-first ordering.
Nyckelord: Data- och informationsvetenskap;Computer and Information Science
Utgivningsdatum: 2014
Utgivare: Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)
Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)
URI: https://hdl.handle.net/20.500.12380/193680
Samling:Examensarbeten för masterexamen // Master Theses



Materialet i Chalmers öppna arkiv är upphovsrättsligt skyddat och får ej användas i kommersiellt syfte!