Distributed Web Crawler

dc.contributor.authorBjerkander, Hans
dc.contributor.authorKarlsson, Erik
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)sv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineering (Chalmers)en
dc.date.accessioned2019-07-03T13:21:04Z
dc.date.available2019-07-03T13:21:04Z
dc.date.issued2014
dc.description.abstractThis thesis investigates possible improvements in distributed web-crawlers. Web-crawling is the cornerstone of search-engines and a well defined part of Internet technology. Due to the size of the Web, it is important that a web-crawler is fast and efficient, since a web-crawler should be able to find the interesting sites before they change or disappear. The thesis will focus on crawler distribution concerning modularity, fault-tolerance and group membership services. The download order of crawlers will also be covered, since this greatly influences the efficiency of a crawler. In addition to the theoretical basis of the thesis, a prototype has been constructed in Java. The prototype is efficient, modular, fault-tolerant and configurable. The result from the thesis indicates that using a membership service is a good way to distribute a crawler and conclusively, the thesis also demonstrate a way to improve the crawling order compared to a breadth-first ordering.
dc.identifier.urihttps://hdl.handle.net/20.500.12380/193680
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectData- och informationsvetenskap
dc.subjectComputer and Information Science
dc.titleDistributed Web Crawler
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster Thesisen
dc.type.uppsokH
Ladda ner
Original bundle
Visar 1 - 1 av 1
Bild (thumbnail)
Namn:
193680.pdf
Storlek:
677.95 KB
Format:
Adobe Portable Document Format
Beskrivning:
Fulltext