A Webpage Structure Processing Algorithm - Extending the Page Tailor Toolkit

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/74444
Download file(s):
File Description SizeFormat 
74444.pdfFulltext1.67 MBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Master Thesis
Title: A Webpage Structure Processing Algorithm - Extending the Page Tailor Toolkit
Authors: Andrén, Lars
Abstract: Research in user preference-based automatic processing on the web, web page content adaptation for a small screen and informative value of web pages have resulted in the design and implementation of an algorithm, called the Domain Heritage-algorithm. This algorithm extends the functionality of the Page Tailor toolkit; a program that is the result of C-Y Tsai’s thesis “Web Page Tailoring Tool for Mobile Devices”. The algorithm extending the toolkit enables automatic processing of web pages where preferences on which parts to be displayed have not been stored. The Domain Heritage-algorithm will not work unless at least one web page of the specific domain visited has been personalised previously. This extended toolkit has then been tested on ten subjects and a number of web sites. The test results were pretty much in accordance with the expectations, but the test subjects’ experience in using the Page Tailor toolkit was found to be quite influential on the rate of successful running of the algorithm. Three major conclusions are made. The first one is that too much editing of the appearance of web page content can result in loss of informative value and successful totally automatic extraction of web page content needs semantic processing. Further, XPaths has been a good choice of data for the algorithm to process as the results of the Big Oanalysis of the running time were acceptable, and that it was possible to implement the algorithm in the existing software. Finally, previous experience in usage of the Page Tailor toolkit, as well as more than one personalised web page is essential to the successful running of the Domain Heritagealgorithm.
Keywords: Information Technology;Informationsteknik
Issue Date: 2007
Publisher: Chalmers tekniska högskola / Institutionen för tillämpad informationsteknologi (Chalmers)
Chalmers University of Technology / Department of Applied Information Technology (Chalmers)
Series/Report no.: Master thesis - Technical Communication, Centre for Digital Media and higher education, Chalmers University of Technology : 2007:2
URI: https://hdl.handle.net/20.500.12380/74444
Collection:Examensarbeten för masterexamen // Master Theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.