Expressive corpus search in a modern framework - Developing expressiveness for Korpsearch, a more efficient tool by which to query a corpus
dc.contributor.author | SALOMONSSON, VICTOR | |
dc.contributor.author | THORESSON, MIJO | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
dc.contributor.examiner | Kemp, Graham | |
dc.contributor.supervisor | Smallbone, Nicholas | |
dc.date.accessioned | 2025-02-25T13:13:59Z | |
dc.date.available | 2025-02-25T13:13:59Z | |
dc.date.issued | 2024 | |
dc.date.submitted | ||
dc.description.abstract | In this thesis we have developed a new corpus querying program, whose purpose is to be fast but also expressive. We achieved this by implementing the ability to query on prefix, suffix, contains, Python regular expressions, as well as with disjunctions whilst maintaining high speeds. Using our program, execution times for queries were on average half those of an established corpus querying program, Corpus Workbench. Our program’s time taken to execute queries on disjunction was a 4% of what Corpus Workbench required, for instance. This thesis shows that one can implement disjunction by dividing disjunctive queries into all the possible permutations of subqueries. Then one can find their result through a quick intersection finding program. This extends to being able to find all words containing a certain string, and general Python regex matches. Our program also shows that one can depart from the disjunction solution path, the path that is to find all matching words and then form a disjuntive query between them, in special cases. Special solutions have been made for prefix and suffix which have managed to have shorter execution time for those kinds of queries. | |
dc.identifier.coursecode | DATX05 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12380/309157 | |
dc.language.iso | eng | |
dc.setspec.uppsok | Technology | |
dc.subject | corpus | |
dc.subject | corpora | |
dc.subject | Corpus Workbench | |
dc.subject | Korpsearch | |
dc.subject | efficiently | |
dc.subject | query | |
dc.subject | Computer | |
dc.subject | science | |
dc.subject | computer science | |
dc.subject | engineering | |
dc.subject | project | |
dc.subject | thesis | |
dc.title | Expressive corpus search in a modern framework - Developing expressiveness for Korpsearch, a more efficient tool by which to query a corpus | |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.degree | Master's Thesis | en |
dc.type.uppsok | H | |
local.programme | Data science and AI (MPDSC), MSc | |
local.programme | High-performance computer systems (MPHPC), MSc |