Expressive corpus search in a modern framework - Developing expressiveness for Korpsearch, a more efficient tool by which to query a corpus

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

In this thesis we have developed a new corpus querying program, whose purpose is to be fast but also expressive. We achieved this by implementing the ability to query on prefix, suffix, contains, Python regular expressions, as well as with disjunctions whilst maintaining high speeds. Using our program, execution times for queries were on average half those of an established corpus querying program, Corpus Workbench. Our program’s time taken to execute queries on disjunction was a 4% of what Corpus Workbench required, for instance. This thesis shows that one can implement disjunction by dividing disjunctive queries into all the possible permutations of subqueries. Then one can find their result through a quick intersection finding program. This extends to being able to find all words containing a certain string, and general Python regex matches. Our program also shows that one can depart from the disjunction solution path, the path that is to find all matching words and then form a disjuntive query between them, in special cases. Special solutions have been made for prefix and suffix which have managed to have shorter execution time for those kinds of queries.

Beskrivning

Ämne/nyckelord

corpus, corpora, Corpus Workbench, Korpsearch, efficiently, query, Computer, science, computer science, engineering, project, thesis

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced