Solving Problems, One Role at a Time
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Data science and AI (MPDSC), MSc
Publicerad
2023
Författare
Dunér, Felix
Johansson, Eric
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
For large companies, leveraging internal knowledge and existing information within
the organization has proved to be difficult for several reasons. In this thesis, which
is conducted in collaboration with Ericsson, an attempt to facilitate the extraction
of internal knowledge is made, more specifically by matching new issues that employees
face with pre-existing, solved ones. The issues are represented by so-called
‘support tickets’ and partly consist of manually entered text where the user describes
the problem. The support process could be optimized by automatically identifying
what kind of issue the user experience.
This study aims to investigate if it is possible to extract semantic information from
the text contained in support tickets through semantic role labeling (SRL), and leverage
that information to match similar issues related to Ericsson’s cloud infrastructure
branch. SRL is often used for information extraction and question-answering, but
not in a technical domain. Two pre-trained SRL models were tested: one based on
FrameNet and the other based on PropBank. Eventually, the FrameNet model was
used throughout the thesis.
After initial preprocessing and standardization of technical jargon, pre-trained stateof-
the-art (SOTA) models were used to extract semantic information, and visual
analysis and overall statistics supported the idea that they could identify relevant
targets in sentences and populate frames with roles accordingly. The information
yielded through SRL allowed for new ways of representing the support tickets. However,
further experiments with topic modeling and classification indicated that the
information produced by the FrameNet SRL model was not useful for grouping support
tickets according to the categorizations provided by Ericsson. It is suggested
that the FrameNet model may be too general for the specific context and that customization
of the semantic framework may be a possible solution. It is also noted
that the categorizations used as similarity proxies for the support tickets may be
based on information outside of the text used to represent the support tickets.
Even though the semantic information yielded through SRL did not improve the
ability to match similar support tickets in this case, we firmly believe that these
features can be helpful. Since the semantic frames provide information otherwise
not present in the text, they should be able to enrich the representation.
Beskrivning
Ämne/nyckelord
Semantic Role Labeling, Machine Learning, Transformers, Information Extraction, Issue Resolution, Sentence Analysis, Natural Language Processing