Bridging AI Readiness and Application:
Prototyping a Strategy-Aligned Language Model for Quality Insights
at Skanska

A Comprehensive Study of Organizational AI Maturity, Applied NLP Development, and
Scalable Implementation in Construction Quality Management

Master’s Thesis in Complex Adaptive Systems, and Quality and Operations Management

LISA LÖVGREN
OLIVIA TURUNEN

DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING

CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2025
www.chalmers.se

www.chalmers.se


Master’s thesis 2025

Bridging AI Readiness and Application:
Prototyping a Strategy-Aligned Language Model

for Quality Insights at Skanska

A Comprehensive Study of Organizational AI Maturity, Applied
NLP Development, and Scalable Implementation in Construction

Quality Management

LISA LÖVGREN
OLIVIA TURUNEN

Department of Civil and Environmental Engineering
Chalmers University of Technology

Gothenburg, Sweden 2025


Bridging AI Readiness and Application: Prototyping a Strategy-Aligned Language
Model for Quality Insights at Skanska
A Comprehensive Study of Organizational AI Maturity, Applied NLP Development,
and Scalable Implementation in Construction Quality Management
LISA LÖVGREN, OLIVIA TURUNEN

© LISA LÖVGREN, OLIVIA TURUNEN, 2025.

Supervisor: Rasmus Rempling, Department of Civil and Environmental Engineering
Supervisor: Peter Samuelsson, Skanska AB Examiner: Rasmnus Rempling, Depart-
ment of Civil and Environmental Engineering

Master’s Thesis 2025
Department of Civil and Environmental Engineering
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: Image generated with ChatGPT 4o, April 2025, prompted: "Construction
and AI" merged with resulting category clusters of data subset.

Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria
Printed by Chalmers Reproservice
Gothenburg, Sweden 2025

iv


Bridging AI Readiness and Application: Prototyping a Strategy-Aligned Language
Model for Quality Insights at Skanska
A Comprehensive Study of Organizational AI Maturity, Applied NLP Development,
and Scalable Implementation in Construction Quality Management
LISA LÖVGREN, OLIVIA TURUNEN
Department of Civil Engineering, Chalmers University of Technology

Abstract
The construction industry is under increasing pressure to improve efficiency, reduce
costs, and enhance sustainability. While other sectors have advanced in AI adop-
tion, construction remains comparatively behind. This thesis explores how artificial
intelligence (AI) can support decision-making in construction, with a focus on Qual-
ity Management at Skanska Sweden AB.

First, organizational AI Readiness was assessed through interviews and workshops
using established organizational frameworks. This reveals both strategic interest and
practical challenges in applying AI. Second, an operational use case was explored by
developing an AI prototype that processes historical quality deviation texts. The
prototype was developed with the purpose of creating value for the Quality Depart-
ment by providing insight. Using natural language processing (NLP), the prototype
explored a weakly supervised classification approach combining unsupervised clus-
tering, pseudo-labelling via zero-shot learning, and a fine-tuned transformer classi-
fier (XLM-R and SBERT). Two promising category types, incident type and affected
building component, were identified and co-developed with domain experts to struc-
ture the data.

The results show that while AI readiness is moderate, initiatives often remain siloed
due to limited infrastructure, resources, and unclear ownership. Skanska shows a
growing awareness and curiosity around AI and there is potential to learn from
international practices within the company. However, although large volumes of
data available, barriers remain particularly in terms of the availability of structured
and labelled data. There is also a need for further AI-specific expertise, and it re-
mains challenging to integrate new tools into established workflows. The prototype
demonstrates practical value by visualizing patterns in text data, enabling the Qual-
ity Department to adopt a more data-driven and preventive approach. While weak
supervision proved challenging due to limited label quality and model sensitivity,
the final classifier achieved approximately 67% accuracy through fine-tuning with a
manually labelled dataset, accounting of 6‰. Despite this, the approach successfully
enabled structured insights into issue frequency, duration, and distribution across
projects. The prototype also serves as a scalable proof of concept, illustrating how
tailored AI solutions can accelerate digital transformation in construction.

Keywords: Artificial Intelligence (AI), Natural Language Processing (NLP), Lan-
guage Model Prototype, Text Classification, AI Readiness, Quality Management,
Change Management, Construction Industry, Digital Transformation.

v


Acknowledgements
We would like to express our sincere gratitude to everyone who contributed to the
completion of this thesis.

First and foremost, we would like to thank our academic supervisor at Chalmers
University of Technology, Rasmus Rempling, for their guidance and encouragement
throughout the research process, and company supervisor, Peter Samuelsson, for
their engagement and reliability. We are also grateful to Skanska for providing the
opportunity to conduct this thesis within the organization and for sharing valuable
insights, supporting data access, and engaging in fruitful discussions. We also want
to express our gratitude towards Brosamverkan for enabling our international ex-
change. Finally, we want to thank all participants in the workshops and interviews
for their time and expertise, which significantly enriched the outcomes of this work.

Lisa Lövgren, Olivia Turunen, Gothenburg, May, 2025

vii


List of Acronyms

Below is the list of acronyms used throughout this thesis listed in alphabetical order:

List of Abbreviations

AEC Architecture, Engineering and Construction
AI Artificial Intelligence
BERT Bidirectional Encoder Representations from Transformers
BIM Building Information Modeling
DL Deep Learning
GDPR General Data Protection Regulation
GPT Generative Pre-trained Transformer
IoT Internet of Things
KMeans K-Means Clustering
KPI Key Performance Indicator
LLM Large Language Model
LOTClass Learning with Out-of-the-box Classifier for Text Classification
LSTM Long Short-Term Memory
MEGClass Mixed Expert Guided Classification
ML Machine Learning
NLI Natural Language Inference
NLP Natural Language Processing
NN Neural Network
PCA Principal Component Analysis
POS Part-of-speech
RNN Recurrent Neural Network
RQ Research Question
SBERT Sentence-BERT (Bidirectional Encoder Representations from

Transformers for Sentence Embeddings)
SQL Structured Query Language
STS Semantic Textual Similarity
t-SNE t-distributed Stochastic Neighbour Embedding
X-Class Explainable Classifier for Weakly Supervised Text Classification
XLM-R Cross-lingual Language Model - RoBERTa
ZSL Zero-Shot Learning

ix


Contents

List of Acronyms ix

List of Figures xv

List of Tables xvii

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Construction Industry . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Case Company . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Frame of Reference 7
2.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Construction Industry Characteristics . . . . . . . . . . . . . . 7
2.1.2 Problem Area and Applications . . . . . . . . . . . . . . . . . 9

2.2 Business Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Change Management . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 AI Readiness Framework . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 AI Adoption Strategies . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 Language Models . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.3 Models in Selection . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Data Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Methods 27
3.1 Research Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Qualitative Methods Used . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.1 Workshops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.3 Litterature Search . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.4 Qualitative Data Analysis . . . . . . . . . . . . . . . . . . . . 32

xi


Contents

3.4 Defining the Research Area . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Quality Management in the Construction Process as the Cho-

sen Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.2 Other Areas of Interest . . . . . . . . . . . . . . . . . . . . . . 34

3.5 Ensuring High-Quality Research . . . . . . . . . . . . . . . . . . . . . 35
3.5.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5.2 Replicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.3 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.4 Ethical Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5.5 Use of AI Tools During the Thesis . . . . . . . . . . . . . . . . 38

4 Implementation of AI Prototype 39
4.1 Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.1 Data and Data Collection . . . . . . . . . . . . . . . . . . . . 40
4.1.2 Initial Clustering Model . . . . . . . . . . . . . . . . . . . . . 41
4.1.3 Workshop-Based Label Design and Manual Annotation . . . . 42
4.1.4 Category Classification through Pseudo-Labelling and Classi-

fier Implementation . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1.5 Visualization of Results . . . . . . . . . . . . . . . . . . . . . 47

4.2 Limitations in Prototype Implementation . . . . . . . . . . . . . . . . 47
4.3 Ensuring High-Quality Data . . . . . . . . . . . . . . . . . . . . . . . 49

4.3.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.2 Replicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.3 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.4 Ethical Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 Results 53
5.1 Skanska’s AI-readiness . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 The Value to the Quality Department . . . . . . . . . . . . . . . . . 56

5.2.1 Insights from the Quality Department . . . . . . . . . . . . . 57
5.2.2 Illustrating Text-Based Data Results for Quality Insights . . . 58

5.3 Classification Performance and Output Examples . . . . . . . . . . . 65
5.3.1 Classification Performance via Confusion Matrices . . . . . . . 65
5.3.2 Model Interpretability and Semantic Visualization . . . . . . . 70
5.3.3 Example of Word Importance for Classification . . . . . . . . 70

6 Discussion 75
6.1 Discussion on AI Readiness . . . . . . . . . . . . . . . . . . . . . . . 75

6.1.1 Organizational Readiness Gaps and Ownership Challenges . . 75
6.1.2 Employee Trust and Safety . . . . . . . . . . . . . . . . . . . . 77
6.1.3 Technical and Data Foundation Gaps . . . . . . . . . . . . . . 78
6.1.4 Early Signs of Adoption and Competitive Opportunity . . . . 80

6.2 Discussion on Value Generating Results by AI-Prototype . . . . . . . 81
6.2.1 Organizational Value and Strategic Implications for the Qual-

ity Department . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2.2 Operational Value for the Quality Department . . . . . . . . . 82
6.2.3 Future Opportunities within Quality . . . . . . . . . . . . . . 83

xii


Contents

6.3 Prototype Performance and Designing Approach . . . . . . . . . . . . 83
6.3.1 Model Selection and Training Strategy . . . . . . . . . . . . . 84
6.3.2 Model Performance and Operational Robustness . . . . . . . . 84
6.3.3 Language and Domain Adaptation through Transfer Learning 85

7 Conclusion 87

Bibliography 89

A List of Interviewees I

B Additional Interview Guide III

C Interview Results V

xiii


Contents

xiv


List of Figures

2.1 Tech investments in Construction[1]. . . . . . . . . . . . . . . . . . . 10
2.2 AI Adoption in Construction vs Other Industries [1]. . . . . . . . . . 10
2.3 Overview of AI Use in the Construction Industry [2]. . . . . . . . . . 12
2.4 Schematic model of AIR [3]. . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Adoption Strategies [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Overview of models implemented. . . . . . . . . . . . . . . . . . . . . 19

3.1 Gantt chart illustrating the research process timeline. . . . . . . . . . 28
3.2 Overview of the currently reporting structure, representing each re-

ported issue and focusing on the construction process phases, e.g.
inspection and production. . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Overview of the desired reporting structure, representing each re-
ported issue and focusing on the category and nature of the issue,
e.g. construction part and action. . . . . . . . . . . . . . . . . . . . 33

4.1 Overview of the 5 stages of implementation, including the 3 mod-
els, including, raw swedish construction data, clustering wia SBERT
+ KMeans, category definition through workshop, zero-shot pseudo-
labels and transformer-based XLM-R classifier. . . . . . . . . . . . . . 39

4.2 Overview of the initial clustering method, using KMeans clustering. . 41
4.3 Overview of the weakly supervised MEGClass-inspired model, utiliz-

ing pseudolabels. The (a) approach is only categorization, meanwhile
model (b) is a semi-supervised classifier based on transformers. . . . . 45

5.1 Indices to class: [0-Bream 1-Perch 2-Pike 3-Roach 4-Silverbream 5-
Smelt 6-Whitefish].Confusion matrices showing the accuracies for dif-
ferent classification methods. . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 Distribution of stage of construction process per category. . . . . . . . 60
5.3 Distribution of the categories reported as a deviation per construction

parts and incidents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 Frequency over time of reported issues in Hus Göteborg. . . . . . . . 62
5.5 The median duration of open issues over all regions. . . . . . . . . . . 63
5.6 The geographical locations of projects with reported issues, per cat-

egory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.7 Confusion matrices showcasing the accuracy of the category predic-

tions, based on a sample of 25 data-points per predicted category. A
darker colour presents a higher accuracy. . . . . . . . . . . . . . . . . 69

xv


List of Figures

5.8 2D-visualization of the categorized sentence embeddings using t-SNE
for the entrie dataset of over 100 000 data-points. . . . . . . . . . . . 71

5.9 Figure showcasing the words in the sentence "[TITLE] Balkong [DESC]
Fogrester invändigt och utvändigt generellt" and their affect on model
and categorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.10 Figure showcasing the words in the sentence "[TITLE] Miljöhus [DESC]
Vilken tjoclek yttervägg ska det vara? Vad ska det vara för tak och
vilken tjocklek?" and their affect on model and categorization. . . . . 74

xvi


List of Tables

2.1 Overview of models, tools, and algorithms used in the NLP classifi-
cation pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Overview of key data features available in the dataset and used in the
prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.2 Overview of chosen labels through workshop; building elements and
common issue categories . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.3 Distribution of manually labelled quality issues across construction
parts and incident types. . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.1 Distribution of AI-classified quality issues by construction part and
incident type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

A.1 Participants’ roles and their organizational affiliations within the Skan-
ska group and external experts. . . . . . . . . . . . . . . . . . . . . . I

xvii


List of Tables

xviii


1
Introduction

As one of the world’s largest and most traditional sectors, the construction industry
is facing a pivotal moment in its digital evolution. It is increasingly challenged by
global conditions such as rising labor and material costs, supply chain disruptions,
and growing sustainability demands. While these external pressures cannot be fully
controlled, the industry can respond by transforming its internal processes and accel-
erating artificial intelligence (AI) adoption. While artificial intelligence has already
transformed industries such as finance and manufacturing, construction continues
to lag behind, held back by fragmented processes and limited standardization. This
thesis explores how AI can become a catalyst for change in construction, not by
replacing human expertise, but by supporting it. Focusing on a case within Skan-
ska Sverige AB, the study investigates both organizational readiness and practical
application through the development of an AI prototype for quality and deviation
management. The goal is to demonstrate how AI can unlock insights, enhance pre-
ventive work, and lay the foundation for scalable, long-term value creation with
AI, which will be obtained by implementing an AI prototype for natural language
processig (NLP) of large amount of texts.

1.1 Background
The background section introduces the broader characteristics and challenges of the
construction industry, outlines the transformative potential of AI, and presents the
case company and the organizational setting in which this research is conducted.

1.1.1 Construction Industry
The construction industry is characterized by complexity, fragmentation, and a high
degree of project uniqueness. Standardization is difficult to achieve, as even build-
ings based on identical designs need to be adapted to varying geological and climat
conditions. Each project involves new owners, contractors, subcontractors, and sup-
pliers working together. This setup makes it challenging to implement repeatable
processes, which limits the potential for automation and the integration of intelli-
gent technologies. [4]

In addition, construction projects follow non-linear and unstructured workflows,
where tasks are interdependent and often executed in parallel by subcontractors
with varying digital maturity. The physical environment is constantly changing,

1


1. Introduction

making coordination and real-time information sharing critical and difficult. Exter-
nal factors such as noise, dust, and instability further complicate data collection and
reduce the reliability of technical systems. These characteristics create uncertainty,
inefficiencies, and barriers to innovation. [4]

Some of the challenges currently facing the construction industry are linked to
broader societal issues and global conditions, for example, rising labour and ma-
terial costs caused by worldwide shortages. While these external factors cannot
be fully controlled, the industry can respond by transforming its internal processes
and operations. By adopting new technologies, the construction sector can increase
efficiency, improve resource management, and become more adaptable in a rapidly
changing world. A shift in mindset to remain competitive and sustainable is also
essential. [5]

1.1.2 Artificial Intelligence
In recent years, AI has shifted from a theoretical concept to a practical tool reshap-
ing industries. Rather than referring to a single technology, AI includes a collec-
tion of methods, such as machine learning, computer vision, and natural language
processing, that allow systems to interpret data, recognize patterns, and support
decision-making. These capabilities are now used to optimize supply chains, predict
equipment failures, and automate complex tasks across sectors.

Although construction has historically been one of the least digitized industries,
this is starting to change. With a global value exceeding USD 12 trillion (at the
time corresponding to SEK 126 trillion), the construction sector is under increasing
pressure to modernize in response to labour shortages, rising costs, and sustain-
ability goals [6]. AI offers a way forward: from enhancing safety through image
recognition on job sites, to optimizing schedules using predictive algorithms, to in-
tegrating real-time data with Building Information Modeling (BIM) [2].

Investment trends reflect this growing interest, between 2020 and 2022, over USD
50 billion (at the time corresponding to SEK 550 billion) was invested globally in
AEC (architecture, engineering, and construction) technologies, with a significant
portion directed toward late-stage ventures [6]. Although still in early stages of
adoption compared to other industries, AI holds considerable promise in addressing
long-standing challenges in construction. From improving schedule reliability and
reducing safety risks to enhancing resource planning and data integration, AI-based
solutions offer a pathway toward greater efficiency and control, if the industry can
overcome barriers related to culture, fragmentation, and digital maturity.

1.1.3 Case Company
Skanska AB is one of the world’s leading construction and project development com-
panies, founded in Sweden in 1887. With over 135 years of experience, the company

2


1. Introduction

has grown into a global player operating in selected markets in the Nordics, Europe,
and the United States. Skanska Sweden employs approximately 6 700 people and
had an operating income of SEK 1,2 billion in 2024 [7]. Their focus is on building a
better society and being a leader in sustainable solutions, quality, safety, and ethics
[8].

In Sweden, Skanska operates through several business units under Skanska Sweden
AB. The core construction and civil engineering operations consist of four main di-
visions: Skanska Hus (Building Construction), Skanska Väg och Anläggning (Civil
Infrastructure), Skanska Industrial Solutions, and Skanska Rental. Construction
and civil engineering activities account for approximately 85% of Skanska Sweden’s
total revenue [9].

Skanska Swedens’s operations are structured around three overarching business
streams: Construction, Residential Development, and Commercial Property De-
velopment. The Construction business stream includes both Civil construction, i.e.
roads, tunnels and bridges etc, and Building construction (=Hus). This study fo-
cuses exclusively on Skanska Hus, where operations are divided into regional units.
The study takes its starting point in the region of Skanska Hus Göteborg, broadening
to focusing on national quality function within Skanska Hus’s building operations
in Sweden, specifically within the unit.

Furthermore, interviews have also been conducted with representatives from Skan-
ska’s U.S. organization in New York City, and Skanska UK, to explore successful
applications of artificial intelligence in other parts of the world. This provided a
broader perspective on how AI readiness and implementation strategies may differ
across geographical and organizational contexts.

1.2 Purpose
The overall purpose of this study is to explore how AI can enhance strategic decision-
making and operational efficiency within the construction industry, focusing specif-
ically on Skanska Hus, Skanska Sverige. The study takes its starting point in the
recognition that, while many industries have already made significant progress in
adopting AI technologies, the construction sector remains comparatively behind,
largely due to its project-based nature, fragmented workflows, and varying levels of
digital maturity. In this context, the study seeks to understand how the construction
industry can begin to close this gap by identifying concrete opportunities, organiza-
tional prerequisites, and value-generating use cases for AI. Through an exploratory,
inductive approach, the research is divided into three interconnected tracks, each
addressing a distinct aspect of AI adoption and aligned with the study’s three re-
search questions.

The first focus is to analyse organizational AI readiness on a high level, assessing
the current state and maturity level within Skanska from an external perspective.
By applying established frameworks and theory on digital and AI transformation,

3


1. Introduction

learning from other industries, this part of the study identifies internal conditions,
barriers, and enablers for AI implementation.

The second focus is narrowing in on the area of quality management, identified dur-
ing the first exploratory phase, as an area with both business value and technical
feasibility. Through creating an AI-prototype, this part of the study analyses how
AI-generated insights can be integrated into decision-making and knowledge-sharing
in a construction context. It also reflects on broader organizational implications for
scaling AI solutions beyond isolated use cases.

The third focus concerns the practical implementation of AI and the technical per-
formance and potential of the prototype itself. Here, an AI prototype is developed
using NLP and ML to structure and analyse text-based quality data. Through
iterative testing, evaluation, and visualization of the results produced from the pro-
totype, the goal is to create an efficient and accurate model.

In conclusion, the study aims to contribute to a more nuanced understanding of
how the construction industry, in this case Skanska AB, can begin to adopt, im-
plement, and benefit from AI, specifically using text-processing to interpret quality
data.

1.3 Research Questions
With background to above Chapter 1, the report will address three research ques-
tions (RQs), which will be accompanied by relating theory, implementation, results
and analysis. The RQs are the following:

• RQ 1: What is Skanska’s current state of AI readiness, and what organisa-
tional strengths and challenges exist in terms of AI adoption?

• RQ 2: How can an AI-prototype create value within the department of Quality
Management?

• RQ 3: How can a language model prototype be developed and used to efficiently
extract and visualize insights from quality-related construction texts?

1.4 Limitations
This study is subject to several limitations that should be acknowledged. First, the
findings are primarily based on data and observations from a single Swedish con-
struction company, which may limit the generalizability of the conclusions to the
broader industry. Although the company is an important actor on the Swedish mar-
ket, its internal processes, digital maturity, and AI-related initiatives may not fully
reflect those of smaller firms or companies operating in other regions or contexts.

4


1. Introduction

Second, the assessment is conducted from the perspective of an external party
and does not include full access to internal strategic documentation, proprietary
datasets, or project-level decision-making. This means that some conclusions, par-
ticularly those related to organizational readiness, internal barriers, or technology
adoption trajectories are based on secondary sources or publicly available informa-
tion rather than first-hand implementation data.

Finally, while the analysis attempts to incorporate a broad view of AI technolo-
gies, it does not encompass all emerging fields or niche use cases. The emphasis is
placed on applications that are currently most relevant to the construction sector,
based on industry reports and peer-reviewed literature.

5


1. Introduction

6


2
Frame of Reference

This Section outlines the theoretical foundation for the study, starting by covering
the construction industry’s unique characteristics. To understand how AI can be suc-
cessfully adopted, the Section further explores theoretical frameworks for managing
organizational change and assessing AI readiness, along with strategic approaches to
AI adoption. Finally, it also introduces key AI components relevant to the prototype,
including data criteria, language models, and pre-trained architectures. Together,
these perspectives offer a comprehensive reference for understanding how construc-
tion firms can approach and implement AI effectively.

2.1 Construction
The construction industry differs from many other industries due to its fragmented
and project-based nature, where buildings are assembled through sequential, yet
disconnected, processes. This discontinuity has contributed to the industry’s slow
industrial and digital development. While technologies such as Internet of Things
(IoT), big data, cloud computing, and AI have driven digitalization across many
industries, particularly manufacturing, the construction sector remains behind. In
recent years, however, these technologies have begun to be applied in construction.
Despite this progress, their adoption remains limited to isolated areas, highlighting
the need for a more integrated and systemic implementation [4].

The following Section outlines key characteristics of the construction industry to
provide a foundation for understanding the potential barriers to the application of
AI.

2.1.1 Construction Industry Characteristics
Standardized construction plans are rare, as buildings based on identical designs still
differ due to varying geological and climatic conditions. As a result, each construc-
tion project is unique, making standardization difficult. For example, generating
detailed and reusable bills of materials for future projects is highly complex. This
uniqueness also comes from the dynamic composition of project teams, where own-
ers, contractors, subcontractors, and suppliers vary from project to project. The
construction industry operates through project unique production systems. The
development of automated and integrated intelligent systems requires modular and
repeatable components and processes, an approach that remains difficult to imple-

7


2. Frame of Reference

ment in this highly fragmented sector.

Furthermore, construction projects follow non-linear processes and are typically or-
ganized in an unstructured way. Rather than forming a sequential chain, tasks are
interlinked through shared resources and parallel activities. A significant part of the
work is carried out by subcontractors, whose varying levels of digital maturity and
project involvement affect the reliability of the information provided, both during
execution and in the aftermarket. These fragmented information flows contribute to
miscommunication among project participants and hinder effective documentation
for future use.

Coordination between participants is crucial, as overlapping workspaces, task se-
quences, and movement paths often create conflicts. This becomes even more
challenging because both the location and the environment change throughout the
project. In addition, construction equipment, materials, and labor need to be con-
tinuously relocated as the work progresses.

Furthermore, construction projects are highly complex and uncertain. Although
construction plans are often detailed, they are frequently modified during execu-
tion to adapt to dynamic environments, which can result in delays, rework, quality
deficiencies, and claims. To manage this uncertainty, project managers tend to in-
corporate large margins and risk buffers into the planning. While this helps prevent
issues, it can also lead to inefficient use of resources.

Unlike a clean and controlled environment like in other industries, construction sites
are often harsh and, as mentioned, unpredictable. Factors like noise, dust, mud, and
even risks of geological disasters, pose significant challenges for data collection, net-
work communication, and the reliability of intelligent systems. Moreover, workers
exposed to such environments may lack real-time information, limiting their ability
to respond to dangerous situations. This insecurity also reduces their willingness to
engage with technical or automated equipment. [4].

Furthermore, the construction sector faces persistent challenges related to orga-
nizational inertia and data fragmentation, both of which limit the adoption and
scalability of digital technologies, including AI. Organizational inertia in construc-
tion refers to the sector’s resistance to change, driven by old practices, fragmented
project structures, and legacy systems that slow the uptake of new technologies. At
the same time, construction data is often siloed across different platforms and stored
in non-standardized formats, restricting opportunities for cross-project integration
and advanced analytics [10]. While construction companies generate large volumes
of data, this data is often fragmented and lacks standardized formats, which hinders
cross-project learning, collaboration, and data-driven decision-making. Low digi-
tal maturity in construction firms compounds this challenge, as proactive leadership
and clear data governance structures are often missing [11]. These combined insights
underscore a dual challenge for construction firms seeking to leverage AI: they must
overcome cultural and structural barriers while simultaneously ensuring that project

8


2. Frame of Reference

data is consistent, accessible, and actionable [10, 11].

2.1.2 Problem Area and Applications
AI has emerged as a cornerstone technology in the Fourth Industrial Revolution,
transforming the way industries operate through data-driven decision-making, au-
tomation, and predictive capabilities [2]. Despite these advancements, the construc-
tion industry, valued at over USD 12 trillion globally (at the time corresponding to
SEK 126 trillion) [6], has traditionally lagged behind in terms of digitization and AI
adoption. Characterized by fragmented stakeholders, manual workflows, and ana-
logue tools, the construction sector has historically demonstrated resistance to tech-
nological change [2]. A McKinsey research highlights that, as of 2018, construction
ranked among the least digitized sectors when compared with twelve other indus-
tries [1]. However, growing demands for infrastructure, increasing labour shortages,
and regulatory pressures for transparency have begun to catalyse a digital transfor-
mation across Architecture, Engineering, and Construction (AEC) industries.

This transformation is being accelerated by AI technologies that promise to ad-
dress many of the sector’s most persistent challenges, such as cost overruns, safety
incidents, and inefficiencies in planning and resource allocation. For instance, AI-
powered image recognition can identify unsafe worker behavior from site footage,
while machine learning models can optimize scheduling by evaluating millions of
potential timelines. Enhanced analytics platforms are being used to monitor sensor
data in real time, improving both predictive maintenance and operational decision-
making [1]. McKinsey estimates that from 2020 to 2022, global investment in AEC
technology reached USD 50 billion (at the time corresponding to SEK 550 billion),
an 85% increase compared to the previous three years, with 1,229 deals closed dur-
ing that period [6]. This trend is illustrated in Figure 2.1, which shows both the
sharp increase in funding and the growing number of deals in AEC technology over
the past few years. These investments underscores a growing recognition that AI
an opportunity for the future of construction. However, the construction industry
shows both low current AI adoption and low future investment compared to other
industries, positioning it closer to the “Falling behind” in Figure 2.2. This suggests
a slow pace of digital transformation, which may limit its ability to capitalize on AI.

While enthusiasm around AI in construction is growing, the industry’s structural
and operational characteristics pose unique challenges to large-scale AI adoption.
Construction firms are often small, project based entities with limited digital infras-
tructure and constrained IT budgets, typically spending only 1-2% of their revenue
on IT, compared to 3-5% in other sectors [6].

AI applications in construction can be categorized across multiple domains: project
planning and scheduling, site monitoring, resource and waste optimization, health
and safety analytics, contract management, and supply chain logistics [2]. For ex-
ample, AI enables more accurate cost estimation and scheduling by leveraging large

9


2. Frame of Reference

Figure 2.1: Tech investments in Construction[1].

Figure 2.2: AI Adoption in Construction vs Other Industries [1].

10


2. Frame of Reference

datasets from previous projects and external factors such as weather or site condi-
tions [2].

AI is also being used to optimize resource allocation and minimize material waste.
By analyzing historical and real-time data, intelligent algorithms can forecast mate-
rial demand, suggest optimal delivery timing, and reduce storage costs, contributing
to both sustainability and profitability. Furthermore, AI can analyze data from sen-
sors, drones, and connected machines to provide construction site analytics. These
tools support real-time monitoring of productivity, safety risks, and performance
bottlenecks, enabling more responsive site management [2].

Additionally, AI is used to read and analyze complex construction contracts. These
AI tools can point out risky clauses, unclear parts, or mistakes, which helps make
better purchasing decisions and lowers the risk of legal problems. AI-powered audit
systems are also being implemented to ensure financial accuracy by cross-referencing
billing data, flagging anomalies, and aligning invoices with real-world progress. This
strengthens financial governance and supports more transparent reporting structures
[2].

Despite these opportunities, several barriers remain, seen in Figure 2.3. Cultural
resistance to change, high initial deployment costs, and a shortage of AI talent con-
tinue to hinder adoption. These challenges are illustrated in the lower left quadrant
of the figure, which highlights issues such as ethics, governance, and limited inter-
net connectivity. In addition, concerns around data ownership, transparency, and
cybersecurity (shown in the top right quadrant) remain unresolved. Nevertheless,
the trajectory is clear, the construction industry stands at a pivotal moment where
AI can redefine its operational norms. Companies that act early and strategically to
incorporate AI, will gain competitive advantages in cost efficiency, project reliability,
and overall value delivery [2].

11


2. Frame of Reference

Figure 2.3: Overview of AI Use in the Construction Industry [2].

2.2 Business Frameworks
This chapter establishes the theoretical foundation for understanding how businesses
adopt and integrate new technologies. It begins with a broad overview of change
management, then narrows down to established frameworks for technology adoption,
ultimately focusing on AI readiness and AI adoption strategies. Together, these per-
spectives provide a basis for analysing how businesses can integrate AI effectively,
ensuring alignment between technology, organizational structures, and industry con-
ditions. The frame of reference presented will later be used in the analysis to assess
construction companies like the case company, Skanska Sweden, can structure their
approach to AI use.

2.2.1 Change Management
Change management focuses on how organizations and individuals adapt to orga-
nizational transitions and change. One of the earliest models, proposed by Lewin
in 1947 [12], includes the stages of unfreezing, moving, and refreezing. Since then,
several other models have been introduced in the literature. While these models
vary, they all emphasize the need for a structured approach to managing change
and recommend appointing a change agent to lead the process. However, applying
traditional change models directly to the construction industry can be challenging
due to its unique characteristics, the industry’s specific nature requires adaptations

12


2. Frame of Reference

to these frameworks for them to be effective [12].

Successfully adopting AI within complex, project-oriented organizations in the con-
struction industry requires not only technical capability but also a structured ap-
proach to change management. Kotter’s Eight-Stage Model for Leading Change
offers an approach for this transition. His model emphasizes the importance of es-
tablishing a sense of urgency, creating a guiding coalition, and developing a vision
for change, all of which are critical for aligning AI initiatives with organizational
priorities and engaging employees in the transformation process. Importantly, Kot-
ter also highlights the need for short-term wins and continuous reinforcement, both
of which can help ensure that AI is seen as a credible, valuable tool rather than an
abstract or disruptive innovation. This model underscores the idea that even well-
designed AI tools will struggle to gain traction if the organizational environment is
not ready to support them [13].

Technology Acceptance Model
A widely recognized framework for understanding how individuals adopt and use
new technologies or services is the Technology Acceptance Model, first introduced
by Davis in 1989 [14]. It is based on the fact that the user’s decision to accept
and use a technology is primarily influenced by Perceived Usefulness and perceived
ease of use. Perceived Usefulness is about how an individual believes that using a
particular technology will improve their performance in the specific task or work.
Perceived Ease of Use is about how an individual believes that using a particular
system will be free from effort [14].

AI Readiness
The AI Readiness Framework, developed by Holmström in 2022, evaluates an or-
ganization’s ability to implement and use AI in a way that adds value to the or-
ganization. It is structured around four key dimensions: technologies, activities,
boundaries, and goals. The framework also helps organizations to address key bot-
tlenecks in AI adoption. This framework also serves as the basis for subsequent
research on AI readiness, which is further elaborated in the next Section 2.2.2 [15].

2.2.2 AI Readiness Framework
Building on Holmström’s study, Tehrani et al. studied 52 multinational corporations
on their orgaizational AI readiness, they conducted 52 semi-structured in-depth in-
terviews with decision makers across different organizations. Ultimately, the findings
identify eight key dimensions that influence an organization’s ability to successfully
implement AI. The study primarily focuses on the following, but is not strictly lim-
ited to them: natural language processing, computer vision, image recognition, and
deep learning. The research is focusing on a model for organizational readines, and
one for AI adoption strategies [3].

Organizational readiness refers to how ready the organization is in terms of cogni-
tive, emotional, and behavioural preparedness toward a change, before the a activity

13


2. Frame of Reference

is started. Organizational readiness should be acquired before starting a change ini-
tiative, and therefore consists of a pre-assesment of organizational capabilities, to
help identify what is needed, and to control the risk factors with a change initiative.
AI readiness refers to how prepared an organization is to adopt and use AI. Many
organizations expect AI to increase their productivity, however many struggle with
adopting AI due to lack of important infrastructure and organizational readiness.
Many managers are also unsure of wheter their organization is ready to implement
AI, and if so how to do it. Due to AI’s complex nature, it’s readiness can not only
rely on traditional readiness theories [3].

The AI-Readiness Framework is an organizational framework that can be catego-
rized in eight cathegories, as shown in Figure 2.4: (1) Environmental Readiness, (2)
Technological Readiness, (3) Informational Readiness, (4) Infrastructural Readiness,
(5) Data Readiness, (6) Participants’ Readiness, (7) Customers’ Readiness, and (8)
Process Readiness.

Figure 2.4: Schematic model of AIR [3].

14


2. Frame of Reference

(1) Environmental Readiness
This dimension refers to the organizational, technical, competitive, cultural, and
regulatory environment within which an organization operates. This includes the
macro-economic environment, organizational culture, and leadership. The regula-
tory environment affects AI implementation, as some countries have more supportive
or restrictive regulations in areas such as personalized data usage, cloud computing,
and AI-driven automation. These regulations determine whether companies can
freely adopt AI or must navigate legal constraints. An open organizational culture
fosters collaboration, tech-friendliness, and knowledge-sharing, making AI adoption
easier. Leaders who convince employees of AI’s benefits and address their concerns
increase willingness to adopt AI. Ensuring alignment among employees and manage-
ment further simplifies implementation. In summary, companies need a supportive
regulatory environment, an open culture, and strong leadership to successfully im-
plement AI.

(2) Technological Readiness
This dimension focuses on an organization’s level of technological maturity, which
includes the availability of necessary resources, a strong track record of using ad-
vanced technologies, and sufficient IT support, as successful adoption often relies
on a well-established technological foundation. Emphasis is placed on the orga-
nization’s historical experience and culture of working with technology, where AI
solutions such as chatbots or virtual assistants are not viewed as standalone fixes,
but rather as advanced tools that build upon an already mature digital environ-
ment. Moreover, the ability to process and manage data is essential. Organizations
that are actively adopting AI typically already have these technical competencies.
Lastly, sufficient and reliable IT support is crucial to avoid technical bottlenecks that
could otherwise hinder the smooth implementation and operation of AI technologies.

(3) Informational Readiness
Distinct from Data Readiness, this dimension refers to “people’s meaningful un-
derstanding of a specific issue.” In this context, it concerns the decision maker’s
knowledge of the relevant AI use case, how AI is applied within the industry, the
specific problem at hand, and how AI can be used to address that problem. The
focus lies specifically on the decision maker, rather than the broader organization or
team. To make informed and strategic decisions, decision makers must have a deeper
knowledge of AI’s practical applications and potential. Furthermore, the decision to
implement AI should be strategically significant, with the potential to substantially
impact operations and increase organizational profitability. Since AI adoption is
cost-intensive, it is essential that the decision maker has a clear understanding of
the problem to be solved, as well as an awareness of the AI solutions available on
the market in order to identify the most suitable option.

(4) Infrastructural Readiness
The availability and suitability of foundational resources are crucial for successful AI
implementation. This dimension includes three main categories: human resources,
financial resources, and IT resources. Financial resources are considered one of the

15


2. Frame of Reference

most critical and challenging components, as AI implementation is costly. It is not
only about having access to sufficient funds, but also about ensuring that it is con-
tinuous and flexible to enable quick adaptation to changing market conditions. AI
requires ongoing investment for updates and maintenance, given the rapid pace of
technological advancement. In terms of human resources, organizations must ensure
access to both internal and external talent with the necessary skills to support AI
initiatives. This readiness can be developed by training existing employees and by re-
cruiting experts who already have relevant knowledge. A key part of infrastructural
readiness is also the ability to bridge the HR gap, not only by hiring individuals with
strong technical skills, but also by seeking those with cross-industry domain knowl-
edge who can contextualize AI applications within the organization’s specific field.
IT resources include the organization’s technical infrastructure, such as computers,
networks, and programming environments. Important capabilities here involve stor-
age capacity, computing power, scalability, and security, all of which are necessary
for AI to function effectively. The absence of the right IT infrastructure can signifi-
cantly hinder a company’s ability to adopt and benefit from AI technologies.

(5) Data Readiness
This dimension refers to the availability of large amouts, high-quality, and rele-
vant data required to feed and support AI. It is important to distinguish data from
information, as previously explained: data can be seen as raw, often meaningless
symbols, while information is the result of organizing and interpreting data to cre-
ate meaning, something that humans use to solve problems or make decisions. In
the context of AI, data serves as the input that enables algorithms to function and
learn, whereas information is primarily used and created by humans. The volume
and quality of data are crucial for the performance of AI models.

(6) Participants’ Readiness
The psychological and behavioural preparedness of individuals within and around
the organization to adopt and work with AI. For employees, this readiness includes
three key aspects: acceptance, trust, and knowledge and skills. In many organiza-
tions, staff are not yet sufficiently familiar with AI, making knowledge and training
critical factors for successful adoption. A common barrier is the lack of trust, as
some employees fear that AI may replace their jobs. This can create resistance and
act as a bottleneck in the implementation process. Therefore, participants must
not only be capable, but also willing to embrace change and see AI as a support-
ive tool rather than a threat. Managerial readiness is equally important. For AI
to create value, it must be aligned with the organization’s strategic goals, both in
the short and long term. Managers play a crucial role in shaping attitudes toward
AI and setting the direction for its integration. Finally, readiness among external
stakeholders and partners is also essential, although it is more difficult to influence.
Partners must be willing to work with AI-based systems and adapt their processes
accordingly. Traditional mindsets and reluctance to change among partners can
slow down or even block AI adoption, highlighting the need for alignment beyond
the organization’s boundaries.

16


2. Frame of Reference

(7) Customers’ Readiness
This dimension refers to how well organizations are prepared to address customers’
needs, privacy concerns, and acceptance of AI technologies. It is essential that com-
panies have clear plans in place for managing potential issues related to AI use, in
order to minimize risks, particularly in industries handling customer transactional
data. In such cases, organizations must communicate transparently with customers
to build trust and avoid misunderstandings. While customers may not require in-
formation about AI used internally, their acceptance becomes more critical when AI
is used in customer-facing touchpoints or involves the use of customer data.

(8) Process’ Readiness
The last dimension includes three key components: operational integration, feed-
back mechanisms, and integrated communication. To fully leverage the value of AI,
operational integration must be in place, meaning that different teams and functions
within the organization collaborate effectively. This cross-functional integration is
essential for spreading the benefits of AI across departments and ensuring consis-
tent outcomes. A strong feedback mechanism is also critical. Continuous feedback
allows AI systems to improve their performance and better adapt to the specific
needs of teams or the organization as a whole. Since these needs may evolve over
time, a constant feedback loop helps maintain the relevance and effectiveness of AI
solutions. Finally, integrated communication plays a vital role in preventing system
failures caused by miscommunication. Clear and consistent communication across
the organization supports smoother implementation and operation of AI technolo-
gies.

2.2.3 AI Adoption Strategies
To successfully adopt AI, organizations must define a clear strategy, as the absence
of one is a major barrier to implementation. McKinsey research underscores that
lacking a defined AI strategy is among the most significant challenges faced by man-
agers [16]. Ultimately, the value of AI lies not in the technology itself, but in how
effectively it is integrated into organizational processes.

Building on this, Tehrani et al. [3] identified AI adoption strategies that organi-
zations can apply individually, or in combination, depending on their context and
readiness profile, shown in Figure 2.5. Each strategy aligns with specific AI readi-
ness dimensions, meaning that the strength or weakness of certain factors can shape
which approach is most appropriate to use.
In addition to internal capabilities, strategy selection also depends on external fac-
tors: (1) whether the organization’s main goal is cost efficiency or differentiation,
and (2) the perceived level of risk in AI adoption. Cost-driven firms might prioritize
partnerships, while differentiation-focused firms may prefer crawling or guinea pig
approaches to test new innovations. In this study, four of the five strategies will be
presented.

17


2. Frame of Reference

Figure 2.5: Adoption Strategies [3].

Adoption Strategies
• The Low-Hanging Fruit Strategy: This is a practical starting point for firms

with strong data and informational readiness but a reluctance to take on signif-
icant risk. It involves identifying straightforward use cases, such as reporting
or marketing automation, that can be implemented quickly and deliver early
wins, helping build momentum for further adoption.

• The Crawling Strategy: This strategy focuses on gradual, iterative AI adop-
tion. Organizations begin with smaller-scale pilots and expand based on
lessons learned. This approach is best suited to organizations with limited
financial flexibility but strong willingness to experiment and learn, requiring
strong process and participant readiness.

• The Guinea Pig Approach: This approach is good for larger firms to learn
indirectly by observing or partnering with smaller, more agile actors who ex-
periment with AI. This enables risk transfer while still gaining insights. The
approach is especially relevant when internal readiness is moderate, but there
is willingness to innovate.

• The Partnership Strategy: Lastly, this strategy focuses on engaging exter-
nal AI consultants or technology partners to compensate for limited internal
capabilities. These partnerships provide access to both infrastructure and
expertise, while supporting shared capability development. Environmental
readiness and trust in external consultant are essential enablers.

18


2. Frame of Reference

2.3 Prototype
The following section covers relevant theory connected to the implementation of the
prototype with the purpose of explaining, motivate and demonstrate the usages of
different methods and models. As the foundation of the prototype is AI and machine
learning, the framework naturally consists of AI in general, and text-processing
models in specific. Hence, first described is overall theory followed by an introduction
of models and algorithms used.

2.3.1 Artificial Intelligence
Artificial Intelligence (AI) is a broad concept, spanning multiple fields such as ma-
chine learning (ML), neural nets (NN) and deep learning (DL), see Fig. 2.6. The
umbrella term refers to the section of computer science where machines and comput-
ers are built to be able to perform complex tasks, mimicking the human intelligence
and ability to reason, predict and analyse. Not a system alone, AI is rather imple-
mented in machines or systems unlocking their potential and sense of intelligence
[17].

Figure 2.6: Overview of models implemented.

Machine Learning: ML is a subset of AI that enables systems to automati-
cally learn from data and improve their performance without being explicitly pro-
grammed. Instead of following hardcoded rules, ML algorithms identify patterns and
relationships in data to make predictions or decisions. ML models can be trained un-
der various learning paradigms depending on the availability and quality of labeled
data. While supervised and unsupervised learning remain foundational approaches,
recent developments in weakly supervised learning have introduced scalable solu-
tions for real-world data constraints:

Supervised Learning: Supervised learning involves training a model using fully

19


2. Frame of Reference

labelled data, where both the input and corresponding output (or ground truth) are
known. This approach enables models to learn direct mappings from inputs to out-
puts, and is widely used in classification and regression tasks. However, supervised
learning relies heavily on high-quality, annotated data which is a resource that is
often costly and labour-intensive to produce at scale. In this paradigm, common
techniques include decision trees, support vector machines, and regression models
such as linear or logistic regression.

Unsupervised Learning: In contrast, unsupervised learning is performed with-
out labelled data. The model is presented only with input features and must un-
cover patterns or structures autonomously. Common applications include clustering,
where data points are grouped based on similarity, e.g., using K-Means Clustering
(KMeans), and association analysis, where relationships between variables are dis-
covered without human supervision. This makes unsupervised learning well-suited
for exploratory analysis, anomaly detection, and image segmentation, among other
tasks.

Semi-supervised/Weakly supervised Learning: Weakly supervised learning
refers to a spectrum of learning strategies that leverage noisy, incomplete, or impre-
cise labelling to train predictive models. This paradigm addresses the challenge of
limited or imperfect supervision which is a common scenario in real-world applica-
tions where manual labelling is expensive or impractical. Three primary forms of
weak supervision are identified:

• Incomplete Supervision: Training on datasets where only a portion of the
data is labelled. Techniques like semi-supervised learning, transfer learning,
and active learning fall under this category.

• Inexact Supervision: Using coarse-grained labels, such as class-level annota-
tions instead of instance-level, which may lack detailed specificity but still
provide learning signals.

• Inaccurate Supervision: Occurs when labels contain errors or noise, often due
to human mistakes or automatic labeling heuristics. Models must then learn
to tolerate or correct for mislabeled examples [18].

Zero-shot and Few-shot Learning: Zero-shot learning (ZSL) is a supervised
machine learning paradigm in which a model is expected to correctly classify data
from previously unseen classes, that is, categories that were not represented in the
labelled training data. Unlike traditional supervised learning, which relies on anno-
tated examples for each class, zero-shot learning requires the model to generalize to
entirely new concepts without direct exposure during training. This is particularly
important in real-world scenarios where obtaining labelled examples is impractical,
expensive, or impossible. Examples include rare disease detection, emerging topics
in text classification, or recognizing obscure object categories in vision tasks. For
instance, while humans can distinguish tens of thousands of object categories, it is
infeasible to provide labelled data for every possible class a model might encounter.
To achieve generalization without labelled examples, ZSL methods typically rely on

20


2. Frame of Reference

auxiliary information, such as textual descriptions of the target classes and semantic
embeddings or class attributes [19].

Transfer Learning: Transfer learning or domain adaptation, has the purpose of
using a trained model for a new task, instead of training it from scratch. Com-
mon in ZSL, transfer learning is often used in methods that focuses on semantic
embeddings. An example would be to use Bidirectional Encoder Representations
from Transformers (BERT), which is pre-trained on language data, to convert newly
seen words into vector embeddings. With transfer learning it is also possible to by
recognizing one type of text or image, simultaneously identify other unseen ones[?].

Neural Nets and Deep Learning: NNs are a type of ML algorithm inspired
by the human brain’s structure, composed of layers of interconnected nodes (neu-
rons). These networks are particularly effective at learning complex, non-linear
relationships in data.

DL is a specialized form of neural networks with many hidden layers. It excels in
handling unstructured data such as text, audio, and images. DL is the foundation of
many recent breakthroughs in AI, including speech recognition, image generation,
and language models like BERT and Generative Pre-trained Transformer (GPT)
[20].

Embeddings as a Foundation for Language Understanding
Embeddings are a method of representing objects, such as words, sentences, im-
ages, or audio, as vectors in a continuous numerical space. These representations
are designed so that semantically similar inputs are located close together in this
space. In the context of NLP, embeddings allow machine learning models to capture
contextual and semantic relationships between words and texts.

Unlike manual feature engineering, embeddings are learned directly from data using
neural networks or other algorithms. This allows the model to identify complex
patterns in language that are not easily defined by rules or discrete categories. Em-
beddings enable tasks such as text classification, clustering, and semantic search
by converting raw text into a format that machine learning models can process
effectively [21].

2.3.2 Language Models
LMs are probabilistic models that assign likelihoods to sequences of words. Rather
than assessing grammatical correctness, they measure how “natural” a word se-
quence is based on patterns learned from real-world text. This capability enables
a wide range of natural language processing tasks, such as part-of-speech tagging,
lemmatization, summarization, translation, and question answering. There are two
main categories of language models:

Neural Network-based Models

21


2. Frame of Reference

NN use word embeddings to represent words as vectors and capture semantic mean-
ing. Early models like Word2Vec improved representation, but struggled with deeper
context. RNNs introduced the ability to handle sequential data by maintaining
memory over time, allowing them to process inputs like text and speech. However,
RNNs are limited by their strictly sequential nature, which slows down training on
longer sequences and makes parallelization difficult. Transformers addressed these
limitations by enabling parallel processing and introducing attention mechanisms
that weigh the importance of each word relative to others in a sentence. This archi-
tecture powers state-of-the-art models like BERT and GPT, which are pre-trained
on large-scale text corpora and can perform a wide range of tasks by leveraging deep
contextual understanding and large parameter capacities [22].

Natural Language Processing
NLP refers to the field of artificial intelligence focused on enabling machines to un-
derstand, interpret, and generate human language. Since its origins in the 1950s,
NLP has evolved into a suite of algorithms and tools that support tasks such as
part-of-speech tagging (POS), sentiment analysis, named entity recognition, pars-
ing, and machine translation. NLP powers a wide range of applications including
speech recognition, text classification, customer service chatbots, and content rec-
ommendation systems.

At its core, NLP applies structured linguistic rules and statistical techniques to
textual data. While effective in specific tasks, traditional NLP models often require
domain-specific tuning and can struggle with language ambiguity, contextual sub-
tleties, and low-resource languages.

Large Language Models
Large language models, such as OpenAI’s GPT and Google’s BERT, represent a ma-
jor advancement in language understanding and generation. Built on transformer
architecture and trained on massive text corpora using deep learning, LLMs can
generate coherent, contextually relevant text, answer questions, summarize content
and more, often with little or no task-specific fine-tuning.

LLMs differ from traditional NLP systems in scale and generality. They are not
limited to specific rule-based tasks but can perform a wide range of language tasks
through learned contextual understanding. Core technologies behind LLMs include
self-attention mechanisms, DNN, and massive parallel training, which allow them
to adapt flexibly to diverse domains [23].

2.3.3 Models in Selection
The AI prototype is built on algorithms and pre-trained models which will be cov-
ered in this section. An overview can be seen in 2.1.

MEGClass: MEGClass (Mutually Enhancing Granularities for Text Classifica-

22


2. Frame of Reference

Component Type Function
MEGClass Weakly Supervised Classifier Combines clustering, embeddings, and pseudo-labeling
BERT Monolingual LLM Contextual language understanding in English
XLM-R Multilingual LLM Cross-lingual language understanding
Sentence-BERT Sentence Embedding Model Generate semantically meaningful sentence vectors
MiniLM Lightweight Multilingual Model Efficient multilingual sentence encoding
KB Swedish SBERT Swedish Sentence Embedding Model Embed Swedish sentences for similarity tasks
spaCy NLP Processing Toolkit Tokenization, lemmatization, POS-tagging
KMeans Clustering Clustering Algorithm Group similar sentence embeddings

Table 2.1: Overview of models, tools, and algorithms used in the NLP classification
pipeline.

tion), is a state-of-the-art method for extremely weakly supervised text classification,
requiring only the surface names of target classes to operate without any labeled
data. The model was designed to overcome limitations in prior approaches that typ-
ically treat word-, sentence-, and document-level information independently, which
can lead to incorrect pseudo-labels when topic cues are ambiguous or inconsistent
across granularity levels.

MEGClass introduces a multi-granular approach, where words, sentences, and doc-
ument representations are allowed to mutually enhance each other. This results
in a more robust and context-aware estimation of class labels, even in challenging
real-world texts. Where core innovations of MEGClass include:

• Class-Oriented Sentence Representations: The model computes class-indicative
sentence embeddings by aligning sentences with class name representations,
emphasizing discriminative terms.

• Class Distribution Estimation: Rather than assigning each document a single
label, MEGClass estimates a class probability distribution, allowing it to gauge
classification confidence and reduce mislabelling.

• Contextualized Document Representations: Through a multi-head self-attention
network, the model creates enriched document vectors that capture hierarchi-
cal context from sentence-level signals.

• Iterative Feedback Mechanism: Confidently classified documents are used to
refine class representations iteratively. This improves class alignment across
documents and reduces error propagation.

• Pseudo-Labelling and Classifier Fine-Tuning: A subset of the most confidently
classified documents is used as pseudo-labelled data to fine-tune a downstream
classifier, making the model applicable to unseen examples.

MEGClass has demonstrated superior performance across several benchmark datasets,
particularly in scenarios with long documents and fine-grained classes. Its effective-
ness lies in its ability to leverage minimal supervision while still generating high-
quality pseudo-training sets, outperforming earlier methods like Learning with Out-
of-the-box Classifier for Text Classification (LOTClass) and Explainable Classifier
for Weakly Supervised Text Classification (X-Class) [24].

K-Means Clustering: KMeans clustering is one of the most widely used un-

23


2. Frame of Reference

supervised learning algorithms in machine learning. It operates on unlabeled data,
partitioning it into k distinct, non-overlapping clusters, where each data point is
assigned exclusively to the cluster with the nearest centroid. This method assumes
no prior knowledge about the data labels and aims to uncover inherent structure
based on similarity. The core idea behind KMeans is to minimize the intra-cluster
variance, or more precisely, the sum of squared Euclidean distances between each
data point and its assigned cluster centroid [25].

BERT and Sentence-BERT: One of the most significant advancements in (NLP)
is the introduction of BERT, developed by Devlin et al. (2018). BERT is a
transformer-based model pre-trained on large text corpora, designed to understand
the context of words in a bidirectional manner. It has achieved great results in a
wide range of NLP tasks, such as question answering, sentence classification, and
semantic textual similarity (STS).

However, BERT is inherently designed as a cross-encoder, meaning that for sen-
tence pair tasks, both sentences are jointly input into the model. While this joint
attention mechanism increases performance for tasks requiring fine-grained compar-
isons, it introduces a significant computational bottleneck. For example, comparing
10,000 sentences to each other using BERT requires approximately 50 million infer-
ence computations, which can take over 65 hours on a high-performance GPU. This
makes BERT unsuitable for large-scale tasks such as clustering, semantic search, or
retrieval [26].

To address this limitation, Sentence-BERT (SBERT) was proposed by Reimers and
Gurevych (2019). SBERT modifies the BERT architecture by applying a siamese
or triplet network structure, allowing it to generate fixed-size sentence embeddings.
Instead of comparing sentences within the model during inference, SBERT maps
each sentence to a vector space such that semantically similar sentences are close
together. These embeddings can then be efficiently compared using standard simi-
larity metrics such as cosine similarity.

The key innovation lies in the training objective. SBERT is fine-tuned on Natural
Language Inference (NLI) datasets, which teach the model to distinguish between
similar, contradictory, and neutral sentence pairs. During inference, the model in-
dependently encodes each sentence, which enables tasks like clustering, semantic
search, and zero-shot classification to be performed orders of magnitude faster than
with BERT, while still maintaining high accuracy. Moreover, SBERT introduces
a pooling strategy (typically mean pooling) over the output token embeddings to
produce the final sentence vector. This differs from BERT’s default use of the [CLS]
token, which has been shown to produce suboptimal results for sentence-level rep-
resentations [27].

Two SBERT models are used in the prototype, to generate vector embeddings of
Swedish texts. KB Swedish sentence-BERT is a bilingual Swedish-English sentence
embedding model developed by the National Library of Sweden (KB-Lab). It uses

24


2. Frame of Reference

KB-Bert, a Swedish BERT model as base-encoder, and all-mpnet-base-v2 as a En-
glish teacher model [28]. The other model paraphrase-multilingual-MiniLM-L12
-v2 model is a lightweight and multilingual SBERT model published by the Sentence-
Transformers team. It is trained on paraphrase data from multiple languages, in-
cluding Swedish, and provides sentence embeddings of 384 dimensions. Due to its
small size and high speed, it is especially suited for resource-efficient applications
[29].

XLM-R: Cross-lingual Language Model - RoBERTa (XLM-R) is a multilingual
transformer model developed by META to improve cross-lingual natural language
understanding. The model is trained using self-supervised learning techniques and
addresses the challenge of transferring knowledge between languages without requir-
ing additional task-specific data in the target language. It builds upon earlier models
like XLM and multilingual BERT but overcomes key limitations by incorporating a
substantially larger and more diverse training dataset, over two terabytes of filtered
data, covering a broader range of languages, including many low-resource ones that
previously lacked large-scale labelled or unlabelled corpora[30].

SpaCy: SpaCy is an open-source Python library, providing advanced and efficient
NLP on text, and offering features such as tokenization and POSTagging. Unlike
BERT, SpaCy is older and though operating at higher speed, less capable of captur-
ing contextual information [31].

2.3.4 Data Criteria
In AI development, especially in natural language processing and weakly supervised
learning, the quality of input data is critical. Several criteria are used to evaluate
whether data is suitable for training in machine learning contexts. These criteria
ensure that models are able to learn effectively, avoid systematic errors, and gener-
alize to new inputs. The following criteria were used to evaluate the data in terms
of its structure.

Completeness: Data completeness refers to if a dataset includes all the information
necessary to address its intended purpose. It evaluates if the data sufficiently covers
the full scope of a given question, if there are any gaps, or biases, that could distort
results. Incomplete datasets can cause inaccurate analyses, including misreported
metrics and biased decisions, which can undermine an organization’s confidence in
data-driven insights [32].

Data Volume: Data volume refers to the amount of information available to train
AI models, and it is especially critical in natural language processing and weakly
supervised learning. Large datasets enable models to detect complex patterns and
structures. AI training requires large volumes of both structured and unstructured
data to meet increasing model complexity. For weakly supervised approaches, un-
labelled data at scale is essential to compensate for the lack of explicit annotations.
Without sufficient volume, models risk underperformance and poor adaptability to

25


2. Frame of Reference

real-world tasks [33].

Relevance Relevance refers to the degree to which data contributes directly to
the specific objective or task the model is designed to solve. Relevant data provides
meaningful context that enhances the model’s ability to identify important patterns,
make accurate predictions, and avoid distraction from noise or unrelated variables.
When irrelevant information is included, it can dilute learning signals and reduce
overall model performance [34].

Consistency: Consistency ensures that data maintains a uniform format, struc-
ture, and meaning across sources and time. For AI, this is essential to enable
accurate parsing and interpretation. Inconsistent data, such as varying labels or
formats, can confuse models, lead to errors, and reduce the reliability of outputs
[35].

Noise: Data noise refers to irrelevant or incorrect data that can confuse AI models
and reduce their accuracy. It includes errors, outliers, and unnecessary information
that do not help the model’s task. Effective data validation techniques are needed
to detect and remove noise, ensuring cleaner data and better AI performance [36].

Bias: Bias in AI occurs when models produce unfair results due to prejudices in
training data or design choices. Biased data, such as underrepresentation of certain
groups, can cause inaccuracies and unreliable predictions. To reduce bias, it’s essen-
tial to use diverse data, transparent algorithms, and ongoing evaluation to promote
fairness and equity [37].

Representativeness: Representativeness in datasets for AI models refers to how
comprehensively the data captures the diversity and variability present in real-world
scenarios. A representative dataset should reflect the full spectrum of possible in-
puts, covering varied contexts and distributions, to ensure robust and reliable gener-
alization. Without representativeness, models risk developing biases or overfitting,
resulting in poor performance when faced with new or diverse inputs. Hence, care-
fully ensuring representativeness is crucial to enhancing the accuracy, fairness, and
effectiveness of AI systems [38].

26


3
Methods

This section outlines the research process, design, and methods used in the study to
ensure transparency, repeatability, and a clear understanding. The chosen method
is designed to systematically address the research questions and support the devel-
opment of a solution driven prototype.

In addition, this section presents the overarching research framework, the data collec-
tion strategies employed (including interviews, workshops, and document analysis),
and the techniques used for data interpretation. A structured approach was adopted
to ensure that the findings are robust, relevant, and reproducible.

Finally, the study includes a discussion of research quality considerations, including
validity, reliability, and ethical aspects such as data privacy, informed consent, and
responsible data handling. These aspects were carefully managed to ensure the in-
tegrity and credibility of the research.

3.1 Research Process
The process began by identifying and defining the overall research focus. This led to
the formulation of RQ 1:"What is Skanska’s current state of AI readiness, and what
organisational strengths and challenges exist in terms of AI adoption?". Through an
open-ended investigation, a specific area within the company was identified where
AI could create tangible value, forming the basis of RQ 2: "How can an AI-prototype
create value within the department of Quality Management?". Building on this, RQ
3: "How can a language model prototype be developed and used to efficiently extract
and visualize insights from quality-related construction texts?" was formed accord-
ingly, and is discussed in detail in Section 4.

These activities were carried out in parallel: organizational insights were qualita-
tively collected and analysed using the AI Readiness Framework, while the prototype
was iteratively developed to meet domain-specific needs. Given the complexity of
AI adoption in the construction sector, an exploratory approach was essential to
uncover relevant challenges, opportunities, and industry-specific conditions. The
timeline and phases of the study are presented in Figure 3.1.

27


3. Methods

Figure 3.1: Gantt chart illustrating the research process timeline.

3.2 Research Design
The study initially adopts an inductive reasoning approach, where broader conclu-
sions are derived from specific observations rather than tested against a predefined
hypothesis. This method is particularly valuable in exploratory research, as it al-
lows for the development of new insights and conceptual frameworks grounded in
empirical data.

At the same time, the study incorporates existing theoretical models, like the AI
Readiness Framework, to guide the interpretation of qualitative findings from inter-
views and workshops. Incorporating established theory adds a deductive dimension
to the study, as parts of the data collection are guided by predefined theoretical
constructs.

Overall, the research follows an abductive reasoning process, characterized by it-
erative movement between empirical data and theoretical insights. This approach
allows for ongoing refinement of both theoretical understanding and practical im-
plementation, which makes it particularly suitable for studying complex issues like
AI adoption in the construction industry [39].

3.3 Qualitative Methods Used
The study began with an exploratory phase that included an initial stakeholder
workshop, followed by several semi-structured qualitative interviews. These quali-
tative insights helped to refine the research focus and informed both the structured
development of the AI prototype and the analysis of AI readiness. The frame of
reference was developed alongside the interviews, prototype, and workshops. It had
two main goals: to support the study’s methodology and to provide a theoretical
foundation for the analysis and discussion.

28


3. Methods

Finally, additional workshops and structured interviews were conducted to gather
feedback and revisit insights from the initial session, and to triangulate findings.

3.3.1 Workshops
Two separate qualitative workshops were conducted, to brainstorm ideas, gather
feedback on the process, as well as collecting information on the AI in the organi-
zation.

1. Idea Generating Workshop

The initial workshop was inspired by the AIM-workshop methodology, first in-
troduced by Shiba, Shoji in 1987, which is an approach to identify and analyze
complicated or complex problems in a collaborative and structured way. The pro-
cess typically begins with clarifying shared aspirations or challenges, followed by
creative brainstorming to generate potential solutions. This methodology is partic-
ularly valuable in exploratory projects where diverse perspectives are essential for
defining needs, generating solutions, and building stakeholder commitment [40].

The workshop was conducted with five stakeholders from Skanska Hus Gothen-
burg, working in different stages of the construction process (byggprocessen). This
workshop followed an exploratory focus group approach, where the participants en-
gaged in guided discussions to explore attitudes, ideas, and expectations related to
AI in construction. Focus groups are useful for capturing diverse perspectives and
encouraging interaction, allowing participants to reflect, react, and build on each
other’s insights [39]. This dynamic helped generate practical input for both the
prototype development and the AI readiness analysis.

The purpose was to gain a deeper understanding of stakeholder workflows, chal-
lenges, and needs of the employees, to guide the final identification of an AI use
case. The first part of the workshop was structured around open-ended questions to
identify inefficiencies, time-consuming tasks, and key pain points in the participants’
daily work. The second part of the workshop focused on data usage, examining the
types of data used, stakeholders’ data literacy, and their perspectives on data-driven
decision-making. Finally, participants were given space to freely generate ideas and
reflect on the overall purpose of the study. To ensure unbiased insights, no prede-
fined concepts or solutions about AI were introduced to the participats, allowing
them to express their views without any bias towards AI use. Finally, the concept
of AI was introduced to the participants, to further collect ideas about use cases.
The reason AI was mentioned later in the workshop was not to make the partic-
ipants think about solutions, but instead about problems to solve. Insights from
the first workshop were further used to guide the interviews, guide the prototype
development, and shape the frame of reference.

2. Feedback Workshop

29


3. Methods

A second workshop was conducted with the purpose of gathering feedback on the
prototype, ensuring it aligned with stakeholder needs and addressed previously iden-
tified challenges. Participants were introduced to the prototype findings and en-
couraged to reflect on its usability, functionality, and integration into their existing
workflows. The perceived value of the prototype was also assessed, with discussions
centered around how the prototype could contribute to value creation within the
organization and support ongoing quality management efforts.

As part of the workshop preparation, participants received a pre-workshop task
which involved reviewing a file containing 10–20 clusters of deviations. These clus-
ters had been grouped based on titles and descriptions. Participants were asked to
reflect on potential patterns across the clusters, name each category, and identify
the most cost-driving deviations based on their professional experience. They were
also encouraged to think freely about alternative ways to categorize deviations, for
instance, in terms of risk reduction, cost savings, or recurrence prevention. In addi-
tion, participants were asked to consider which descriptive elements (e.g., actions,
damage types, locations) were most relevant for analysis. This preparatory activity
served to engage participants ahead of the workshop and ensured that the feedback
session was rooted in both practical experience and reflective input, ultimately en-
hancing the relevance and depth of the discussions.

The workshop insights were then prioritized to identify the most impactful modifi-
cations, forming the basis for final prototype adjustments. This iterative approach
strengthened stakeholder engagement and ensured that the solution was validated
before further development. The final prototype was built to contribute value by
addressing these identified priorities.

3.3.2 Interviews
The study applied a qualitative interview approach, where the initial interviews in
the explorative phase were positioned between semi-structured and unstructured
formats. An initial interview guide was used to loosely cover two main areas: (1)
the organization’s perceived value of AI and readiness, and (2) the feasibility of de-
veloping a prototype in their context. However, the guide served more as a flexible
support than a strict script. Interviewees were encouraged to speak freely, and the
interviewer followed up on relevant points as they emerged. This conversational and
open style allowed participants to highlight what they viewed as most important.

Following the first workshop, two participants were interviewed, and they recom-
mended additional interviewees through snowball sampling, which is a method in
which initial participants assist in identifying and recruiting additional relevant in-
terview subjects [39]. This approach made it possible to identify additional relevant
stakeholders within the Skanska organization, beyond the Skanska Hus Gothenburg
department. This helped shape both the prototype and the analysis of AI readi-
ness. In total, 13 individuals in the Swedish organization were interviewed using the

30


3. Methods

this phase. Respondents and their roles can be found in Appendix A. Some of them
were interviewed more than once to gather further information. The interviews were
conducted primarily in person; if the interviewee was not available onsite, the inter-
view was held digitally via Microsoft Teams. Interviews were manually transcribed
to capture immediate insights and nuances. In some cases, transcription was sup-
ported by AI software, in which case permission was obtained from the interviewee.

The first phase of interviews was thus more exploratory, defining the scope of the
study, highlighting contextual nuances, and identifying patterns related to AI readi-
ness. These interviews also informed the design and development of the AI proto-
type, ensuring it addressed stakeholder needs and priorities.

In addition, a second round of six semi-structured interviews were conducted to
validate and triangulate the initial interview and workshop findings. Triangulation
involves using multiple data sources, methods, or researchers to increase the trust-
worthiness and credibility of research findings [39]. These interviews consisted of
two professionals from the US organization at the company’s New York office, one
individual from the UK office, and three additional interviews with Swedish employ-
ees, two of whom had also participated in the initial round of interviews. While the
study primarily focuses on a Swedish context, the international perspective exceeds
the scope but contributes to triangulating findings. The purpose of the international
interviews was to understand how AI is applied in practice within an international
context at Skanska, and to compare our own AI model against their workflows.

These additional interviews followed a structured interview guide, which can be
found in Appendix B. The approach was grounded in the eight AI Readiness dimen-
sions proposed by Tehrani et al. [3]. This strengthened the empirical foundation
for answering Research Question 1 (RQ1) and enabled a critical assessment of the
prototype’s alignment with strategic objectives.

3.3.3 Litterature Search
To construct the frame of reference, the scientific database Scopus was used to
identify peer-reviewed and high-quality literature relevant to the research topic.
Systematic and well-documented literature searches are essential for ensuring trans-
parency, reliability, and replicability in academic research [39]. Therefore, a struc-
tured approach was applied when selecting keywords, which were derived from key
concepts in the research questions and refined iteratively during the process. Com-
mon Boolean operators (e.g., AND, OR) were used to combine terms and narrow
down the results.

To assess the relevance and quality of the sources, search results were filtered based
on citation count and publication outlet, with a focus on highly cited articles pub-
lished in reputable journals. In areas related to artificial intelligence, which is a
fast evolving field, particular emphasis was placed on recently published articles to
ensure that the theoretical foundation reflects the latest developments and current

31


3. Methods

state of the field with constant updates in the area.

Overall, litterature search covered the areas of the construcion indsutry, relevant
businesss frameworks. The frame of reference also covers relevant aspects of AI,
including specific models, to later reinforce their applicability in the study’s con-
text. This analysis of existing research, also known as secondary studies, refers to
examining and interpreting data or findings originally collected by others [39].

3.3.4 Qualitative Data Analysis
The interview data were analyzed thematically, using the AI Readiness framework
as a guiding structure. Thematic coding is a qualitative analysis method used to
identify and organize patterns or themes within interview data [39]. Each transcript
was reviewed, and segments were divided into themes according to the eight dimen-
sions of AI readiness. Within each of these eight initial themes, approximately five
key codes were identified, each capturing an essential aspect of the theme.

Following this first categorization, a second layer of analysis was performed by
connecting these codes to five overarching second-layer themes: (1) Organizational
Readiness Gaps and Ownership Challenges, (2) Employee Trust and Safety, (3) Tech-
nical and Data Foundation Gaps, and (4) Early Signs of Adoption and Competitive
Opportunity. Finally, these second-layer themes lay the basis for the discussion chap-
ter, and captures frequently mentioned insights from the interviewees. Altogether,
this analysis forms the basis for answering RQ1 and supports the formulation of
managerial implications for Skanska.

3.4 Defining the Research Area
Following a series of workshops, interviews, and an initial exploration of both the
construction industry and Skanska Hus’s operational practices, potential areas of
interest for AI application were identified. These areas were evaluated based on
technical feasibility, data availability, and potential business value.

3.4.1 Quality Management in the Construction Process as
the Chosen Area

The area, Quality Management in the Construction Process, was selected following a
series of workshops and careful evaluation (partly in accordance with Section 2.3.4).
It was identified as a domain characterized by a continuous and centralized inflow
of data, rich in metadata and largely untapped textual content. This combination
offered strong technical feasibility NLP and high business value through the poten-
tial to extract deeper insights and conclusions from the data.

Employees across all stages of the construction process interact with Skanska’s ACC
system, where they log deviations (avvikelser). A deviation refers to any activity
or outcome that fails to meet specified requirements, thereby affecting work quality,

32


3. Methods

Figure 3.2: Overview of the currently reporting structure, representing each reported
issue and focusing on the construction process phases, e.g. inspection and production.

Figure 3.3: Overview of the desired reporting structure, representing each reported issue
and focusing on the category and nature of the issue, e.g. construction part and action.

33


3. Methods

the final product, or the surrounding environment. Managing these deviations is
essential for organizational learning, continuous improvement, and compliance with
both contractual obligations and external standards. Notably, deviations are most
commonly reported during the production and aftermarket phases, rather than dur-
ing inspection or product development. Historically, such issues were documented
in the BIM360 system, whose historical data remains accessible. Today, deviations
are registered according to a construction process hierarchy (see Figure 3.2). There
is a recognized need for improved oversight and timely handling of deviations to
prevent costly consequences in later project phases.

The ACC/BIM360 platforms collect and store quality-related deviations and issues
using predefined categories and root cause classifications tailored to the different
stages of the building process. This system design helps prevent users from impro-
vising their own taxonomies and reinforces the understanding that documentation
serves a broader analytical purpose. Users are also prompted to provide a title and
a description summarizing the issue. As of May 2024, Skanska introduced a new set
of predefined categories within ACC, enabling better tracking, sorting, and analysis
of deviations. This update enhances consistency in documentation and supports
systematic analysis to detect patterns, facilitate learning, and improve quality per-
formance over time.

Given the volume and structure of the existing data, there is a significant oppor-
tunity to improve data utilization in order to support the Quality Department’s
proactive quality management. This study therefore aimed to demonstrate how
AI, specifically through NLP, can extract insights and enhance the categorization of
issues. By restructuring issue reporting to emphasize what construction part was af-
fected or what type of incident occurred, the organization can foster more proactive
strategies and reduce both the time and cost associated with recurring deviations.
This revised reporting logic is illustrated in Figure 3.3, and showcased in Section 4.

3.4.2 Other Areas of Interest
During the exploratory phase multiple areas of possible interest were introduced.
These, however, were not deemed fit to use as focus scope according to their strate-
gic and integrated business complexity, as well as the data criteria found in Section
2.3.4. The following section presents considered areas.

Risk assessment in the Cost Estimation of Construction Projects
The second area of interest focuses on cost calculations, including an added risk
factor, which must be made for each project request. This risk assessment is often
based on underlying data, supplier prices, but primarily on experience-based knowl-
edge and team discussions. Since each project is unique, there is no exact method
for this process, it relies heavily on experience based knowledge. Cost estimates vary
from person to person and across different regions. Ultimately, many projects end
up costing significantly less than initially estimated, meaning that costs are often
overestimated due to higher-than-necessary risk additions. While there is a large

34


3. Methods

amount of available data, there is no structured approach to conducting an analysis.
There is a need for a tool to support this process, but it remains a complex area,
as it involves many individuals and their experience-based assessments. Therefore,
this area was not considered further.

The Process of Reviewing and Verifying Accuracy in Project Planning
The third area of interest relates to the validation process throughout the construc-
tion value chain. From initial architectural designs, through technical planning and
engineering, to the execution phase on site. A recurring issue identified is the lack
of reliable, complete, and up-to-date project information being transferred between
actors. This leads to time-consuming double-checking activities, such as manual
cross-verification of drawings, bills of quantities, and technical specifications, to en-
sure no critical information has been omitted or misinterpreted. These inefficiencies
stem largely from communication breakdowns and fragmented project documenta-
tion practices, which are common challenges in construction project management.
Although addressing this area would offer significant value, it was deemed too com-
plex for the current study due to the heterogeneous nature of the data (including
both text-based and image-based documents) and the difficulty of standardizing
such unstructured information streams.

Samläsning
The third area, Samläsning, is loosely linked to the previous area, it refers to the
process of harmonizing terminologies and ensuring consistent communication across
project participants. Differences in how design elements, construction materials,
or processes are labelled and described create misunderstandings that complicate
project execution and quality assurance efforts. This semantic misalignment often
leads to errors, rework, and delays. Although this area is highly relevant and ties
into broader issues of digitalization and it was ultimately excluded from this study.
The decision was based on the complexity of the communication challenges involved
and the broad organizational changes required to address them systematically.

3.5 Ensuring High-Quality Research
To maintain transparency and reflect on potential limitations, the following method-
ological considerations were acknowledged.

• Positive bias in participants: Most interviewees are highly open to change,
innovation, and AI, which may skew results toward opt