Bridging AI Readiness and Application: Prototyping a Strategy-Aligned Language Model for Quality Insights at Skanska A Comprehensive Study of Organizational AI Maturity, Applied NLP Development, and Scalable Implementation in Construction Quality Management Master’s Thesis in Complex Adaptive Systems, and Quality and Operations Management LISA LÖVGREN OLIVIA TURUNEN DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2025 www.chalmers.se www.chalmers.se Master’s thesis 2025 Bridging AI Readiness and Application: Prototyping a Strategy-Aligned Language Model for Quality Insights at Skanska A Comprehensive Study of Organizational AI Maturity, Applied NLP Development, and Scalable Implementation in Construction Quality Management LISA LÖVGREN OLIVIA TURUNEN Department of Civil and Environmental Engineering Chalmers University of Technology Gothenburg, Sweden 2025 Bridging AI Readiness and Application: Prototyping a Strategy-Aligned Language Model for Quality Insights at Skanska A Comprehensive Study of Organizational AI Maturity, Applied NLP Development, and Scalable Implementation in Construction Quality Management LISA LÖVGREN, OLIVIA TURUNEN © LISA LÖVGREN, OLIVIA TURUNEN, 2025. Supervisor: Rasmus Rempling, Department of Civil and Environmental Engineering Supervisor: Peter Samuelsson, Skanska AB Examiner: Rasmnus Rempling, Depart- ment of Civil and Environmental Engineering Master’s Thesis 2025 Department of Civil and Environmental Engineering Chalmers University of Technology SE-412 96 Gothenburg Telephone +46 31 772 1000 Cover: Image generated with ChatGPT 4o, April 2025, prompted: "Construction and AI" merged with resulting category clusters of data subset. Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria Printed by Chalmers Reproservice Gothenburg, Sweden 2025 iv Bridging AI Readiness and Application: Prototyping a Strategy-Aligned Language Model for Quality Insights at Skanska A Comprehensive Study of Organizational AI Maturity, Applied NLP Development, and Scalable Implementation in Construction Quality Management LISA LÖVGREN, OLIVIA TURUNEN Department of Civil Engineering, Chalmers University of Technology Abstract The construction industry is under increasing pressure to improve efficiency, reduce costs, and enhance sustainability. While other sectors have advanced in AI adop- tion, construction remains comparatively behind. This thesis explores how artificial intelligence (AI) can support decision-making in construction, with a focus on Qual- ity Management at Skanska Sweden AB. First, organizational AI Readiness was assessed through interviews and workshops using established organizational frameworks. This reveals both strategic interest and practical challenges in applying AI. Second, an operational use case was explored by developing an AI prototype that processes historical quality deviation texts. The prototype was developed with the purpose of creating value for the Quality Depart- ment by providing insight. Using natural language processing (NLP), the prototype explored a weakly supervised classification approach combining unsupervised clus- tering, pseudo-labelling via zero-shot learning, and a fine-tuned transformer classi- fier (XLM-R and SBERT). Two promising category types, incident type and affected building component, were identified and co-developed with domain experts to struc- ture the data. The results show that while AI readiness is moderate, initiatives often remain siloed due to limited infrastructure, resources, and unclear ownership. Skanska shows a growing awareness and curiosity around AI and there is potential to learn from international practices within the company. However, although large volumes of data available, barriers remain particularly in terms of the availability of structured and labelled data. There is also a need for further AI-specific expertise, and it re- mains challenging to integrate new tools into established workflows. The prototype demonstrates practical value by visualizing patterns in text data, enabling the Qual- ity Department to adopt a more data-driven and preventive approach. While weak supervision proved challenging due to limited label quality and model sensitivity, the final classifier achieved approximately 67% accuracy through fine-tuning with a manually labelled dataset, accounting of 6‰. Despite this, the approach successfully enabled structured insights into issue frequency, duration, and distribution across projects. The prototype also serves as a scalable proof of concept, illustrating how tailored AI solutions can accelerate digital transformation in construction. Keywords: Artificial Intelligence (AI), Natural Language Processing (NLP), Lan- guage Model Prototype, Text Classification, AI Readiness, Quality Management, Change Management, Construction Industry, Digital Transformation. v Acknowledgements We would like to express our sincere gratitude to everyone who contributed to the completion of this thesis. First and foremost, we would like to thank our academic supervisor at Chalmers University of Technology, Rasmus Rempling, for their guidance and encouragement throughout the research process, and company supervisor, Peter Samuelsson, for their engagement and reliability. We are also grateful to Skanska for providing the opportunity to conduct this thesis within the organization and for sharing valuable insights, supporting data access, and engaging in fruitful discussions. We also want to express our gratitude towards Brosamverkan for enabling our international ex- change. Finally, we want to thank all participants in the workshops and interviews for their time and expertise, which significantly enriched the outcomes of this work. Lisa Lövgren, Olivia Turunen, Gothenburg, May, 2025 vii List of Acronyms Below is the list of acronyms used throughout this thesis listed in alphabetical order: List of Abbreviations AEC Architecture, Engineering and Construction AI Artificial Intelligence BERT Bidirectional Encoder Representations from Transformers BIM Building Information Modeling DL Deep Learning GDPR General Data Protection Regulation GPT Generative Pre-trained Transformer IoT Internet of Things KMeans K-Means Clustering KPI Key Performance Indicator LLM Large Language Model LOTClass Learning with Out-of-the-box Classifier for Text Classification LSTM Long Short-Term Memory MEGClass Mixed Expert Guided Classification ML Machine Learning NLI Natural Language Inference NLP Natural Language Processing NN Neural Network PCA Principal Component Analysis POS Part-of-speech RNN Recurrent Neural Network RQ Research Question SBERT Sentence-BERT (Bidirectional Encoder Representations from Transformers for Sentence Embeddings) SQL Structured Query Language STS Semantic Textual Similarity t-SNE t-distributed Stochastic Neighbour Embedding X-Class Explainable Classifier for Weakly Supervised Text Classification XLM-R Cross-lingual Language Model - RoBERTa ZSL Zero-Shot Learning ix Contents List of Acronyms ix List of Figures xv List of Tables xvii 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Construction Industry . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.3 Case Company . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Frame of Reference 7 2.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Construction Industry Characteristics . . . . . . . . . . . . . . 7 2.1.2 Problem Area and Applications . . . . . . . . . . . . . . . . . 9 2.2 Business Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Change Management . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 AI Readiness Framework . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 AI Adoption Strategies . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.1 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.2 Language Models . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.3 Models in Selection . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.4 Data Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 Methods 27 3.1 Research Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Qualitative Methods Used . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.1 Workshops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.2 Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.3 Litterature Search . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.4 Qualitative Data Analysis . . . . . . . . . . . . . . . . . . . . 32 xi Contents 3.4 Defining the Research Area . . . . . . . . . . . . . . . . . . . . . . . 32 3.4.1 Quality Management in the Construction Process as the Cho- sen Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4.2 Other Areas of Interest . . . . . . . . . . . . . . . . . . . . . . 34 3.5 Ensuring High-Quality Research . . . . . . . . . . . . . . . . . . . . . 35 3.5.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.5.2 Replicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5.3 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5.4 Ethical Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.5 Use of AI Tools During the Thesis . . . . . . . . . . . . . . . . 38 4 Implementation of AI Prototype 39 4.1 Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1.1 Data and Data Collection . . . . . . . . . . . . . . . . . . . . 40 4.1.2 Initial Clustering Model . . . . . . . . . . . . . . . . . . . . . 41 4.1.3 Workshop-Based Label Design and Manual Annotation . . . . 42 4.1.4 Category Classification through Pseudo-Labelling and Classi- fier Implementation . . . . . . . . . . . . . . . . . . . . . . . . 43 4.1.5 Visualization of Results . . . . . . . . . . . . . . . . . . . . . 47 4.2 Limitations in Prototype Implementation . . . . . . . . . . . . . . . . 47 4.3 Ensuring High-Quality Data . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.2 Replicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.3 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.4 Ethical Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5 Results 53 5.1 Skanska’s AI-readiness . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2 The Value to the Quality Department . . . . . . . . . . . . . . . . . 56 5.2.1 Insights from the Quality Department . . . . . . . . . . . . . 57 5.2.2 Illustrating Text-Based Data Results for Quality Insights . . . 58 5.3 Classification Performance and Output Examples . . . . . . . . . . . 65 5.3.1 Classification Performance via Confusion Matrices . . . . . . . 65 5.3.2 Model Interpretability and Semantic Visualization . . . . . . . 70 5.3.3 Example of Word Importance for Classification . . . . . . . . 70 6 Discussion 75 6.1 Discussion on AI Readiness . . . . . . . . . . . . . . . . . . . . . . . 75 6.1.1 Organizational Readiness Gaps and Ownership Challenges . . 75 6.1.2 Employee Trust and Safety . . . . . . . . . . . . . . . . . . . . 77 6.1.3 Technical and Data Foundation Gaps . . . . . . . . . . . . . . 78 6.1.4 Early Signs of Adoption and Competitive Opportunity . . . . 80 6.2 Discussion on Value Generating Results by AI-Prototype . . . . . . . 81 6.2.1 Organizational Value and Strategic Implications for the Qual- ity Department . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.2.2 Operational Value for the Quality Department . . . . . . . . . 82 6.2.3 Future Opportunities within Quality . . . . . . . . . . . . . . 83 xii Contents 6.3 Prototype Performance and Designing Approach . . . . . . . . . . . . 83 6.3.1 Model Selection and Training Strategy . . . . . . . . . . . . . 84 6.3.2 Model Performance and Operational Robustness . . . . . . . . 84 6.3.3 Language and Domain Adaptation through Transfer Learning 85 7 Conclusion 87 Bibliography 89 A List of Interviewees I B Additional Interview Guide III C Interview Results V xiii Contents xiv List of Figures 2.1 Tech investments in Construction[1]. . . . . . . . . . . . . . . . . . . 10 2.2 AI Adoption in Construction vs Other Industries [1]. . . . . . . . . . 10 2.3 Overview of AI Use in the Construction Industry [2]. . . . . . . . . . 12 2.4 Schematic model of AIR [3]. . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 Adoption Strategies [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6 Overview of models implemented. . . . . . . . . . . . . . . . . . . . . 19 3.1 Gantt chart illustrating the research process timeline. . . . . . . . . . 28 3.2 Overview of the currently reporting structure, representing each re- ported issue and focusing on the construction process phases, e.g. inspection and production. . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Overview of the desired reporting structure, representing each re- ported issue and focusing on the category and nature of the issue, e.g. construction part and action. . . . . . . . . . . . . . . . . . . . 33 4.1 Overview of the 5 stages of implementation, including the 3 mod- els, including, raw swedish construction data, clustering wia SBERT + KMeans, category definition through workshop, zero-shot pseudo- labels and transformer-based XLM-R classifier. . . . . . . . . . . . . . 39 4.2 Overview of the initial clustering method, using KMeans clustering. . 41 4.3 Overview of the weakly supervised MEGClass-inspired model, utiliz- ing pseudolabels. The (a) approach is only categorization, meanwhile model (b) is a semi-supervised classifier based on transformers. . . . . 45 5.1 Indices to class: [0-Bream 1-Perch 2-Pike 3-Roach 4-Silverbream 5- Smelt 6-Whitefish].Confusion matrices showing the accuracies for dif- ferent classification methods. . . . . . . . . . . . . . . . . . . . . . . . 59 5.2 Distribution of stage of construction process per category. . . . . . . . 60 5.3 Distribution of the categories reported as a deviation per construction parts and incidents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.4 Frequency over time of reported issues in Hus Göteborg. . . . . . . . 62 5.5 The median duration of open issues over all regions. . . . . . . . . . . 63 5.6 The geographical locations of projects with reported issues, per cat- egory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.7 Confusion matrices showcasing the accuracy of the category predic- tions, based on a sample of 25 data-points per predicted category. A darker colour presents a higher accuracy. . . . . . . . . . . . . . . . . 69 xv List of Figures 5.8 2D-visualization of the categorized sentence embeddings using t-SNE for the entrie dataset of over 100 000 data-points. . . . . . . . . . . . 71 5.9 Figure showcasing the words in the sentence "[TITLE] Balkong [DESC] Fogrester invändigt och utvändigt generellt" and their affect on model and categorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.10 Figure showcasing the words in the sentence "[TITLE] Miljöhus [DESC] Vilken tjoclek yttervägg ska det vara? Vad ska det vara för tak och vilken tjocklek?" and their affect on model and categorization. . . . . 74 xvi List of Tables 2.1 Overview of models, tools, and algorithms used in the NLP classifi- cation pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1 Overview of key data features available in the dataset and used in the prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 Overview of chosen labels through workshop; building elements and common issue categories . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3 Distribution of manually labelled quality issues across construction parts and incident types. . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.1 Distribution of AI-classified quality issues by construction part and incident type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 A.1 Participants’ roles and their organizational affiliations within the Skan- ska group and external experts. . . . . . . . . . . . . . . . . . . . . . I xvii List of Tables xviii 1 Introduction As one of the world’s largest and most traditional sectors, the construction industry is facing a pivotal moment in its digital evolution. It is increasingly challenged by global conditions such as rising labor and material costs, supply chain disruptions, and growing sustainability demands. While these external pressures cannot be fully controlled, the industry can respond by transforming its internal processes and accel- erating artificial intelligence (AI) adoption. While artificial intelligence has already transformed industries such as finance and manufacturing, construction continues to lag behind, held back by fragmented processes and limited standardization. This thesis explores how AI can become a catalyst for change in construction, not by replacing human expertise, but by supporting it. Focusing on a case within Skan- ska Sverige AB, the study investigates both organizational readiness and practical application through the development of an AI prototype for quality and deviation management. The goal is to demonstrate how AI can unlock insights, enhance pre- ventive work, and lay the foundation for scalable, long-term value creation with AI, which will be obtained by implementing an AI prototype for natural language processig (NLP) of large amount of texts. 1.1 Background The background section introduces the broader characteristics and challenges of the construction industry, outlines the transformative potential of AI, and presents the case company and the organizational setting in which this research is conducted. 1.1.1 Construction Industry The construction industry is characterized by complexity, fragmentation, and a high degree of project uniqueness. Standardization is difficult to achieve, as even build- ings based on identical designs need to be adapted to varying geological and climat conditions. Each project involves new owners, contractors, subcontractors, and sup- pliers working together. This setup makes it challenging to implement repeatable processes, which limits the potential for automation and the integration of intelli- gent technologies. [4] In addition, construction projects follow non-linear and unstructured workflows, where tasks are interdependent and often executed in parallel by subcontractors with varying digital maturity. The physical environment is constantly changing, 1 1. Introduction making coordination and real-time information sharing critical and difficult. Exter- nal factors such as noise, dust, and instability further complicate data collection and reduce the reliability of technical systems. These characteristics create uncertainty, inefficiencies, and barriers to innovation. [4] Some of the challenges currently facing the construction industry are linked to broader societal issues and global conditions, for example, rising labour and ma- terial costs caused by worldwide shortages. While these external factors cannot be fully controlled, the industry can respond by transforming its internal processes and operations. By adopting new technologies, the construction sector can increase efficiency, improve resource management, and become more adaptable in a rapidly changing world. A shift in mindset to remain competitive and sustainable is also essential. [5] 1.1.2 Artificial Intelligence In recent years, AI has shifted from a theoretical concept to a practical tool reshap- ing industries. Rather than referring to a single technology, AI includes a collec- tion of methods, such as machine learning, computer vision, and natural language processing, that allow systems to interpret data, recognize patterns, and support decision-making. These capabilities are now used to optimize supply chains, predict equipment failures, and automate complex tasks across sectors. Although construction has historically been one of the least digitized industries, this is starting to change. With a global value exceeding USD 12 trillion (at the time corresponding to SEK 126 trillion), the construction sector is under increasing pressure to modernize in response to labour shortages, rising costs, and sustain- ability goals [6]. AI offers a way forward: from enhancing safety through image recognition on job sites, to optimizing schedules using predictive algorithms, to in- tegrating real-time data with Building Information Modeling (BIM) [2]. Investment trends reflect this growing interest, between 2020 and 2022, over USD 50 billion (at the time corresponding to SEK 550 billion) was invested globally in AEC (architecture, engineering, and construction) technologies, with a significant portion directed toward late-stage ventures [6]. Although still in early stages of adoption compared to other industries, AI holds considerable promise in addressing long-standing challenges in construction. From improving schedule reliability and reducing safety risks to enhancing resource planning and data integration, AI-based solutions offer a pathway toward greater efficiency and control, if the industry can overcome barriers related to culture, fragmentation, and digital maturity. 1.1.3 Case Company Skanska AB is one of the world’s leading construction and project development com- panies, founded in Sweden in 1887. With over 135 years of experience, the company 2 1. Introduction has grown into a global player operating in selected markets in the Nordics, Europe, and the United States. Skanska Sweden employs approximately 6 700 people and had an operating income of SEK 1,2 billion in 2024 [7]. Their focus is on building a better society and being a leader in sustainable solutions, quality, safety, and ethics [8]. In Sweden, Skanska operates through several business units under Skanska Sweden AB. The core construction and civil engineering operations consist of four main di- visions: Skanska Hus (Building Construction), Skanska Väg och Anläggning (Civil Infrastructure), Skanska Industrial Solutions, and Skanska Rental. Construction and civil engineering activities account for approximately 85% of Skanska Sweden’s total revenue [9]. Skanska Swedens’s operations are structured around three overarching business streams: Construction, Residential Development, and Commercial Property De- velopment. The Construction business stream includes both Civil construction, i.e. roads, tunnels and bridges etc, and Building construction (=Hus). This study fo- cuses exclusively on Skanska Hus, where operations are divided into regional units. The study takes its starting point in the region of Skanska Hus Göteborg, broadening to focusing on national quality function within Skanska Hus’s building operations in Sweden, specifically within the unit. Furthermore, interviews have also been conducted with representatives from Skan- ska’s U.S. organization in New York City, and Skanska UK, to explore successful applications of artificial intelligence in other parts of the world. This provided a broader perspective on how AI readiness and implementation strategies may differ across geographical and organizational contexts. 1.2 Purpose The overall purpose of this study is to explore how AI can enhance strategic decision- making and operational efficiency within the construction industry, focusing specif- ically on Skanska Hus, Skanska Sverige. The study takes its starting point in the recognition that, while many industries have already made significant progress in adopting AI technologies, the construction sector remains comparatively behind, largely due to its project-based nature, fragmented workflows, and varying levels of digital maturity. In this context, the study seeks to understand how the construction industry can begin to close this gap by identifying concrete opportunities, organiza- tional prerequisites, and value-generating use cases for AI. Through an exploratory, inductive approach, the research is divided into three interconnected tracks, each addressing a distinct aspect of AI adoption and aligned with the study’s three re- search questions. The first focus is to analyse organizational AI readiness on a high level, assessing the current state and maturity level within Skanska from an external perspective. By applying established frameworks and theory on digital and AI transformation, 3 1. Introduction learning from other industries, this part of the study identifies internal conditions, barriers, and enablers for AI implementation. The second focus is narrowing in on the area of quality management, identified dur- ing the first exploratory phase, as an area with both business value and technical feasibility. Through creating an AI-prototype, this part of the study analyses how AI-generated insights can be integrated into decision-making and knowledge-sharing in a construction context. It also reflects on broader organizational implications for scaling AI solutions beyond isolated use cases. The third focus concerns the practical implementation of AI and the technical per- formance and potential of the prototype itself. Here, an AI prototype is developed using NLP and ML to structure and analyse text-based quality data. Through iterative testing, evaluation, and visualization of the results produced from the pro- totype, the goal is to create an efficient and accurate model. In conclusion, the study aims to contribute to a more nuanced understanding of how the construction industry, in this case Skanska AB, can begin to adopt, im- plement, and benefit from AI, specifically using text-processing to interpret quality data. 1.3 Research Questions With background to above Chapter 1, the report will address three research ques- tions (RQs), which will be accompanied by relating theory, implementation, results and analysis. The RQs are the following: • RQ 1: What is Skanska’s current state of AI readiness, and what organisa- tional strengths and challenges exist in terms of AI adoption? • RQ 2: How can an AI-prototype create value within the department of Quality Management? • RQ 3: How can a language model prototype be developed and used to efficiently extract and visualize insights from quality-related construction texts? 1.4 Limitations This study is subject to several limitations that should be acknowledged. First, the findings are primarily based on data and observations from a single Swedish con- struction company, which may limit the generalizability of the conclusions to the broader industry. Although the company is an important actor on the Swedish mar- ket, its internal processes, digital maturity, and AI-related initiatives may not fully reflect those of smaller firms or companies operating in other regions or contexts. 4 1. Introduction Second, the assessment is conducted from the perspective of an external party and does not include full access to internal strategic documentation, proprietary datasets, or project-level decision-making. This means that some conclusions, par- ticularly those related to organizational readiness, internal barriers, or technology adoption trajectories are based on secondary sources or publicly available informa- tion rather than first-hand implementation data. Finally, while the analysis attempts to incorporate a broad view of AI technolo- gies, it does not encompass all emerging fields or niche use cases. The emphasis is placed on applications that are currently most relevant to the construction sector, based on industry reports and peer-reviewed literature. 5 1. Introduction 6 2 Frame of Reference This Section outlines the theoretical foundation for the study, starting by covering the construction industry’s unique characteristics. To understand how AI can be suc- cessfully adopted, the Section further explores theoretical frameworks for managing organizational change and assessing AI readiness, along with strategic approaches to AI adoption. Finally, it also introduces key AI components relevant to the prototype, including data criteria, language models, and pre-trained architectures. Together, these perspectives offer a comprehensive reference for understanding how construc- tion firms can approach and implement AI effectively. 2.1 Construction The construction industry differs from many other industries due to its fragmented and project-based nature, where buildings are assembled through sequential, yet disconnected, processes. This discontinuity has contributed to the industry’s slow industrial and digital development. While technologies such as Internet of Things (IoT), big data, cloud computing, and AI have driven digitalization across many industries, particularly manufacturing, the construction sector remains behind. In recent years, however, these technologies have begun to be applied in construction. Despite this progress, their adoption remains limited to isolated areas, highlighting the need for a more integrated and systemic implementation [4]. The following Section outlines key characteristics of the construction industry to provide a foundation for understanding the potential barriers to the application of AI. 2.1.1 Construction Industry Characteristics Standardized construction plans are rare, as buildings based on identical designs still differ due to varying geological and climatic conditions. As a result, each construc- tion project is unique, making standardization difficult. For example, generating detailed and reusable bills of materials for future projects is highly complex. This uniqueness also comes from the dynamic composition of project teams, where own- ers, contractors, subcontractors, and suppliers vary from project to project. The construction industry operates through project unique production systems. The development of automated and integrated intelligent systems requires modular and repeatable components and processes, an approach that remains difficult to imple- 7 2. Frame of Reference ment in this highly fragmented sector. Furthermore, construction projects follow non-linear processes and are typically or- ganized in an unstructured way. Rather than forming a sequential chain, tasks are interlinked through shared resources and parallel activities. A significant part of the work is carried out by subcontractors, whose varying levels of digital maturity and project involvement affect the reliability of the information provided, both during execution and in the aftermarket. These fragmented information flows contribute to miscommunication among project participants and hinder effective documentation for future use. Coordination between participants is crucial, as overlapping workspaces, task se- quences, and movement paths often create conflicts. This becomes even more challenging because both the location and the environment change throughout the project. In addition, construction equipment, materials, and labor need to be con- tinuously relocated as the work progresses. Furthermore, construction projects are highly complex and uncertain. Although construction plans are often detailed, they are frequently modified during execu- tion to adapt to dynamic environments, which can result in delays, rework, quality deficiencies, and claims. To manage this uncertainty, project managers tend to in- corporate large margins and risk buffers into the planning. While this helps prevent issues, it can also lead to inefficient use of resources. Unlike a clean and controlled environment like in other industries, construction sites are often harsh and, as mentioned, unpredictable. Factors like noise, dust, mud, and even risks of geological disasters, pose significant challenges for data collection, net- work communication, and the reliability of intelligent systems. Moreover, workers exposed to such environments may lack real-time information, limiting their ability to respond to dangerous situations. This insecurity also reduces their willingness to engage with technical or automated equipment. [4]. Furthermore, the construction sector faces persistent challenges related to orga- nizational inertia and data fragmentation, both of which limit the adoption and scalability of digital technologies, including AI. Organizational inertia in construc- tion refers to the sector’s resistance to change, driven by old practices, fragmented project structures, and legacy systems that slow the uptake of new technologies. At the same time, construction data is often siloed across different platforms and stored in non-standardized formats, restricting opportunities for cross-project integration and advanced analytics [10]. While construction companies generate large volumes of data, this data is often fragmented and lacks standardized formats, which hinders cross-project learning, collaboration, and data-driven decision-making. Low digi- tal maturity in construction firms compounds this challenge, as proactive leadership and clear data governance structures are often missing [11]. These combined insights underscore a dual challenge for construction firms seeking to leverage AI: they must overcome cultural and structural barriers while simultaneously ensuring that project 8 2. Frame of Reference data is consistent, accessible, and actionable [10, 11]. 2.1.2 Problem Area and Applications AI has emerged as a cornerstone technology in the Fourth Industrial Revolution, transforming the way industries operate through data-driven decision-making, au- tomation, and predictive capabilities [2]. Despite these advancements, the construc- tion industry, valued at over USD 12 trillion globally (at the time corresponding to SEK 126 trillion) [6], has traditionally lagged behind in terms of digitization and AI adoption. Characterized by fragmented stakeholders, manual workflows, and ana- logue tools, the construction sector has historically demonstrated resistance to tech- nological change [2]. A McKinsey research highlights that, as of 2018, construction ranked among the least digitized sectors when compared with twelve other indus- tries [1]. However, growing demands for infrastructure, increasing labour shortages, and regulatory pressures for transparency have begun to catalyse a digital transfor- mation across Architecture, Engineering, and Construction (AEC) industries. This transformation is being accelerated by AI technologies that promise to ad- dress many of the sector’s most persistent challenges, such as cost overruns, safety incidents, and inefficiencies in planning and resource allocation. For instance, AI- powered image recognition can identify unsafe worker behavior from site footage, while machine learning models can optimize scheduling by evaluating millions of potential timelines. Enhanced analytics platforms are being used to monitor sensor data in real time, improving both predictive maintenance and operational decision- making [1]. McKinsey estimates that from 2020 to 2022, global investment in AEC technology reached USD 50 billion (at the time corresponding to SEK 550 billion), an 85% increase compared to the previous three years, with 1,229 deals closed dur- ing that period [6]. This trend is illustrated in Figure 2.1, which shows both the sharp increase in funding and the growing number of deals in AEC technology over the past few years. These investments underscores a growing recognition that AI an opportunity for the future of construction. However, the construction industry shows both low current AI adoption and low future investment compared to other industries, positioning it closer to the “Falling behind” in Figure 2.2. This suggests a slow pace of digital transformation, which may limit its ability to capitalize on AI. While enthusiasm around AI in construction is growing, the industry’s structural and operational characteristics pose unique challenges to large-scale AI adoption. Construction firms are often small, project based entities with limited digital infras- tructure and constrained IT budgets, typically spending only 1-2% of their revenue on IT, compared to 3-5% in other sectors [6]. AI applications in construction can be categorized across multiple domains: project planning and scheduling, site monitoring, resource and waste optimization, health and safety analytics, contract management, and supply chain logistics [2]. For ex- ample, AI enables more accurate cost estimation and scheduling by leveraging large 9 2. Frame of Reference Figure 2.1: Tech investments in Construction[1]. Figure 2.2: AI Adoption in Construction vs Other Industries [1]. 10 2. Frame of Reference datasets from previous projects and external factors such as weather or site condi- tions [2]. AI is also being used to optimize resource allocation and minimize material waste. By analyzing historical and real-time data, intelligent algorithms can forecast mate- rial demand, suggest optimal delivery timing, and reduce storage costs, contributing to both sustainability and profitability. Furthermore, AI can analyze data from sen- sors, drones, and connected machines to provide construction site analytics. These tools support real-time monitoring of productivity, safety risks, and performance bottlenecks, enabling more responsive site management [2]. Additionally, AI is used to read and analyze complex construction contracts. These AI tools can point out risky clauses, unclear parts, or mistakes, which helps make better purchasing decisions and lowers the risk of legal problems. AI-powered audit systems are also being implemented to ensure financial accuracy by cross-referencing billing data, flagging anomalies, and aligning invoices with real-world progress. This strengthens financial governance and supports more transparent reporting structures [2]. Despite these opportunities, several barriers remain, seen in Figure 2.3. Cultural resistance to change, high initial deployment costs, and a shortage of AI talent con- tinue to hinder adoption. These challenges are illustrated in the lower left quadrant of the figure, which highlights issues such as ethics, governance, and limited inter- net connectivity. In addition, concerns around data ownership, transparency, and cybersecurity (shown in the top right quadrant) remain unresolved. Nevertheless, the trajectory is clear, the construction industry stands at a pivotal moment where AI can redefine its operational norms. Companies that act early and strategically to incorporate AI, will gain competitive advantages in cost efficiency, project reliability, and overall value delivery [2]. 11 2. Frame of Reference Figure 2.3: Overview of AI Use in the Construction Industry [2]. 2.2 Business Frameworks This chapter establishes the theoretical foundation for understanding how businesses adopt and integrate new technologies. It begins with a broad overview of change management, then narrows down to established frameworks for technology adoption, ultimately focusing on AI readiness and AI adoption strategies. Together, these per- spectives provide a basis for analysing how businesses can integrate AI effectively, ensuring alignment between technology, organizational structures, and industry con- ditions. The frame of reference presented will later be used in the analysis to assess construction companies like the case company, Skanska Sweden, can structure their approach to AI use. 2.2.1 Change Management Change management focuses on how organizations and individuals adapt to orga- nizational transitions and change. One of the earliest models, proposed by Lewin in 1947 [12], includes the stages of unfreezing, moving, and refreezing. Since then, several other models have been introduced in the literature. While these models vary, they all emphasize the need for a structured approach to managing change and recommend appointing a change agent to lead the process. However, applying traditional change models directly to the construction industry can be challenging due to its unique characteristics, the industry’s specific nature requires adaptations 12 2. Frame of Reference to these frameworks for them to be effective [12]. Successfully adopting AI within complex, project-oriented organizations in the con- struction industry requires not only technical capability but also a structured ap- proach to change management. Kotter’s Eight-Stage Model for Leading Change offers an approach for this transition. His model emphasizes the importance of es- tablishing a sense of urgency, creating a guiding coalition, and developing a vision for change, all of which are critical for aligning AI initiatives with organizational priorities and engaging employees in the transformation process. Importantly, Kot- ter also highlights the need for short-term wins and continuous reinforcement, both of which can help ensure that AI is seen as a credible, valuable tool rather than an abstract or disruptive innovation. This model underscores the idea that even well- designed AI tools will struggle to gain traction if the organizational environment is not ready to support them [13]. Technology Acceptance Model A widely recognized framework for understanding how individuals adopt and use new technologies or services is the Technology Acceptance Model, first introduced by Davis in 1989 [14]. It is based on the fact that the user’s decision to accept and use a technology is primarily influenced by Perceived Usefulness and perceived ease of use. Perceived Usefulness is about how an individual believes that using a particular technology will improve their performance in the specific task or work. Perceived Ease of Use is about how an individual believes that using a particular system will be free from effort [14]. AI Readiness The AI Readiness Framework, developed by Holmström in 2022, evaluates an or- ganization’s ability to implement and use AI in a way that adds value to the or- ganization. It is structured around four key dimensions: technologies, activities, boundaries, and goals. The framework also helps organizations to address key bot- tlenecks in AI adoption. This framework also serves as the basis for subsequent research on AI readiness, which is further elaborated in the next Section 2.2.2 [15]. 2.2.2 AI Readiness Framework Building on Holmström’s study, Tehrani et al. studied 52 multinational corporations on their orgaizational AI readiness, they conducted 52 semi-structured in-depth in- terviews with decision makers across different organizations. Ultimately, the findings identify eight key dimensions that influence an organization’s ability to successfully implement AI. The study primarily focuses on the following, but is not strictly lim- ited to them: natural language processing, computer vision, image recognition, and deep learning. The research is focusing on a model for organizational readines, and one for AI adoption strategies [3]. Organizational readiness refers to how ready the organization is in terms of cogni- tive, emotional, and behavioural preparedness toward a change, before the a activity 13 2. Frame of Reference is started. Organizational readiness should be acquired before starting a change ini- tiative, and therefore consists of a pre-assesment of organizational capabilities, to help identify what is needed, and to control the risk factors with a change initiative. AI readiness refers to how prepared an organization is to adopt and use AI. Many organizations expect AI to increase their productivity, however many struggle with adopting AI due to lack of important infrastructure and organizational readiness. Many managers are also unsure of wheter their organization is ready to implement AI, and if so how to do it. Due to AI’s complex nature, it’s readiness can not only rely on traditional readiness theories [3]. The AI-Readiness Framework is an organizational framework that can be catego- rized in eight cathegories, as shown in Figure 2.4: (1) Environmental Readiness, (2) Technological Readiness, (3) Informational Readiness, (4) Infrastructural Readiness, (5) Data Readiness, (6) Participants’ Readiness, (7) Customers’ Readiness, and (8) Process Readiness. Figure 2.4: Schematic model of AIR [3]. 14 2. Frame of Reference (1) Environmental Readiness This dimension refers to the organizational, technical, competitive, cultural, and regulatory environment within which an organization operates. This includes the macro-economic environment, organizational culture, and leadership. The regula- tory environment affects AI implementation, as some countries have more supportive or restrictive regulations in areas such as personalized data usage, cloud computing, and AI-driven automation. These regulations determine whether companies can freely adopt AI or must navigate legal constraints. An open organizational culture fosters collaboration, tech-friendliness, and knowledge-sharing, making AI adoption easier. Leaders who convince employees of AI’s benefits and address their concerns increase willingness to adopt AI. Ensuring alignment among employees and manage- ment further simplifies implementation. In summary, companies need a supportive regulatory environment, an open culture, and strong leadership to successfully im- plement AI. (2) Technological Readiness This dimension focuses on an organization’s level of technological maturity, which includes the availability of necessary resources, a strong track record of using ad- vanced technologies, and sufficient IT support, as successful adoption often relies on a well-established technological foundation. Emphasis is placed on the orga- nization’s historical experience and culture of working with technology, where AI solutions such as chatbots or virtual assistants are not viewed as standalone fixes, but rather as advanced tools that build upon an already mature digital environ- ment. Moreover, the ability to process and manage data is essential. Organizations that are actively adopting AI typically already have these technical competencies. Lastly, sufficient and reliable IT support is crucial to avoid technical bottlenecks that could otherwise hinder the smooth implementation and operation of AI technologies. (3) Informational Readiness Distinct from Data Readiness, this dimension refers to “people’s meaningful un- derstanding of a specific issue.” In this context, it concerns the decision maker’s knowledge of the relevant AI use case, how AI is applied within the industry, the specific problem at hand, and how AI can be used to address that problem. The focus lies specifically on the decision maker, rather than the broader organization or team. To make informed and strategic decisions, decision makers must have a deeper knowledge of AI’s practical applications and potential. Furthermore, the decision to implement AI should be strategically significant, with the potential to substantially impact operations and increase organizational profitability. Since AI adoption is cost-intensive, it is essential that the decision maker has a clear understanding of the problem to be solved, as well as an awareness of the AI solutions available on the market in order to identify the most suitable option. (4) Infrastructural Readiness The availability and suitability of foundational resources are crucial for successful AI implementation. This dimension includes three main categories: human resources, financial resources, and IT resources. Financial resources are considered one of the 15 2. Frame of Reference most critical and challenging components, as AI implementation is costly. It is not only about having access to sufficient funds, but also about ensuring that it is con- tinuous and flexible to enable quick adaptation to changing market conditions. AI requires ongoing investment for updates and maintenance, given the rapid pace of technological advancement. In terms of human resources, organizations must ensure access to both internal and external talent with the necessary skills to support AI initiatives. This readiness can be developed by training existing employees and by re- cruiting experts who already have relevant knowledge. A key part of infrastructural readiness is also the ability to bridge the HR gap, not only by hiring individuals with strong technical skills, but also by seeking those with cross-industry domain knowl- edge who can contextualize AI applications within the organization’s specific field. IT resources include the organization’s technical infrastructure, such as computers, networks, and programming environments. Important capabilities here involve stor- age capacity, computing power, scalability, and security, all of which are necessary for AI to function effectively. The absence of the right IT infrastructure can signifi- cantly hinder a company’s ability to adopt and benefit from AI technologies. (5) Data Readiness This dimension refers to the availability of large amouts, high-quality, and rele- vant data required to feed and support AI. It is important to distinguish data from information, as previously explained: data can be seen as raw, often meaningless symbols, while information is the result of organizing and interpreting data to cre- ate meaning, something that humans use to solve problems or make decisions. In the context of AI, data serves as the input that enables algorithms to function and learn, whereas information is primarily used and created by humans. The volume and quality of data are crucial for the performance of AI models. (6) Participants’ Readiness The psychological and behavioural preparedness of individuals within and around the organization to adopt and work with AI. For employees, this readiness includes three key aspects: acceptance, trust, and knowledge and skills. In many organiza- tions, staff are not yet sufficiently familiar with AI, making knowledge and training critical factors for successful adoption. A common barrier is the lack of trust, as some employees fear that AI may replace their jobs. This can create resistance and act as a bottleneck in the implementation process. Therefore, participants must not only be capable, but also willing to embrace change and see AI as a support- ive tool rather than a threat. Managerial readiness is equally important. For AI to create value, it must be aligned with the organization’s strategic goals, both in the short and long term. Managers play a crucial role in shaping attitudes toward AI and setting the direction for its integration. Finally, readiness among external stakeholders and partners is also essential, although it is more difficult to influence. Partners must be willing to work with AI-based systems and adapt their processes accordingly. Traditional mindsets and reluctance to change among partners can slow down or even block AI adoption, highlighting the need for alignment beyond the organization’s boundaries. 16 2. Frame of Reference (7) Customers’ Readiness This dimension refers to how well organizations are prepared to address customers’ needs, privacy concerns, and acceptance of AI technologies. It is essential that com- panies have clear plans in place for managing potential issues related to AI use, in order to minimize risks, particularly in industries handling customer transactional data. In such cases, organizations must communicate transparently with customers to build trust and avoid misunderstandings. While customers may not require in- formation about AI used internally, their acceptance becomes more critical when AI is used in customer-facing touchpoints or involves the use of customer data. (8) Process’ Readiness The last dimension includes three key components: operational integration, feed- back mechanisms, and integrated communication. To fully leverage the value of AI, operational integration must be in place, meaning that different teams and functions within the organization collaborate effectively. This cross-functional integration is essential for spreading the benefits of AI across departments and ensuring consis- tent outcomes. A strong feedback mechanism is also critical. Continuous feedback allows AI systems to improve their performance and better adapt to the specific needs of teams or the organization as a whole. Since these needs may evolve over time, a constant feedback loop helps maintain the relevance and effectiveness of AI solutions. Finally, integrated communication plays a vital role in preventing system failures caused by miscommunication. Clear and consistent communication across the organization supports smoother implementation and operation of AI technolo- gies. 2.2.3 AI Adoption Strategies To successfully adopt AI, organizations must define a clear strategy, as the absence of one is a major barrier to implementation. McKinsey research underscores that lacking a defined AI strategy is among the most significant challenges faced by man- agers [16]. Ultimately, the value of AI lies not in the technology itself, but in how effectively it is integrated into organizational processes. Building on this, Tehrani et al. [3] identified AI adoption strategies that organi- zations can apply individually, or in combination, depending on their context and readiness profile, shown in Figure 2.5. Each strategy aligns with specific AI readi- ness dimensions, meaning that the strength or weakness of certain factors can shape which approach is most appropriate to use. In addition to internal capabilities, strategy selection also depends on external fac- tors: (1) whether the organization’s main goal is cost efficiency or differentiation, and (2) the perceived level of risk in AI adoption. Cost-driven firms might prioritize partnerships, while differentiation-focused firms may prefer crawling or guinea pig approaches to test new innovations. In this study, four of the five strategies will be presented. 17 2. Frame of Reference Figure 2.5: Adoption Strategies [3]. Adoption Strategies • The Low-Hanging Fruit Strategy: This is a practical starting point for firms with strong data and informational readiness but a reluctance to take on signif- icant risk. It involves identifying straightforward use cases, such as reporting or marketing automation, that can be implemented quickly and deliver early wins, helping build momentum for further adoption. • The Crawling Strategy: This strategy focuses on gradual, iterative AI adop- tion. Organizations begin with smaller-scale pilots and expand based on lessons learned. This approach is best suited to organizations with limited financial flexibility but strong willingness to experiment and learn, requiring strong process and participant readiness. • The Guinea Pig Approach: This approach is good for larger firms to learn indirectly by observing or partnering with smaller, more agile actors who ex- periment with AI. This enables risk transfer while still gaining insights. The approach is especially relevant when internal readiness is moderate, but there is willingness to innovate. • The Partnership Strategy: Lastly, this strategy focuses on engaging exter- nal AI consultants or technology partners to compensate for limited internal capabilities. These partnerships provide access to both infrastructure and expertise, while supporting shared capability development. Environmental readiness and trust in external consultant are essential enablers. 18 2. Frame of Reference 2.3 Prototype The following section covers relevant theory connected to the implementation of the prototype with the purpose of explaining, motivate and demonstrate the usages of different methods and models. As the foundation of the prototype is AI and machine learning, the framework naturally consists of AI in general, and text-processing models in specific. Hence, first described is overall theory followed by an introduction of models and algorithms used. 2.3.1 Artificial Intelligence Artificial Intelligence (AI) is a broad concept, spanning multiple fields such as ma- chine learning (ML), neural nets (NN) and deep learning (DL), see Fig. 2.6. The umbrella term refers to the section of computer science where machines and comput- ers are built to be able to perform complex tasks, mimicking the human intelligence and ability to reason, predict and analyse. Not a system alone, AI is rather imple- mented in machines or systems unlocking their potential and sense of intelligence [17]. Figure 2.6: Overview of models implemented. Machine Learning: ML is a subset of AI that enables systems to automati- cally learn from data and improve their performance without being explicitly pro- grammed. Instead of following hardcoded rules, ML algorithms identify patterns and relationships in data to make predictions or decisions. ML models can be trained un- der various learning paradigms depending on the availability and quality of labeled data. While supervised and unsupervised learning remain foundational approaches, recent developments in weakly supervised learning have introduced scalable solu- tions for real-world data constraints: Supervised Learning: Supervised learning involves training a model using fully 19 2. Frame of Reference labelled data, where both the input and corresponding output (or ground truth) are known. This approach enables models to learn direct mappings from inputs to out- puts, and is widely used in classification and regression tasks. However, supervised learning relies heavily on high-quality, annotated data which is a resource that is often costly and labour-intensive to produce at scale. In this paradigm, common techniques include decision trees, support vector machines, and regression models such as linear or logistic regression. Unsupervised Learning: In contrast, unsupervised learning is performed with- out labelled data. The model is presented only with input features and must un- cover patterns or structures autonomously. Common applications include clustering, where data points are grouped based on similarity, e.g., using K-Means Clustering (KMeans), and association analysis, where relationships between variables are dis- covered without human supervision. This makes unsupervised learning well-suited for exploratory analysis, anomaly detection, and image segmentation, among other tasks. Semi-supervised/Weakly supervised Learning: Weakly supervised learning refers to a spectrum of learning strategies that leverage noisy, incomplete, or impre- cise labelling to train predictive models. This paradigm addresses the challenge of limited or imperfect supervision which is a common scenario in real-world applica- tions where manual labelling is expensive or impractical. Three primary forms of weak supervision are identified: • Incomplete Supervision: Training on datasets where only a portion of the data is labelled. Techniques like semi-supervised learning, transfer learning, and active learning fall under this category. • Inexact Supervision: Using coarse-grained labels, such as class-level annota- tions instead of instance-level, which may lack detailed specificity but still provide learning signals. • Inaccurate Supervision: Occurs when labels contain errors or noise, often due to human mistakes or automatic labeling heuristics. Models must then learn to tolerate or correct for mislabeled examples [18]. Zero-shot and Few-shot Learning: Zero-shot learning (ZSL) is a supervised machine learning paradigm in which a model is expected to correctly classify data from previously unseen classes, that is, categories that were not represented in the labelled training data. Unlike traditional supervised learning, which relies on anno- tated examples for each class, zero-shot learning requires the model to generalize to entirely new concepts without direct exposure during training. This is particularly important in real-world scenarios where obtaining labelled examples is impractical, expensive, or impossible. Examples include rare disease detection, emerging topics in text classification, or recognizing obscure object categories in vision tasks. For instance, while humans can distinguish tens of thousands of object categories, it is infeasible to provide labelled data for every possible class a model might encounter. To achieve generalization without labelled examples, ZSL methods typically rely on 20 2. Frame of Reference auxiliary information, such as textual descriptions of the target classes and semantic embeddings or class attributes [19]. Transfer Learning: Transfer learning or domain adaptation, has the purpose of using a trained model for a new task, instead of training it from scratch. Com- mon in ZSL, transfer learning is often used in methods that focuses on semantic embeddings. An example would be to use Bidirectional Encoder Representations from Transformers (BERT), which is pre-trained on language data, to convert newly seen words into vector embeddings. With transfer learning it is also possible to by recognizing one type of text or image, simultaneously identify other unseen ones[?]. Neural Nets and Deep Learning: NNs are a type of ML algorithm inspired by the human brain’s structure, composed of layers of interconnected nodes (neu- rons). These networks are particularly effective at learning complex, non-linear relationships in data. DL is a specialized form of neural networks with many hidden layers. It excels in handling unstructured data such as text, audio, and images. DL is the foundation of many recent breakthroughs in AI, including speech recognition, image generation, and language models like BERT and Generative Pre-trained Transformer (GPT) [20]. Embeddings as a Foundation for Language Understanding Embeddings are a method of representing objects, such as words, sentences, im- ages, or audio, as vectors in a continuous numerical space. These representations are designed so that semantically similar inputs are located close together in this space. In the context of NLP, embeddings allow machine learning models to capture contextual and semantic relationships between words and texts. Unlike manual feature engineering, embeddings are learned directly from data using neural networks or other algorithms. This allows the model to identify complex patterns in language that are not easily defined by rules or discrete categories. Em- beddings enable tasks such as text classification, clustering, and semantic search by converting raw text into a format that machine learning models can process effectively [21]. 2.3.2 Language Models LMs are probabilistic models that assign likelihoods to sequences of words. Rather than assessing grammatical correctness, they measure how “natural” a word se- quence is based on patterns learned from real-world text. This capability enables a wide range of natural language processing tasks, such as part-of-speech tagging, lemmatization, summarization, translation, and question answering. There are two main categories of language models: Neural Network-based Models 21 2. Frame of Reference NN use word embeddings to represent words as vectors and capture semantic mean- ing. Early models like Word2Vec improved representation, but struggled with deeper context. RNNs introduced the ability to handle sequential data by maintaining memory over time, allowing them to process inputs like text and speech. However, RNNs are limited by their strictly sequential nature, which slows down training on longer sequences and makes parallelization difficult. Transformers addressed these limitations by enabling parallel processing and introducing attention mechanisms that weigh the importance of each word relative to others in a sentence. This archi- tecture powers state-of-the-art models like BERT and GPT, which are pre-trained on large-scale text corpora and can perform a wide range of tasks by leveraging deep contextual understanding and large parameter capacities [22]. Natural Language Processing NLP refers to the field of artificial intelligence focused on enabling machines to un- derstand, interpret, and generate human language. Since its origins in the 1950s, NLP has evolved into a suite of algorithms and tools that support tasks such as part-of-speech tagging (POS), sentiment analysis, named entity recognition, pars- ing, and machine translation. NLP powers a wide range of applications including speech recognition, text classification, customer service chatbots, and content rec- ommendation systems. At its core, NLP applies structured linguistic rules and statistical techniques to textual data. While effective in specific tasks, traditional NLP models often require domain-specific tuning and can struggle with language ambiguity, contextual sub- tleties, and low-resource languages. Large Language Models Large language models, such as OpenAI’s GPT and Google’s BERT, represent a ma- jor advancement in language understanding and generation. Built on transformer architecture and trained on massive text corpora using deep learning, LLMs can generate coherent, contextually relevant text, answer questions, summarize content and more, often with little or no task-specific fine-tuning. LLMs differ from traditional NLP systems in scale and generality. They are not limited to specific rule-based tasks but can perform a wide range of language tasks through learned contextual understanding. Core technologies behind LLMs include self-attention mechanisms, DNN, and massive parallel training, which allow them to adapt flexibly to diverse domains [23]. 2.3.3 Models in Selection The AI prototype is built on algorithms and pre-trained models which will be cov- ered in this section. An overview can be seen in 2.1. MEGClass: MEGClass (Mutually Enhancing Granularities for Text Classifica- 22 2. Frame of Reference Component Type Function MEGClass Weakly Supervised Classifier Combines clustering, embeddings, and pseudo-labeling BERT Monolingual LLM Contextual language understanding in English XLM-R Multilingual LLM Cross-lingual language understanding Sentence-BERT Sentence Embedding Model Generate semantically meaningful sentence vectors MiniLM Lightweight Multilingual Model Efficient multilingual sentence encoding KB Swedish SBERT Swedish Sentence Embedding Model Embed Swedish sentences for similarity tasks spaCy NLP Processing Toolkit Tokenization, lemmatization, POS-tagging KMeans Clustering Clustering Algorithm Group similar sentence embeddings Table 2.1: Overview of models, tools, and algorithms used in the NLP classification pipeline. tion), is a state-of-the-art method for extremely weakly supervised text classification, requiring only the surface names of target classes to operate without any labeled data. The model was designed to overcome limitations in prior approaches that typ- ically treat word-, sentence-, and document-level information independently, which can lead to incorrect pseudo-labels when topic cues are ambiguous or inconsistent across granularity levels. MEGClass introduces a multi-granular approach, where words, sentences, and doc- ument representations are allowed to mutually enhance each other. This results in a more robust and context-aware estimation of class labels, even in challenging real-world texts. Where core innovations of MEGClass include: • Class-Oriented Sentence Representations: The model computes class-indicative sentence embeddings by aligning sentences with class name representations, emphasizing discriminative terms. • Class Distribution Estimation: Rather than assigning each document a single label, MEGClass estimates a class probability distribution, allowing it to gauge classification confidence and reduce mislabelling. • Contextualized Document Representations: Through a multi-head self-attention network, the model creates enriched document vectors that capture hierarchi- cal context from sentence-level signals. • Iterative Feedback Mechanism: Confidently classified documents are used to refine class representations iteratively. This improves class alignment across documents and reduces error propagation. • Pseudo-Labelling and Classifier Fine-Tuning: A subset of the most confidently classified documents is used as pseudo-labelled data to fine-tune a downstream classifier, making the model applicable to unseen examples. MEGClass has demonstrated superior performance across several benchmark datasets, particularly in scenarios with long documents and fine-grained classes. Its effective- ness lies in its ability to leverage minimal supervision while still generating high- quality pseudo-training sets, outperforming earlier methods like Learning with Out- of-the-box Classifier for Text Classification (LOTClass) and Explainable Classifier for Weakly Supervised Text Classification (X-Class) [24]. K-Means Clustering: KMeans clustering is one of the most widely used un- 23 2. Frame of Reference supervised learning algorithms in machine learning. It operates on unlabeled data, partitioning it into k distinct, non-overlapping clusters, where each data point is assigned exclusively to the cluster with the nearest centroid. This method assumes no prior knowledge about the data labels and aims to uncover inherent structure based on similarity. The core idea behind KMeans is to minimize the intra-cluster variance, or more precisely, the sum of squared Euclidean distances between each data point and its assigned cluster centroid [25]. BERT and Sentence-BERT: One of the most significant advancements in (NLP) is the introduction of BERT, developed by Devlin et al. (2018). BERT is a transformer-based model pre-trained on large text corpora, designed to understand the context of words in a bidirectional manner. It has achieved great results in a wide range of NLP tasks, such as question answering, sentence classification, and semantic textual similarity (STS). However, BERT is inherently designed as a cross-encoder, meaning that for sen- tence pair tasks, both sentences are jointly input into the model. While this joint attention mechanism increases performance for tasks requiring fine-grained compar- isons, it introduces a significant computational bottleneck. For example, comparing 10,000 sentences to each other using BERT requires approximately 50 million infer- ence computations, which can take over 65 hours on a high-performance GPU. This makes BERT unsuitable for large-scale tasks such as clustering, semantic search, or retrieval [26]. To address this limitation, Sentence-BERT (SBERT) was proposed by Reimers and Gurevych (2019). SBERT modifies the BERT architecture by applying a siamese or triplet network structure, allowing it to generate fixed-size sentence embeddings. Instead of comparing sentences within the model during inference, SBERT maps each sentence to a vector space such that semantically similar sentences are close together. These embeddings can then be efficiently compared using standard simi- larity metrics such as cosine similarity. The key innovation lies in the training objective. SBERT is fine-tuned on Natural Language Inference (NLI) datasets, which teach the model to distinguish between similar, contradictory, and neutral sentence pairs. During inference, the model in- dependently encodes each sentence, which enables tasks like clustering, semantic search, and zero-shot classification to be performed orders of magnitude faster than with BERT, while still maintaining high accuracy. Moreover, SBERT introduces a pooling strategy (typically mean pooling) over the output token embeddings to produce the final sentence vector. This differs from BERT’s default use of the [CLS] token, which has been shown to produce suboptimal results for sentence-level rep- resentations [27]. Two SBERT models are used in the prototype, to generate vector embeddings of Swedish texts. KB Swedish sentence-BERT is a bilingual Swedish-English sentence embedding model developed by the National Library of Sweden (KB-Lab). It uses 24 2. Frame of Reference KB-Bert, a Swedish BERT model as base-encoder, and all-mpnet-base-v2 as a En- glish teacher model [28]. The other model paraphrase-multilingual-MiniLM-L12 -v2 model is a lightweight and multilingual SBERT model published by the Sentence- Transformers team. It is trained on paraphrase data from multiple languages, in- cluding Swedish, and provides sentence embeddings of 384 dimensions. Due to its small size and high speed, it is especially suited for resource-efficient applications [29]. XLM-R: Cross-lingual Language Model - RoBERTa (XLM-R) is a multilingual transformer model developed by META to improve cross-lingual natural language understanding. The model is trained using self-supervised learning techniques and addresses the challenge of transferring knowledge between languages without requir- ing additional task-specific data in the target language. It builds upon earlier models like XLM and multilingual BERT but overcomes key limitations by incorporating a substantially larger and more diverse training dataset, over two terabytes of filtered data, covering a broader range of languages, including many low-resource ones that previously lacked large-scale labelled or unlabelled corpora[30]. SpaCy: SpaCy is an open-source Python library, providing advanced and efficient NLP on text, and offering features such as tokenization and POSTagging. Unlike BERT, SpaCy is older and though operating at higher speed, less capable of captur- ing contextual information [31]. 2.3.4 Data Criteria In AI development, especially in natural language processing and weakly supervised learning, the quality of input data is critical. Several criteria are used to evaluate whether data is suitable for training in machine learning contexts. These criteria ensure that models are able to learn effectively, avoid systematic errors, and gener- alize to new inputs. The following criteria were used to evaluate the data in terms of its structure. Completeness: Data completeness refers to if a dataset includes all the information necessary to address its intended purpose. It evaluates if the data sufficiently covers the full scope of a given question, if there are any gaps, or biases, that could distort results. Incomplete datasets can cause inaccurate analyses, including misreported metrics and biased decisions, which can undermine an organization’s confidence in data-driven insights [32]. Data Volume: Data volume refers to the amount of information available to train AI models, and it is especially critical in natural language processing and weakly supervised learning. Large datasets enable models to detect complex patterns and structures. AI training requires large volumes of both structured and unstructured data to meet increasing model complexity. For weakly supervised approaches, un- labelled data at scale is essential to compensate for the lack of explicit annotations. Without sufficient volume, models risk underperformance and poor adaptability to 25 2. Frame of Reference real-world tasks [33]. Relevance Relevance refers to the degree to which data contributes directly to the specific objective or task the model is designed to solve. Relevant data provides meaningful context that enhances the model’s ability to identify important patterns, make accurate predictions, and avoid distraction from noise or unrelated variables. When irrelevant information is included, it can dilute learning signals and reduce overall model performance [34]. Consistency: Consistency ensures that data maintains a uniform format, struc- ture, and meaning across sources and time. For AI, this is essential to enable accurate parsing and interpretation. Inconsistent data, such as varying labels or formats, can confuse models, lead to errors, and reduce the reliability of outputs [35]. Noise: Data noise refers to irrelevant or incorrect data that can confuse AI models and reduce their accuracy. It includes errors, outliers, and unnecessary information that do not help the model’s task. Effective data validation techniques are needed to detect and remove noise, ensuring cleaner data and better AI performance [36]. Bias: Bias in AI occurs when models produce unfair results due to prejudices in training data or design choices. Biased data, such as underrepresentation of certain groups, can cause inaccuracies and unreliable predictions. To reduce bias, it’s essen- tial to use diverse data, transparent algorithms, and ongoing evaluation to promote fairness and equity [37]. Representativeness: Representativeness in datasets for AI models refers to how comprehensively the data captures the diversity and variability present in real-world scenarios. A representative dataset should reflect the full spectrum of possible in- puts, covering varied contexts and distributions, to ensure robust and reliable gener- alization. Without representativeness, models risk developing biases or overfitting, resulting in poor performance when faced with new or diverse inputs. Hence, care- fully ensuring representativeness is crucial to enhancing the accuracy, fairness, and effectiveness of AI systems [38]. 26 3 Methods This section outlines the research process, design, and methods used in the study to ensure transparency, repeatability, and a clear understanding. The chosen method is designed to systematically address the research questions and support the devel- opment of a solution driven prototype. In addition, this section presents the overarching research framework, the data collec- tion strategies employed (including interviews, workshops, and document analysis), and the techniques used for data interpretation. A structured approach was adopted to ensure that the findings are robust, relevant, and reproducible. Finally, the study includes a discussion of research quality considerations, including validity, reliability, and ethical aspects such as data privacy, informed consent, and responsible data handling. These aspects were carefully managed to ensure the in- tegrity and credibility of the research. 3.1 Research Process The process began by identifying and defining the overall research focus. This led to the formulation of RQ 1:"What is Skanska’s current state of AI readiness, and what organisational strengths and challenges exist in terms of AI adoption?". Through an open-ended investigation, a specific area within the company was identified where AI could create tangible value, forming the basis of RQ 2: "How can an AI-prototype create value within the department of Quality Management?". Building on this, RQ 3: "How can a language model prototype be developed and used to efficiently extract and visualize insights from quality-related construction texts?" was formed accord- ingly, and is discussed in detail in Section 4. These activities were carried out in parallel: organizational insights were qualita- tively collected and analysed using the AI Readiness Framework, while the prototype was iteratively developed to meet domain-specific needs. Given the complexity of AI adoption in the construction sector, an exploratory approach was essential to uncover relevant challenges, opportunities, and industry-specific conditions. The timeline and phases of the study are presented in Figure 3.1. 27 3. Methods Figure 3.1: Gantt chart illustrating the research process timeline. 3.2 Research Design The study initially adopts an inductive reasoning approach, where broader conclu- sions are derived from specific observations rather than tested against a predefined hypothesis. This method is particularly valuable in exploratory research, as it al- lows for the development of new insights and conceptual frameworks grounded in empirical data. At the same time, the study incorporates existing theoretical models, like the AI Readiness Framework, to guide the interpretation of qualitative findings from inter- views and workshops. Incorporating established theory adds a deductive dimension to the study, as parts of the data collection are guided by predefined theoretical constructs. Overall, the research follows an abductive reasoning process, characterized by it- erative movement between empirical data and theoretical insights. This approach allows for ongoing refinement of both theoretical understanding and practical im- plementation, which makes it particularly suitable for studying complex issues like AI adoption in the construction industry [39]. 3.3 Qualitative Methods Used The study began with an exploratory phase that included an initial stakeholder workshop, followed by several semi-structured qualitative interviews. These quali- tative insights helped to refine the research focus and informed both the structured development of the AI prototype and the analysis of AI readiness. The frame of reference was developed alongside the interviews, prototype, and workshops. It had two main goals: to support the study’s methodology and to provide a theoretical foundation for the analysis and discussion. 28 3. Methods Finally, additional workshops and structured interviews were conducted to gather feedback and revisit insights from the initial session, and to triangulate findings. 3.3.1 Workshops Two separate qualitative workshops were conducted, to brainstorm ideas, gather feedback on the process, as well as collecting information on the AI in the organi- zation. 1. Idea Generating Workshop The initial workshop was inspired by the AIM-workshop methodology, first in- troduced by Shiba, Shoji in 1987, which is an approach to identify and analyze complicated or complex problems in a collaborative and structured way. The pro- cess typically begins with clarifying shared aspirations or challenges, followed by creative brainstorming to generate potential solutions. This methodology is partic- ularly valuable in exploratory projects where diverse perspectives are essential for defining needs, generating solutions, and building stakeholder commitment [40]. The workshop was conducted with five stakeholders from Skanska Hus Gothen- burg, working in different stages of the construction process (byggprocessen). This workshop followed an exploratory focus group approach, where the participants en- gaged in guided discussions to explore attitudes, ideas, and expectations related to AI in construction. Focus groups are useful for capturing diverse perspectives and encouraging interaction, allowing participants to reflect, react, and build on each other’s insights [39]. This dynamic helped generate practical input for both the prototype development and the AI readiness analysis. The purpose was to gain a deeper understanding of stakeholder workflows, chal- lenges, and needs of the employees, to guide the final identification of an AI use case. The first part of the workshop was structured around open-ended questions to identify inefficiencies, time-consuming tasks, and key pain points in the participants’ daily work. The second part of the workshop focused on data usage, examining the types of data used, stakeholders’ data literacy, and their perspectives on data-driven decision-making. Finally, participants were given space to freely generate ideas and reflect on the overall purpose of the study. To ensure unbiased insights, no prede- fined concepts or solutions about AI were introduced to the participats, allowing them to express their views without any bias towards AI use. Finally, the concept of AI was introduced to the participants, to further collect ideas about use cases. The reason AI was mentioned later in the workshop was not to make the partic- ipants think about solutions, but instead about problems to solve. Insights from the first workshop were further used to guide the interviews, guide the prototype development, and shape the frame of reference. 2. Feedback Workshop 29 3. Methods A second workshop was conducted with the purpose of gathering feedback on the prototype, ensuring it aligned with stakeholder needs and addressed previously iden- tified challenges. Participants were introduced to the prototype findings and en- couraged to reflect on its usability, functionality, and integration into their existing workflows. The perceived value of the prototype was also assessed, with discussions centered around how the prototype could contribute to value creation within the organization and support ongoing quality management efforts. As part of the workshop preparation, participants received a pre-workshop task which involved reviewing a file containing 10–20 clusters of deviations. These clus- ters had been grouped based on titles and descriptions. Participants were asked to reflect on potential patterns across the clusters, name each category, and identify the most cost-driving deviations based on their professional experience. They were also encouraged to think freely about alternative ways to categorize deviations, for instance, in terms of risk reduction, cost savings, or recurrence prevention. In addi- tion, participants were asked to consider which descriptive elements (e.g., actions, damage types, locations) were most relevant for analysis. This preparatory activity served to engage participants ahead of the workshop and ensured that the feedback session was rooted in both practical experience and reflective input, ultimately en- hancing the relevance and depth of the discussions. The workshop insights were then prioritized to identify the most impactful modifi- cations, forming the basis for final prototype adjustments. This iterative approach strengthened stakeholder engagement and ensured that the solution was validated before further development. The final prototype was built to contribute value by addressing these identified priorities. 3.3.2 Interviews The study applied a qualitative interview approach, where the initial interviews in the explorative phase were positioned between semi-structured and unstructured formats. An initial interview guide was used to loosely cover two main areas: (1) the organization’s perceived value of AI and readiness, and (2) the feasibility of de- veloping a prototype in their context. However, the guide served more as a flexible support than a strict script. Interviewees were encouraged to speak freely, and the interviewer followed up on relevant points as they emerged. This conversational and open style allowed participants to highlight what they viewed as most important. Following the first workshop, two participants were interviewed, and they recom- mended additional interviewees through snowball sampling, which is a method in which initial participants assist in identifying and recruiting additional relevant in- terview subjects [39]. This approach made it possible to identify additional relevant stakeholders within the Skanska organization, beyond the Skanska Hus Gothenburg department. This helped shape both the prototype and the analysis of AI readi- ness. In total, 13 individuals in the Swedish organization were interviewed using the 30 3. Methods this phase. Respondents and their roles can be found in Appendix A. Some of them were interviewed more than once to gather further information. The interviews were conducted primarily in person; if the interviewee was not available onsite, the inter- view was held digitally via Microsoft Teams. Interviews were manually transcribed to capture immediate insights and nuances. In some cases, transcription was sup- ported by AI software, in which case permission was obtained from the interviewee. The first phase of interviews was thus more exploratory, defining the scope of the study, highlighting contextual nuances, and identifying patterns related to AI readi- ness. These interviews also informed the design and development of the AI proto- type, ensuring it addressed stakeholder needs and priorities. In addition, a second round of six semi-structured interviews were conducted to validate and triangulate the initial interview and workshop findings. Triangulation involves using multiple data sources, methods, or researchers to increase the trust- worthiness and credibility of research findings [39]. These interviews consisted of two professionals from the US organization at the company’s New York office, one individual from the UK office, and three additional interviews with Swedish employ- ees, two of whom had also participated in the initial round of interviews. While the study primarily focuses on a Swedish context, the international perspective exceeds the scope but contributes to triangulating findings. The purpose of the international interviews was to understand how AI is applied in practice within an international context at Skanska, and to compare our own AI model against their workflows. These additional interviews followed a structured interview guide, which can be found in Appendix B. The approach was grounded in the eight AI Readiness dimen- sions proposed by Tehrani et al. [3]. This strengthened the empirical foundation for answering Research Question 1 (RQ1) and enabled a critical assessment of the prototype’s alignment with strategic objectives. 3.3.3 Litterature Search To construct the frame of reference, the scientific database Scopus was used to identify peer-reviewed and high-quality literature relevant to the research topic. Systematic and well-documented literature searches are essential for ensuring trans- parency, reliability, and replicability in academic research [39]. Therefore, a struc- tured approach was applied when selecting keywords, which were derived from key concepts in the research questions and refined iteratively during the process. Com- mon Boolean operators (e.g., AND, OR) were used to combine terms and narrow down the results. To assess the relevance and quality of the sources, search results were filtered based on citation count and publication outlet, with a focus on highly cited articles pub- lished in reputable journals. In areas related to artificial intelligence, which is a fast evolving field, particular emphasis was placed on recently published articles to ensure that the theoretical foundation reflects the latest developments and current 31 3. Methods state of the field with constant updates in the area. Overall, litterature search covered the areas of the construcion indsutry, relevant businesss frameworks. The frame of reference also covers relevant aspects of AI, including specific models, to later reinforce their applicability in the study’s con- text. This analysis of existing research, also known as secondary studies, refers to examining and interpreting data or findings originally collected by others [39]. 3.3.4 Qualitative Data Analysis The interview data were analyzed thematically, using the AI Readiness framework as a guiding structure. Thematic coding is a qualitative analysis method used to identify and organize patterns or themes within interview data [39]. Each transcript was reviewed, and segments were divided into themes according to the eight dimen- sions of AI readiness. Within each of these eight initial themes, approximately five key codes were identified, each capturing an essential aspect of the theme. Following this first categorization, a second layer of analysis was performed by connecting these codes to five overarching second-layer themes: (1) Organizational Readiness Gaps and Ownership Challenges, (2) Employee Trust and Safety, (3) Tech- nical and Data Foundation Gaps, and (4) Early Signs of Adoption and Competitive Opportunity. Finally, these second-layer themes lay the basis for the discussion chap- ter, and captures frequently mentioned insights from the interviewees. Altogether, this analysis forms the basis for answering RQ1 and supports the formulation of managerial implications for Skanska. 3.4 Defining the Research Area Following a series of workshops, interviews, and an initial exploration of both the construction industry and Skanska Hus’s operational practices, potential areas of interest for AI application were identified. These areas were evaluated based on technical feasibility, data availability, and potential business value. 3.4.1 Quality Management in the Construction Process as the Chosen Area The area, Quality Management in the Construction Process, was selected following a series of workshops and careful evaluation (partly in accordance with Section 2.3.4). It was identified as a domain characterized by a continuous and centralized inflow of data, rich in metadata and largely untapped textual content. This combination offered strong technical feasibility NLP and high business value through the poten- tial to extract deeper insights and conclusions from the data. Employees across all stages of the construction process interact with Skanska’s ACC system, where they log deviations (avvikelser). A deviation refers to any activity or outcome that fails to meet specified requirements, thereby affecting work quality, 32 3. Methods Figure 3.2: Overview of the currently reporting structure, representing each reported issue and focusing on the construction process phases, e.g. inspection and production. Figure 3.3: Overview of the desired reporting structure, representing each reported issue and focusing on the category and nature of the issue, e.g. construction part and action. 33 3. Methods the final product, or the surrounding environment. Managing these deviations is essential for organizational learning, continuous improvement, and compliance with both contractual obligations and external standards. Notably, deviations are most commonly reported during the production and aftermarket phases, rather than dur- ing inspection or product development. Historically, such issues were documented in the BIM360 system, whose historical data remains accessible. Today, deviations are registered according to a construction process hierarchy (see Figure 3.2). There is a recognized need for improved oversight and timely handling of deviations to prevent costly consequences in later project phases. The ACC/BIM360 platforms collect and store quality-related deviations and issues using predefined categories and root cause classifications tailored to the different stages of the building process. This system design helps prevent users from impro- vising their own taxonomies and reinforces the understanding that documentation serves a broader analytical purpose. Users are also prompted to provide a title and a description summarizing the issue. As of May 2024, Skanska introduced a new set of predefined categories within ACC, enabling better tracking, sorting, and analysis of deviations. This update enhances consistency in documentation and supports systematic analysis to detect patterns, facilitate learning, and improve quality per- formance over time. Given the volume and structure of the existing data, there is a significant oppor- tunity to improve data utilization in order to support the Quality Department’s proactive quality management. This study therefore aimed to demonstrate how AI, specifically through NLP, can extract insights and enhance the categorization of issues. By restructuring issue reporting to emphasize what construction part was af- fected or what type of incident occurred, the organization can foster more proactive strategies and reduce both the time and cost associated with recurring deviations. This revised reporting logic is illustrated in Figure 3.3, and showcased in Section 4. 3.4.2 Other Areas of Interest During the exploratory phase multiple areas of possible interest were introduced. These, however, were not deemed fit to use as focus scope according to their strate- gic and integrated business complexity, as well as the data criteria found in Section 2.3.4. The following section presents considered areas. Risk assessment in the Cost Estimation of Construction Projects The second area of interest focuses on cost calculations, including an added risk factor, which must be made for each project request. This risk assessment is often based on underlying data, supplier prices, but primarily on experience-based knowl- edge and team discussions. Since each project is unique, there is no exact method for this process, it relies heavily on experience based knowledge. Cost estimates vary from person to person and across different regions. Ultimately, many projects end up costing significantly less than initially estimated, meaning that costs are often overestimated due to higher-than-necessary risk additions. While there is a large 34 3. Methods amount of available data, there is no structured approach to conducting an analysis. There is a need for a tool to support this process, but it remains a complex area, as it involves many individuals and their experience-based assessments. Therefore, this area was not considered further. The Process of Reviewing and Verifying Accuracy in Project Planning The third area of interest relates to the validation process throughout the construc- tion value chain. From initial architectural designs, through technical planning and engineering, to the execution phase on site. A recurring issue identified is the lack of reliable, complete, and up-to-date project information being transferred between actors. This leads to time-consuming double-checking activities, such as manual cross-verification of drawings, bills of quantities, and technical specifications, to en- sure no critical information has been omitted or misinterpreted. These inefficiencies stem largely from communication breakdowns and fragmented project documenta- tion practices, which are common challenges in construction project management. Although addressing this area would offer significant value, it was deemed too com- plex for the current study due to the heterogeneous nature of the data (including both text-based and image-based documents) and the difficulty of standardizing such unstructured information streams. Samläsning The third area, Samläsning, is loosely linked to the previous area, it refers to the process of harmonizing terminologies and ensuring consistent communication across project participants. Differences in how design elements, construction materials, or processes are labelled and described create misunderstandings that complicate project execution and quality assurance efforts. This semantic misalignment often leads to errors, rework, and delays. Although this area is highly relevant and ties into broader issues of digitalization and it was ultimately excluded from this study. The decision was based on the complexity of the communication challenges involved and the broad organizational changes required to address them systematically. 3.5 Ensuring High-Quality Research To maintain transparency and reflect on potential limitations, the following method- ological considerations were acknowledged. • Positive bias in participants: Most interviewees are highly open to change, innovation, and AI, which may skew results toward opt