Are we agile? Evaluating how a short survey can mea- sure agile transformation A field study on the survey and agile maturity model used by Volvo Cars during their transformation Master’s thesis in Computer science and engineering Johannes Gustavsson Pontus Lindblom Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG Gothenburg, Sweden 2022 Master’s thesis 2022 Evaluating how a short survey can measure agile transformation A field study on the survey and agile maturity model used by Volvo Cars during their transformation Johannes Gustavsson, Pontus Lindblom Department of Computer Science and Engineering Chalmers University of Technology University of Gothenburg Gothenburg, Sweden 2022 Evaluating how a short survey can measure agile transformation A field study on the survey and agile maturity model used by Volvo Cars during their transformation Johannes Gustavsson, Pontus Lindblom © Johannes Gustavsson, Pontus Lindblom, 2022. Supervisor: Lucas Gren, Department of Computer Science and Engineering Advisor: Anna Sandberg, Volvo Cars Examiner: Regina Hebig, Department of Computer Science and Engineering Master’s Thesis 2022 Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg SE-412 96 Gothenburg Telephone +46 31 772 1000 Cover: A one-question survey. Typeset in LATEX Gothenburg, Sweden 2022 iii Evaluating how a short survey can measure agile transformation A field study on the survey and agile maturity model used by Volvo Cars during their transformation Johannes Gustavsson Pontus Lindblom Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg Abstract With more companies moving towards agile, there is a need for ways to measure the progress. There are several agile maturity models for this purpose, but they are accompanied by a large set of indicators to be measured. Volvo Cars created a model with only ten questions in their shift towards the agile ways of working, creating an alternative and time-efficient measuring method. The purpose of this study is to evaluate to what degree one can use the 10-question survey to measure a large automotive company’s agile transformation. To investigate how much of agile the model covers, a literature review was conducted to collect different agile defini- tions, values, and principles and then map them onto the questions. The model was validated with exploratory factor analysis and by plotting means with confidence intervals. Lastly, a thematic analysis uncovered the participants’ opinions on the model. What was found is that the survey does not cover all aspects of agile. Instead, it is composed of a mix of agile and important success factors for agile transformations. The statistical analysis did not support the proposed grouping of the questions in the model since there were too few measured variables. What could be seen is that the participants matured according to the staircase proposed. Lastly, we discovered that 8.4% responded with their opinion about the survey. Of those, 13.5% were positive, 38.2% were neutral, and 47.3% were negative. People generally had many opinions about what questions to include or exclude in the survey. In conclusion, the scope of what can be covered with ten questions is limited. All questions except two can be connected to important aspects of agile or agile trans- formation, but they did not map to all included aspects in this thesis. Since all aspects of agile and the transformation cannot be covered, adapting such a survey and model to the targeted organization becomes essential. We see that grouping the questions when they are so few might hide important information. Not all aspects the employees wished for could be included with only ten questions. There also seemed to have been a dissonance between the type of questions included in the sur- vey and what the employee expected, even though its purpose was communicated when sent out. Keywords: Agile, Large-Scale Agile, Agile Transformation, Survey, Automotive, Agile Maturity Model, Scaled Agile Framework (SAFe). iv Acknowledgements First and foremost, we want to thank our academic supervisor Lucas Gren for his tremendous efforts in helping us during the whole process, from proposal to a finished thesis. We would also like to thank Volvo Cars and our industrial advisor Anna Sandberg for giving us access to their survey data so that we could conduct our study and for their support in our endeavor. Lastly, we would like to thank our examiner Regina Hebig for providing feedback and support at the different milestones. Johannes Gustavsson and Pontus Lindblom, Gothenburg, June 2022 vi Contents List of Figures ix List of Tables xi 1 Introduction 2 1.1 Statement of the problem . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Purpose and research questions . . . . . . . . . . . . . . . . . . . . . 3 1.3 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Theory 6 2.1 Waterfall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 V-model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Agile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Large-scale agile . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 SAFe - Scaled Agile Framework . . . . . . . . . . . . . . . . . 13 2.4 Agile transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 Measuring agile transformation . . . . . . . . . . . . . . . . . . . . . 18 3 Method 21 3.1 Research strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.1 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.2 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.3 Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1 RQ1: Linking the survey questions to agile . . . . . . . . . . . 23 3.3.2 RQ2: Model validation . . . . . . . . . . . . . . . . . . . . . . 24 3.3.3 RQ3: Free-text analysis . . . . . . . . . . . . . . . . . . . . . 26 3.4 Research ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Results 30 4.1 Results of the data collection . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.1 RQ1: What parts of agility are covered in the 10-question survey? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.2 RQ2: Are the question pairs reflecting the maturity steps sta- tistically? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 viii Contents 4.2.3 RQ3: What do the participants think of the 10-question agility survey? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5 Discussion 53 5.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1.1 RQ1: Challenges with covering agile with a 10-question survey 53 5.1.2 RQ2: Challenges with statistically validating so few questions 56 5.1.3 RQ3: Discussing the results from the free-text answers . . . . 57 5.2 Contributions & significance of the study . . . . . . . . . . . . . . . . 58 5.3 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.3.1 Construct validity . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.3.2 External validity . . . . . . . . . . . . . . . . . . . . . . . . . 59 6 Conclusion 61 6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Bibliography 63 A Normality plots II B Structure matrixes XIII C Mean with 95% interval XV ix List of Figures 2.1 Example of a Waterfall model [1]. . . . . . . . . . . . . . . . . . . . . 6 2.2 Example of a Waterfall model [2]. Dashed lines have been added between the horizontal phases to show that there is a relationship between them. Directional arrows have been added along the sides to clarify what phases are included in the verification and validation phase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 The SAFe core competencies [3]. . . . . . . . . . . . . . . . . . . . . . 14 2.4 The Full SAFe configuration [3]. . . . . . . . . . . . . . . . . . . . . . 15 2.5 The SAFe implementation roadmap [3]. . . . . . . . . . . . . . . . . . 16 3.1 The data analysis workflow for RQ1. . . . . . . . . . . . . . . . . . . 23 3.2 The data analysis workflow for RQ2. . . . . . . . . . . . . . . . . . . 24 3.3 The data analysis workflow for RQ3. . . . . . . . . . . . . . . . . . . 26 4.1 Scree plots from the different measurement points paired with paral- lel analysis used to determine the numbers of statistically supported factors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Interval plots on Q1 and Q2 across the four surveys. . . . . . . . . . . 41 4.3 Interval plots on Q3 and Q4 across the four surveys. . . . . . . . . . . 41 4.4 Interval plots on Q5 and Q6 across the four surveys. . . . . . . . . . . 42 4.5 Interval plots on Q7 and Q8 across the four surveys. . . . . . . . . . . 42 4.6 Interval plots on Q9 and Q10 across the four surveys. . . . . . . . . . 43 4.7 Combined Interval plot of the question groupings. . . . . . . . . . . . 44 4.8 Zoomed in version combined Interval plot of the question groupings. . 44 4.9 Combined Interval plot of the survey questions. . . . . . . . . . . . . 45 4.10 Zoomed in version of the combined Interval plot of the survey questions. 46 4.11 Thematic map based on all the surveys. . . . . . . . . . . . . . . . . 47 4.12 Thematic map based on the first survey. . . . . . . . . . . . . . . . . 48 4.13 Thematic map based on the second survey. . . . . . . . . . . . . . . . 48 4.14 Thematic map based on the third survey. . . . . . . . . . . . . . . . . 49 4.15 Thematic map based on the fourth survey. . . . . . . . . . . . . . . . 49 C.1 Mean, lower bound and upper bound of the 95% interval for the grouped questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . XVI C.2 Mean, lower bound and upper bound of the 95% interval for the questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XVII x List of Figures xi List of Tables 2.1 The four agile values [4]. . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 The twelve agile principles [4]. . . . . . . . . . . . . . . . . . . . . . . 10 2.3 The agile definitions chosen for this study. . . . . . . . . . . . . . . . 12 2.4 The ten SAFe principles [5]. . . . . . . . . . . . . . . . . . . . . . . . 16 3.1 Example of thematic analysis codes from Saad et al. [6]. . . . . . . . 27 4.1 Survey questions used by Volvo Cars to measure agility. . . . . . . . . 30 4.2 The number of data points originally and after list-wise deletion. . . . 31 4.3 The number of free-text answers originally, after list-wise deletion, and after duplicate deletion. . . . . . . . . . . . . . . . . . . . . . . . 32 4.4 The keywords extracted from the survey questions. . . . . . . . . . . 33 4.5 Mapping of the agile values (Table 2.1) to the survey questions (Table 4.1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.6 Mapping of the agile principles (Table 2.2) to the survey questions (Table 4.1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.7 Mapping of the agile definitions (Table 2.3) to the survey questions (Table 4.1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.8 Mapping of the SAFe principles (Table 2.4) to the survey questions (Table 4.1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.9 The agile sources that did not map to any of the survey questions. . . 36 4.10 Factor correlations measurement one. . . . . . . . . . . . . . . . . . . 37 4.11 Factor correlations measurement two. . . . . . . . . . . . . . . . . . . 38 4.12 Factor correlations measurement three. . . . . . . . . . . . . . . . . . 38 4.13 Factor correlations measurement four. . . . . . . . . . . . . . . . . . . 38 4.14 Factor loadings measurement one. . . . . . . . . . . . . . . . . . . . . 39 4.15 Factor loadings measurement two. . . . . . . . . . . . . . . . . . . . . 39 4.16 Factor loadings measurement three. . . . . . . . . . . . . . . . . . . . 40 4.17 Factor loadings measurement four. . . . . . . . . . . . . . . . . . . . 40 4.18 The distribution of answers between the four surveys. . . . . . . . . . 46 4.19 The percentages of how many answers were extracted from the total. 47 4.20 Details of all the sub-themes. . . . . . . . . . . . . . . . . . . . . . . 50 4.21 The theme distribution among the total answers. . . . . . . . . . . . 51 4.22 The theme distribution among the extracted answers. . . . . . . . . . 51 B.1 Structure matrix measurement one. . . . . . . . . . . . . . . . . . . . XIII B.2 Structure matrix measurement two. . . . . . . . . . . . . . . . . . . . XIII xii List of Tables B.3 Structure matrix measurement three. . . . . . . . . . . . . . . . . . . XIV B.4 Structure matrix measurement four. . . . . . . . . . . . . . . . . . . . XIV xiii List of Tables 1 1 Introduction The agile approach in software development has become increasingly widespread among large and small companies. Increased responsiveness to change, increased customer satisfaction, and reduced time to market are some of the benefits of agile [7]. Even though frameworks such as Scrum and Kanban exist, being agile is more than just following a framework with bundled practices [8]. It also includes a certain mindset and culture that need to be adopted within the company, and its practices [9]. The process of changing to an agile practice, mindset, and culture is called an agile transformation [10]. Having a guide through this process and measuring the progress could be helpful for a successful transformation. There are multiple ways of measuring agility in an organization. One way to mea- sure agile transformation is to look at the organization’s agile maturity through an agile maturity model. These models guide the process with their questionnaires and maturity levels in different categories of agile aspects while letting the organization understand how far they have come in their agile transformation [11]. Even though there are multiple methods for measuring agile transformation, only a few of those tools have been statistically validated [12]. This study evaluates the possibility of measuring agile transformation through a 10- question survey. Access to the data and the model for this study was gained through Volvo Cars. The evaluation was done by first mapping the survey questions to different agile sources found in the literature. Secondly, exploratory factor analysis was applied to the answers to statistically validate the relationships between the questions in the survey and the proposed steps in the agile maturity model. To further validate the model, the mean and confidence interval of the data points were compared between the different points in time the data was collected. This comparison was made to see if the organization’s maturity follows the steps proposed by Volvo Cars. Lastly, the free-text answers were analyzed to see the employees’ thoughts on using the 10-question survey to measure agile transformation. 1.1 Statement of the problem The industry wants quick ways to measure agility, but it is uncertain if such an approach is feasible. The measurement design should ensure it can measure agility 2 1. Introduction in a meaningful way. At the same time, it needs to be written so that people from different disciplines answer them. With cars becoming increasingly digital, Volvo Cars strives toward becoming a soft- ware company. Agile methods stem from software development, and most teams at Volvo Cars are now a mix of hardware and software developers since the software is central in the car. Understanding the applicability of these methods of measuring the agile transformation in such a large company is therefore important. One challenge with measuring agile aspects is that there are many definitions of ag- ile, and being agile according to one definition does not per se translate into being agile according to another. Even with an agreed-upon definition of agile, one still needs to understand how to measure if a team or organization is agile or not. There are several strategies to choose from when deciding how to measure agile. One way is to measure if common agile success factors, such as on-time delivery, have been improved with the agile transformation [13]. Another way is to, for example, use a tool like the Comprehensive Agility Measurement Tool (CAMT) that captures the ten most critical agility enablers and measures the agility with Likert scales, ranging from 1-5 [14]. These are just some of the available tools and frameworks, and both Gren et al. [12] and Läppenen [15] conclude in their papers that there is a need to validate these types of tools statistically. 1.2 Purpose and research questions The purpose of this study is to evaluate to what degree one can use the 10-question survey to measure a large automotive company’s agile transformation. Since the survey is short and there is no standard definition of agile, it is essential to verify that the definitions of agile used in the study can translate to an agile transforma- tion. Being able to boil down a company’s agile state into ten questions that could be sent to employees would be useful. The agile maturity model used by Volvo Cars was statistically validated with ex- ploratory factor analysis to see to what extent the questions correlate according to the company’s intentions. The differences in the mean of the question and question pairs at the different measurement points were compared to see if the organization matures according to the staircase type levels in the model. By doing this, potential problems with the model could be exposed. By understanding those problems, that knowledge could be used to improve the process of creating a survey and model to measure agility in the future. Lastly, participants’ acceptance of the 10-question survey were explored to see how applicable it is at measuring agile transformation. A low acceptance level can in- dicate some underlying issues with the survey that need further evaluation. Such information can provide valuable insights about the design of this survey, which can 3 1. Introduction be of use when similar surveys are designed in the future. Based on the purpose of the study, the following research questions were formulated: • RQ1: What parts of agility are covered in the 10-question survey? • RQ2: Are the question pairs reflecting the maturity steps statistically? • RQ3: What do the participants think of the 10-question agility survey? 1.3 Delimitations The study focuses on all agile definitions that could be identified and verified in two working days when answering what parts of agility are covered in the 10-question survey. There are likely more definitions of agile available, but identifying all defi- nitions would be out of the time scope for this study. Analyzing the free-text answers is limited to only covering the participants’ views and opinions of the survey. There were thousands of free-text answers regarding many other aspects of the agile transformation, and analyzing them further would be an overwhelming task for the scope of this study. 4 1. Introduction 5 2 Theory This chapter presents the theory needed to follow and understand the study. More specifically, an introduction to Waterfall, V-model, agile, large-scale agile, SAFe, agile transformation, and how to measure that transformation. 2.1 Waterfall The Waterfall model is the oldest process model among the traditional development approaches in software development [16]. The Waterfall model is a sequential process where one phase must finish before the next phase can begin. It was first introduced by Royce back in 1970 [17], and over the years, it has evolved to take many forms. There are many variations of Waterfall available, even if the core principles remain the same. Some of the most common Waterfall variations have between five and seven phases, and the name of the phases can differ. Figure 2.1 shows an example of a Waterfall model provided by Bassil [1]. Implementation Design Analysis Testing Maintenance Figure 2.1: Example of a Waterfall model [1]. This specific Waterfall model from Bassil has five phases, 1) analysis, 2) design, 3) implementation, 4) testing, and 5) maintenance. A description of the different phases is shown below: 6 2. Theory Analysis: This phase is about gathering a comprehensive description of the sys- tem’s behavior that is going to be developed [1]. The customers’ needs are identified and documented, and then the requirements are refined to be used in the design and implementation phase [18]. Creating clear requirements is vital as Waterfall is a sequential process, and once the design phase is reached, further changes in the requirements will not be considered [2]. The analysis phase is sometimes called the requirement phase. Design: The requirements gathered from the previous phase are used to plan the design for a software solution [1]. The plan includes solutions for software architec- ture design, concept design, algorithm design, data structure design, graphical user interface design, and more. Implementation: This is where the gathered requirements and created designs are converted to the production environment [1]. The implementation phase is some- times called the coding or the development phase. Testing: This is where the software solution is tested against the original require- ments and specifications to ensure it accomplishes its intended purposes [1]. This phase is also where system glitches are found, fixed, and refined. Maintenance: The product needs to be maintained after it has been released to the customer so that it keeps working correctly and that bugs found in the live environment are fixed [18]. Examples of maintenance tasks include error correction, performance improvements, and quality improvements. The advantages and disadvantages of the Waterfall approach have been outlined in a paper from Balaji et al. [2]: Advantages: • Requirements are clear before development starts. • Each phase is completed in a specified period before moving to the next phase. • It is easy to implement as Waterfall is a linear model. • Minimal resources are required to implement the model. • The proper documentation is followed in each phase for the development qual- ity. Disadvantages: • The problems in one phase are never entirely solved, and many issues arise after the phase is signed off, resulting in a poorly structured system. • Requirements will not be changed in the current development process, even if the client requests it. 7 2. Theory 2.2 V-model The V-model is a modification of the Waterfall model that was first proposed by Rook in 1987 [2, 19]. Like the Waterfall model, the V-model consists of many phases that one traverses from start to finish, and a phase can only start once the previous is completed. One difference between the models is that the phases of the V-model start by moving down and then bend upwards after the coding phase, as shown in Figure 2.2 [20]. The figure visualizes the relationship between the development and testing phases on the same horizontal level. In the V-model, the developer and tester work in parallel instead of doing all the tests at the end of the software development life cycle. This relationship means that acceptance tests are created while gathering the requirements. Such an approach ensures that testing is conducted at the very beginning of the project, making it possible to identify problems that could cause serious harm at a later phase [20]. High-Level Design Specifications Requirements Low-Level Unit Design Build Code Unit Testing Acceptance Testing Integration Testing System Testing Verification Phase Vali dati on Phas e Figure 2.2: Example of a Waterfall model [2]. Dashed lines have been added between the horizontal phases to show that there is a relationship between them. Directional arrows have been added along the sides to clarify what phases are included in the verification and validation phase. Balaji et al. outline the advantages and disadvantages of the V-model, as they did with the Waterfall model [2]: Advantages: • The same advantages as the Waterfall model. • The tester role is involved in the requirements phase. • Requirement changes are possible in any phase. Disadvantages: • The most significant disadvantage of the V-model is that it is very rigid and inflexible. • If any changes happen mid-way, the requirements documents and the test documentation need to be updated. 8 2. Theory • It is not suitable for short-term projects as it requires reviews at each stage. The V-model was the model used at Volvo Cars before they decided to become agile. The Waterfall model and the V-model are rigid and not suitable for projects with uncertain requirements that risk changing during development [2]. Agile works very well in these situations, and one of the main reasons Volvo Cars decided to move away from the V-model to the agile way of working was that they were going too slowly compared to the changes in the automotive world [21]. There are many agile frameworks to choose from when moving to agile. As Volvo Cars was a large and complex organization, they decided to go with the large-scale agile framework SAFe (Scaled Agile Framework), which is described further in Section 2.3.2. 2.3 Agile The agile approach in software development has become increasingly widespread among large and small companies [22]. Some of the benefits of agile are increased responsiveness to change, increased customer satisfaction, and reduced time to mar- ket [7]. The idea of agile software development was outlined in the agile manifesto in 2001 and it consists of four values (see Table 2.1) and twelve principles (see Table 2.2) [4]: The four agile values We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: - Individuals and interactions over processes and tools. - Working software over comprehensive documentation. - Customer collaboration over contract negotiation. - Responding to change over following a plan. That is, while there is value in the items on the right, we value the items on the left more. Table 2.1: The four agile values [4]. 9 2. Theory The twelve agile principles - Our highest priority is to satisfy the customer through early and continuous delivery of valuable software. - Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage. - Deliver working software frequently, from a couple of weeks to a couple of months, with a preference for the shorter timescale. - Business people and developers must work together daily throughout the project. - Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done. - The most efficient and effective method of conveying information to and - within a development team is face-to-face conversation. - Working software is the primary measure of progress. - Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely. - Continuous attention to technical excellence and good design enhances agility. - Simplicity–the art of maximizing the amount of work not done–is essential. - The best architectures, requirements, and designs emerge from self-organizing teams. - At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly. Table 2.2: The twelve agile principles [4]. A few different agile frameworks have emerged, Scrum, which is the most used, and Extreme Programming (XP), to mention the two big ones [23]. There is also one method that stems from the Lean movement created at Toyota called Kanban [24]. The use of hybrid approaches is not too uncommon when looking at the data col- 10 2. Theory lected by digital.ai [22]. The second most used method in that report, with 7% of the respondents using it, is Scrumban, a mix of Scrum and Kanban [22]. There are many definitions of agile, and being agile according to one definition does not necessarily translate to being agile in another. To prevent a team or an organization from pulling in different agile directions, it is essential that they work with the same definition. Laanti et al. [25] collected several definitions of agile created up until 2013 when their paper was published. They conclude in their paper that the different perceptions of agility make the implementation of the agile methods complicated. A list of agile definitions in chronological order has been gathered in Table 2.3. The table includes the definitions from Laanti et al. we could verify and other definitions we found through the literature review. 11 2. Theory Source Agile definition Cockburn 2001 [26] Being effective and maneuverable. Use of light-but-sufficient rules of project behavior and the use of human and communication-oriented rules. Highsmith 2002 [27] Ability to both create and respond to change in order to profit in a turbulent business environment. Anderson 2003 [28] Ability to expedite. Larman 2004 [29] Rapid and flexible response to change. Schuh 2004 [30] Building software by empowering and trusting people. Acknowl- edging change as a norm, and promoting constant feedback. Producing more valuable functionality faster. Lyytinen 2006 [31] Discovery and adoption of multiple types of Information Sys- tems Development innovations through garnering and utilizing agile sensing and response capabilities. Subramaniam and Hunt 2006 [32] Uses feedback to make constant adjustments in a highly collab- orative environment. Ambler 2007 [33] Iterative and incremental (evolutionary) approach to software development which is performed in a highly collaborative man- ner by self-organizing teams with “just enough” ceremony that produces high-quality software in a cost-effective and timely manner which meets the changing needs of its stakeholders. Kruchten 2013 [34] The ability of an organization to react to changes in its environ- ment faster than the rate of these changes. Denning 2016 [35] An umbrella term for a set of management practices–including Scrum, Kanban, and Lean– which enable offering requirements and solutions to evolve through collaboration between self- organizing, cross-functional teams. It promotes adaptive plan- ning, evolutionary development, early delivery and continuous improvement, and it institutionalizes rapid and flexible response to customer input. Pioneered by software development teams, its ideals were first widely promulgated in the agile manifesto of 2001 [4]. Agile practices have now spread to management generally. Gren et al. 2020 [36] Responsiveness to change. Table 2.3: The agile definitions chosen for this study. Abrahamsson et al. created one definition of agile by asking, “What makes a de- velopment method an agile one?” [37]. This question creates a narrow focus on the agile connection to methods and not the whole company, and the definition is presented as the following: • Incremental (small software releases, with rapid cycles). • Cooperative (customer and developers working constantly together with close communication). • Straightforward (the method itself is easy to learn and to modify, well docu- mented). 12 2. Theory • Adaptive (able to make last moment changes). It is important to stress that to be agile, there is more to it than implementing a framework. “Agile is more than a set of practices used by IT requiring wide ranging change to work patterns” [8], and a key success factor in being agile is the culture [38, 22]. A recent take on the definition of agile was created by Gren et al. Their definition intends to provide an unambiguous way to discuss and measure agility, and that definition is that agile is responsiveness to change [36]. This definition has several advantages, where one of the most powerful strengths is that one can ask oneself, “Does this change make us more responsive to change?” to answer if a change makes you more agile. 2.3.1 Large-scale agile Based on the number of teams, a definition for large-scale agile development is proposed to be agile development efforts with two or more teams [39]. To successfully scale agile beyond a single team, the organization must support communication between the teams [38]. This adds another layer of complexity compared to working within the agile team itself. Large-scale agile frameworks such as SAFe and Scrum of Scrums emerged to manage this complexity [40]. Volvo Cars chose SAFe to scale agile across their organization. 2.3.2 SAFe - Scaled Agile Framework Scaled Agile Framework (SAFe) is a system for scaling agile across teams of teams, business units, and organizations. It implements Lean, Agile, and DevOps practices at scale. According to Scaled Agile Inc., using this approach typically results in 1) faster time-to-market, 2) improvements in quality, 3) increase in productivity, and 4) increase in employee engagement [5]. SAFe is constantly growing and evolving. It is currently in its fifth iteration, SAFe 5, and the framework is continuously updated due to increased industry expertise, more customer data, and leading-edge management practices that improve. SAFe is also the most used framework for scaled agile, where 37% of the respondents to the 15th Annual State of agile Report stated that they use SAFe [22]. SAFe is growing on a year-over-year basis, as they got 35% the year before and 30% the year before that [41]. This increase can be compared to the second most used framework for scaled agile, Scrum@Scale/Scrum of Scrums, where 9% stated they used it compared to last year when 16% said they did [41]. SAFe is built around the seven core competencies of the Lean Enterprise that enable organizations to respond to changes in the market in an agile way [3]. The core competencies are outlined in Figure 2.3. 13 2. Theory Figure 2.3: The SAFe core competencies [3]. There are four different configurations in SAFe: 1) Essential SAFe, 2) Large Solution SAFe, 3) Portfolio SAFe, and 4) Full SAFe [3]: 1. “Essential is the most basic configuration of the framework, and it provides the minimal elements necessary to be successful with SAFe. 2. Large Solution is for enterprises building large and complex solutions, which do not require the portfolio level’s constructs. 3. Portfolio provides Strategy and Investment funding, agile Portfolio Opera- tions, and Lean Governance. 4. Full is the most comprehensive configuration. It supports building large, in- tegrated solutions that typically require hundreds of people to develop and maintain.” An image of Full SAFe, which clearly outlines all the configurations of SAFe, can be seen in Figure 2.4. 14 2. Theory Figure 2.4: The Full SAFe configuration [3]. SAFe provides a step-by-step implementation roadmap that has been proven to be effective in successfully implementing the framework. The roadmap can be seen in Figure 2.5, and it shows the twelve steps necessary to complete the transformation. 15 2. Theory Figure 2.5: The SAFe implementation roadmap [3]. As scaled agile is complex, and all organizations are unique, there is no one-size- fits-all solution. An organization might encounter a situation where some SAFe practices do not suit their way of working, and if that is the case, SAFe provides ten principles to follow [5]. The principles are outlined in Table 2.4, and they work as guidance in the implementation of SAFe in any context. # SAFe principle 1 Take an economic view. 2 Apply systems thinking. 3 Assume variability; preserve options. 4 Build incrementally with fast, integrated learning cycles. 5 Base milestones on objective evaluation of working systems. 6 Visualize and limit WIP, reduce batch sizes, and manage queue lengths. 7 Apply cadence, synchronize with cross-domain planning. 8 Unlock the intrinsic motivation of knowledge workers. 9 Decentralize decision-making. 10 Organize around value. Table 2.4: The ten SAFe principles [5]. 16 2. Theory 2.4 Agile transformation A paper report from the first international workshop on agile transformation [10] states that agile transformation is to implement and adopt agile practices in an or- ganization. The paper also presents a few other definitions collected from managers, scientists, and others that attended the event: • “a path from adopting agile practices to establishing agile culture” • “shift towards practices that enable organizational responsiveness” • “agile – iterative, incremental, collaborative, effects/results/outcomes-driven transformation – continuous improvement from where you are towards the ag- ile values and principles” As stated by Gandomani [42], the values in agile methods throughout the company are different from traditional ones, and therefore an agile transformation impacts the whole organization. One can further read that these transformations have multiple challenges, but the main ones are organizational culture, management, people, and processes [42]. When moving to an agile development process, there is a need to change the orga- nizational culture from the traditional process-centric way to a people-centric one [43, 9]. No matter the size, all companies have their own organizational culture, as stated by Tolfo et al. [44]. By only looking at the visual aspects, the company can get a false image of their culture and miss potential obstacles in the planned trans- formation. The company needs to talk, discuss and observe people to get a proper idea of its culture. Furthermore, the authors point out that the strategic context in the company, the stakeholders with their values and principles, needs to be taken into account, as well as the tactical and operational context when evaluating the culture [44]. A critical success factor for the transformation is management buy-in and support [45]. The management team must have a logical reason for the change, create clear goals for the transformation, and be committed to the cause. Another challenge in management is that with the move to agile, there is a change in leadership style from a traditional commanding style to a servant type of leadership [45]. There is also a change in the decision-making that makes it more decentralized. Conboy et al. [46] states numerous challenges for people during an agile transforma- tion. One of the challenges is to deal with the fear and feeling of being inadequate that developers can get when their work becomes more transparent. Another one is that as a developer in agile, one is required to have multiple skills aside from one’s niche. If the organization only looks at individual performance, employees might feel at a disadvantage when competing for a promotion [46]. Introducing agile at scale comes with its challenges. One challenge is to coordinate the large number of agile teams [38]. This coordination requires an organization that supports communication between the agile teams, which becomes even more 17 2. Theory important when the organization is distributed globally. Another challenge with agile at scale is to achieve technical consistency across all teams [38]. An efficient way of measuring the agile transformation is needed to take the correct actions and keep track of the progress. Some models for this are presented in the next section. 2.5 Measuring agile transformation One way to measure the agility in an organization is through the quantitative met- rics model proposed by Olszewska et al. [47]. This model was created to provide a quantitative approach to measuring agile impact in a software organization. It proposes four questions with eight metrics, two for each question. The metrics were chosen to be generic so that they could be used in a plan-driven, agile and lean setting. Although they also state that fine-tuning of some metrics to the specific settings would most likely be needed [47]. Agile maturity models are another way to measure agile transformation by using different categories that vary between the variety of models out there. Examples of categories used in existing models are agile principles [48], code quality metrics, and automated regression testing [49]. Within these categories, maturity levels are supposed to work like a staircase. Level one is reached first, then level two, and so on, to increase the organization’s agile maturity gradually [11]. Even though there are multiple suggestions for measuring an agile transformation, there is little statistical validation for those tools [12]. An example of a model that underwent a statistical validation study after its cre- ation is the agile adoption model created by Sidky et al. [48]. They call this model the Sidky Agile Measurement Index (SAMI), and it comes with two components, one agile measurement index, and a four-stage process. The agile measurement in- dex has five agility levels and five agile principles, which is a summarization of the 12 agile principles from the manifesto. The four-stage process is described as the backbone of the framework by the authors. The four stages are grouped into two objectives, and the first objective contains the first stage called discontinuing fac- tors. This objective and stage aim to provide a way to assess if the organization is ready to begin the transformation towards agile, a go/no-go assessment. The second objective is meant to guide agile coaches towards what practices fit the organization. Underneath this, the three remaining stages are found, project level assessment, or- ganizational readiness assessment, and reconciliation [48]. The statistical validation on parts of this model was done by Gren et al. They concluded that “Data did not support the division of a subset of items selected from the Agile Adoption Framework. However, the data gave new categorizations of the items in the Agile Adoption Framework.” [12]. They also claim that together with their changes, this was the first partly validated maturity model to their knowledge at that time. 18 2. Theory Another example of a maturity model and one that is specified towards SAFe is the one created by Stojanov et al. [50]. The authors suggest that there was no structured roadmap for SAFe and therefore created the SAFe MM to fulfill that role. They did this by taking existing agile maturity models as a basis and then developed the model in close collaboration with industry experts through a Delphi study. The model is composed of five categories, Embrace change to deliver customer value, Plan and deliver software frequently, Human Centricity, Technical excellence, and Customer collaboration. There were five levels with varying amounts of items within each category. For example, level one in technical excellence includes Acceptance testing, Task volunteering, Knowledge sharing, and Coding standards. To evaluate the maturity, they conducted a two-hour-long interview with one scrum master and one release train engineer, which is described as a “chief scrum master.” The industry experts found the model to have practical merit and were easy to understand and use but had reservations about its necessity when it was boiled down into a single variable, agile maturity. They were worried that organizations would be too occupied chasing the maturity levels instead of enhancing team performance and cooperation, which led to the researchers having a score for each practice. 19 2. Theory 20 3 Method This chapter presents the research strategy used to answer the research questions of the study, how the data was collected, how the data analysis was performed, and how research ethics was taken into consideration. 3.1 Research strategy According to Bell et al. [51], the two categories in a research strategy are qualita- tive and quantitative. This study uses a combination of qualitative and quantitative research, where the qualitative research was done when answering RQ1 and RQ3, and the quantitative research was done when answering RQ2. A literature review was performed to answer what parts of agility are covered in the 10-question survey (RQ1). The data points from the survey were analyzed quantitatively to answer if the question pairs reflect the maturity steps statistically (RQ2). The free-text answers were analyzed manually to identify what the participants think of the 10- question agility survey (RQ3). The study was conducted as a field study. Stol and Fitzgerald [52] describe it as the following: “Field study refers to any research conducted in a specific, real-world setting to study a specific software engineering phenomenon.” Field studies have low generalizability and a low level of obtrusiveness, which matches the approach of this study. There is low obtrusiveness in doing a literature review and analyzing already gathered data, and the study is inherently less generalizable since the data originates from one company. 3.2 Data collection The data was collected through a literature review, surveys from Volvo Cars and their results, and other internal documents from the company to answer the research questions. The details of the data collection methods are described in this section. 3.2.1 Literature review The literature review primarily focused on Waterfall, V-model, Agile, Scaled Agile, Agile Transformation, and Agile Maturity Models. Knowledge in these areas is es- sential to understand why a company decides to undergo an agile transformation, 21 3. Method how it can be done, and how to measure it. Data was collected from the following four sources to answer what parts of agility the 10-question survey can cover: 1) the agile values, 2) the agile principles, 3) agile definitions, and 4) the SAFe principles. Agile as it is known today started with the agile manifesto in 2001. Since the agile values and principles are part of that manifesto, it was natural to include them as agile sources. Agile has evolved since the release of the agile manifesto, and many new agile definitions have emerged [25]. Gathering some of these definitions is essential to capture the essence of how agile has evolved since the manifesto. Lastly, the SAFe principles are included because Volvo Cars made their agile transformation from the V-model to this large-scale agile framework. Including the principles provides insight into how well the survey questions map to their agile framework. The approach of collecting agile definitions was to gather enough of them to enable a meaningful mapping of the survey questions while at the same time not collecting more than what is manageable within the time frame of this study. The source of every identified definition was verified before getting added to the collection, and if the source could not be verified, it was not added. The literature was mainly collected through the Chalmers Library and Google Scholar databases. A technique called backward snowballing was used and is the process of going through the reference list to identify new papers, and from those reference lists, identify more papers [53]. It was used on relevant literature to iden- tify other publications on the subject. The keywords and phrases used to find relevant literature were: Waterfall, V-model, Agile, Scaled Agile, Scaled Agile Framework (SAFe), Agile Transformation, Agile Definition, Survey Design, Maturity Models, Confirmatory Factor Analysis and Ex- ploratory Factor Analysis. 3.2.2 Survey The data was collected through Volvo Cars, and they conducted the survey four times over two years, starting in 2019 and ending in 2020. The survey consisted of ten questions and a free-text answer, introduced with the text “Write your comment here.” 3.2.3 Documents PowerPoint slides related to the studies were accessed through Volvo Cars. These slides contained the purpose behind the surveys, a background to the survey ques- tions, and an explanation of the survey structure. 22 3. Method 3.3 Data analysis This section will present the methodology used to analyze the collected data. Firstly, a description of the mapping process for the ten questions to different agile sources is given. Following that, how the exploratory factor analysis (EFA) was implemented and used on the Likert scale data is explained. Lastly, the procedure of analyzing and classifying relevant results from the free-text answers is presented. 3.3.1 RQ1: Linking the survey questions to agile The data analysis workflow for linking the survey questions to agile is found in Figure 3.1. Find Agile sources Extract keywords Agile Values R1 Mapping R2 Mapping Discussion Agile Principles R1 Mapping R2 Mapping Discussion Agile Definitions R1 Mapping R2 Mapping Discussion SAFe Principles R1 Mapping R2 Mapping Discussion Figure 3.1: The data analysis workflow for RQ1. The first step in the analysis is done by extracting keywords from the survey ques- tions. These keywords were then used to map against the agile values, principles, definitions, and SAFe principles, and the authors did their mapping in parallel. When both had mapped one area, they discussed any differences between them. When deciding if an agile source mapped to the 10-question survey question, the questions and Volvo Cars’ maturity model levels acted as a foundation for the de- cision. There were two types of connections, one where keywords were found and the other comparing the question’s underlying intent with the agile value, principle, definitions, and SAFe principles. Marking the significant connections in the table was done with an X, and a parenthesis (X) is added where it was vague or the authors needed to interpret the question or object. 23 3. Method 3.3.2 RQ2: Model validation The data analysis workflow for the model validation is found in Figure 3.2. Groom the data Determine the number of factors Create the model Test the model Calculate means and spread Groom the data Determine the number of factors Figure 3.2: The data analysis workflow for RQ2. For the second research question, there are two validations that together validate the maturity model proposed by Volvo Cars. The first is to statistically validate if the questions in the survey describe the created agility maturity model steps correctly. The second is to determine if the organization matures as the model proposes. This section describes why exploratory factor analysis was used and how it was imple- mented. It also describes how calculations of means and spread verified that the organization matured as intended. First, the data was groomed before conducting the factor analysis. None of the survey questions were mandatory, which resulted in some data points being incom- plete. As the dataset is comprehensive, list-wise deletion, where incomplete data points are removed, could be done while still having enough data for the analysis. By dividing the dataset into each survey and conducting the analysis for each survey individually, interdependency in the data was avoided. Factor analysis is a “... set of statistical procedures designed to determine the num- ber of distinct constructs needed to account for the pattern of correlations among a set of measures.” [54]. There are three main types of factor analysis, Structural Equation Modeling (SEM), Exploratory factor analysis (EFA), and Confirmatory factor analysis (CFA). Exploratory factor analysis is used when there is no, or only a limited, idea of the underlying structure of the correlations [54, 55]. With this method, one starts by finding how many factors to extract, and there are several ways to do this, which are described further down. When one knows how many factors to use, one runs an EFA on the data with the decided number of factors. The higher the corre- lation, which is called loading, is of multiple items on a single factor, the higher the chance is that they measure the same thing [54]. EFA does not tell what the factors are, that is left to the researchers to decide based on intuition and theory [54]. 24 3. Method Confirmatory factor analysis (CFA), in contrast to EFA, is used when the researcher has a good idea, based on previous research and theory, of how the correlations are structured and what factors lie behind them [56]. Some of the use cases for CFA are psychometric evaluation of measures, construct validation, testing method effects, and testing measurement invariance [56]. Since the aim is to validate the existing model and not find a new one, CFA analysis would best fit this analysis based on the information presented above. The problem is that a prerequisite for using factor analysis is that the model should be overde- termined, which means each factor has multiple measures strongly loaded on it [56]. With the presented model from Volvo Cars, there are only two questions (measures) for each area (factor), and as such, CFA cannot be used. Therefore EFA is used to look statistically at how many factors the data support and how they group up. The program of choice for conducting the factor analysis was JASP, and this is since it has an easy-to-use interface and is used in research. The data was split into each measurement to remove interdependence, and the analysis was run separately for each. There are several methods to determine the number of factors, such as the Kaiser criterion, scree plot, and parallel analysis. The Kaiser criterion is to look at the eigenvalues for the correlation matrix and then choose as many factors as there are eigenvalues above one [57]. A second way is to use the scree plot [58]. Similar to the Kaiser criterion, the scree plot uses the eigenvalues of the correlation matrix, but this time they are plotted from largest to the smallest. Then the number of factors is chosen based on where the eigenvalues start to level off, forming a scree/elbow. Combining the Kaiser test and scree plot can be done by creating a horizontal line along eigenvalue one in the scree plot [58]. The third way is to use parallel analy- sis [59]. In this method, the eigenvalues from the correlation matrix are compared with the eigenvalues generated from simulated randomized data. The number of eigenvalues from the original correlation matrix above the ones produced by the simulated data is how many factors to retain [59]. The Kaiser test has been found to overestimate the number of factors, while the scree plot has been criticized for being too subjective [60, 61]. On the other hand, parallel analysis has been proven to have a consistent result and is what was used in this analysis [60, 62]. The two common methods for running the estimation are the Maximum-Likelihood (ML) and Principal Axis (PA). As long as the data is normally distributed, ML works well and has a more formal statistical foundation, but as nonnormality in- creases, it tends to overestimate [63, 64, 65]. If the data violates normality, Fabrigar et al. recommend the use of a principal factors procedure, such as principal axis factoring [63]. In the case of our data, it cannot with certainty be called normal when looking at the distribution and density plots which are included in Appendix A. Therefore Principal Axis factoring was used as the estimation method. A rotation is applied to make the estimations easier to interpret. There are two 25 3. Method types of rotations, orthogonal and oblique. Orthogonal rotation forces the factors to be uncorrelated [63]. All data in this survey surround the topic of agile and agile transformation. Therefore, it cannot be expected that the underlying factors are independent of each other. Thus the choice of rotation for this study was oblique, and the method Promax. For the second part, three plots were created with the help of the R language. Firstly the mean and spread were plotted for each survey, question by question, revealing whether the organization improved between each measurement. Secondly, by plotting the result for question pairs with a confidence interval for each measuring point, the maturity levels can be evaluated, ensuring that level one increased before level two, etc. Lastly, a plot was created like the previous one, although each question was plotted separately, revealing differences inside the question pairs. 3.3.3 RQ3: Free-text analysis The data analysis workflow for the free-text analyzation is found in Figure 3.3. Define and name themes Review themes Groom the data Familiarize with the data R1 Generate initial codes R1 Searching for themes R1 Review themes coherence Verify themes against data Refine theme names Ensure themes do not overlap Explore potential sub-themes Familiarize with the data R2 Generate initial codes R2 Searching for themes R2 Figure 3.3: The data analysis workflow for RQ3. The free-text answers were analyzed to answer the third research question, what do the participants think of the 10-question agility survey. Comparisons on the an- swers from the different measures were looked at to see changes in attitudes from the employees. If there are any changes in attitude, the free-text answers will be further analyzed to understand the reasons behind this. The free-text answers were systematically coded with thematic analysis. Braun and Clark [66] define thematic analysis as “a method for identifying, analysing and re- porting patterns (themes) within data”, and the first five of their six phases were followed during the analysis. The sixth phase, producing the report, was not in- cluded as it merely contained guidelines on how to present the data. The removal of duplicate and empty free-text answers was made before the first phase to reduce the number of data points to be analyzed. As described by Braun and Clark [66], the first phase was to get familiar with the dataset, which meant actively reading through the free-text answers to gather an un- 26 3. Method derstanding and finding patterns while taking notes. Since the dataset is extensive, the free-text answers related to the survey got marked so they could be extracted to a separate dataset. This new dataset was then used for the rest of the steps. When a basic understanding of the data has been established and an initial list of interesting insights exists, phase two can begin [66]. Further, they elaborate that in phase two, coding the essential features for each answer gives the basis for the analysis [66]. They also explain that codes are words or short sentences that summarize a piece of text. See Figure 3.1 for an example from Saad et al. [6] in their study of UX work in software startups. The coding can be done either with a theme in mind or wholly based on the data. In our case, the theme is their view on how Volvo Cars measured agility. Each answer was thoroughly read through, and codes were noted down in a document. Original text Codes human-centered design is not fit for small compa- nies because of lack of resources. human-centered design is not fit for small compa- nies, lack of resources What discount usability and UX methods are promising for start-ups and Small to Medium Enterprises? usability, UX methods [about customer research] what to do if feedback is neg- ative? How to get the information you need, inexpen- sively and quickly? How much validation is suffi- cient? if feedback is nega- tive, inexpensively and quickly, how much vali- dation Table 3.1: Example of thematic analysis codes from Saad et al. [6]. After the data were fully coded, phase three began. During the phase, the codes were grouped based on similarities that created patterns [66]. When a list of candi- date themes had been established, those were to be refined, and this is what Braun and Clark [66] call phase four. They divide the phase into two levels of refine- ment: Level one is where the codes inside a theme are inspected, identifying if they coherently support the theme. Level two is similar, but the level of abstraction is increased to looking at the themes compared to the dataset instead of the codes [66]. During this phase, if a theme had too few data points supporting the theme or if the data were too diverse, the theme was either removed altogether or merged with another, creating a more high-level theme. If there was support for it, some themes were broken down further to smaller ones. During level 2, the themes and their relation to the data were inspected by reading through the dataset. A few new codes can also be generated during this read-through if they have been missed in the earlier stages [66]. The result of this phase was a more defined thematic map. The last phase is to refine the names to capture the essence of what the data de- scribes [66]. Going through the data for each theme and figuring out what data is 27 3. Method interesting and why, ensures that the themes align with the research questions and do not overlap too much [66]. The relations between the themes are therefore crucial to consider in this phase. During this phase, exploring potential sub-themes that emerged gave an easier understanding of complex themes [66]. This phase resulted in a more defined thematic map and clearly defined themes. 3.4 Research ethics The only data that was kept and analyzed were the data points, the free-text an- swers, and information on which survey the data belonged. All other data was dis- regarded to keep the participants as anonymous as possible. To further strengthen the anonymity, no data point was analyzed in isolation and presented in the study, and no free-text answer was quoted. The high number of data points, in combina- tion with the data anonymization steps above, makes it hard, if not impossible, to identify the participants of the study. 28 3. Method 29 4 Results This chapter presents the results from our data collection and the analysis of the different research questions. Firstly, the surveys and internal documentation from Volvo Cars are presented. In RQ1, tables are provided showing how the questions relate to literature. In RQ2, plots and tables show if there is statistical support for the model. Lastly, the RQ3 section includes our findings from conducting the thematic analysis. 4.1 Results of the data collection Looking at the data obtained from Volvo Cars, the questions used in their 10- question surveys were the following: 1 I understand why my team needs to move towards Agile ways of working. 2 I understand why Volvo Cars Product Creation is moving towards agile ways of working. 3 In my team, we regularly reflect and learn to improve. 4 Leaders around my team support our improvement efforts. 5 In my team, we understand how to move towards agile ways of working. 6 I have enough knowledge about agile ways of working to do my job. 7 In my team, we are transparent with our real-time progress and status. 8 I can find real-time information needed for my work. 9 In my team we pull from a prioritized backlog according to our capacity. 10 Leaders around my team avoid pushing work over our capacity. Table 4.1: Survey questions used by Volvo Cars to measure agility. One question was refined, and the order of the questions was changed between the first and second surveys. In Q2, the word “P&Q” (Product Creation and Quality) was replaced with “Volvo Cars Product Creation” as P&Q was changed after a re- organization in the company. Q9 and Q10 were moved to Q3 and Q4, pushing the questions two steps, making Q3-Q8 becoming Q5-Q10. The questions were changed so that the results from the first survey showed a staircase maturity model. Each question was measured with Likert scales of 1-6, ranging from strongly agree to strongly disagree, and there was one alternative when none of the levels were applicable. Shown below are the different response alternatives: 1. Strongly disagree 30 4. Results 2. Disagree 3. Somewhat disagree 4. Somewhat agree 5. Agree 6. Strongly agree • Not applicable to me The survey was designed according to Volvo Cars’ maturity model, where every pair of two questions represents one level in a maturity staircase. This design resulted in five maturity levels, and the backgrounds of the levels are shown below: • (Q1, Q2) Why go agile – To really embrace agile we need to understand why it is important for us and understand what it means in our team. • (Q3, Q4) Continuous learning – We need to reflect and continuously improve, and we need management to enable such efforts. • (Q5, Q6) Prerequisites to go agile – For us to be able to move to agile we need to have knowledge and know how to change. • (Q7, Q8) Real-time transparency – Agile ways of working are dependent on having data available in real- time, and we also need to share our own data in the same way for others to see. • (Q9, Q10) Leveraging agile – Obtaining empowered teams implies we pull tasks from a prioritizes back- log and that no one else in the organization pushes us over our capacity. The survey data consisted of 14,580 data points in total. Table 4.2 shows the number of data points in each survey and how much data remained after a list-wise deletion. Table 4.3 shows the number of free-text answers (including empty answers) and how many remained after the list-wise deletion and duplicate deletion. After the data grooming process, the remaining free-text answers were much less than the remaining data points. The average percentage of remaining free-text answers was 26.0%, compared to 88.5% of the data points. Survey Data points originally Data points after list-wise deletion % of remaining data points #1 3,107 2,711 87.3% #2 3,767 3,261 86.6% #3 4,046 3,671 90.7% #4 3,660 3,261 89.1% #1-4 14,580 12,905 88.5% Table 4.2: The number of data points originally and after list-wise deletion. 31 4. Results Survey Free-text answers originally Free-text answers after list-wise deletion Free-text answers after duplicate deletion % of remaining free-text answers #1 3,107 602 598 19.2% #2 3,767 774 759 20.1% #3 4,046 1,465 1,423 35.2% #4 3,660 1,023 1,014 27.7% #1-4 14,580 3,864 3,794 26.0% Table 4.3: The number of free-text answers originally, after list-wise deletion, and after duplicate deletion. 4.2 Analysis In this section, the research questions will be analyzed, based on the results, sepa- rately in their own subcategory. 4.2.1 RQ1: What parts of agility are covered in the 10- question survey? The extracted keywords from the survey questions are shown below in Table 4.4. 32 4. Results Question Keywords I understand why my team needs to move to- wards agile ways of working. understand why, my team, move towards agile I understand why Volvo Cars Product Cre- ation is moving towards agile ways of working. understand why, Product Creation, moving towards agile In my team, we regularly reflect and learn to improve. regularly reflect, learn to im- prove Leaders around my team support our improve- ment efforts. Leaders, support, improve- ment efforts In my team, we understand how to move to- wards agile ways of working. understand how, move to- wards agile I have enough knowledge about agile ways of working to do my job. enough knowledge about ag- ile In my team, we are transparent with our real- time progress and status. transparent, real-time progress and status I can find real-time information needed for my work. find real-time information In my team we pull from a prioritized backlog according to our capacity. pull prioritized backlog, ac- cording to our capacity Leaders around my team avoid pushing work over our capacity. Leaders, avoid pushing, over our capacity Table 4.4: The keywords extracted from the survey questions. The rest of this section presents the extracted keywords from the survey questions and the results from the mapping process. Table 4.5 is empty since no connections between the questions asked in the survey and the agile values could be found. Agile values Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Individuals and interactions over processes and tools. Working software over compre- hensive documentation. Customer collaboration over contract negotiation. Responding to change over fol- lowing a plan. Table 4.5: Mapping of the agile values (Table 2.1) to the survey questions (Table 4.1). Looking instead at Table 4.6, one can see that some connections could be made here. Out of the connections, two could clearly be seen, Q3 towards principle 12 and Q4 towards principle five. Q4 also connected towards principle 12 but vaguely. Lastly, Q9 and Q10 vaguely mapped towards principle eight. 33 4. Results Agile principle Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Our highest priority is to satisfy the customer through early and contin- uous delivery of valuable software. Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage. Deliver working software fre- quently, from a couple of weeks to a couple of months, with a prefer- ence for the shorter timescale. Business people and developers must work together daily through- out the project. Build projects around motivated individuals. Give them the envi- ronment and support they need, and trust them to get the job done. X The most efficient and effective method of conveying information to and - within a development team is face-to-face conversation. Working software is the primary measure of progress. Agile processes promote sustain- able development. The sponsors, developers, and users should be able to maintain a constant pace in- definitely. (X) (X) Continuous attention to technical excellence and good design en- hances agility. Simplicity–the art of maximizing the amount of work not done–is es- sential. The best architectures, require- ments, and designs emerge from self-organizing teams. At regular intervals, the team re- flects on how to become more ef- fective, then tunes and adjusts its behavior accordingly. X (X) Table 4.6: Mapping of the agile principles (Table 2.2) to the survey questions (Table 4.1). 34 4. Results In mapping the questions towards agile definitions, no strong connections could be made but a few vaguer ones, which can be seen in Table 4.7. Both Q3 and Q4 mapped towards the definition of Schuh et al. [30]. Q3 also mapped to both Subramaniam and Hunt [32] and Denning et al. [35] definitions. Agile definition Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Cockburn 2001 [26] Highsmith 2002 [27] Anderson 2003 [28] Larman 2004 [29] Schuh 2004 [30] (X) (X) Lyytinen 2006 [31] Subramaniam and Hunt 2006 [32] (X) Ambler 2007 [33] Kruchten 2013 [34] Denning 2016 [35] (X) Gren et al. 2020 [36] Table 4.7: Mapping of the agile definitions (Table 2.3) to the survey questions (Table 4.1). Lastly, four questions mapped towards the SAFe principles [5]. One can see in Table 4.8 that Q3 mapped strongly with principle four while Q7, Q9, and Q10 mapped more vaguely with principle six. SAFe principle Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Take an economic view. Apply systems thinking. Assume variability; preserve op- tions. Build incrementally with fast, inte- grated learning cycles. X Base milestones on objective evalu- ation of working systems. Visualize and limit WIP, reduce batch sizes, and manage queue lengths. (X) (X) (X) Apply cadence, synchronize with cross-domain planning. Unlock the intrinsic motivation of knowledge workers. Decentralize decision-making. Organize around value. Table 4.8: Mapping of the SAFe principles (Table 2.4) to the survey questions (Table 4.1). 35 4. Results A summary of which agile sources that did not map to any of the survey questions is shown in Table 4.9. Agile source Did not map to any survey question Statistics Agile values 1-4 4 of 4 (100%) did not map to any survey question Agile principles 1-4, 6-7, 9-11 9 of 12 (75%) did not map to any survey question Agile definitions 1-4, 6, 8-9, 11 8 of 11 (72.7%) did not map to any survey question SAFe principles 1-3, 5, 7-10 8 of 10 (80%) did not map to any survey question Table 4.9: The agile sources that did not map to any of the survey questions. 4.2.2 RQ2: Are the question pairs reflecting the maturity steps statistically? In this section, the results of RQ2 will be presented. It will be presented with graphs such as a scree plot and text describing the outcome. Firstly the exploratory factor analysis was conducted. The parallel analysis results, as can be seen in Figure 4.1, support four factors for measurement one, five fac- tors for measurement two, five factors for measurement three, and five factors for measurement four. 36 4. Results 0 1 2 3 4 2.5 5.0 7.5 10.0 Factor E ig en va lu e Data Simulated data from parallel analysis (a) First Measurement 0 1 2 3 4 2.5 5.0 7.5 10.0 Factor E ig en va lu e Data Simulated data from parallel analysis (b) Second Measurement 0 1 2 3 4 2.5 5.0 7.5 10.0 Factor E ig en va lu e Data Simulated data from parallel analysis (c) Third Measurement 0 1 2 3 4 2.5 5.0 7.5 10.0 Factor E ig en va lu e Data Simulated data from parallel analysis (d) Fourth Measurement Figure 4.1: Scree plots from the different measurement points paired with parallel analysis used to determine the numbers of statistically supported factors. The result from the factor correlation matrix, see Table 4.10, 4.11, 4.12, and 4.13, reveals that our assumption that the factors are correlated was correct, and thus the decision to use oblique rotation is statistically supported. The correlations range from 0.304 up to 0.803 where most lie around 0.6. Factor 1 Factor 2 Factor 3 Factor 4 Factor 1 1.000 0.381 0.687 0.684 Factor 2 0.381 1.000 0.558 0.342 Factor 3 0.687 0.558 1.000 0.495 Factor 4 0.684 0.342 0.495 1.000 Table 4.10: Factor correlations measurement one. 37 4. Results Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 1 1.000 0.531 0.663 0.451 0.304 Factor 2 0.531 1.000 0.796 0.683 0.618 Factor 3 0.663 0.796 1.000 0.580 0.506 Factor 4 0.451 0.683 0.580 1.000 0.540 Factor 5 0.304 0.618 0.506 0.540 1.000 Table 4.11: Factor correlations measurement two. Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 1 1.000 0.432 0.641 0.507 0.376 Factor 2 0.432 1.000 0.759 0.774 0.672 Factor 3 0.641 0.759 1.000 0.791 0.609 Factor 4 0.507 0.774 0.791 1.000 0.679 Factor 5 0.376 0.672 0.609 0.679 1.000 Table 4.12: Factor correlations measurement three. Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 1 1.000 0.470 0.626 0.332 0.417 Factor 2 0.470 1.000 0.803 0.683 0.681 Factor 3 0.626 0.803 1.000 0.616 0.632 Factor 4 0.332 0.683 0.616 1.000 0.609 Factor 5 0.417 0.681 0.632 0.609 1.000 Table 4.13: Factor correlations measurement four. When looking at the factor loadings table, since an oblique rotation was used, it is important to understand that 0.9 does not say that it relates 90% to that factor. Rather it should be seen as the standardized unit of increase in the measured vari- able for each standardized unit of increase in the common factor [54]. A cutoff at 0.4 was chosen for the rotated factor loadings to simplify the interpretation of the factor analysis table. Different sources claim that different values should be used, from 0.3 to 0.7, but 0.4 seems to be the most common [67]. In Table 4.14 the factor loading table for measurement one is presented. This first table is the only one that only supports four factors and puts Q3, Q4, Q7, and Q8 in the same factor. What also is shown in the table is that Q9 lacks enough support in any of the factors. Q1 and Q2 are grouped in factor two, Q5 and Q6 are grouped in factor three, and Q10 is alone in factor four. 38 4. Results Factor 1 Factor 2 Factor 3 Factor 4 Uniqueness q1 0.875 0.127 q2 0.829 0.273 q3 0.823 0.433 q4 0.618 0.484 q5 0.592 0.360 q6 0.523 0.589 q7 0.640 0.493 q8 0.432 0.614 q9 0.537 q10 0.885 0.309 Table 4.14: Factor loadings measurement one. In measurement two, the factor loadings table, see Table 4.15, shows that Q1 and Q2 highly loaded onto factor one. Q3, Q7, and Q8 load onto factor two, while Q5 alone loads onto factor three. Q6 does not load properly onto any of the factors. Q9 and Q10 nicely load onto factor four, and lastly, Q4 loads onto factor five. Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Uniqueness q1 0.937 0.169 q2 0.988 0.178 q3 0.541 0.402 q4 0.775 0.176 q5 1.108 0.028 q6 0.591 q7 0.995 0.361 q8 0.493 0.545 q9 0.422 0.430 q10 0.810 0.388 Table 4.15: Factor loadings measurement two. Looking at the third measurement, the table, see Table 4.16, creates a diagonal line with Q1 and Q2 once again mapping strongly against their own factor, factor one. Q3 and Q4 load onto factor two, while Q5 once again has its own factor. Both Q6 and Q9 are not loaded enough onto anything to be represented in the table. Q7 and Q8 pair together into factor two. Lastly, Q10 sits alone in factor five. 39 4. Results Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Uniqueness q1 0.904 0.185 q2 0.990 0.136 q3 1.066 0.126 q4 0.475 0.405 q5 1.091 0.106 q6 0.570 q7 0.528 0.425 q8 1.020 0.290 q9 0.468 q10 0.990 0.188 Table 4.16: Factor loadings measurement three. In the last measurement the factor loadings table, see Table 4.17, {Q1, Q2}, {Q3, Q4} and {Q7, Q8} forms one factor each. Q6 and Q9 once again do not have enough loading onto any factor to show in the table, while Q5 and Q10 are alone to load onto their factors. Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Uniqueness q1 0.870 0.218 q2 0.962 0.202 q3 0.411 0.387 q4 0.800 0.230 q5 0.991 0.171 q6 0.657 q7 1.013 0.308 q8 0.528 0.558 q9 0.474 q10 0.887 0.300 Table 4.17: Factor loadings measurement four. When using an oblique rotation, a structure matrix is provided. Analyzation of these matrixes is generally not done in research, and therefore, due to time constraints, no deeper examination of them was performed [54]. Although they are included in Appendix B. Figure 4.2, with interval plots of Q1 and Q2, shows that the results for survey two and three are significant compared to their previous survey. 40 4. Results 4.4 4.5 4.6 4.7 4.8 4.9 5.0 1 2 3 4 survey q1 (a) Interval plot on Q1. 4.5 4.6 4.7 4.8 4.9 5.0 1 2 3 4 survey q2 (b) Interval plot on Q2. Figure 4.2: Interval plots on Q1 and Q2 across the four surveys. Figure 4.3, with interval plots of Q3 and Q4, shows that the results for survey two and three are significant compared to their previous survey. It is noted that the pattern in every question is quite similar, with the exception of Q3 in survey two. 4.4 4.5 4.6 4.7 4.8 4.9 1 2 3 4 survey q3 (a) Interval plot on Q3. 4.4 4.5 4.6 4.7 4.8 1 2 3 4 survey q4 (b) Interval plot on Q4. Figure 4.3: Interval plots on Q3 and Q4 across the four surveys. Figure 4.4, with interval plots of Q5 and Q6, shows that the results for survey three are significant compared to their previous survey. The second survey was only significant compared to its previous survey for Q6, not for Q5. 41 4. Results 4.1 4.2 4.3 4.4 4.5 4.6 1 2 3 4 survey q5 (a) Interval plot on Q5. 4.4 4.5 4.6 4.7 4.8 4.9 1 2 3 4 survey q6 (b) Interval plot on Q6. Figure 4.4: Interval plots on Q5 and Q6 across the four surveys. Figure 4.5, with interval plots of Q7 and Q8, shows that the results for survey two and three are significant compared to their previous survey. The fourth survey was significant compared to its previous survey for Q8, not for Q7. 4.6 4.7 4.8 4.9 1 2 3 4 survey q7 (a) Interval plot on Q7. 3.8 3.9 4.0 4.1 4.2 4.3 4.4 1 2 3 4 survey q8 (b) Interval plot on Q8. Figure 4.5: Interval plots on Q7 and Q8 across the four surveys. Figure 4.6, with interval plots of Q9 and Q10, shows that the results for survey two, three, and four is significant compared to their previous survey in Q9. In Q10, only the third survey showed a significant result. 42 4. Results 3.9 4.0 4.1 4.2 4.3 4.4 4.5 1 2 3 4 survey q9 (a) Interval plot on Q9. 3.9 4.0 4.1 4.2 4.3 4.4 1 2 3 4 survey q1 0 (b) Interval plot on Q10. Figure 4.6: Interval plots on Q9 and Q10 across the four surveys. Grouping the questions as intended in the proposed model and calculating the mean and 95% confidence interval will reveal if the organization matured as intended. Looking at Figure 4.7 and the zoomed in version 4.8, the question groupings gener- ally mature quite well according to the model. In survey one, there is no significant difference between Q1-Q2 and Q3-Q4, while in survey two, both Q1-Q2 and Q3-Q4, as well as Q5-Q6 and Q7-Q8, are on the same level. Looking at the numbers which can be found in Appendix C and Table C.1, the difference between them is negligible. 43 4. Results 2 4 6 Q1−Q2 Q3−Q4 Q5−Q6 Q7−Q8 Q9−Q10 Question M ea n 1 2 3 4 survey Figure 4.7: Combined Interval plot of the question groupings. 3.5 4.0 4.5 5.0 5.5 Q1−Q2 Q3−Q4 Q5−Q6 Q7−Q8 Q9−Q10 Question M ea n 1 2 3 4 survey Figure 4.8: Zoomed in version combined Interval plot of the question groupings. 44 4. Results Figure 4.9 and the zoomed in version 4.10 can tell us a bit about the development of each question. Overall, Q8, Q9, and Q10 continuously have a lower maturity than the other questions. Q6 and Q7 are usually statistically significantly higher than Q4 and Q5. Q5 has a low mean compared to the surrounding questions throughout all the surveys. Another remark is that the question pairs Q7 and Q8 have the highest difference between the pairs. From the first survey to the second, Q3 has leaped and became the top-scoring question. From surveys two to three, Q3 finds itself closer to its question partner Q4. 2 4 6 Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 Q 7 Q 8 Q 9 Q 10 Question M ea n 1 2 3 4 survey Figure 4.9: Combined Interval plot of the survey questions. 45 4. Results 3.5 4.0 4.5 5.0 5.5 Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 Q 7 Q 8 Q 9 Q 10 Question M ea n 1 2 3 4 survey Figure 4.10: Zoomed in version of the combined Interval plot of the survey questions. 4.2.3 RQ3: What do the participants think of the 10-question agility survey? This section presents the results of RQ3. The themes and sub-themes will be ex- plained, and example sentences are included to clarify what types of questions were assigned to the sub-themes. The number of total answers (displayed in Table 4.3), and how many answers were extracted from them, are shown in Table 4.18. It also shows the distribution of total and extracted answers between the four surveys. For the total answers, survey one has the least answers with 598 (15.8%), and survey three has the most answers with 1,423 (37.5%). For the extracted answers, survey one has the least answers with 48 (15.0%), and survey two has the most answers with 152 (47.6%). Survey Total answers % of total answers Extracted answers % of extracted answers #1 598 15.8% 48 15.0% #2 759 20.0% 152 47.6% #3 1,423 37.5% 67 21.0% #4 1,014 26.7% 52 16.3% #1-4 3,794 100% 319 100% Table 4.18: The distribution of answers between the four surveys. Table 4.19 shows the percentages of answers that were extracted from the total answers. The last row in the table is the combined results of all four surveys and provides a baseline to see how each survey differs from the average results. The 46 4. Results strongest deviations are found in survey two (20.0% extracted answers) and survey three (4.7% extracted answers), compared to the average of 8.4% extracted answers. Survey Total answers Extracted answers Extracted / total answers #1 598 48 8.0% #2 759 152 20.0% #3 1,423 67 4.7% #4 1,014 52 5.1% #1-4 3,794 319 8.4% Table 4.19: The percentages of how many answers were extracted from the total. Below are five thematic maps that show what the participants think of the 10- question agility survey. One thematic map is based on all the free-text answers, and the other four are based on the free-text answers from each survey. The thematic map based on all free-text answers (Figure 4.11), with a total of 319 answers, has 43 positive answers, 125 neutral/unclear answers, and 151 negative answers. The largest sub-theme is wrong/missing questions, with 47.3% of the extracted answers relating to it. Neutral/ unclear (125) Negative (151) Positive (43) Tracking progress (13) No reason given (12) Good number of questions (3) Good questions (11) Possibility to share ideas (3) Other agenda (4) Not anonymous (4) Don't see the purpose (8) Bad design (24) Bad release timing (4) Suggests improvements (5) Don't understand (20) Not applicable (28) Wrong/missing questions  (51 / 100) Hard to answer (15 / 6) Misc. (1 / 4 / 5) Figure 4.11: Thematic map based on all the surveys. The thematic map based on the first survey’s free-text answer (Figure 4.12), with a total of 48 answers, has three positive answers, 18 neutral/unclear answers, and 27 negative answers. The sub-themes that had no answers related to them were: good number of questions, possibility to share ideas, bad release timing, and not anonymous. 47 4. Results Neutral/ unclear (18) Negative (27) Positive (3) Tracking progress (1) No reason given (1) Good number of questions (0) Good questions (1) Possibility to share ideas (0) Other agenda (1) Not anonymous (0) Don't see the purpose (1) Bad design (2) Bad release timing (0) Suggests improvements (1) Don't understand (4) Not applicable (6) Wrong/missing questions  (4 / 19) Hard to answer (3 / 3) Misc. (0 / 0 / 1) Figure 4.12: Thematic map based on the first survey. The thematic map based on the second survey’s free-text answer (Figure 4.13), with a total of 152 answers, has 28 positive answers, 60 neutral/unclear answers, and 64 negative answers. The only sub-theme with no answers related to it was: suggests improvements. Neutral/ unclear (60) Negative (64) Positive (28) Tracking progress (8) No reason given (10) Good number of questions (1) Good questions (6) Possibility to share ideas (2) Other agenda (3) Not anonymous (2) Don't see the purpose (1) Bad design (10) Bad release timing (1) Suggests improvements (0) Don't understand (12) Not applicable (10) Wrong/missing questions  (32 / 43) Hard to answer (5 / 2) Misc. (1 / 0 / 3) Figure 4.13: Thematic map based on the second survey. The thematic map based on the third survey’s free-text answer (Figure 4.14), with a total of 67 answers, has eight positive answers, 25 neutral/unclear answers, and 34 negative answers. The sub-themes that had no answers related to them were: no reason given, possibility to share ideas, bad release timing, misc., not anonymous, and other agenda. 48 4. Results Neutral/ unclear (25) Negative (34) Positive (8) Tracking progress (3) No reason given (0) Good number of questions (2) Good questions (3) Possibility to share ideas (0) Other agenda (0) Not anonymous (0) Don't see the purpose (2) Bad design (6) Bad release timing (0) Suggests improvements (3) Don't understand (2) Not applicable (6) Wrong/missing questions  (10 / 26) Hard to answer (4 / 0) Misc. (0 / 0 / 0) Figure 4.14: Thematic map based on the third survey. The thematic map based on the fourth survey’s free-text answer (Figure 4.15), with a total of 52 answers, has four positive answers, 22 neutral/unclear answers, and 26 negative answers. The sub-themes that had no answers related to them were: good number of questions and other agenda. Neutral/ unclear (22) Negative (26) Positive (4) Tracking progress (1) No reason given (1) Good number of questions (0) Good questions (1) Possibility to share ideas (1) Other agenda (0) Not anonymous (2) Don't see the purpose (4) Bad design (6) Bad release timing (3) Suggests improvements (1) Don't understand (2) Not applicable (6) Wrong/missing questions  (5 / 12) Hard to answer (3 / 1) Misc. (0 / 2 / 1) Figure 4.15: Thematic map based on the fourth survey. A list with the details of all sub-themes is provided in Table 4.20. The table shows what theme the sub-themes belong to, how many extracted answers relate to the sub-themes, and an example sentence to understand what type of answers were included. The example sentences are made up as the free-text answers were under an NDA. No example sentences were provided to the Misc. sub-theme, as these free-text answers varied greatly and did not match any category. 49 4. Results Theme Sub-theme Extracted answers Example sentence Positive Tracking progress 13 It’s good to have a survey that tracks our agile progress. Positive No reason given 12 The survey is great. Positive Good questions 11 Good questions that are relevant. Positive Good number of questions 3 The survey is short and simple. Positive Possibility to share ideas 3 The survey is a good place to share our ideas. Neutral/ unclear Wrong/missing questions 51 I would like to see questions about X. Neutral/ unclear Not applicable 28 I’m part of team X, and the questions are not applicable to us. Neutral/ unclear Don’t understand 20 I don’t understand question X. What does the word Y mean? Neutral/ unclear Hard to answer 15 It’s hard to answer when it’s unclear what team I’m part of. Neutral/ unclear Suggests improvements 5 The survey could be better if you did X. Neutral/ unclear Bad release timing 4 Is this really the best time to release the survey? Negative Wrong/missing questions 100 The questions do not cover X. Why don’t you ask us about Y? Negative Bad design 24 I don’t see how this survey gathers any valuable information we can work with. Negative Don’t see the purpose 8 I don’t see the meaning of this survey. Negative Hard to answer 6 The questions are written so that I can’t answer them properly. Negative Not anonymous 3 It’s possible to identify me based on my answers. Negative Other agenda 3 The survey serves another purpose than measuring the agile transformation. Positive Misc. 1 - Neutral/ unclear Misc. 4 - Negative Misc. 5 - Table 4.20: Details of all the sub-themes. Below we have two tables that show the theme distribution among the total answers (Table 4.21) and the extracted answers (Table 4.22). As the number of total answers is much higher than the extracted answers, the percentages are considerably lower 50 4. Results when compared to the total answers versus the extracted answers. Looking at the theme distribution in the extracted answers, the data shows that, among the four surveys, the first survey is the most negative, and survey two is the most positive. The first survey has the lowest frequency of positive answers (6.2%) and the highest frequency of negative answers (56.2%). On the contrary, the second survey has the highest frequency of positive answers (18.4%) and the lowest frequency of negative answers (42.1%). This can be compared to the average of 13.5% positive answers and 47.3% negative comments among the extracted answers. Survey Total answers Theme: Positive Theme: Neutral/unclear Theme: Negative #1 598 3 (0.5%) 18 (3.0%) 27 (4.5%) #2 759 28 (3.7%) 60 (7.9%) 64 (8.4%) #3 1,423 8 (0.6%) 25 (1.8%) 34 (2.4%) #4 1,014 4 (0.4%) 22 (2.2%) 26 (2.6%) #1-4 3,794 43 (1.2%) 125 (3.3%) 151 (4.0%) Table 4.21: The theme distribution among the total answers. Survey Extracted answers Theme: Positive Theme: Neutral/unclear Theme: Negative #1 48 3 (6.2%) 18 (37.5%) 27 (56.2%) #2 152 28 (18.4%) 60 (39.5%) 64 (42.1%) #3 67 8 (11.9%) 25 (37.3%) 34 (50.7%) #4 52 4 (7.7%) 22 (42.3%) 26 (50.0%) #1-4 319 43 (13.5%) 125 (39.2%) 151 (47.3%) Table 4.22: The theme distribution among the extracted answers. 51 4. Results 52 5 Discussion This chapter presents discussions about the research questions, what contributions this study provides practitioners of agile transformation, what significance the study has, and the identified threats to validity. 5.1 Research questions In the following sections, the data analysis results will be discussed. 5.1.1 RQ1: Challenges with covering agile with a 10-question survey In general, the mapping did not give much support for connecting the questions to the literature. As shown in the result section (Table 4.5), the questions did not map towards the agile values. No keywords were found in the values, and making con- nections by looking at the essence of the questions and values was also troublesome and gave no result. The most significant obstacle was that the values always put something over another, for example “Individuals and interactions over processes and tools.” [4]. The survey questions answered if they were transparent, used a prioritized backlog, and worked according to their capacity, but never if they put it above something else. It could be argued since they include the questions “In my team, we regularly reflect and learn to improve.”, “Leaders around my team support our improvement efforts.” and “Leaders around my team avoid pushing work over our capacity.” they focus on individuals. However, making that connection was a bit of a stretch and filled with assumptions. When looking at the agile principles (AP), there were a few direct connections ({Q4, AP5}, {Q3, AP12}) and some that required a bit of interpretation ({Q9, Q10, AP8}, {Q4, AP12}). The first distinct connection was Q4 to “Build projects around mo- tivated individuals. Give them the environment and support they need, and trust them to get the job done.” and revolves around support. The question asks if “Lead- ers around my team supports our improvement efforts.” Even though the question only covers a portion of the principle, a connection was prevalent. Q3, “In my team, we regularly reflect and learn to improve.” maps strongly toward principle 12, “At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.” Both are about regularly reflecting and using it to improve the team. One could interpret that the question is about improving the 53 5. Discussion product since it does not specify what to reflect on and learn to improve on, but it was found highly unlikely. The mapping of Q9 and Q10 towards principle 8 is not as straightforward. The two questions are about capacity, not committing to much, and the leaders should not push work on the teams. Connecting them with sustain- able development and maintainable work pace does not explicitly match word for word but can be done by understanding the underlying intent. The last connection found was the one between Q4 and principle 12. To “...reflects on how to become more effective, then tunes and adjusts its behavior accordingly.” is interpreted as an improvement effort, and thus a connection could be found. This connection is vague because the question explicitly talks about leader support for the improvement ef- forts while the principal guides what the team should do. Thus, the essence of the two is different, and it is questionable if the connection should be made. Trying to reason why the questions mapped somewhat poorly toward the principles is more complex than the values. Some principles are relevant and concrete enough to be measured, but it might be that within the constraint of ten questions, they deemed other aspects more important. Figure 4.7 of the agile definitions shows that Q3 weakly connects to three agile defi- nitions and that Q4 weakly connects to one. Q4 connects to the fifth agile definition as the word empowering in “Building software by empowering and trusting people” can be seen as a support, as empowering someone requires an action to support the person in acquiring what they need. Q3 connects to the fifth and seventh agile definitions, as promoting constant feedback and using feedback to make constant adjustments can be seen as a way to learn to improve. Q3 is also connected to the tenth agile principle, as continuous improvement can be argued to relate to the key- word learn to improve. However, as Q3 talks about the team, it is only connected if the continuous improvement is about the person or team, not the software. The agile definitions vary considerably. Some are concise, others are broad, and some are vague, while others are more precise. Even if there is a substantial variety, three out of the four correlations were with the keyword learn to improve from Q3. This result could indicate that most questions do not capture the core of agile, as one could oth- erwise expect at least some correlation to any of the available definitions. However, that does not necessarily mean that the agile definitions do an excellent job captur- ing the core of agile. Some of the agile definitions are vague to the extent that it becomes hard to understand what they mean in practice, and as a result, it becomes especially tricky to measure. For example, how do you measure the agile definition “Iterative and incremental (evolutionary) approach to software development which is performed in a highly collaborative manner by self-organizing teams with “just enough” ceremony that produces high-quality software in a cost-effective and timely manner which meets the changing needs of its stakeholders” [33]? It raises the ques- tion if one can only be truly agile if all of the items in this definition are satisfied. If all the items are fulfilled flawlessly, except that there is “too much” or “too little” ceremony, does it mean that one is not agile? Worth noting is that even if this def- inition covers many important agile aspects, none of the survey questions correlated. Looking at Figure 4.8, there are only one strong and three weak correlations be- 54 5. Discussion tween the survey questions and the SAFe principles. Q3 correlated strongly with the fourth SAFe principle, “Build incrementally with fast, integrated learning cy- cles.” as integrated learning cycles can be clearly connected to the keyword learn to improve. One could also argue that learning cycles weakly connect to the key- word regularly reflect, as it is not unlikely that learning cycles include reflections, and cycles indicate that it is done regularly. Interestingly, the sixth SAFe principle, “Visualize and limit WIP, reduce batch sizes, and manage queue lengths,” weakly correlates to three different questions. The number of correlations indicates that the sixth SAFe principle is more connected with the survey questions than the fourth principle, even if the connections are weak. Q7 got a weak connection to the sixth principle as visualizing work can be argued to make it more transparent. Q9 and Q10 have a weak connection to the sixth principle, as limiting WIP can be seen as a way to take the team capacity into consideration, even if it does not explicitly say so. Though, it is not a surprise that the survey questions do not correlate strongly to the SAFe principles. The survey questions measure the agile transformation, and SAFe