The Effects of AI Assisted Programming
in Software Engineering

An Observation of GitHub Copilot in the Industry

Master’s thesis in Computer science and engineering

Johan Gottlander

Theodor Khademi

Department of Computer Science and Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
UNIVERSITY OF GOTHENBURG
Gothenburg, Sweden 2023


Master’s thesis 2023

The Effects of AI Assisted Programming in
Software Engineering

An Observation of GitHub Copilot in the Industry

Johan Gottlander
Theodor Khademi

Department of Computer Science and Engineering
Chalmers University of Technology

University of Gothenburg
Gothenburg, Sweden 2023


The Effects of AI Assisted Programming in Software Engineering
An Observation of GitHub Copilot in the Industry
Johan Gottlander
Theodor Khademi

© Johan Gottlander, 2023. © Theodor Khademi, 2023.

Supervisor: Robert Feldt, Department of Computer Science and Engineering
Examiner: Lucas Gren, Department of Computer Science and Engineering

Master’s Thesis 2023
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: An illustration of a software engineer pair-programming with an AI.

This image was generated using Tome, a generative AI tool that can create images.
The image was generated using the prompt “a developer writing code at a desk,
with a robot sitting next to him giving instructions”.

Typeset in LATEX
Gothenburg, Sweden 2023

iv

https://tome.app/


The Effects of AI Assisted Programming in Software Engineering
An Observation of GitHub Copilot in the Industry
Johan Gottlander
Theodor Khademi
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg

Abstract
The recent emergence of artificial intelligence (AI) learning algorithms has brought
generative AI (GAI) tools to the market. In software engineering, an example of
such a tool is GitHub Copilot (Copilot), which can generate code suggestions in
real-time and through natural language input.

In contrast to contemporary studies, this report attempts to fill a knowledge gap by
employing a qualitative study, gaining insights into professional software engineers’
opinions regarding GAI use in natural settings. While Copilot was the primary ref-
erence point, the study acknowledges the emergence of other GAI such as ChatGPT
which also fit within the scope of the thesis.

The study was initially designed to let engineers use Copilot in their work for two
weeks, followed by a semi-structured interview. However, hesitance from approached
companies to use Copilot in their code due to legal and privacy concerns led to an
alternative study design being used in tandem. Retaining the interview format and
questions, participants were instead shown a demo showcasing Copilot’s features.
In total, 13 professionals participated in the study.

Through thematic analysis, findings revealed that utilizing Copilot can increase
efficiency through auto-completion specifically. A lack of conversational capabilities
and disruptive elements of Copilot lead to hindrances in development and code
analysis. Furthermore, GAI tools allow engineers to focus on higher-level problems
and offer inspiration, enhancing end-product creativity. Engineers also emphasized
the retention of base knowledge to criticize GAI output. Finally, widespread GAI
integration can lower the profession’s entry barrier, and developer roles can shift to
take advantage of the enhancements the tools provide. It is still evident that there
are currently many concerns with the technology for trusted integration. Therefore,
efforts should be made to address these issues, which in turn can make studies in
natural settings more viable.

Keywords: generative AI, software engineering, field experiment, semi-structured
interviews, problem-solving, programmer efficiency, AI’s long-term effects

v


Acknowledgements
We would like to thank our supervisor Robert Feldt for the continuous help and
support during this project, both regarding the content of the thesis and the struc-
ture of the report. We also want to thank our examiner Lucas Gren for reviewing
the thesis and giving valuable feedback. We would also like to thank our opponents
Marcus Axelsson and Daniel Karlkvist for giving us feedback on the report. Finally,
we want to thank all interviewees who participated as this research could not have
been done without your valuable reflections and perspectives.

Johan Gottlander, Gothenburg, June 2023

Theodor Khademi, Gothenburg, June 2023

vii


Contents

List of Figures xi

List of Tables xiii

1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background and Related Work 5
2.1 Key Concepts of AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 A Brief History of AI . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Generative AI and ChatGPT . . . . . . . . . . . . . . . . . . . . . . 7
2.4 The Ethical Intricacies of AI . . . . . . . . . . . . . . . . . . . . . . . 8

2.4.1 Ethical Dilemmas of Copilot . . . . . . . . . . . . . . . . . . . 9
2.4.2 Addressing the Issues . . . . . . . . . . . . . . . . . . . . . . . 10

2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.1 Empirical Investigations of Copilot . . . . . . . . . . . . . . . 12
2.5.2 Programmers’ Experience using LLM-powered Tools . . . . . 13
2.5.3 Limits and Risk Factors Using Language Models . . . . . . . . 14
2.5.4 Field Experiments in Software Engineering . . . . . . . . . . . 14

3 Method 17
3.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Research Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.1 SA1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 SA2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.3 Creating the Demo . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.4 Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Results 25
4.1 Direct Effect on Daily Work . . . . . . . . . . . . . . . . . . . . . . . 28

4.1.1 Features Increasing Efficiency . . . . . . . . . . . . . . . . . . 28
4.1.2 Features Being a Hindrance . . . . . . . . . . . . . . . . . . . 30
4.1.3 New or Improved Features . . . . . . . . . . . . . . . . . . . . 31

ix


Contents

4.2 AI in the Problem-Solving Process . . . . . . . . . . . . . . . . . . . 32
4.2.1 Positive Effects . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.2 Negative Effects . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.3 Critical Thinking . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3 Prospects using AI in SE . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 AI in Education . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.2 Career Prospects as a Software Engineer . . . . . . . . . . . . 36
4.3.3 Worries and Questionings . . . . . . . . . . . . . . . . . . . . 39

5 Discussion 43
5.1 Hesitance of Participation . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Programming with Copilot . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2.1 Effect on Programmer’s Efficiency . . . . . . . . . . . . . . . . 44
5.2.2 Drawing Inspiration from Conversational AI . . . . . . . . . . 45

5.3 An Optimized Problem-Solving Process . . . . . . . . . . . . . . . . . 46
5.3.1 Higher-Level Problem Solving and the Effect on

Creativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3.2 Retain Base Knowledge and Critical Thinking . . . . . . . . . 47

5.4 Prospects of SE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4.1 A Saturated Industry . . . . . . . . . . . . . . . . . . . . . . . 48
5.4.2 Metamorphosis of the Software Engineer . . . . . . . . . . . . 49
5.4.3 Integration of Generative AI . . . . . . . . . . . . . . . . . . . 50

5.5 Differences in Responses of SA1 and SA2 . . . . . . . . . . . . . . . . 50
5.6 Implications for Research with GAI . . . . . . . . . . . . . . . . . . . 51
5.7 Implications for Practitioners . . . . . . . . . . . . . . . . . . . . . . 52
5.8 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.9 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6 Conclusion 57

Bibliography 59

A Appendix I
A.1 Survey questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
A.2 Interview questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . I

x


List of Figures

1.1 Example of the auto-completion feature in Copilot. The text in gray
is what Copilot is suggesting. . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Example of Copilot’s commenting feature. The user writes a comment
in natural language and presses enter to come to a new line. Copilot
then gives suggestions row by row or entire functions all at once. . . . 2

2.1 Simple example of code generation from ChatGPT. . . . . . . . . . . 8

3.1 Original version of the demo. . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 An example of a semantic differential scale question from the survey. 22
3.3 A section of the most used codes in the code distribution report pro-

vided by Atlas.ti. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

xi


List of Figures

xii


List of Tables

3.1 Demographic data of the participants. . . . . . . . . . . . . . . . . . 18

4.1 Showing an overview of the theme “Direct effect on daily work” and
its categories with example quotations. . . . . . . . . . . . . . . . . . 26

4.2 Showing an overview of the theme “AI in the problem-solving process”
and its categories with example quotations. . . . . . . . . . . . . . . . 27

4.3 Showing an overview of the theme “Prospects using AI in SE” and
its categories with example quotations. . . . . . . . . . . . . . . . . . 28

xiii


List of Tables

xiv


1
Introduction

Artificial intelligence (AI) has had a profound impact on humanity for nearly a cen-
tury [1] and due to recent advances in data collection and computer power [2], AI
implementation and research have seen a rapid increase in different industries on
a global scale including healthcare, human resources, finance, and many more [3].
While the main purpose of integrating AI use in these industries is to optimize the
efficiency of more laborious, computationally intensive tasks such as face recognition
and natural language processing [1], the emergence of AI in the world of creativity
and arts in recent years has also been seen through the utilization of deep- and rein-
forcement learning in large neural networks [4]. In the music world, programs such
as Amper, AIVA, and FlowComposer can generate completely new compositions of
music and help composers in their work through incremental suggestions [5]. Dall-E
2 is another example of an AI in the art world generating new unique art pieces, or
expanding upon existing ones. According to the AI research laboratory OpenAI who
created the tool, Dall-E 2 can “create realistic images and art from a description in
natural language” [6].

The world of software development has also received a similar tool as FlowComposer
and Dall-E 2 in the name of GitHub Copilot, hereinafter referred to as “Copilot”.
Copilot is an AI tool that the version-control hosting service GitHub announced in
June 2021 as an extension for the integrated development environment (IDE) Visual
Studio Code (VSCode) [7]. Copilot is powered by Codex, an AI model based on
deep learning that has been trained on existing source code on GitHub to translate
natural language to code [8]. Developed by OpenAI as well, Codex is a specialized
model tailored to computer code generation and is a descendant of the model Dall-E
2 employs. Copilot makes suggestions on code snippets, entire functions, and test
cases in real-time while the user is writing code [9] (see Figure 1.1). The user can
also ask Copilot to produce code suggestions by writing a comment in natural lan-
guage inside the codebase (see Figure 1.2).

Figure 1.1: Example of the auto-completion feature in Copilot. The text in gray
is what Copilot is suggesting.

1


1. Introduction

Figure 1.2: Example of Copilot’s commenting feature. The user writes a
comment in natural language and presses enter to come to a new line. Copilot

then gives suggestions row by row or entire functions all at once.

In the relatively short time it has been available to the public, research on Copilot
has predominantly focused on analyzing quantitative data in empirical studies (see
Section 2.5). The aim of these studies has been on analyzing the tool itself, for
example with what efficiency and quality it helps solve tasks, how understandable
the suggestions are to humans, and if the suggestions produce any bugs. In contrast
to these studies, this thesis aims to look at Copilot from a different perspective,
that being from the standpoint of the programmer itself. By observing the cognitive
processes of software engineers using this tool during normal work, the goal is to
analyze the effect Copilot has as a pair programmer on the profession itself. This
includes investigating the advantages and hindrances of using the tool in their work;
if the tool in any way can alter how a programmer thinks about solving tasks; and
how it can affect the prospects of the software engineering (SE) profession.

1.1 Problem Statement
As mentioned earlier, contemporary research on Copilot has primarily focused on
evaluating Copilot itself and analyzing quantitative data in empirical studies. For
example, one study evaluates the accuracy and quality of Copilot’s code suggestions
[10]. Another study analyzes the security risks that the code suggestions present
[11]. Finally, one study does analyze the effectiveness of having Copilot as a pair-
programmer versus a human pair-programmer [12], but the focus is again on the
effectiveness of the tool and not on the actual experience of using it or its implica-
tions on the profession as a whole.

The trending paradigm is that AI will become more and more integrated with human
life as assistive tools in our work [3]. As seen in the aforementioned studies, various
research has already been conducted to analyze Copilot as an assistive tool when
writing code. However, one aspect that is not being addressed in this research is
the cognitive effects that programmers experience while using it in their day-to-day
work. The general problem with this lack of research and knowledge gap is that
it could be hard to determine the potential long-term effects humans experience
through the continuous use of AI tools such as Copilot, and the potential risks for
the SE industry when integrating such tools. Could the use of AI limit creativity
in a programmer’s problem-solving process? Could programmers start to lack a

2


1. Introduction

sense of purpose as AI becomes more and more sophisticated at doing their tasks
for them?

1.2 Purpose

This study aims to observe and explore professional software engineers’ experience
and thoughts regarding Copilot. However, while Copilot is the primary focus and
starting point of discussion, it is natural that an extension of other tools will be
discussed as the study acknowledges the recent emergence of generative AI (GAI)
tools in the world. This will be done in two ways: (1) professionals will use the
tool in their day-to-day work in a real-world setting and (2) professionals will be
shown a demonstration of the tool in a contrived setting to familiarize themselves
with its features. In both cases, interviews will be held in purpose to investigate the
experience had by the professionals. In contrast to other contemporary studies on
Copilot, this study aims to gather qualitative data with the programmer in focus
and highlight potential opportunities, challenges, and risks of using the tool. This
study intends to benefit both practitioners and other researchers. It can provide
practitioners insight into how well the tool can affect their day-to-day work, rather
than only in a controlled laboratory setting. Furthermore, it can clarify the op-
portunities and risks of using the tool and other AI tools for programming. For
researchers, it can help construct a stronger knowledge base on the issue of GAI
tools utilized in natural settings and bring light to potential issues and/or elements
of strength the tool has for the industry, so that they can be researched further.

Three research questions (RQs) will be used in this study to mainly investigate Copi-
lot’s impact. As mentioned however, because of the recent developments of GAI, it
is natural that the findings will reflect on other tools as well. This can especially be
true for the third RQ which focuses on the broader/macro level. The first two focus
on the individual/micro level.

RQ 1: What features of the tool help programmers be more efficient in their work
and conversely what features hinder them? We want to investigate how much help
the tool can offer programmers and where it potentially could be improved.

RQ 2: To what extent can the tool affect programmers’ problem-solving process?
We want to see if the tool in any way shapes a different cognitive process for solving
the problems programmers face in their work compared to what they previously had.

RQ 3: Can the potential possibilities and/or hurdles introduced by the tool change
the prospects of software engineering as a profession? Here we want to evaluate the
results elicited from the previous research questions and investigate the tool’s effect
on the profession.

3


1. Introduction

1.3 Limitations
The intention of this study is to investigate the effects of Copilot on professional
software engineers. Students were also considered as candidates for participation
but ultimately not chosen as the study’s findings aim to answer questions more re-
lated to the effect on the industry.

The tool to be studied was narrowed down to Copilot, even though there exist other
AI tools for programming assistance, for example ChatGPT [13]. This is due to
the fact that while ChatGPT can offer code suggestions, it is not solely meant for
programming, and is more of a general-purpose tool focused on natural language
communication.

4


2
Background and Related Work

This chapter will provide context and background information for the thesis by
exploring different aspects of AI technology. First, a brief overview of key concepts
of AI will be presented. Next, the history of AI development, its ethical implications,
and the recent rise of other GAI will be explored. Finally, other work related to this
thesis will be presented to review relevant literature.

2.1 Key Concepts of AI
A brief overview of key concepts used in this thesis pertaining to AI technology
will be explained to clarify their significance and give readers a more comprehensive
knowledge surrounding the research topic.

Deep learning is a subset of machine learning and involves an artificial neural net-
work that tries to simulate human intelligence by “learning” from large sets of data.
Typically, these networks comprise of three or more layers. Due to its capacity for
learning from significant amounts of input, it is especially beneficial for tasks like
speech and image recognition [14]. In contrast to more traditional machine learning,
deep learning uses unsupervised data to learn.

Reinforcement learning concerns a learning paradigm of AI where the goal is to
optimize sequential decisions. Broadly, by applying a reinforcement learning algo-
rithm, AI can learn similarly to how humans learn by continuously receiving rewards
for finding the best strategy to navigate an environment, otherwise known as a pol-
icy [15].

Natural language processing (NLP) is another branch of machine learning. NLP
studies how to use natural language to connect computers and people by understand-
ing more complex aspects such as tone and context. NLP uses methods such as deep
learning to give computers the ability to comprehend, decipher, and create human
language. Sentiment analysis, chatbots, machine translation, and text summarizing
are just a few of its many useful applications [16].

Large language models (LLM) are AI algorithms with large sets of parame-
ters trained on huge datasets of text to generate text akin to humans, respond to
prompts, and accomplish other language-related activities with precision [17]. An
example of an LLM is GPT-3, having 175 billion parameters and is trained on 570

5


2. Background and Related Work

gigabytes of text [18].

Codex is an OpenAI-developed AI system and can produce code from inputs in
natural language. It creates the code that best satisfies the provided requirements
after using a deep learning model to comprehend the intended purpose of generating
the code. Codex might accelerate the software development process considerably,
which would boost developer productivity [8].

2.2 A Brief History of AI
AI as a concept was already being discussed in ancient Greek mythology where the
poet Homer talks about the God Hephaestus, who forged automated assistants made
of gold but appeared as young woman [19]. They were imbued with intelligence and
strength by the gods, making this one of the first mentions of intelligence artificially
made. Homer also wrote of mechanical assistants called “tripods”, waiting for the
gods at their dinner tables [20]. While AI has been a topic of fiction since then,
the first real “machine intelligence” seen was created by the English mathematician
Alan Turing during the second world war [21]. Turing created a machine that could
break the Enigma code used by the German Army. This led to Turing publishing
an article named “Computing Machinery and Intelligence”, in which he described
both how to create intelligent machines and most importantly, how to test them.
This test, which Turing initially calls the “Imitation Game” but is now called the
“Turing Test”, is meant to analyze a machine’s capacity to demonstrate intelligent
behavior comparable to, or indistinguishable from, that of a person [22]. Essentially,
the test consists of a human evaluator having two conversations, one with another
human and one with a machine. The evaluator is not able to see who they are
having the conversation with and it is limited to a text-only channel. The machine
passes the test if the evaluator could not distinguish the conversations between the
computer and the human [22], and is still used today to determine the intelligence
of an artificial system [21].

There were steady improvements to AI technology after the publishing of Turing’s
article. One example is in 1956 when Dartmouth College located in New Hampshire,
USA, hosted a research project on AI [21]. This project had a high significance as
participants included Nathaniel Rochester, who designed the IBM 701 which was
the world’s first scientific computer, and Claude Shannon, who created information
theory. The research on AI continued for almost two decades but saw some decline
in funding in the mid-70s. It was not until the early 80s that funding started in-
creasing again [21].

While AI continued to be researched and improved, the first major breakthrough
in modern AI came in 2015. Google presented a program called AlphaGo which
is a computer program that was able to beat the world champion in the board
game Go [21]. Notably, Go is far more complicated than chess, where at the start
of a game there are 361 potential moves compared to 20 in chess. It was long

6


2. Background and Related Work

thought that computers would never be able to outperform humans in this game.
Google, however, accomplished this by using deep learning for its AI. Since then,
the majority of AI has been developed using this technique, forming the basis for
algorithms pertaining to image and speech recognition, NLP, etc [21].

2.3 Generative AI and ChatGPT
In short, GAI refers to a machine learning technology that through human input can
automate the output (generation) of a large quantity of new content/media that is
seemingly plausible for a human to create themselves. This can be in many different
forms as presented earlier in the introduction, such as audio, images, text, etc [23].
Due to the increases in data volumes for training the models and improvements in
more advanced algorithms, computer processing power, and data storage [4], these
GAIs have in recent years garnered popularity not only in the academic and research
world but also on the commercial side due to their more sophisticated human-like
behavior [24]. GAI tools are also offering public and more accessible interfaces,
which in turn also leads to higher user bases being observed. One example is Chat-
GPT, reaching one million users within five days of its release in November 2022 [25].

ChatGPT was created by the same company (OpenAI) that powers the model for
Copilot and is an AI chatbot focusing on providing human-like conversation to its
users via text. It is capable of handling a large range of queries, including code
generation [13] (see Figure 2.1). As the name suggests, the AI model is built using a
natural language processing deep learning framework called Generative Pre-trained
Transformer (GPT), which takes in a substantial amount of data in both supervised
and unsupervised training to comprehend and output human-like conversation [26].
More specifically, ChatGPT uses a so-called GPT-3.5 model which was created by
OpenAI to enhance task completion and the ability to follow instructions compared
to other earlier models such as GPT-3, which powers the previously mentioned Dall-
E 2 [13]. As of March 2023, OpenAI has also added a feature to premium users of
ChatGPT to enable another model called GPT-4, which is said to further improve
on GPT-3.5 by enhancing creativity, having a greater ability to handle longer text
inputs, and even offering visual input [27].

7


2. Background and Related Work

Figure 2.1: Simple example of code generation from ChatGPT.

While GAI tools have only existed for a relatively short amount of time, their impact
can already be seen on many different levels of society, especially in education.
Dwivedi et al. provide accounts from experts in different fields discussing the rise of
ChatGPT and its implications [28]. The experts discuss the disruption ChatGPT
has caused, due to its effectiveness in producing larger amounts of text with at least
perceived high quality that can turn homework obsolete. While tools have been
created to detect texts written by AI, the experts do not agree that a ban on these
tools is useful, as students will live in tandem with these technologies in the future.
They compare the introduction of ChatGPT with the introduction of the calculator.
While controversial at first, it ultimately had to be used in educational settings as
the technology was being used everywhere else.

2.4 The Ethical Intricacies of AI
AI can present a plethora of opportunities. Maedche et al. discuss the expecta-
tions of opportunities that AI-based digital assistants could bring, such as resources
needed for more routine tasks can be significantly lowered, which in turn leads to
more time for more demanding tasks [29]. One example that Maedche et al. present
is IBM, which claims that AI-powered chatbots can reduce the cost of customer
service by 30%. This can also be seen in the investigations that Kumar [3] leads in
the human resources departments of different industries, where many were positive
towards the introduction of AI assistants in their everyday work to eliminate more
mundane tasks. Siau and Wang add to this sentiment by discussing other positive
impacts of AI, such as economic growth, social development, human well-being, and

8


2. Background and Related Work

safety improvement [30].

On the other hand, while AI continues its evolution to become more and more au-
tonomous and integrated into human life, various complex ethical questions and
issues are increasingly raised [31] [29]. Gratch and Fast identify that the degree
to which software consumers can create unethical behavior in AI agents is a more
challenging conundrum than the commonly researched issue of how the development
of AI can introduce human biases [32]. As seen in the rise of GAI, consumers are
given an increased amount of interactions with the agent itself, giving them the
ability to customize, or “program” as Gratch and Fast call it, the AI’s behavior,
goals, and values directly. Gratch and Fast tested this claim through three differ-
ent studies. The first was allowing a human to personalize their own AI agent to
handle legal negotiations. Secondly, they let humans personalize AI assistants as
tutors or coaches. Lastly, participants were placed in a managerial role and then
asked to choose whether to conduct employee check-ins themselves or through an AI
avatar. The results of these studies found respectively that humans were most likely
to deceive through AI, received less blame for the unethical behavior that their AI
caused, and that an AI avatar was a preferred choice due to humans anticipating
less criticism towards them by the AI.

Flick and Worrall have also identified other ethical issues specifically within GAI,
one being copyright infringement [33]. AI needs to be trained on existing material
produced by humans and because of this, the output of the AI is legally ambiguous.
It is also hard to determine who to blame for a potential infringement, i.e., the user
of the AI tool or the creator of it. Another issue identified by Flick and Worall that
is often brought up is the possibility of author/artist replacement. With an interface
that is easier to use, people who are not in the artistic/creative field can produce
work themselves without much hindrance, potentially rendering a subdivision of
professional capability as seemingly superfluous.

2.4.1 Ethical Dilemmas of Copilot
Copilot also falls under the umbrella of GAI as it authors code with AI technology
based on human input. Because of this, the tool has not been without issues being
raised about it, both in ethical and legal realms. Caballar presents and discusses a
class-action lawsuit that has been filed against GitHub, Microsoft (GitHub’s parent
company), as well as OpenAI [34]. According to the lawsuit, the code created by
Copilot does not include any acknowledgment to the original author of the code,
copyright notices, or a copy of the license, all of which are required by most open-
source agreements. This lawsuit is unprecedented in that it is the first to challenge
GAI and will set a benchmark for other jurisdictions on the subject. The defending
entities of this lawsuit have moved to dismiss the plaintiffs’ claims, stating that the
complaints fail by “lack of injury”, “lack of an otherwise viable claim”, and a lack
of providing real scenarios to which someone was personally harmed by the tool. As
of writing this report, the claim has yet to be resolved by the court of California, as
the hearing of the dismissal will take place in May 2023 [35].

9


2. Background and Related Work

While the lawsuit focuses on the licensing issues that Copilot has, there are other
risks and ethical shortcomings that large language models, such as Codex, can face.
Two examples of such risks are security and privacy. As these models are trained
on massive amounts of code that are not manually curated, there is a chance that
vulnerable and insecure code can be suggested by them, and Copilot is no exception.
In a study by Pearce et al., it was observed from 1689 programs completed with the
help of Copilot, that 40% of them had some vulnerability in their code introduced
into them [11]. It should be noted that the code generated by Copilot in Pearce
et al’s. study cannot be directly reproducible, and alternative percentages could be
derived in other studies regarding vulnerability. The study was also controlled in
the sense that artificial scenarios were created to produce these programs, limiting
the insights one can connect the study to more real-world coding scenarios. Pearce
et al. do however conclude that developers should not blindly accept the suggestions
that Copilot generates, and be aware that it can introduce vulnerabilities. This is
something that GitHub also discusses in their Copilot FAQ [9], stating that: “Pub-
lic code may contain insecure coding patterns, bugs, or references to outdated APIs
or idioms. When GitHub Copilot synthesizes code suggestions based on this data,
it can also synthesize code that contains these undesirable patterns. ... Of course,
you should always use GitHub Copilot together with good testing and code review
practices and security tools, as well as your own judgment.”.

On the topic of privacy, since code can contain sensitive information like creden-
tials, personal information, or even in-code discussions by developers, large language
models could be trained on this data. There is therefore a chance that sensitive in-
formation can be extracted from the model [36]. When asked if Copilot can output
personal data, GitHub responded with this in their FAQ: “Because Codex, the model
powering GitHub Copilot, was trained on publicly available code, its training set in-
cluded public personal data that was included in that code. From our internal testing,
we found it to be very rare that Copilot’s suggestions included personal data verba-
tim from the training set. In some cases, the model will suggest what appears to be
personal data – email addresses, phone numbers, etc. – but those suggestions are
actually fictitious information synthesized from patterns in training data and there-
fore do not relate to any particular individual”. GitHub also offers two settings for
Copilot handling data both in terms of input and output. One setting allows a user
of the tool to block Copilot from allowing suggestions matching public code, and
another is to block GitHub from using one’s code snippets for research purposes and
product improvements [9].

2.4.2 Addressing the Issues
As ethics in AI is regarded to be in its inception stage, one crucial concern that is
yet to be fully mapped out is how to address these issues being raised [30]. Danaher
attempts to analyze and evaluate some of the concerns to form a framework for
thinking about AI in a more harmonious, ecological way on an individual level [37].
Danaher presents a comprehensive review of several different issues raised by other

10


2. Background and Related Work

authors where AI assistants potentially inflict harm to mankind. One such is the
degeneration argument “If anything that forces us to use our own internal cognitive
resources enhances our memory and understanding, then anything that takes away
the need to exert those internal resources will reduce our memory and understand-
ing”. Essentially, one can lose cognitive ability when using AI assistants to perform
tasks over an extended period, as the brain is not stimulated enough. Danaher
concludes that one should be aware of the risk that this could happen depending
on the primary value and the risk of decoupling from the AI. If the primary value
of the task is intrinsic to oneself and/or the risk of decoupling from the AI is high,
one should try to use their own cognitive ability to perform the task. Another issue
is autonomy. It is something that is cherished in modern society, and AI assistants
have the potential to reduce it by both removing the channels between our choices
and their results, as well as homogenizing the way we construct decisions. Danaher
argues however that such forces that reduce autonomy already exist in other forms
such as in one’s culture and in one’s general environment. The approach to not
letting those factors manipulate one’s decision-making should be the same with AI.

Dhirani et al. identify an issue in that compared to creative technological develop-
ments, there has been less growth in the ethical arena regarding AI. In their article,
they give an overview of the ethical standards for AI technology currently being
developed by regulatory bodies [38]. The authors chose to focus on the European
Union (EU), which has already established regulations regarding technology in the
General Data Protection Regulation (GDPR). Looking at more recent history, the
EU has in 2022 passed several different acts in an attempt to further form a legal
framework around technology, those being the EU Artificial Intelligence Act, the
EU Digital Services Act, the Digital Markets Act, and the EU Cyber Resilience
Act. Focusing on the EU AI Act, its goal is to form a framework around AI to
enhance trust and limit possible harm that the technology may create. According
to the EU, it is the first law on AI by a major regulator anywhere [39]. While
Dhirani et al. focus on the EU and its strides in ethical AI, they do also put the
issue of interoperability between regulatory bodies to light. Large industries usu-
ally have multiple production regions/setups all over the world (eg. Europe, the
United States, and China) and are subject to different jurisdictions, rules regarding
data, and compliance. There is much work to do when it comes to regulating AI
and as the acts mentioned above are very new, it is hard to measure their impact.
Moreover, it is challenging to determine the threats of emerging technologies such
as AI until they have been used over several years. Dhirani et al. also mentions that
the AI Act should comply more with GDPR to create a more cohesive regulation.
Despite these issues, the acts are a starting point and have the potential to set up
a guideline for other regulatory bodies to follow [38].

While having regulations and guidelines on how to build ethical AI is important on
the government level, it is also critical that the businesses in the AI industry take
their own measures and governance practices to reduce the risks associated with AI
[40]. Eitel-Porter provides a multi-step framework for businesses to ensure that the
AI being developed is ethical and free of risk, not only during the model-building and

11


2. Background and Related Work

development stages but also during the deployment and scaling of it. The framework
is mainly based on the five common pillars of ethical AI: Fairness, Accountability,
Transparency, Explainability, and Privacy [40] and includes for example creating an
ethics board in the business, establishing metrics to ensure that AI principles are
followed, etc. The framework would assist to safeguard businesses from danger and
build customer trust in their services, which will help to boost usage even further,
both in private and enterprise settings.

2.5 Related Work

A review of related studies will be conducted with the intention of offering a more
thorough understanding of the topics of Copilot, using large-language models, and
conducting experiments in natural settings in the field of SE. This will mainly be
done by performing in-depth evaluations of past studies that are relevant, highlight-
ing their contributions and limits to understand the present state of research within
the topics. The review can also contextualize the research undertaken in this thesis
and demonstrate the gap in the literature that this thesis attempts to address.

2.5.1 Empirical Investigations of Copilot

In an empirical, quantitative study on Copilot by Imai [12], the results indicate that
using Copilot produces more lines of code but with a lower quality compared to pair
programming with a human. 21 persons with at least some programming skills took
part in the experiment where the number of lines of code added as well as deleted
was recorded. When using Copilot compared to a real human as a pair programmer,
more lines were deleted, hence worse quality. One explanation for this could be what
Dakhel et al. discuss in their study [41], as they discovered that Copilot had trouble
understanding details in the description of what to code. For example, telling it to
return a list in an order that “...older people are at the front...” was much harder
for it to understand than telling it to return the list in “...descending order...”. They
also noticed that Copilot could have difficulties understanding long descriptions
of a problem to solve, and as a result, it could misunderstand an entire problem.
However, it was also concluded that even though Copilot could not always satisfy
the description completely, the developer could incorporate the code generated by
Copilot with little to moderate change. Copilot’s difficulties to understand longer
descriptions were explained further by Chen et al., showcasing that Codex with 12
billion parameters performs exponentially worse as the chained building blocks in a
prompt increase [42]. An example of a building block could be “convert the string
to lower case”, and a chain of building blocks could be the prompt “Convert the
string to lower case, then remove all instances of the letter e from the string, and
finally add the word apples after every word in the string”. Prompts like these could
be hard for Copilot to complete, but not so hard for a professional developer.

12


2. Background and Related Work

2.5.2 Programmers’ Experience using LLM-powered Tools
There are a few studies that investigate Copilot and similar tools from a qualitative
aspect, such as the usability of the tools and programmers’ general experience. In
a within-subjects user study [43], Vaithilingam et al. found that the majority of
the participants preferred to use Copilot over VSCode’s default code completion
tool called Intellisense when solving programming tasks. The task completion time
when using Copilot was not statistically significant, and the participants using Copi-
lot failed three more tasks compared to the ones using Intellisense. However, the
qualitative results of the study indicate that Copilot offers features that affect the
programmer’s experience to such an extent that 23 of the 24 participants stated that
Copilot was more helpful than Intellisense. The study had a focus on how the user
perceived Copilot and how the user interacted with its code suggestions, and one
reason why participants favored Copilot was that it helped them get started with a
solution, even if the suggestions were not always completely correct. On the other
hand, some participants further explained that they found it hard to understand
and correct code that was incorrectly generated, while some had an over-reliance
and trusted the suggested code as it was. As a result, they came to the conclusion
that there is room for improvements such as adding support that makes it easier
for the programmer to understand and validate the generated code. It should be
noted that only one of the 24 participants was a software engineer and the rest were
either undergraduate, master’s, or Ph.D. students but all except one had at least
more than 2 years of programming experience.

Ross et al. have also conducted research on users’ interaction with an LLM-powered
developer tool which they developed and called the Programmer’s Assistant [44].
The tool combines a code editor with a chat interface which is powered by the
Codex model through its web API. Compared to Copilot, this assistant does not
give real-time suggestions but only responds to the programmer’s request, and the
suggestions must be transferred from the chat to the editor via copy/paste. The
study did not track metrics such as code quality or time to complete a task but
focused on the participants’ (mainly software engineers) attitudes toward the tool.
The result was divided into three sections: (1) expectations and experience, (2) util-
ity of conversational assistance, and (3) patterns of interaction and mental models.
Regarding (1), the participant’s overall experience was beyond their expectations.
Similar to previous studies [43] the generated code was not always correct (correct
80.2% of the time), but most participants still found it very helpful as minor tweaks
often solved issues. Regarding (2) they found a conversational assistant valuable,
especially when solving smaller and simpler tasks. Since the participants could have
a conversation with the assistant, they also found it helpful by asking it to explain
and document the code. Regarding (3), two patterns were discovered where one
was that participants asked the assistant to solve complete programming challenges,
and the second was that participants broke down the challenge and asked for help
from the assistant for each sub-challenge. Overall, the participants felt that their
workflow was impacted for the better since the assistant could help with lower-level
details, allowing the programmer to focus on development on a higher level and
speeding up work.

13


2. Background and Related Work

2.5.3 Limits and Risk Factors Using Language Models
Codex is a descendant of the language model GPT-3 created by OpenAI [42]. GPT-
3 has about 175 billion parameters, compared to the previously biggest language
model (Microsoft’s Turing NLG model) with 10 billion parameters. However, gen-
eralization comes with the cost of performance which is why GPT solves 0% of a set
of problems compared to Codex, which has been specifically trained on code and
can solve 28.8% of the same problems having around 12 billion parameters with just
a single sample [42]. A more fine-tuned version of Codex named Codex-S can solve
77.5% of the problems with 100 samples.

In the same paper, Chen et al. identified several risk factors which help to better
understand the hazards of using language models like Codex. One risk factor is
“misalignment”, which is described as even though Codex is capable of performing
task X, it “chooses” not to, i.e., it is not incompetent of performing the task but
does not do it due to various mistakes. This is concerning as it is a problem that
could become worse when scaling up data, parameters, and training. Another risk
discussed by Chen et al. is “over-reliance”, meaning that Codex can suggest so-
lutions that appear correct but do not perform the asked task, such as producing
insecure code. This can be related to the risks identified by Challen et al., who
discuss quality and safety issues using AI and divides them into short, medium, and
long-term [45].

The research was done to identify issues using AI within medicine, but the issues
could be applied to SE to some extent as well. The long-term risks are taken from
the framework by Amodei et al., discussing AI’s potential risks [46]. One of the long-
term risks of using AI is the negative side effects and it is discussed how to prevent AI
from disturbing the environment while pursuing its goals. This is relatable to “over-
reliance”, as Codex can suggest insecure code while trying to complete a task which
is an example of disturbing the environment. One of the short-term risks discussed
by Challen et al. is “distributional shift” which is explained as a mismatch between
the environment the data has been trained in and the environment it is used in.
This is relatable to “misalignment” since Codex suggests code that is as similar as
possible to its training distribution.

2.5.4 Field Experiments in Software Engineering
Given that field experiments will be a part of data collection in this thesis, tak-
ing a look back and reflecting on other studies that have been done with a similar
technique in the same field can be pertinent to improve the thesis study design and
give the thesis a reference point. One such study was made by Grimstad and Jor-
gensen, who explored the effects of manipulating variables in a natural setting in
which software companies estimated the effort (time) it would take to complete the
same project [47]. What Grimstad and Jorgensen recognized through previous stud-
ies was that software developers often subconsciously were having their judgmental
processes affected, both by relevant and irrelevant information, such as the client’s
expectations, variations in the wording of the requirements, and wishful thinking

14


2. Background and Related Work

[47]. However, these previous studies were in a controlled laboratory environment,
which could lead to certain threats to their validity. For example, the high time
pressure in a laboratory setting could affect the expended effort more than what is
seen in a natural setting. Focusing all efforts on estimating could also lead to more
causal variables [47]. Therefore, Grimstad and Jorgensen recognized a need to study
the estimation of software projects in a field setting as well. Recognizing a lack of
knowledge in a more natural setting was identified in the context of this thesis as
well. As mentioned in Section 1, many of the studies on Copilot were done in a
controlled fashion, giving a precedent to fill the gap in knowledge and employing a
similar study design. Looking at the results, Grimstad and Jorgensen documented
a decrease in effect sizes when comparing their earlier laboratory studies and the
experiment they conducted, indicating that the irrelevant information taken into an
estimation of a project is less likely to affect the judgment process [47]. One can
not always rely on controlled laboratory experiments to generalize findings, as they
might differ from what is happening in reality, hence the significance of conducting
such a study as the one this thesis will present.

Another study using the field experiment design was done by Anda et al. [48].
They conduct “a longitudinal multiple-case study of variations and reproducibility
in software development, from bidding to deployment, on the basis of the same re-
quirement specification”. This was done by giving four software companies the same
project to complete, that being the manipulated variable, and observing the differ-
ences in reproducibility of different aspects such as code quality, schedule overrun,
and actual lead time. While the article by Anda et al. offers little in terms of direct
comparison to this thesis purpose, one can still find interesting details and general
erudition that can be applied and taken into account in this thesis. When discussing
external validity, the authors reflect upon how much they actually can generalize
their findings, and realize that it is a difficult task. Counteracting effects such as the
size of the companies could partly affect variability as well as the degree to which
the company can be flexible. Therefore, the authors restrict their generalizations
to smaller Norwegian companies developing smaller Java systems. Generalization
restrictions should therefore be acknowledged for the findings in this thesis as well.

15


2. Background and Related Work

16


3
Method

This chapter presents the research designs and methodologies used to investigate the
impact of integrating Copilot into software engineers’ work. Firstly, a description of
the recruitment process will be presented, within which an alternative study design
had to be made to support an unforeseen hesitance to participate in the study’s
original design. Therefore, two different strategies were employed, named Study
Alternative 1 (SA1) and Study Alternative 2 (SA2). These will be presented in
detail in Sections 3.2.1 and 3.2.2 respectively. Furthermore, a description of the
data collection and data analysis used to generate the findings for this thesis will be
provided. This will include arguments to support the choices made for the different
techniques.

3.1 Participants
The goal of recruiting participants was to get professional software engineers working
in different fields to get as good of a spread as possible, allowing better generalization
of the data gathered to the SE profession. The methods used to recruit participants
varied, such as posting an advertisement on LinkedIn, but in the end all participants
were recruited due to personal contacts.

Originally, the study only had one design in which the participant was supposed to
use Copilot for two work weeks, followed up with an interview. However, very early
in the recruitment process, it was recognized from the first six larger companies
contacted that there was a hesitance to join the study due to the legal problems
that Copilot has faced. There was also worry expressed about security and privacy
elements that could be introduced when integrating Copilot into their code. There
was however a large interest in the topic and people in the approached organizations
were eager to discuss GAI in programming. Because of this, a new approach or al-
ternative for the study was created to be able to remove any security, privacy, or
legal concerns and recruit more participants to get a good enough sample size. After
creating the new alternative and offering it in tandem with SA1 in the recruitment
process, more participants were recruited for SA2.

In total, 13 participants were recruited, with three participating in SA1 and 10
participating in SA2. The participants were guaranteed anonymity and are therefore
given a participant number rather than a name, and their demographic data can be
seen in Table 3.1 below.

17


3. Method

Participant Main area of profession Age Study alternative
1 Web developer (consultant) 27 1
2 Fullstack developer 29 1
3 Frontend developer 24 1
4 Solution architect 37 2
5 Fullstack developer 27 2
6 Fullstack developer 30 2
7 Fullstack developer (consultant) 29 2
8 Fullstack developer (consultant) 26 2
9 Sofware developer 26 2
10 Backend developer 34 2
11 Backend developer 27 2
12 Fullstack developer 26 2
13 Fullstack developer (consultant) 28 2

Table 3.1: Demographic data of the participants.

3.2 Research Designs
This section will describe the different approaches and execution strategies used
for SA1 and SA2. While they do differ in many aspects, both research approaches
sought qualitative data and insights on how AI technologies such as Copilot may
influence the SE profession.

3.2.1 SA1
There are several research strategies that can be used in SE. In a framework by
Stol and Fitzgerald [49], eight different strategies are described that can be used
depending on what level of generalizability or obtrusiveness one wants in the study.
For SA1, the strategy of the study was designed to fit the field experiment model
that Stol and Fitzgerald present. Field experiments, together with field studies are
conducted in natural settings to capture realism gained at the price of low gener-
alizability. The main difference between the two is that in field experiments, the
researcher can manipulate properties or variables to observe an effect as the re-
searcher is not manipulating anything in a field study. SA1 was conducted in a
natural setting with manipulation of a variable to study its effect [49]. The setting
was the participants’ workplace and the variable was Copilot, allowing evaluation
of how the tool affected them in their work.

For this option, participants were asked to set up Copilot on their own device and
use it during their day-to-day work for two weeks, i.e., ten work days. An initial
meeting with the participant was held to present the purpose of the study, give
them instructions on enabling Copilot on their IDE, and inform them of the legal
and ethical dilemmas that Copilot has faced. A consent form was also sent out
to the participant for them to sign electronically. When this was signed and sent

18


3. Method

back, the two weeks were considered to have begun. During these two weeks, the
participants were asked to answer a set of questions in a survey at the end of each
day that asked about their experience and how helpful they found the tool. A semi-
structured interview was held after two weeks of using Copilot and allowed for a
deeper dialog and qualitative data to be gathered.

3.2.2 SA2

Getting people to participate in SA1 was harder than expected and the main expla-
nation was that companies had to turn down the offer to participate due to legal and
privacy concerns about using Copilot in their work. However, the most important
goal of this study was to interview professional software engineers to get their per-
spectives on how Copilot and similar AI tools can affect the profession, and it was
therefore not necessary for them to use Copilot in their day-to-day work to accom-
plish this. Therefore, SA2 was created and subsequently offered in the recruitment
process as an alternative to participating in the study.

From the discourse in the recruitment process with larger organizations, it was no-
ticed that a lot of people working in the industry have a large interest in tools like
Copilot and ChatGPT. Most have used them in one way or another and were ea-
ger to share their opinions and experience about them, which coincides with the
aim of the thesis. To retain the qualitative study design and gather data similar
to what SA1 could collect, SA2 was designed to include the same interview format
and questions. In contrast to SA1 where the participants would have guaranteed
experience with using Copilot, this could not be assured for the participants in SA2,
as this thesis does not limit participants to only ones with prior experience with the
tool. To retain the same interview format and questions, a demonstration of the
tool’s capabilities was chosen as an alternative method to provide the participants
with fundamental knowledge about Copilot and its features before conducting the
interview. The demo was showcased on the interviewer’s own systems to remove the
potential legal, privacy, or security risks for the professionals. Before the demo and
interview session, a consent form was sent out to the participant for them to sign
electronically.

In reference to the framework provided by Stol and Fitzgerald [49], the design of
SA2 does not align with any of the eight research strategies directly. It does however
incorporate elements of different research strategies discussed by the authors, such
as gathering data from experts discussing a given topic of interest (judgment study)
and setting up an environment that would not exist without the study (experimental
simulation). As the demo was shown in a contrived setting with professional software
engineers (experts) discussing Copilot as the topic of interest, SA2 can be viewed as
a separate, artifact-driven research strategy inspired by the aforementioned designs.

19


3. Method

3.2.3 Creating the Demo
All demo sessions for SA2 aimed towards creating the same small-scale full-stack
application using the so-called “FERN”-stack, a collection of four technologies that
can be used to build a web application. The technologies are Firebase, Express,
React, and Node, hence the name “FERN”. Firebase is a platform developed by
Google and offers services such as a real-time database, authentication, storage, and
hosting. Express is a web application framework used on top of Node and it helps
build web API’s quickly and easily among other features. React is a JavaScript
library used to build a web application’s client side, i.e., the user interface. Finally,
Node is a runtime environment used to run JavaScript code and allows running web
applications outside the client’s browser.

A separate, finished version of the demo application was created first with the help
of Copilot. This was done to provide a finished product to show the participants
before demoing the tool what the interviewers were going to work towards. It also
provided a form of backup to the interviewers when implementing the demo if they
got stuck. To uphold the “FERN”-stack, a Firebase/Firestore database was created
that could be used both by the already finished version and any subsequent demo
attempts. A simple Node server was then created for the backend part of the appli-
cation and an Express server was instantiated within that. A React application was
created for the client side (frontend), and the Firestore database was then configured
in the backend application.

The idea of the application was to fetch a random quote from a dummyJSON API
and fetch data (name and timestamp) from the database to display on the page (
see Figure 3.1). Users were then supposed to insert that quote and their name into
respective inputs and submit it. Their name, along with a timestamp, was then
inserted into the database. The backend handled adding and fetching data, and
API routes were created which the frontend could access.

20

https://dummyjson.com/docs/quotes


3. Method

Figure 3.1: Original version of the demo.

Copilot was used in tandem when developing this original application to produce
various code. For example, configuring Express on the backend side and helping with
writing the routes for fetching and adding data to the database. On the frontend
side, it helped structure HTML, styling, and testing. Using Copilot while imple-
menting the original application gave an idea of where its features could be shown
during the other demos.

For the sessions with participants, a separate project was created for the showcase
which was uploaded to GitHub via the version control tool Git. To save time, cer-
tain parts of the demo were prepared before the sessions. On the backend side, the
configuration of the database, the API route for fetching data, and all necessary im-
ports of libraries and configurations were implemented. On the frontend, necessary
imports were created as well, along with the request to the route in the backend to
fetch data and an HTML skeleton.

3.2.4 Demo
Before every session, a new Git branch of the demo was created from the main
branch version that only had the preparations. This removed the need for multiple
projects and only required one. During the sessions, the participants were first shown
the original, finished version of the application that the interviewers were going to
work towards implementing. They were then explained the technologies used for the
application. The participants were also told that while the interviewers were doing
the coding, they were allowed to interrupt to ask questions or give suggestions of
something they wanted to test Copilot on. The demo aimed at being a maximum

21


3. Method

of 30 minutes. After this time, the participants were quickly shown the code for
the original version of the application to give them more examples of where Copilot
could be used.

3.3 Data Collection
Hearkening back to the purpose of this study, the aim is ultimately to assess the
impact of Copilot on software developers’ work processes and their opinions on the
future of their profession from a qualitative perspective. The data collection meth-
ods should consequently align with the goals of this thesis. Based on a taxonomy
on the degree to which interaction with software engineers is necessary, Singer et al.
present a comprehensive list of different data collection methods for different con-
texts [50]. According to the list, one can form an evaluation by collecting general
information and opinions about a process, a product, and even personal knowledge
through interviews and questionnaires.

Google Forms was used to distribute the surveys for SA1 that the participants
filled out every workday. Questionnaires are mainly applicable when collecting large
amounts of data as they are very fast to create, send out, and analyze [51]. They are
also a good complement to use together with other methods when gathering data
as it gives a broader knowledge basis [51]. Therefore, it was still suitable to use
them for the goals of this thesis as they could support the subsequent interviews.
The questions of the survey were tailored to address the research questions and can
be seen in Appendix A.1, with a majority of them using the semantic differential
scale. This technique allows the respondent to select a degree of opinion when an-
swering a question using a bipolar adjective scale, usually between one and seven
[51] (see Figure 3.2). Each end of the scale host antonym adjectives (e.g. fun -
boring), and the respondent select the number on the scale they feel represents their
answer. The technique also offers an advantage compared to other techniques such
as the Likert scale in that it can be analyzed using qualitative methods. One can
analyze ratings to identify patterns or themes, which in turn can provide a deeper
understanding of how a concept or object is perceived and evaluated [51].On top of
this, some open-ended questions were included as well to allow the participants to
provide more detailed accounts and insights as well as other comments.

Figure 3.2: An example of a semantic differential scale question from the survey.

The interviews for both alternatives did not differ in structure or technique. Each
interview followed a semi-structured format design and lasted around 40 minutes. A

22


3. Method

more focused set of questions was created for each individual RQ to ensure that the
respondent discussed topics within the scope of the study, and the interview ques-
tions can be seen in Appendix A.2. However, because of the more free, conversation-
like setting that semi-structured interviews provide [50], questions did not necessarily
have to be discussed in the order written down in the script and there was an option
to ask follow-up questions to get more data on interesting topics brought up by the
interviewee. One difference between the interviews conducted with the participants
doing SA1 was that the data from the surveys could be used to ask the participants
more deeply why they answered the way that they did. The interviews were done
either online using various digital meeting tools or on-site at the participant’s work-
place. Before starting the interview, the participant was first asked for consent so
that the interview could be recorded for later transcription. The interviews were
recorded with the standard recording app of an iPhone 13. After the interview,
the recordings were uploaded to the Google Chrome extension Transkriptor, a tran-
scription software that automatically writes out the transcription of the interview
from the uploaded recording. The recordings together with the transcriptions were
however cross-checked for every interview and polished to remove any errors created
by the software.

3.4 Data Analysis
Because of the limited amount of sample size for SA1 specifically, it was deemed to
be insufficient to use the data provided by the surveys for meaningful analysis and
as something that could be used in the result section as significant data. Instead,
the data from the surveys were used in the interviews with the participants doing
SA1 as a form of diary for the interviewers to reference and ask the interviewee
about data points that were deemed particularly noteworthy during the interviews.

An iterative, inductive approach was used to perform thematic analysis on the tran-
scriptions to identify, analyze and report patterns, or themes, in the data [52]. The
iterative process was done in accordance with the phases Braun and Clarke identify
[53]:

1. Familiarising yourself with the data: simply reading and re-reading the data,
making notes of ideas that spring to mind.

2. Generating initial codes: coding the entire dataset systematically and collating
data that is relevant to each code. They define codes as labels that “identify
a feature of the data (semantic content or latent) that appears interesting to
the analyst”.

3. Searching for themes: gathering codes (and related data) into candidate themes
for further analysis.

4. Reviewing themes: checking whether the themes work with the data and cre-
ating a thematic map of the analysis.

5. Defining and naming themes: refining the themes and the overall narrative
iteratively.

23


3. Method

6. Producing the report: which will, in turn, require further reflection on the
themes, the narrative and the examples used to illustrate themes.

The analysis was done with the help of the software Atlas.ti, allowing the user to
create a database of documents to highlight and analyze their transcriptions. Firstly,
familiarization with the data was made with each transcript being read through
thoroughly. Quotes were then highlighted and marked with an initial code relevant
to the study’s purpose, for example “Efficiency” or “Hindrance”. Candidate themes
were then created, for example, “Features increasing efficiency”, and served as a
basis for a reviewing process. The reviewing process was done by producing a code
distribution report, provided by Atlas.ti, on the entire project (see Figure 3.3).

Figure 3.3: A section of the most used codes in the code distribution report
provided by Atlas.ti.

The quotes could then more easily be mapped out using the candidate themes and
were iteratively analyzed to see if any adjustments needed to be made by adding
and removing codes from the quotes. Once new codes could not be formed, the code
distribution on the quotes was once again analyzed to form overarching themes based
on the candidate themes and the RQs. Some of the candidate themes were turned
into categories of the more overarching themes, and a final report could then be
produced.

24


4
Results

This chapter presents the results elicited from the thematic analysis where three
themes were created. The themes aim to cover one RQ each and will be presented
in their own section, while Tables 4.1, 4.2, and 4.3 below give an overview of the
findings. Each theme also consists of multiple categories in which quotations from
the interviewees could be divided. Do note that various quotes presented in these
findings have been translated from Swedish to English. Furthermore, note that
specific participant identifiers such as the participant number have been omitted
when presenting the quotes in the results to further insure anonymity to those who
have participated. Lastly, while 261 quotes could be given codes in the thematic
analysis, only a subset of these will be showcased in the findings to capture the quotes
that best summarized the overall sentiments of multiple participants’ responses along
with quotes that were deemed particularly compelling.

25


4. Results

Table 4.1: Showing an overview of the theme “Direct effect on daily work” and
its categories with example quotations.

Theme Category Examples of supporting quotations
Direct effect on
daily work

Features increas-
ing efficiency

“Yes, but this with auto-completion, it goes much
faster than if you had written everything yourself.
Then as a programmer it is important to under-
stand whether it is right or wrong, what it auto-
completes. But yes, it just goes much faster sim-
ply.”

Features being a
hindrance

“Sometimes you’ve accidentally tabbed something
you don’t want, so you have to sit and delete it
instead. A small little thing. But when you have
used it more, I assume that you also get to know
the way of working better. But for example at the
beginning also like, you have to wait a little while
for the Copilot to think, then you have to sit and
(inaudible) for 2 seconds and then it will figure out
what to do.”

New or im-
proved features

“...a little console that you can ask questions or
something like that. Maybe it would be really nice
if Copilot could not only write the code but ex-
plain it as well. I know that it can generate the
comments, then you’re generating the code as well,
but it would be really nice to have like a more in-
depth explanation of what it’s doing, like Chat-
GPT can actually do it at the moment. So that
would be really useful.”

26


4. Results

Table 4.2: Showing an overview of the theme “AI in the problem-solving process”
and its categories with example quotations.

Theme Category Examples of supporting quotations
AI in the
problem-solving
process

Positive effects “I Google a lot. I kind of have the editor and
Google, and so I just copy back and forth. So
often a lot is already copied from something that
someone else has done. The difference here was
just that I didn’t have to google. But it was just
there [the suggestion] and then I changed in it.”

Negative effects “But I also think that maybe it will be a bit that
you become unsure of yourself, like you have a plan
of where you are going with your code and then
it gives a completely different suggestion, that it
becomes a bit like an interruption that you start
thinking but is this what I should do now or should
I, like, what should I do?”

Critical thinking “Especially with security issues you really have to
be aware that it is able to handle the security. Be-
cause if that goes wrong, it’s less good than if it’s
just a regular bug. Then you have to know yourself
what security routines are needed to secure, that
is, to write this system.”

27


4. Results

Table 4.3: Showing an overview of the theme “Prospects using AI in SE” and its
categories with example quotations.

Theme Category Examples of supporting quotations
Prospects using
AI in SE

AI tools in edu-
cation

“It’s the same as this about it being so problematic
to write articles and essays with ChatGPT. What
I think is that in the future, everyone will write
bulk works via that type of AI. So it consists of
learning how to use the tools that you will use
in professional life, for example in school. Same
way as the calculator and all the other tools that
exists.”

Worries and
Questionings

“With Copilot, it really becomes more accessible
than it is if you have to Google it too. So is
very easy to just take a solution and be a bit lazy.
All developers are lazy. We kind of like abstract-
ing things, not having to do things twice. So I
think wrong solutions or non-conventional things
can stick.”

Career prospects
as a software en-
gineer

“Yes, I think it’s the same there that you might
focus on what you need to move forward. If there
is a tool that solves more low-level things for you,
then there is less reason to be really good at that.
So perhaps for efficiency reasons, I absolutely be-
lieve that it can affect the level of competence.”

4.1 Direct Effect on Daily Work
This theme was created to represent the findings regarding RQ1 and the three
categories include different aspects of the participants’ opinions on Copilot’s features.
RQ1 sought to find out which features might make a developer more efficient and/or
hinder them in their daily work, but also what features could be improved or ideas
for completely new features, hence the naming of the categories.

4.1.1 Features Increasing Efficiency
All participants emphasized in some way that Copilot could enhance their efficiency
in their day-to-day work. When it comes to the specific features that Copilot offers,
the efficiency was mostly attributed to the auto-completion of code. When writing a
piece of code, Copilot could recognize what one was writing and suggest what would
come next based on what had been written before. The user can press the tab key
to accept the suggestion.

28


4. Results

One participant of SA1 mentioned how Copilot was especially helpful when doing
smaller parts of a method.

“What’s been helpful is primarily when Copilot has
determined smaller things for me, i.e. auto-complete.
When it sees that I am attempting to write a state vari-
able for example, or a branch in a switch-statement, a
branch in an if-statement, those smaller pieces. Then it
has been nice not to have to write it, I just tab.”

Five participants mentioned similar sentiments where auto-completion specifically
could help one become more efficient, which was when writing more repetitive code.
The participants mentioned for example boiler-plate code when building web com-
ponents from scratch in React or initial configurations for databases. Essentially,
writing code when completing tasks that were seen more as chores and did not con-
tain much business logic was where Copilot was seen as the most effective with its
auto-completion suggestions.

“I think what really can be helpful is features which
help to write code, like repetitive code. You maybe don’t
write it very often, like: OK make an HTTP request,
that you always need to Google. Even if you know that
yeah, this problem would do the trick, you would still
Google just to make sure it’s the correct one. So proba-
bly for such cases and maybe not where business logic is
involved. But, like, from web development side it could
be: OK how to center this block? Every time you would
Google it. If the Copilot can generate the code yeah that
would be really really helpful.”

Eight participants identified new development as the main area of use when exam-
ining the specific domains of software development in which the tool could be most
useful for efficiency. Participants saw that Copilot could be most effectively utilized
when building a program from scratch and also where they could be most efficient
with the tool. Out of the three participants in SA1, two said that the tool was most
helpful in new development.

“It didn’t feel like it could improve existing code that
much, but it’s more giving new suggestions as you are
writing. So it’s only creating boilerplate code right now
anyway. It feels like ChatGPT is more like: how can I
improve this function to be more efficient?”

Conversely, four participants instead saw the benefits of Copilot when creating tests
for the code more efficiently. The participants felt that testing was something mun-

29


4. Results

dane that could be automated to a certain degree. One participant specifically
mentioned how the commenting feature of Copilot, where the tool gives a sugges-
tion of a block of code based on an earlier comment by the user, could potentially
make them more efficient in testing.

“Something I think is very boring is testing, unit
testing, and what I have mainly thought about with these,
specifically with these comments and being able to write
a test through comments. I think that could really re-
duce the boring work maybe and receive a bit more help
on the way even if [Copilot] doesn’t solve all tests. ... It
takes a while to set them up and being able to get some
rows for free would be nice.”

4.1.2 Features Being a Hindrance
While participants found Copilot to be very effective in helping them complete tasks
faster, some also found certain aspects of the tool as more of a hindrance to their
day-to-day work. Five participants expressed that giving instructions to Copilot
explicitly by commenting directly in code was an inefficient way to work. This was
due to a couple of different reasons: (1) Copilot being slow to give suggestions or
not giving suggestions at all after one had commented, and (2) the process of com-
menting felt unnecessary and required too much extra work.

“The bulk of the time when I wrote the comment and
then tabbed down I got nothing, so I don’t know if you
have to wait or if [Copilot] was sort of thinking, but
yeah. So it was very difficult to determine if I had done
something wrong or if it was [Copilot] who had done
something wrong, or just wait.”

“I think that writing these comments the way that
you do, it’s nice sometimes, but sometimes when you
do it feels like it takes a longer time to come up with
how to write them than to just continue writing.”

While the auto-completion suggestions were seen as making one more efficient most
of the time, six participants also voiced concerns about how it could hinder them.
It was expressed that having a pair programmer with oneself at all times during
development can be an efficient way to move forward if one gets stuck or wants an
extra opinion, but at times it can become a disturbance that disrupts one’s train
of thought when inputs come in at inopportune times. It was also mentioned that
it could make one insecure about one’s work. One participant talked about how
Copilot made them insecure when the tool constantly gave suggestions as they al-
ready had a plan on what to do next. When the tool gave suggestions it disrupted

30


4. Results

them and made them think twice about what should be done, and ultimately it took
longer time to finish the task. Another participant in SA2 who had used Copilot in
their work before had to remove the tool because they felt it brought more hindrance
than effectiveness in their work.

“Initially I thought this could be good, but then some-
where along the way it became a bit too overused and a
bit annoying, so I chose to turn it off myself. It was in
the way. ... I knew the complete scope in my head and
everything and I didn’t have to auto-complete the parts
that I sat with because all the times I tested [Copilot] I
refactored myself anyway. So I could have written cor-
rectly from the beginning instead”

4.1.3 New or Improved Features
While there were some perceived hindrances of the tool, many participants came up
with some interesting suggestions on how the tool can be improved by either adding
new features or improving existing ones to reduce these hindrances. While this study
is not meant to be an analysis of how Copilot should be improved necessarily, it is
interesting to see how AI tools in programming can be better adapted for a better
developer experience, and how that would affect developers’ work.

Eight participants spoke about how they would like Copilot to be an AI assistant
that they could communicate with more directly. Many mentioned ChatGPT as an
inspiration for how Copilot could be changed in that ChatGPT opened up for more
discussion and analysis of existing code rather than just providing new development.
According to the participants, this would also improve areas such as maintenance
and bug-fixing.

“I was playing around with ChatGPT a bit and I re-
ally liked that you can really interact with it, like more.
It would be nice to see if Copilot could do something like
that. Like instead of only suggesting the code that you
can write there would be, I don’t know, a little console
so that you can ask questions or something like that.
Maybe, it would be really nice if Copilot could not only
write the code but explain it as well. I know that it can
generate the comments, when you’re generating the code
as well, but it would be really nice to have like more in-
depth explanation of what it’s doing like, ChatGPT can
actually do it at the moment, so that would be really
useful.”

The idea of a smaller interface outside of the code that not only provides new code

31


4. Results

suggestions but also discusses the code one has already written was shared by four
other participants as well. One participant also predicted that this change could
eliminate the hindrance of Copilot disturbing one’s thought process in that they are
given a choice to interact with the AI, rather than it providing suggestions at all
times.

“I would like to have more of a discussion platform
like ChatGPT and also analysis tools so you can ask it:
Why do I get this error message when running? So it
can sort of figure it out. Such things would have been
really mind-blowing.”

4.2 AI in the Problem-Solving Process
RQ2 aims to answer how developers’ problem-solving process could be affected by
using AI tools in their work, and the participants reasoned about the topic in differ-
ent ways. The majority believed that these tools would help them in their process
by allowing them to be more creative and efficient, but many also pointed out ways
that AI tools could affect their process for the worse. Another key aspect of this
theme was the participants’ emphasis on being critical when using code suggestions
or other outputs from an AI.

4.2.1 Positive Effects
One aspect that participants pointed out was that AI tools can replace steps in their
problem-solving process in which they search for information, such as how to write
a piece of code. Six participants pointed out that Google is open next to their code
editor a lot of the time but that with these AI tools, they could find solutions to
tasks in a more efficient way.

“I Google a lot. I kind of have the editor and Google,
and so I just copy back and forth. So often a lot is
already copied from something that someone else has
done. The difference here was just that I didn’t have to
google. But it was just there [the suggestion] and then I
changed in it.”

Similar to the findings regarding increasing efficiency, participants described that
AI tools could add value when they need to implement functionality that is already
existing or solutions to lower-level problems that the AI can solve itself. This could
let software engineers solve problems in a different way, focusing more on the end
product and less on the implementation details.

‘It’s easy as a developer to want to just write some-
thing fancy, but the important thing is how you deliver

32


4. Results

value, and there I think that the less time I need to spend
on the implementation details, the more room there is
for me to be creative in the final product.”

Two participants even referred to programming as being boring, and that with AI
tools they could focus more on solving problems at a higher level that were perceived
to be more enjoyable.

“You don’t have to do the boring bits, you get to be
part of the fun and build up the system. And then when
you actually know that: now I want a function like this,
so give me a function like this.

Furthermore, it was highlighted by five participants that AI tools could increase
creativity. One participant for example made a comparison between AI tools and
using programming libraries in the sense that it makes programming more abstract
rather than less creative.

“I think that ChatGPT is rather a sounding board
on like we have this problem what would you do, and
then you get some ideas and then maybe you just: Ah,
that’s a way you can do it! Then you test it and you
will think of other ideas and you spin on further. So I
rather think that it kind of increases creativity because
you maybe start from zero and have no idea what to do
but you receive some ideas.”

Six participants also mentioned that it is a great starting point for solving problems
as it is a source of inspiration, and even if all suggestions do not solve an issue fully
they can be tweaked to your liking.

“Because in my particular example, I guess it would
really enhance my creativity because I learn from see-
ing the example being done and since GitHub Copilot
can actually implement something for you step-by-step
or line-by-line, at least for me, that would be really ben-
eficial because I would see a good example and maybe it
could spark something in my, I don’t know, subconscious
and maybe I can remember and learn a few things that
I maybe haven’t seen before.”

4.2.2 Negative Effects
Although many participants put emphasis on the benefit of not having to use Google
and forums such as Stack Overflow as much in their problem-solving process when

33


4. Results

using AI tools, one participant felt that not reading about and comparing different
techniques and solutions online could be a disadvantage. Two other participants
reasoned in a similar way and saw a risk of getting lazy in the problem-solving pro-
cess, i.e. trusting blindly on the AIs’ solutions and not considering alternatives.

“... when you start relying on ChatGPT’s capabili-
ties and what it can do or Copilot and what it can do, I
think it’s easy to ignore looking for other alternative so-
lutions to any suggestions. And then whether that limits
creativity, I don’t know, but I think it’s generally tricky
where you start to trust something far too much that
it becomes easy to have a bit of tunnel vision and you
might not even think about what you’re doing, then hope
that it sort of works out”.

4.2.3 Critical Thinking
Even though AI tools can help software engineers in their work it became clear that
critical thinking and base knowledge are required when using these tools, both to
know how to ask the correct questions to get what you are looking for and to val-
idate the outputs. This was agreed among all participants. Two participants also
mentioned that you should be extra aware since Copilot in this case is trained on
public repositories. They reasoned that just because Copilot suggests a solution
that is most common among all data it is trained on, it is not necessarily the best
or correct way to solve the problem

“You can always jump into this GitHub Copilot and
start making the application without any knowledge. Like
you’re just writing comments, GitHub is generating code,
and then you pray that everything runs fine the first
time. But that’s not really effective. I guess, yeah, in
order to use it effectively, you will still have to under-
stand what it is outputting.”

Another aspect two participants felt needed to be taken into consideration when
using AI in the problem-solving process is to be aware of the security risks the out-
puts can entail. One needs to have a fundamental knowledge of how secure code is
written to prevent the blind acceptance of insecure suggestions.

“Especially with security issues you really have to be
aware that it is able to handle the security. Because if
that goes wrong, it’s less good than if it’s just a regular
bug. Then you have to know yourself what security rou-
tines are needed to secure, that is, to write this system.”

34


4. Results

4.3 Prospects using AI in SE
The final theme includes the participants’ thoughts and opinions on the prospects of
using AI in SE. This is in connection with RQ3 and will be answered by eliciting re-
sults from the two previous themes, but also from the interview questions about how
AI tools could change how programming is taught, how the prospects of a software
engineer could change in terms of a shift in the role as a developer, and the effect on
expertise and newly graduated developers. Finally, worries and questionings raised
by the participants regarding the integration of AI will be presented.

4.3.1 AI in Education
While this study does not have students as its subjects, if or how the teaching of
programming in schools will be handled differently with the introduction of GAI is
relevant for the future of the industry. 12 participants had a thought process that
was positive about the introduction of GAI into SE education.

“I don’t think ban. I absolutely don’t think so. You
will always have [AI-tools] in your working life later. It
is more about learning how to work with them”

One participant interestingly also brought up the example of the introduction of the
calculator and used it to justify that AI will impact education in a similar fashion.

“Yes, it certainly will. I think it will have a big im-
pact. I think it will be a bit like the calculator, you won’t
have to be able to calculate everything yourself and you
just can figure out, like overall how it should work and
then you let calculators do the heavy lifting.”

The effectiveness one can increase when using these tools could play a role in school
settings as well. Five participants foresaw scenarios where teachers could move on
more quickly from the more technical concepts such as syntax and computer archi-
tecture, as that will be provided by the AI. Teachers can instead focus on discussing
higher levels of abstraction such as the infrastructures of building programs and best
practices in SE.

“... where I am from, we have lots of people now
changing their careers from a totally different area to
IT and I’ve noticed that in those courses they are taught
like, okay, how to develop in C# and JavaScript, etc,
but maybe lack, like, architecture best practices or like
what’s best practices of software development in gen-
eral? What are restful APIs? Something like that. So
yeah, using this Copilot or other tools would give more
time for other topics, so then it would be really really

35


4. Results

nice. So I think yeah definitely there could be a bit
of change in the academic setting, how we teach those
things”

In general, it was hard for participants to predict what would have to happen in
regard to testing and assessment if AI assistants are incorporated into a school set-
ting. The only real conclusions that participants could think of were: (1) having
tests where these tools were not allowed, or (2) having assignments and tests where
it is known that these tools can be used, and formulating the questions around that.

Outside of more formal education, four participants saw the potential of GAI tools
in enhancing one’s personal learning. Emphasis was put on the value of practical
experience for obtaining competency in all aspects of SE, such as lower-level pro-
gramming and software architecture. It was thereby, according to the participants,
important to view the implementation of theoretical concepts, the experimentation
of new languages and frameworks, as well the handling of bugs and errors as indi-
vidual learning opportunities. These opportunities could then be learned even faster
when integrating GAI tools into one’s workflow.

“You don’t always have Google giving all of the best
examples or answers, and I think that would be, I don’t
know, quite a nice tool to learn some of this stuff. Espe-
cially if you’re, I don’t know, learning a new program-
ming language or a new concept. As I’ve said, again,
it would be a nice example like teaching the concrete,
I don’t know, piece of codes, so I guess that would be
useful.”

4.3.2 Career Prospects as a Software Engineer
Inquiries were also made about how these AI tools such as Copilot and ChatGPT
can potentially alter the careers of software engineers. Specifically, if it will become
easier or harder to enter the profession, if the skill requirements of a software en-
gineer will change, and if there would be any effect on the level of expertise within
the industry.

There were some mixed opinions by the participants regarding entering the profes-
sion. Nine participants felt that it would become easier as people without any prior
knowledge would have more tools at their disposal to potentially learn programming
from. In general, participants felt that it has become easier and easier over the years
to become a programmer as so much has become abstracted and there exists so many
free learning tools online. The introduction of AI tools was considered the next level
of abstraction that could lower the entry barrier into the programming world even
further.

36


4. Results

“Yes, it can probably lower the requirements [to en-
ter the profession] in the future because it will be easier
to train people and it is going to be easier for people to
learn things because you receive help at all times with
the more basic things.”

One thing that a participant mentioned that was deemed important to highlight was
that one should be transparent with one’s use of AI to companies recruiting you.

“I think that might be easier, but I think you would
have to be transparent about that. For example, when
you’re learning new stuff there’s like a few ways to do
that. You can, I don’t know, read books, take a look
at YouTube, and I believe Copilot is, would be one of
the learning paths. For example, if you’re applying to a
company that requires you to do a homework task. For
example, make a small application, I guess you would
have to be transparent about it that you have used Copi-
lot if you use it. Because I guess it really shows if you
have knowledge about your code or if you have actually,
I don’t know, written everything with Copilot prompts
and then you couldn’t explain what’s in your program.”

Four participants said that it would be more difficult to enter the profession. While
the individuals who felt this way mainly agreed that learning programming would be
simpler, they followed their thought process in a different direction and considered
the repercussions of programming being an easier discipline. One school of think-
ing is that you will not need as many programmers since fewer individuals with
all of these new AI tools can do the same amount of work as previously in larger
teams, reducing the number of jobs needed and making it more difficult to find work.

“Like from a learning perspective like if a new per-
son is learning something and gets stuck, yeah, maybe
he or she can ask Copilot to create things. So from a
learning perspective, yeah you might learn things even
faster, understand them faster. But if like there’s a lot
of such developers who learn it this way, so there might
like be more tricky to get into the market”

Eight participants felt that there would be a shift in the requirements in the future
and that the title “Software Engineer” would change meaning in terms of what is
needed to perform the profession. This sentiment mostly centered around how AI
tools can perform more low-level tasks at a more efficient level, which in turn can
shift the focus to solving problems on a higher and more abstract level. It was
also mentioned that knowing how to control your AI tool and effectively writing
prompts will become a required skill in the future. Two participants even imagined

37


4. Results

there could be a specific title for a profession as an “AI prompt engineer”.

“I really think this is just the next step in program-
ming. Since I started, I have always raised the level of
abstraction, that a line I write today, how many com-
puter instructions isn’t that? I have no idea. So, and
there are certainly people who say that you don’t even
know what your code does, because you don’t under-
stand which ones and zeroes this really becomes, this is
the same thing. I’m super abstract now. The language
solves this for me, the details. And somehow, that’s how
I see my career developing. That you become more and
more abstract where this is just a giant shortcut to it,
that we can all become more tech leads, than that you
have 50% of the workforce who are junior people who sit
on Stackoverflow and try to google implementation de-
tails in the language they working with. Maybe this tool
can make us realize, the same way we’ve realized with
our programming languages, it’s okay to abstract and
we can all level up and have greater impact that way”

“Then, I can probably imagine it in the future. Cer-
tainly not now, but in the future someone will surely
write: you must know... you must be good at asking
an AI. Now I don’t know, there’s probably nothing there
right now, but there will probably be courses and work-
shops on how to ask the question to AI, and that will
be what determines how well an AI can help you in the
end.”

When looking at the level of expertise in the industry being affected, there was a
clearer hesitance in predicting how or where the AI tools could affect it from the
participants, at least compared to the answers about the knowledge requirements
and entering the profession. Many commented on how the question was complex
and had a hard time formulating a concrete answer. One participant commented
that they could see themselves answering that question in 10 years when these tools
have had a longer effect on the industry. More concretely, five participants believed
that, for the time being, the level of expertise would not be affected. However, newer
developers would learn at a faster rate. In other words, the knowledge floor would
be increased, but not the ceiling.

“More people could, yeah, actually get up to a cer-
tain level faster. And the people that already know stuff,
it wouldn’t really have that big of an effect. Like it still
would be like a nice thing to have, but