Causal Models Applied to Studies within the Mining Software Repository Domain

Typ
Examensarbete för masterexamen
Master's Thesis
Program
Software engineering and technology (MPSOF), MSc
Publicerad
2024
Författare
LEVINSSON, AMANDA
FRANSSON, LINNÉA
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Context: Research conducted in the mining software repository domain commonly utilize observational data, due to software repositories serving as a rich source of such data. Simultaneously, there is a clear lack regarding the incorporation of causality in Software Engineering (SE) research, whilst statistical analyses often are conducted. Objective: To analyse the practical implications of applying causal models to studies from the Mining Software Repository (MSR) conference. Specifically, it is of interest to examine whether researchers accidentally have included variables (colliders) in their analyses which have biased their results. Method: A computer simulation was utilized as research methodology. This included the steps of (1) identifying a paper with colliders by sampling from the MSR conference and constructing Directed Acylic Graphs (DAGs), (2) a theoretical computer simulation of an SE scenario to prove collider effects, (3) computer simulations utilizing generated synthetic data based on the identified research paper. In addition, an analysis was conducted using the original data from chosen paper. Results: A lack of transparency amongst the research investigated was identified, where variable selection processes and underlying assumptions were not completely clear. Three papers were investigated in the first step of constructing DAGs. Subsequently, colliders were identified in the paper of Nagy and Abdalkareem [46]. Simulations revealed that the exclusion of collider variables improved the sought after effect sizes. However, no practical implications were possible to determine. Replication package available 1. Conclusion: A lack of transparency hindered the construction of DAGs, and indicated a threat to advancements in research. This, due to the need of interpreting authors’ assumptions in their research. An incorporation of causality and DAGs could, due to the increased transparency it would bring, in the long run result in more robust advancements in research. Additionally, DAGs are recommended as tools to mitigate the risk of accidentally conditioning on colliders.
Beskrivning
Ämne/nyckelord
Empirical Software Engineering , Colliders , Directed Acyclic Graphs , DAGs , Mining Software Repository Research , Causal Inference , Bayesian Statistics
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index