Causal Models Applied to Studies within the Mining Software Repository Domain

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Context: Research conducted in the mining software repository domain commonly utilize observational data, due to software repositories serving as a rich source of such data. Simultaneously, there is a clear lack regarding the incorporation of causality in Software Engineering (SE) research, whilst statistical analyses often are conducted. Objective: To analyse the practical implications of applying causal models to studies from the Mining Software Repository (MSR) conference. Specifically, it is of interest to examine whether researchers accidentally have included variables (colliders) in their analyses which have biased their results. Method: A computer simulation was utilized as research methodology. This included the steps of (1) identifying a paper with colliders by sampling from the MSR conference and constructing Directed Acylic Graphs (DAGs), (2) a theoretical computer simulation of an SE scenario to prove collider effects, (3) computer simulations utilizing generated synthetic data based on the identified research paper. In addition, an analysis was conducted using the original data from chosen paper. Results: A lack of transparency amongst the research investigated was identified, where variable selection processes and underlying assumptions were not completely clear. Three papers were investigated in the first step of constructing DAGs. Subsequently, colliders were identified in the paper of Nagy and Abdalkareem [46]. Simulations revealed that the exclusion of collider variables improved the sought after effect sizes. However, no practical implications were possible to determine. Replication package available 1. Conclusion: A lack of transparency hindered the construction of DAGs, and indicated a threat to advancements in research. This, due to the need of interpreting authors’ assumptions in their research. An incorporation of causality and DAGs could, due to the increased transparency it would bring, in the long run result in more robust advancements in research. Additionally, DAGs are recommended as tools to mitigate the risk of accidentally conditioning on colliders.

Beskrivning

Ämne/nyckelord

Empirical Software Engineering, Colliders, Directed Acyclic Graphs, DAGs, Mining Software Repository Research, Causal Inference, Bayesian Statistics

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced