Applying software engineering principles to develop parcel delay forecasting models using tracking data: A study of the models ARIMA, BSTS, and GAM

dc.contributor.authorTang, Alex
dc.contributor.authorHalmkrona Lahtinen, Joonas
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.examinerFeldt, Robert
dc.contributor.supervisorTorkar, Richard
dc.date.accessioned2022-07-07T13:09:48Z
dc.date.available2022-07-07T13:09:48Z
dc.date.issued2022sv
dc.date.submitted2020
dc.description.abstractThe fast growth of data in the supply chain industry combined with the development of more sophisticated data science tools has amplified the possibilities of actors in this industry to leverage their data. At the same time, industries need to apply best practices for data science programming to ensure software quality. In accordance with the digitalization in the supply chain industry, this thesis aims to explore the possibilities of utilizing tracking event data in parcel delay forecasting. That is, to forecast the share of late packages in various geographical regions. Three different models have been applied for time series analysis where both the predictive ability of the models and the significance of the tracking data is studied. The three models studied in this thesis are Auto-regressive integrated moving average models (ARIMA), Bayesian structural time series (BSTS), and Generalized additive models (GAM). During the development of the models, various tools and techniques were applied to follow software engineering principles. The tools and techniques used was compared to what tools and techniques used currently at Company X. The research was conducted in collaboration with Company X in a field study that aims to both elaborate on the studied models but also compare practices performed in industry and practices suggested by formal theory research regarding software engineering principles in data science programming. Results show that out of the three models, the best performing model is BSTS and the second best performing model is the GAM. This is expected, since the ARIMA model is, in essence, a sub-model of the two other classes of models. Although the different models have different forecasting capabilities, the quality of the predictions depend on the regions inherent variance in the target variable. The results also show that there is not a very clear correlation between the features created from the tracking-event data and the target variable although some features stand out more than others. The mixed-effects between variables also seem to have greater predictive ability than the independent contributions of the in-going variables. Furthermore, the results show that Company X makes a distinction between the phases of data exploration and software delivery in data science programming. This distinction is not found during the formal theory research. Although the practices found in the formal theory research are followed by Company X, creating a distinction between the two phases allows different practices to be followed at different stages of a data science project. To ensure software engineering principles during data science programming, Company X needs to separate data exploration from software delivery and create a clear framework with what tools and techniques should be used at what stage.sv
dc.identifier.coursecodeDATX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/305139
dc.language.isoengsv
dc.setspec.uppsokTechnology
dc.subjectTime seriessv
dc.subjectARIMAsv
dc.subjectBSTSsv
dc.subjectGAMsv
dc.subjectsoftware engineeringsv
dc.subjectsoftware engineering principlessv
dc.subjectsupply chainsv
dc.titleApplying software engineering principles to develop parcel delay forecasting models using tracking data: A study of the models ARIMA, BSTS, and GAMsv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 22-82 Tang Halmkrona.pdf
Storlek:
2.37 MB
Format:
Adobe Portable Document Format
Beskrivning:
Master’s thesis in Computer science and engineering

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.51 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: