Applying software engineering principles to develop parcel delay forecasting models using tracking data: A study of the models ARIMA, BSTS, and GAM
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The fast growth of data in the supply chain industry combined with the development
of more sophisticated data science tools has amplified the possibilities of actors in
this industry to leverage their data. At the same time, industries need to apply best
practices for data science programming to ensure software quality. In accordance
with the digitalization in the supply chain industry, this thesis aims to explore the
possibilities of utilizing tracking event data in parcel delay forecasting. That is, to
forecast the share of late packages in various geographical regions. Three different
models have been applied for time series analysis where both the predictive ability
of the models and the significance of the tracking data is studied. The three models
studied in this thesis are Auto-regressive integrated moving average models (ARIMA),
Bayesian structural time series (BSTS), and Generalized additive models (GAM).
During the development of the models, various tools and techniques were applied to
follow software engineering principles. The tools and techniques used was compared
to what tools and techniques used currently at Company X.
The research was conducted in collaboration with Company X in a field study that
aims to both elaborate on the studied models but also compare practices performed
in industry and practices suggested by formal theory research regarding software
engineering principles in data science programming. Results show that out of the
three models, the best performing model is BSTS and the second best performing
model is the GAM. This is expected, since the ARIMA model is, in essence, a
sub-model of the two other classes of models. Although the different models have
different forecasting capabilities, the quality of the predictions depend on the regions
inherent variance in the target variable.
The results also show that there is not a very clear correlation between the features
created from the tracking-event data and the target variable although some features
stand out more than others. The mixed-effects between variables also seem to have
greater predictive ability than the independent contributions of the in-going variables.
Furthermore, the results show that Company X makes a distinction between the
phases of data exploration and software delivery in data science programming. This
distinction is not found during the formal theory research. Although the practices
found in the formal theory research are followed by Company X, creating a distinction
between the two phases allows different practices to be followed at different stages of
a data science project. To ensure software engineering principles during data science programming, Company
X needs to separate data exploration from software delivery and create a clear
framework with what tools and techniques should be used at what stage.
Beskrivning
Ämne/nyckelord
Time series, ARIMA, BSTS, GAM, software engineering, software engineering principles, supply chain