- PostCausal effect of carbon footprint calculators(2022)This master thesis aims to answer whether theory on causality and multivariate time series are relevant tools for questions that might arise in the context of different tracking apps. The context is the mobile application Svalna, which is a research-based carbon calculator designed to help people track and reduce their emissions. It has been shown that information provision can impact behavior, so the central question is whether using the Svalna application impacts the users consumption. I introduce a statistical approach to analyse multivariate time series like those gathered through Svalna. I create a data generation model to test the suggested statistical model. As an intermediate check, the model is used to evaluate a data set from Svalnas users. I conclude that the mechanisms of the developed models function in well-behaved data and the model should be seen as a intermediate step towards a model to analyze real data from Svalna. I think it is a useful approach that can contribute to understanding behavioural change and contribute to better app design.
- PostSpatial Modeling of Formation of Gel(2022)Understanding and predicting colloidal interaction is important in a variety of applications. In this study, we investigate aggregation dynamics of colloidal silica by generating simulated structures and comparing them to experimental data gathered through scanning transmission electroscopy (STEM). More specifically, diffusion-limited cluster aggregation (DLCA) and reaction-limited cluster aggregation (RLCA) models with different functions for the probability of particles sticking upon contact were used. Aside from using a constant sticking probability, the sticking probability was allowed to depend on the masses of the colliding clusters and on the number of particles close to the collision. It was found that in comparison to using a constant sticking probability, both the mass-dependent and neighbordependent sticking probability improved the goodness-of-fit of spatial summary statistics when the simulated data were compared to the experimental data. The models were also compared based on fractal dimensions. Both in terms of goodnessof- fit for the summary statistics and the fractal dimension, the structures generated with a neighbor-dependent sticking probability were the most similar to the experimental data. This model was further analyzed by conducting global envelope tests based on the spatial summary statistics. The tests showed that although the summary statistics are similar for the simulated and experimental structures, there are also systematic deviations. Structures generated with the same model were also compared with the STEM data by simulating flow and diffusion. From this analysis, it was seen that the permeability and the geometry factor of the simulated and experimental structures were relatively similar.
- PostUsing language models to improve a speech recognition based maritime emergency call detection system(2022)Novel applications of the transformers architechture as well as the availability of pre-trained models have drastically reduced the amount of data required to train successful speech-to-text (STT) models. By using the Connectionist Temporal Classification (CTC) algorithm, the process is further simplified as the training data does not have to be pre-segmented. This work aims to improve the performance of such a model developed to detect maritime VHF radio emergency calls by adding a language model to the CTC-decoding. We experiment with language models trained on several different text corpora and apply language models both in the decoding and on the resulting transcripts. The results indicate the importance of large amounts of domain-specific text. The results also show that a reduced Word Error Rate (WER) does not necessarily lead to an improvement in contextual comprehension. Finally, it is shown that relatively large improvements are given by fine-tuning various pre-trained STT-models on a curated dataset.
- PostEvaluation of a bidirectional GAN on high dimensional data(2022)In statistics and machine learning it is well known that as the dimensionality of a space increases, an exponentially greater amount of data is necessary to accurately analyze it. This is a problem currently faced by Svenska Handelsbanken AB. As they aim to simulate future markets, they require methods of estimating densities of historical markets in order to generate new data points on which to produce the simulations. This thesis investigated the ability of a novel machine learning algorithm to generate data that manages to capture tail dependencies that common statistical models fail to do. The performance was first measured on a simulated data set where the means and variances were already known, followed by measuring the performance on real market data. The results on the market data made it clear that the algorithm was not capable of capturing tail dependencies as desired as it generally generated points of much smaller variance than the original data. However, the results on the simulated data implied that on a data set of roughly only ten times the size, which in machine learning is not extremely large, the algorithm would likely generate data according to the original distribution much more consistently.
- PostAdaptive Driver Modelling for Forward Collision Warning Systems(2021)In this work, driving behaviour is analysed with the purpose of finding connections between a drivers routine driving and their behaviour in collision and near-collision situations. The ambition is to improve the Forward Collision Warning (FCW) system on Volvo cars by taking information from previous driving situations of the current driver into account when determining the best timing for issuing a collision warning. The analysis is performed by means of feature extraction on multivariate time series data, containing measurements from various sensors. Using principal component analysis (PCA) and clustering methods such as k-means and DBSCAN, no connections relevant to the formulated aim could be found in the investigation. The conclusion drawn is that a more thorough evaluation of the available data is required. Removing parts of drive sequences that are not of interest or categorise the sequences into different scenarios can make the information more comparable and hence yield a better result. A more careful data cleaning of the available time series could also lead to an improvement.