Co-clustering of Tensor Data Using Sparse Tensor Factorisation

dc.contributor.authorTabakovic, Selma
dc.contributor.departmentChalmers tekniska högskola / Institutionen för matematiska vetenskapersv
dc.contributor.examinerAxelson-Fisk, Marina
dc.contributor.supervisorHeld, Felix
dc.date.accessioned2020-08-10T11:19:17Z
dc.date.available2020-08-10T11:19:17Z
dc.date.issued2020sv
dc.date.submitted2020
dc.description.abstractWith the ever increasing amounts of data generated from new sources and scientific methods, e.g. high throughput genome sequencing methods in bioinformatics, powerful tools for exploratory data analysis are required. One such tool is clustering, i.e. grouping together coherent observations in data, which is important for categorising vast amounts of observations into a more manageable format for further analysis. However, this task is subject to new challenges as tensor data, i.e. multidimensional data, has become a frequent occurrence in many applications. For tensor data, a clustering approach called co-clustering in particular has recently attracted research attention. Co-clustering means that the clustering is performed on all of the tensor dimensions simultaneously, which enables the detection of joint data expressions that only occur under special circumstances. In this thesis, two methods for co-clustering of tensor data using sparse CP decompositions are proposed. The motivation behind using a tensor factorisation with enforced sparsity is that it can enable the extraction of the most relevant data from the tensor, whilst reducing noise. The first method, called the sCP-S, considers the sign pattern in the vectors, obtained from a sparse CP decomposition, to determine the clustering. The second method instead uses hierarchical clustering on the sparse CP decomposition vectors, and is named sCP-HC. The two methods were compared on simulated data and the more flexible sCP-HC was tested thoroughly on more advanced simulated data sets. The types of predefined co-clusters that can be detected, and the stability of co-cluster detection under perturbations of the input data, were both investigated prior to applying the sCP-HC on real data. These evaluations have been performed through computer simulations on simulated data sets, along with application on a real genomic tensor data set. The obtained results from the simulations show that the sCP-HC has the potential to detect several types of additive coherent co-clusters. Additionally, the stability simulations show that the sCP-HC is quite consistent in its co-clustering, even in the presence of considerable noise. Applying the sCP-HC to real genomic data, several interesting co-clusters were obtained, which can be used for further analysis. As such, this work concludes that the sCP-HC is a useful tool for detecting coherent co-clusters in tensor data, and for exploratory data analysis.sv
dc.identifier.coursecodeMVEX03sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/301442
dc.language.isoengsv
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectCo-clustering, Tensor decomposition, CANDECOMP/PARAFAC decomposition, Sparsity, Agglomerative hierarchical clusteringsv
dc.titleCo-clustering of Tensor Data Using Sparse Tensor Factorisationsv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
local.programmeEngineering mathematics and computational science (MPENM), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Selma Tabakovic Master Thesis.pdf
Storlek:
16.74 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.14 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: