DF Deep learning for particle tracking Bachelor’s thesis at the department of Physics Adrian Lundell David Tonderski Fredrik Meisingseth Dennis Kristiansson Department of Physics Chalmers University of Technology Gothenburg, Sweden 2020 Deep learning for particle tracking c© ADRIAN LUNDELL, DAVID TONDERSKI, FREDRIK MEISINGSETH, DENNIS KRISTIANS- SON, 2020. Supervisors: Daniel Midtvedt, Giovanni Volpe Examiner: Lena Falk Bachelor’s Thesis TIFX04-20-17 Department of Physics Chalmers University of Technology Cover: Visualization of tracked particles (circled) in an image from microscopy. Acknowledgements First of all, thank you, Daniel, for all the help you have given us during the course of this project, for your patience and for your guidance. We also want to thank you, Giovanni, for your great input at the start of the project as well as the rest of the group that developed DeepTrack for the base that our research has been built upon. Finally, we want to thank you, Erik, for your help in our testing of U-track on real data. Adrian, David, Fredrik and Dennis Deep learning for particle tracking Adrian Lundell David Tonderski Fredrik Meisingseth Dennis Kristiansson Supervisor: Daniel Midtvedt Supervisor: Giovanni Volpe May 2020 Abstract The use of machine learning for classification has in recent years spread into a wide range of dis- ciplines, amongst them the detection of particles for particle tracking on microscopy data. We modified the Python package DeepTrack, which makes use of deep learning to detect particles, creating a package called U-Track. By using a new network architecture based on a U-Net, better performance and higher computational efficiency than DeepTrack was achieved on images with multiple particles. Furthermore, functionality to track detected particles over series of frames was developed. The application of U-Track on experimental data from two-dimensional flow nanome- try produced tracks consistent with theory, as well as tracking larger quantities of particles over longer periods of time compared to a digital filter based benchmark algorithm. Sammandrag De senaste åren har användningen av maskininlärning för lösning av klassificeringsproblem spridits till en mängd discipliner, däribland partikeldetektion för partikelsp̊arning p̊a mikroskopidata. Vi modifierade Pythonpaketet DeepTrack som använder djupinlärning för att detektera partiklar och skapade paketet U-Track. Med en ny nätverksarkitektur baserad p̊a ett U-Net uppn̊addes bättre resultat och högre beräkningseffektivitet än Deeptrack p̊a bilder med flera partiklar. Vidare lades funktionalitet för att sp̊ara partiklar över en sekvens av bilder till. Användningen av U-Track p̊a experimentell data fr̊an tv̊adimensionell flödesnanometri producerade resultat som överrenstämmer med teorin, samt med fler partiklar sp̊arade över längre tidsperioder jämfört med en algorithm baserad p̊a digital filtrering. 1 Contents 1 Introduction 3 1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Report structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Theory 4 2.1 Network architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Measures of network’s detection performance . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Two dimensional flow nanometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.1 Determination of particle radius . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.2 Covariance based diffusion estimators . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.3 Testing of the diffusion hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Method 10 3.1 Network architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Training and loss function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.1 Extracting particle information from network output . . . . . . . . . . . . . . . . 13 3.3.2 The particle tracking algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3.3 Calculation of the cost matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Evaluation of network performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5 Benchmarking against DeepTrack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5.2 Optimal parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.6 Evaluation of particle tracks on experimental data . . . . . . . . . . . . . . . . . . . . . 18 3.6.1 Comparison with a digital filter based approach . . . . . . . . . . . . . . . . . . . 18 3.6.2 Bias of intensity and pixel size . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.6.3 Consistency with the diffusion hypothesis . . . . . . . . . . . . . . . . . . . . . . 19 3.7 Image generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Results 21 4.1 Evaluation of network performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Benchmarking against DeepTrack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2.1 Single particle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2.2 Multiple particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Tracking on experimental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.1 Tracking results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.2 Track length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.3 Position accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.4 Image generation routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5 Discussion 33 5.1 Evaluation of network performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2 Benchmarking against DeepTrack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.3 Tracking on experimental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.3.1 Plausibility of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.3.2 The cost matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3.3 Track length and quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.4 Image generation routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6 Conclusion and outlooks 38 References 39 2 1 Introduction The usage of machine learning algorithms in image analysis practices has been an increasing trend in the last decade as technological improvements has made the required data handling feasible. Advan- tages of neural networks include robustness against disturbances in data, generality for performance on different types of data and a possibility for automation, decreasing both time consuming work load and subjective decision making in scientific image analysis. In a 2019 paper, the research group of Giovanni Volpe showcased how their machine learning soft- ware named DeepTrack captured these advantages and outperformed several manual algorithms in the field of microscopy particle tracking [1]. The result has applications in a variety of fields, including cal- ibrating optical tweezers, measuring biomolecular forces and monitoring growth of crystals. However, the network used in DeepTrack only detects one particle for each image, handling images containing multiple particles by partitioning them into parts and analysing each part separately. For applications on large images or images of high particle density, this method results in long computation times. An approach for particle tracking in two-dimensional microscopy adapted for tracking a larger number of particles in one image is therefore of interest. Hence, one of the purposes of this project is to change the network architecture to a U-Net. This type of network was first proposed by Ronneberger, Fischer, and Brox in 2015 [2] and has since become one of the most commonly used network architectures in image segmentation, especially within medical analysis [3]. 1.1 Purpose The purpose of this project is to develop a framework for extracting particle tracks from microscopy data. A first step is a rework of DeepTrack based on a U-Net neural network structure, which is better suited for images containing multiple particles. With this new approach, the aim is to perform at least as well as the preexisting software with regard to classification performance and computational efficiency, while providing improved performance on large images and images of high particle density. Secondly, a pipeline for post-processing the output from the network is developed, turning image segmentation data on a sequence of frames into tracked particles upon which statistical analysis may be performed. Post-processing further allows for evaluation of particles from situational context be- tween frames not considered by the network, and may thus help in filtering of falsely predicted particles. Contextual information is derived from expected physical behaviour of the input data and so this part was developed for use on particular experimental data supplied from a PhD thesis project [4]. While the network and post-processing methods are general, a particular use case helps exemplifying usage considerations and showcases the usability of the final package. This final package is from here on called U-Track [5]. 1.2 Scope Development is confined to the use of a U-Net architecture in tandem with a particle tracking algorithm based on linear sum assignment. These methods are specifically chosen for the problem of multi frame tracking of many particles and thereby evaluation is focused on showing improved results on this problem. 3 1.3 Report structure This project can be thought of as consisting of three subprojects, to some extent carried out in parallel throughout all parts of the report: 1. Implementing a U-Net architecture adapted for particle tracking. 2. Improving the image generation routine used in network training. 3. Developing a post-processing pipeline to extract particle tracks from the network’s output. In chapter 2, the basic theory behind the neural network, particle tracking and some relevant metrics is given. Chapter 3 treats the developed methods and gives some general insight into the considerations behind them, as well as the methods used for evaluation. This is followed by a presentation of results in chapter 4 where each project part is evaluated or benchmarked. In chapter 5 and 6, the methods and results are discussed, with a finishing conclusion of the overall outcome of the project. 2 Theory The theory chapter starts off with a subchapter on the architecture of the network, explaining the roles of its most important building blocks and how they interconnect, as well as discussing the meaning of the network’s output. Following this is a subchapter presenting the metrics and terminology used to quantitatively measure the network’s performance. In the last subchapter, the groundwork for post-processing of the network’s output in connection to a real world use case is laid out, specifically introducing the concept of particle tracking in the context of two dimensional flow nanometry. This includes a description of the input data, theory about which information can be extracted, and methods for statistically analysing this information. 2.1 Network architecture A neural network transforms input data into a set of features containing information about the data. The term network architecture refers to the arrangement of layers in the neural network. A layer is a collection of artificial neurons, which are modelled as mathematical operations within a neural network. In this project, three types of layers are mainly used (see figure 1): • Convolutional layers, which in our case utilise convolution to detect local features in an input image. In practice, this means that a set of kernels (or matrices) is slid over the input in steps. For each step and for each kernel, the dot product between the kernel and the corresponding part of the image is inserted into the output image for that kernel, resulting in an output image for each kernel. This dot product can be understood as the weighted average of that part of the input. The number of output images (or feature maps) is called the number of feature channels of the layer. Before applying this layer, padding is often added to the input to preserve the image resolution. • Max pooling layers, which split the input image into smaller rectangles, choosing the largest value in each such rectangle and inserting it into the output image. This reduces the resolution of the input image, allowing the network to detect large-scale features in the following convolutional layers. The size of these rectangles is called the pool size. 4 • Upsampling layers, which for each input pixel create a small rectangle of pixels and insert it into the output image. All the pixels in the output rectangle are equal to the input pixel. The size of these rectangles is called the upsampling factor. Upsampling layers are used to increase the resolution of the input image. Figure 1: Visualisation of the three main types of layers that make up the neural network used in this project. First, a convolutional layer (blue) is applied to the input image with padding added. Then, a max pooling layer is applied (purple), followed by an upsampling layer (green). The upsampling factor is identical to the pool size, which means that the final resolution is the same as the input image resolution. After each of these layers, an activation function is usually applied to the output before passing it on to the next layer. In this project, the ReLU activation function visualised in figure 2 is mainly used. It can be described as the mapping of all negative values of the data to 0, while leaving other values unchanged. Figure 2: Visualisation of the ReLU activation function, which is used for most layers in both the DeepTrack and the U-Track networks. The original DeepTrack architecture consists of a convolutional base followed by a dense top. This dense top connects all outputs from the convolutional base to each one of the network’s output fea- tures. The network’s features are the x and y position of the center of the particle, as well as its distance from the center of the image. This approach is suitable for images containing a single parti- cle, but the generalization to multiple particle images is often inefficient, which is shown in this project. A more common approach for this problem is image segmentation, which assigns information locally 5 to an image. In our case, this means that U-Track’s features consist of images of the same size as the input data, outputting information on a pixel-to-pixel basis. This is accomplished by using an architecture based on a U-Net, which can be understood as a combination of two parts: 1. A contracting path, sometimes understood as the encoder because of it’s ability to detect large- scale patterns [6]. It can be seen as a series of blocks consisting of convolutional layers followed by a max pooling layer. Each such block decreases the resolution of the feature maps while increasing the number of feature channels to retain information. This process is repeated until the minimum desired feature map resolution is reached. 2. An expanding path or decoder, which is applied after the contracting path. Here, the blocks consist of convolutional layers followed by an upsampling layer. Therefore, each block increases the resolution of the feature maps while decreasing the number of feature channels, inversely to the blocks in the contracting path. To retain the more detailed information from the contracting path, the output of each block is also concatenated with the corresponding feature map from the contracting path before being fed into the next block. After the final block, convolutional layers are applied to reach the desired output shape. An important feature of the U-Net architecture is that it is scale invariant. This means that the network can be used on a variety of image sizes without retraining. Additionally, its prediction quality is not affected by the image size. An example of the original U-Net architecture is shown in figure 3. Figure 3: Visualisation of the original U-Net architecture, using an example with an input size of 572x572 pixels and 32x32 pixels at the lowest resolution. The blue boxes represent feature maps, while the arrows denote operations (or layers). Note that padding isn’t used for the convolutional layers, thus resulting in a decreasing resolution. Source: adapted from [2]. 6 2.2 Measures of network’s detection performance One purpose of our network is to classify each pixel in an image either as being part of a particle or not. This can also be done using a scanning box method, with the idea being to classify if such a scanning box contains a particle or not. The terminology being presented in this subsection is formulated for pixels, but can analogously be translated to be used regarding these scanning boxes. It is presented for the use case of particle detection, but it can be generalized to any binary classification problem. To evaluate the binary map resulting from the predictions for each pixel, one needs an answer key, called a label. Each pixel of the label corresponding to the presence of a particle is called a positive (P), while all others are negatives (N). A correct prediction that a pixel is a positive is called a true positive (TP), whilst an incorrect prediction that a pixel is a positive is called a false positive (FP). We define true negatives (TN) and false negatives (FN) in the corresponding way. The true positive rate (TPR), also known as recall, is calculated as TP TP+FN and the false positive rate (FPR) is calculated as FP FP+TN . The precision is calculated as TP TP+FP . Precision describes the amount of positive-guesses that are correct, while recall describes how many of the positives are cor- rectly classified. There are several other accepted measures, each describing a certain aspect of the classifier’s performance. Summarising different measures into one value may also be done in different ways, with the one primarily used when evaluating U-Track being calculated from a so called ROC curve. A ROC (Receiver Operating Characteristics) curve is constructed by plotting TPR against FPR for different values of some parameter. The best possible prediction would yield a point on the ROC curve in the upper left corner (0,1), meaning TPR = 100% and FPR = 0%, whilst worse pre- dictions would yield points further away from (0,1). Therefore, the distance to the upper left corner (DULROC)1 may be used as a measure of the performance of the classifier[7], which is what we use in this project. 2.3 Two dimensional flow nanometry Two dimensional flow nanometry (2DFN) is a subclass of the larger problem of tracking nanoparticles by illuminating them and recording their fluorescence intensity [4]. The tracking then involves detect- ing particles in each frame and linking corresponding particles over a series of frames forming particle tracks (xn)Nn=0 for a particle tracked over N frames. These tracks are then used for determining the properties of the particles2. 2DFN is characterized by tying the particles to a surface through one or more DNA tethered to each particle, constraining the movement to two dimensions, and subjecting them to a laminar flow, causing them to move in a deterministic fashion. This setup simplifies track- ing and greatly improves intensity determination compared to usual nano-tracking analysis in three dimensions. Even under these conditions recorded intensities may vary much within tracks, but a reasonable range is expected [4]. 2.3.1 Determination of particle radius Typically, scientific interest lies in statistically determining the radius of the particles. This radius is smaller than the wavelength of the reflected light, and thus the radius seen in the video differs from the actual particle radius. For this purpose, current state-of-the-art techniques make use of the diffusion of the particle [4]. Due to the laminar flow of the surrounding medium, particles will be influenced by flows with dif- 1There is no widely accepted abbreviation for this measure, so for the simplicity of this report, we chose this one. 2For convenience, particles are identified with their positions throughout this report. 7 ferent velocities. This is due to the fact that the flow velocity of the medium increases with the distance from the surface. Each particle experiences an average velocity vmedium approximately de- pendent on the radius r of the particle and two constants describing the laminar flow, u0 and λ. The speed vmedium = |vmedium| dependence is described by vmedium = u0(r + λ) (1) The resulting particle velocity vf is similarly in the direction of the laminar flow but of a different speed. Perpendicular to this motion, direction eD, it can be shown that the particle only undergoes diffusion, a behaviour showcased in figure 4. Particle speed vf and the characterizing diffusion coefficient D are estimated for each tracked particle. The particle radius may then be calculated through the diffusion relations of Stokes and Einstein, assuming the friction of the linking DNA to be negligible [4]. Other needed constants are the Boltzmann’s constant kB , the temperature T , the viscosity η, and A, representing inhomogeneities in the flow, where the non-physical constants A, u0 and λ are determined by calibration on known particles. The resulting equation is kBTvf D = Aηru0(r + λ), (2) which can be rearranged letting a = kBTvf D and b = Aηu0 and picking the physically feasible root, yielding the particle radius r = √ λ2 + 4ab− λ 2b . (3) If vmedium is approximated as equal for all particles, neglecting the influence of the radii, the particle radius r is given by r ≈ vf D · const, (4) where const = kbT/Aηvmedium. The intensities of the particles are also of interest, since intensity is expected to be correlated with the particle radius, Irel ∼ r2. (a) Distance difference in the direction of vf (b) Distance difference in the direction of eD Figure 4: Distance difference from the first frame for a track of 165 frames. The particle moves 60 µm in the direction of vf while staying around a displacement of zero in the direction of ef , showing the differentiating movement behaviours. 8 2.3.2 Covariance based diffusion estimators The statistic estimations used throughout the rest of this section are based on a paper by Vester- gaard, Blainey, and Flyvbjerg, which provides a method for determining the diffusion coefficient D for individual particle tracks through the covariance matrix of position displacements of a track [8]. Furthermore, it provides a method for quality assessment of the tracking through the variance of lo- calisation errors σ2 and a test of the assumption that the particles undergo diffusion, on which the estimates are grounded. As opposed to other methods, this method is proven to be unbiased and practically optimal for relatively short time series, making it suitable for this project [8]. Knowing the direction of flow for all particles allows for computation of the perpendicular unit vec- tor eD in the direction of diffusion. Diffusion displacements (∆xn)Nn=1 may then be computed as projections of position differences onto this vector. ∆xn = (xn − xn−1) · eD (5) and their coviariances ∆xn∆xn+j = 1 (N − j)2 N−j∑ n=1 ∆xn∆xn+j . (6) Through a theoretical model of the camera’s capture of the particle motion, it is then possible to find closed-form expressions for the covariance matrix of the displacements as a function of D, σ2, the time step between two consecutive frames ∆t, and the constant R related to the lens shutter behaviour. Solving for σ2 and D yields equations 7 and 8 σ̂2 = R(∆xn)2 + (2R− 1)∆xn∆xn+1, (7) D̂ = (∆xn)2 − 2σ2 2(1− 2R)∆t . (8) where an average σ2 from all tracks is used in equation 8. Assuming approximately equal tracking errors, this is a more stable estimator for D than using individually computed σ̂2 for each track. This value is expected to be around 2 µm2/s for particles with one DNA link between the particle and surface, decreasing with more links. [4]. 2.3.3 Testing of the diffusion hypothesis Testing of whether a track is consistent with free diffusion is also based on the covariances of particle displacement, specifically on the assumption that displacements further than one time step apart should not covariate [8]. More precisely, covariances for large N should be normally distributed around zero, with a variance given by equation 10 ∆xn∆xn+j ≈ 0, if j > 1 (9) var(∆xn∆xn+j) = α2 + 4αβ + 6β2 N − j − 2β2 (N − j)2 , if j > 1. (10) A more robust test transforms displacements (∆xn)Nn=1 with a discrete sinus transform to instead examine the periodogram of values P̌k, as this transformation can be proven to remove statistical dependencies within the type of particle tracks treated here. Statistical independent data ensures that 9 results from several shorter tracks (N ≈ 100) may be averaged over for a certainty equal to that of one long track assuming equal values of D and σ2. ∆x̌k = ∆t N∑ n=1 sin( πkt N + 1 )∆xn (11) P̌k = 2(∆x̌k)2 [(N + 1)∆t] . (12) These results may then be compared to the expected periodogram for free diffusion 〈P̌k〉 = 2D(∆t)2 + 2 [ σ2∆t− 2DR(∆t)2 ] (1− cos πk N + 1 ) (13) by testing that P̌k 〈P̌k〉 should follow a gamma distribution of shape parameter 1 2 and scale parameter 2. 3 Method The method chapter begins with a subchapter detailing the inner workings of the U-Net, including the attributes of its layers, the loss function and the training routine. Then follows a subchapter on post-processing, elaborating on the extraction of useful information from the output of the U-Net, as well as the details of how information about particles in separate images can be connected into tracks of said particles over a sequence of images. Then, there are three subchapters on the evaluation of different aspects of the package’s functionality. The final subchapter is of the need for a new routine for image generation and the different approaches that are considered. 3.1 Network architecture The network architecture used is a U-Net, as described in the section 2.1. The contracting path con- tains 4 blocks. Each such block contains 3 convolutional layers with padding, 3 by 3 kernels and ReLu activation functions. These are then followed by a max pooling layer. The initial number of feature channels used in the convolutional layers is 8, and the number is doubled after each max pooling layer. These max pooling layers have a pool size of 2 by 2, which corresponds to halving the image resolution in each block. The expanding path similarly consists of 4 blocks. The convolutional layers are identical to the ones used in the contracting path. The upsampling layers have size 2 by 2, which corresponds to a dou- bling of the image resolution. The number of feature channels is halved after each upsampling layer. In the concatenations, half the feature map from the contracting path is dropped to prevent overfitting. Overfitting is what happens when a model is too specialized on training data and does not gener- alize well to data it has never seen before. In this project, the model never trains on the same image twice, so it will not overfit to a certain image. However, the simulated data is often only an approxi- mation for the experimental data. In this case, the network might overfit to the simulated data type. Then, if the experimental data is of a slightly different type, the network’s performance might decrease. This effect should mainly be counteracted by simulating the data as realistically as possible, but it can also be reduced by the dropout layer. After the four blocks in the expanding path, two more convolutional layers of the same type are 10 applied. Finally, a convolutional layer with padding, a 1 by 1 kernel, no activation function and 5 feature channels is applied. The output from this last layer is the output of the network. The five features of the output are: 1. A measure of the network’s certainty that the pixel belongs to a particle. A sigmoid can be applied to this feature, mapping it to the interval (0, 1), which allows it to be interpreted as a probability. Therefore, the first feature of the prediction after the sigmoid is in this report called the probability prediction. The other features are only meaningful for pixels that are part of a particle. Therefore, the network is designed to only train on those features for pixels that are part of a particle. 2. The pixel’s x-distance to the center of the particle. 3. The pixel’s y-distance to the center of the particle. 4. The particle’s radius. 5. The particle’s relative intensity. Simulated particles are generated using a Bessel function, and the relative intensity is the constant with which this Bessel function is multiplied. For Bessel order 1, this is the highest point in the curve. This approach is proposed with the aim of achieving higher accuracy in particle localization and characterisation. 3.2 Training and loss function The term training denotes the process of minimizing the loss function for training data. The loss function is a measure of the discrepancy between label entries yij and predictions pij , where i is the feature index and j is the pixel index. A common loss function used for image segmentation problems is binary cross-entropy, which measures the performance of classification problems with outputs between 0 and 1 [9]. In particle tracking, data sets are often highly imbalanced with many more background pixels than pixels belonging to particles. This can lead to the training getting stuck in local minima where the network detects no particles. To avoid this, true positive detections are valued over true negatives through the weighted binary cross-entropy loss function L1 = − N∑ j=0 β · y1j log (p̃1j) + (1− y1j) log (1− p̃1j), (14) where y1j is the first feature of the label (y1j = 0 or y1j = 1), p̃1j is the first feature of the prediction after the sigmoid, which maps its values to the interval (0,1), β is the weighting parameter, and N is the number of pixels. During development, it was found that a value β = 30 worked well for most data sets. This loss function grows toward infinity when p̃1j tends to 1 as yij = 0 or the other way around, which makes it suitable for binary image segmentation. For the other features, the absolute error is used: Li = N∑ j=0 |pij − yij |, i ∈ {2, 3, 4, 5}. (15) 11 Crucially, the loss functions for the last four features are only calculated for pixels that belong to particles according to the label, not the prediction. The final loss function is then calculated as a weighted sum of the loss functions for each feature: L = 5∑ i=1 wi · Li (16) Weights are then chosen to both prioritize certain features and normalise the scales of the different parts of the loss function 3. After the loss function is formulated, the exact training method must be chosen. In the training examples of the original DeepTrack package, the training is set-up in a way where it starts with many iterations on small batches, which are the sets of images used in an iteration of the network’s training. For each iteration, a new batch is generated. As the training progresses, the batch size is gradually increased, while the number of iterations for each batch size is decreased. To simplify testing and analysis, U-Track is instead trained with a fixed batch size. It is generally accepted that larger batches lead to more stable training [10]. Therefore, the largest batch size that fits into the GPU’s memory is chosen, which in our case is 64. It is then trained for about 8000 iterations, with a new batch being generated for each iteration. The training takes about five hours4, compared to around three hours for DeepTrack. The hardest task for the network is prediction of images with low SNRs. A simple solution would be to only train using images with low SNRs, but this could lead to overfitting and decreased perfor- mance on high SNR images. To train mainly on images with low SNRs, but also some images with high SNRs, the SNRs of the images generated for each training batch are drawn out of an exponential distribution. 3.3 Post-processing Performing meaningful analysis on the network’s output requires post-processing in several steps, from identifying particles to sorting them into particle tracks and lastly performing statistical analysis on the resulting data. This process is made clear with the diagram in figure 5. In addition to creating tractable research data, the post-processing step allows for sorting out false positives from the network’s output. Figure 5: Data processing chain from input images to statistics, with blue squares representing forms of data and red squares representing processing steps. The post-processing begins with the extraction of particle detection information for each image. Particle detections are then linked into tracks and statistically analysed. 3For example, if the intensity can take values (0.1, 0.3) while the other features take values (1,3), w5 ≈ 10 might be chosen. However, if the quality of intensity prediction is not of interest to the user, w5 might be lower. 4Computer specifications: Processor: Intel Core i7-8750H 2.20 GHz, RAM: 16GB DDR4 2400 MHz, GPU: NVIDIA GeForce GTX 1060 6 GB. 12 3.3.1 Extracting particle information from network output The condensation from the pixel by pixel prediction down to information about a set of particles is done through the following process: 1. The first feature of the prediction is transformed into a probability prediction through application of a sigmoid. 2. The probability prediction is transformed into a binary map, with 1’s representing pixels where a particle is predicted and 0’s representing absence of a particle. This is done using a threshold called a cutoff, with all pixels with values larger than the cutoff being set to 1, and all others being set to 0. 3. A union-find algorithm groups clusters of ones in the binary map. Each such group is defined as a particle. 4. The particle center positions, radii and intensities are computed as averages over the remaining four features for pixels belonging to the particle. Choosing the cutoff value is a step of vital influence for the rest of the processing, as it determines the balance between the inclusion of particles and allowing false positives to filter through. Optimization is utilized for finding the cutoff maximizing detection5, but in practice this choice depends on the attributes of the input data and on the desired usage of the output. For example, one might accept more noise if the post processing is known to be robust against noise. The approach of averaging values over clusters is a somewhat naive method, as the uncertainty of the particle attribute values then depend on the number of pixels per particle. It also does not handle cases of overlapping particles, treating such occurrences as one large particle. Still, overlapping parti- cles were deemed to be of rare occurrence in the data used in this project, and should thus have little impact. It is however an important consideration for use cases with a higher particle density. 3.3.2 The particle tracking algorithm Tracking particles over time is the next step in the post-processing pipeline. With little to no intrinsic information carried over between frames, which are predicted individually and under varying light- ing conditions, the tracking algorithm is mostly based on the assumption that particle positions are sparse compared to particle movement between frames. This is a usually good assumption and is used previously in problems of similar type [11]. Additionally, if more information is known about the ex- pected behaviour of the particles, the algorithm can be further modified to take this into consideration. Connections between detected particles identified as the same physical particle detected in subse- quent frames n and n + 1 are formed inductively, with sets of particle detections (xn,j) J j=1 already part of tracks and the new particle detections (xn+1,i) I i=1 yet to be assigned to tracks. Note that the number of particles is different from frame to frame and not all particles should be linked, taking into account false positives as well as particles entering and exiting frames. A cost Cij is then assigned for each pair of particles from both collections based on how well their connection follows the behaviour assumptions, resulting in a cost matrix C. The problem can now be formulated as the linear sum assignment of for each row i finding the corresponding column j such that the cost ∑ cij of pairs ij is minimised and is solvable through the Hungarian algorithm [12]. 5See chapter 3.5.2 for more details. 13 The next step is handling cases of track ends and beginnings, which stem from the varying num- ber of particle detections between frames as well as unfeasible connections. The Hungarian algorithm is able to solve non square matrices by not find pairings for all rows and columns. Beyond that, a stronger condition is applied, where pairings with cost cij ≥ cmax are discarded as well. Particles in (xn,j) J j=1 without a good enough connection are then considered ends of tracks. Similarly, particles (xn+1,i) I i=1 are beginnings. After this step, very short tracks are filtered as noise, as false positives typically only appear in a few frames. A choice has to be made of the minimal number of frames in a track for it to be regarded as a tracked particle. 3.3.3 Calculation of the cost matrix The cost matrix (C) is a matrix with values in the range [0, cmax] summed from terms Ck representing assumptions about particle behavior that can be evaluated by comparing particles xt,j and xt+1,i. These terms should also be normalized to the same range as the cost matrix to work on the same comparable scale. In this project, the terms for the distance between particles CD and the fluctuations in intensity CI are used, but other factors can be imagined as well. This results in the equation cij = min (cD,ij + cI,ij , cmax), (17) where the minimum value is taken to ensure the maximum cost in C to be cmax. As known from theory, particles are subjected to a laminar flow and are expected to move with a certain velocity vf,j . From that information the distance cost is made more precise by estimating this velocity, predicting the next position xt,j + ∆t · vf,j and comparing it to the position xt+∆t,j instead. A maximum accepted value M for this distance is determined by an initial tracking with M much greater than the actual maximum. The mean squared displacement (MSD) is then calculated for each track. MSD is given by MSD = 1 Ñ − 1 Ñ−1∑ t=1 (xt − xt+1)2, (18) where Ñ is the length of the track, which is the number of frames in the track. log(N) is plotted as a function of the MSD, as seen in figure 6, where N is the number of particles tracked with a certain MSD. 14 Figure 6: Plot of log(N) as a function of MSD, where N is the number of particles. Two curve fits are plotted for the tracks with low MSD (particles) and tracks with high MSD (noise). They intersect at MSD = 1.2245 µm2, which gives a max distance M = 1.1066 µm. By linear fitting of the values with lower MSD (consisting of tracked particles) and comparing it to a linear fit of the higher MSD values (consisting of noise), the cutoff distance M can be determined by calculating the MSD value of the intersection of these two curves and taking its square root [4]. For the video tracked in figure 6, this value is 1.1066 µm. Normalising the distance difference by M2 results in the final distance cost cD,ij = { cmax M2 (xi − rj − vv,j) 2 if (xi − rj − vv,j) 2 < M2 cmax otherwise. (19) The intensity term represents another way of using C, functioning as a hard filter. Because of the fluctuations in intensities they are unsuitable to use as a measure for likeness, but can instead be used as a sharp bound. As a rule of thumb, fluctuations over 15% of the mean intensity can instantly be filtered as tracking errors [4]. Referring to plot 7 confirms such occurrences in the tracks. Letting I(x) represent the intensity of a particle, the intensity term is expressed as cI,ij = { 0 if | I(xt,j)−I(xt+1,i)∑t t̃=0 I(xt̃,j)/N | < 0.15 cmax otherwise. (20) 15 (a) Without filter term CI (b) With filter term CI Figure 7: Histograms of particle intensity deviations from mean relative to mean before and after introduction of CI in the cost matrix, logarithmised. The few terms to the right in (a) are considered falsely linked outliers which are removed in (b). 3.4 Evaluation of network performance The evaluation of U-Track’s performance is done on a pixel-by-pixel basis, since this is the most detailed scale of predictions and labels available. The following metrics were chosen6: 1. To evaluate the network’s ability to detect particles (the first feature): • DULROC. 2. To evaluate U-Track’s ability to predict particle attributes (other features)7: • Absolute error of position of the particle center. • Absolute error of radius prediction. • Absolute error of intensity prediction. 3.5 Benchmarking against DeepTrack Evaluating U-Track on a pixel-by-pixel basis leads to complications when comparing performance against DeepTrack, since DeepTrack does not produce a prediction for each pixel. Therefore, a particle- by-particle approach is used when benchmarking. The translation from pixel-by-pixel predictions to particle-by-particle is outlined in subsection 3.3.1. For the purpose of evaluating the ability to predict particle attributes, such a predicted particle is deemed a true positive if its center is within the radius of a true particle, otherwise it is deemed a false positive. For the purpose of evaluating U-Track’s ability to detect particles, a scanning box approach is used to calculate operating characteristics. Each image is broken down into overlapping smaller boxes roughly 6All measures are calculated as the mean of each metric for a batch of images and as a function of SNR. 7These metrics are only calculated for pixels belonging to particles, as calculating them for the background would be nonsensical. 16 the size of a large particle, placed a certain step size apart. Smaller step sizes result in better results, but longer computational times. Then, each of the scanning boxes yields operating characteristics based on the comparison between the number of true and the number of predicted particles within the box. Here, it is not demanded that the predicted particle centers fall within the radius of a true particle. It is rather assumed that a predicted particle is a true positive if there is a true particle within the same scanning box. It is also assumed that the grid of scanning boxes is fine enough to cancel the inaccuracies arising as a consequence of the first assumption. For single particle images though, the use of operating characteristics are assumed to be ineffective. The reasoning behind this is that since there is only one particle, the number of data points is too small for the operating characteristics to fairly describe the overall performance of the prediction. Therefore, precision is used for single particle images, with the hypothesis that precision should converge to 1 as SNR grows. Precision is calculated according to the previous paragraph. To benchmark U-Track against DeepTrack, the following metrics were chosen8: • Precision (for single particle images). • DULROC (for images with multiple particles). • Absolute error of position of particle centers. • Calculation time for prediction. 3.5.1 Training A new method for training the DeepTrack network was needed, as we found no ready way to train a network that could work with both multiple particle images and images containing no particles. To simulate this without changing the DeepTrack package, the particle position is chosen randomly from an interval bigger than the image size. To the best of our knowledge, this was also the approach of the original DeepTrack paper. The network is then trained according to the default training settings found in the DeepTrack repository. If the network has not converged by the end of the training, it is trained further on the biggest batch size. All in all, the training takes about three hours, which is slightly shorter than the time taken for U-Track’s training. 3.5.2 Optimal parameters Since the performance of both U-Track and DeepTrack depend on what parameters are fed to their post-processing methods, it is essential to optimize those parameters to get an accurate measure of the performance of the packages. Optimization methods are implemented mainly using the Nelder-Mead optimization algorithm [13], with DULROC chosen as the function to be minimized9. The methods for calculating optimal param- eters are quite slow, taking roughly 1 hour both for optimizing U-Track’s cutoff using 400 images and for optimizing DeepTrack’s parameters using 40 images on a mid-range laptop 10. These computation times makes it unfeasible to use a larger amounts of images per optimization run. There arises a risk 8All measures are calculated as the mean of each metric for a batch of images and as a function of SNR. 9For optimizing the cutoff used in U-Track, Nelder-Mead is used to generated an initial guess. It is then followed by a check that the guess is within the allowed interval. If it is not, then the initial guess is set to a predetermined cutoff. In both cases, the initial guess is fed to a bounded optimization using the L-BFGS-B algorithm. 10Computer specifications: Processor: Intel Core i7-9750H 2.60 GHz, RAM: 8GB DDR4 2400 MHz, GPU: NVIDIA GeForce GTX 1050 3 GB. 17 from using a small batchsize and that is that the optimization simply might not find the true optimum. The optimization methods are evaluated by plotting the value of the DULROC using parameter values around the ones suggested by the optimization. For U-Track, only the cutoff is optimized, so these graphs show a relatively detailed picture of the optimality of the calculated cutoff. For DeepTrack however, there are two parameters being optimized, and thus the analysis is more complicated. The method used here to investigate optimality is to vary one of the parameters at a time, with the other one being kept constant. This can be seen as the parameters only being varied in two distinct direc- tions on the plane that their possible values span, which is not enough to assume optimality, but it can be seen a rough check. Using these ”tests” of convergence, the optimization methods were deemed functional when using the number of images mentioned in the paragraph above. It should be kept in mind, though, that the values suggested by the methods are not proven to be optimal. 3.6 Evaluation of particle tracks on experimental data The network and post-processing are also tested by tracking on experimental data. The evaluation of this is made difficult due to the absence of information about the ground truth. This chapter explains how this is handled through qualitative comparison with a previously used tracking algorithm and reasoning around the consistency with theoretical results. 3.6.1 Comparison with a digital filter based approach An alternative tracking script not based on machine learning was generously provided by the authors of [11] for a rough comparison of tracking abilities. The script differs in most steps of the process: • Input data is processed through digital filters to extract light particles on a black background, as exemplified in figure 8. Figure 8: An example image processed through the benchmarking script, visualising the function of digital filtering. • Particle centers are computed by a cutting gradient algorithm [14]. • Tracking is done with a similar algorithm to the one developed in subsection 3.3.2, although allowing links formed with several frames in between and discarding short tracks. 18 • Intensity is calculated as the sum of pixel values from the image which differs from the intensity value predicted by U-Track. The script is also mostly suitable for data with high SNR and is not to be interpreted as the best possible alternative, especially as results might be improved through parameter tweaking. To fairly compare track lengths, they are compared both before and after an unlinking of tracks from the benchmark which contains gaps. 3.6.2 Bias of intensity and pixel size One desirable property of the tracking is equal detection performance regardless of particle intensity and size. The absence of this may result in particle distributions being skewed toward particles of high intensity and large size. The method chosen for evaluating potential bias is studying how the track average of the particle intensity and size varies with the number of frames in the track. A mean increasing with track length would imply that particles of higher intensity and size are more easily tracked. This conclusion is based on the assumption that in reality, most particle tracks should be of about the same length. The real distribution of intensity and size is unknown, ruling out a direct comparison between detected particles and the expected distribution. 3.6.3 Consistency with the diffusion hypothesis The diffusion hypothesis is used for evaluation in two different ways. Firstly, coming from a real use case, it has to be confirmed whether particles really are subject to diffusion. It is then important that U-Track is able to produce data which converges to the expected results from the periodogram theory developed in section 2.3.3. Comparison between the expected and measured periodogram is done quantitatively through Pearson’s chi-squared test by binning the computed values of P̌k, counting the quantities in each bin Oi and comparing to the expected number Ei by calculating χ2 = N∑ i=1 (Oi − Ei)2 E2 i . (21) The null hypothesis of likeness between the two distributions is then rejected with significance α if this value exceeds the α value of the χ2 distribution with N − 3 degrees of freedom, and otherwise nothing can be said for singular tests [8]. The next step is to use testing of diffusion as a metric of individual track quality, that is the closeness to a true particle track. The idea is that even though short single tracks do not provide enough basis for confirming diffusion, true tracks should at least not break expected behavior. Covariances with large deviations from zero then imply a track which does not behave as diffusion, indicating that the track is faulty. Similarly to the first method, the likeness of the distribution of covariances to a normal distribution around zero with variance given by theory is quantified through a statistical test, this time using Pearson’s test of variance χ2 = n∑ i=1 xi − E[xi] var(xi) . (22) 3.7 Image generation Training the network requires a large quantity of simulated images and for DeepTrack, this generation of images is often the single most calculation heavy part of training the network. 19 DeepTrack’s image generation routine consists of looping over the parameters for each particle and calculating the value of an adaptation of the Bessel function for each element in the matrix represent- ing the image to be generated. The Bessel function can be of different orders, in this report often called Bessel orders, but in this paper order 1 is primarily used, which simulates particles with their maximum intensity at the particle center and with an absolute intensity converging to 0 as function of the distance from particle center. Order 2 of the Bessel function could however also be useful, as it simulates a dark particle with an intense outer radius. To speed up the image generation, three different approaches were considered: • Near the center of the particle, the adaptation of the Bessel function of order 1 has a similar shape to Gauss function, likeness visualized in figure 9, and since the Gauss function has a less complicated formula, it might be cheaper to calculate. Therefore, since Bessel order 1 is the most relevant, it could be worth removing the ability to use other orders of the Bessel function to instead use the (potentially) cheaper to calculate Gauss function. Figure 9: Comparison between a Gaussian function and an adaptation of Bessel function of order 1. • Because the values of the adaptation of the Bessel function converge to zero relatively quickly as function of distance to particle center, the impact on image quality from only calculating the Bessel function for elements relatively close to the particle center in question might be negligible. Therefore, calculation time might be able to be cut by in this way limiting the number of calculations made. • The values of the adaptation of the Bessel function are calculated for each element in the image matrix. Since these calculations are many but individually relatively cheap, the process could be sped up by using the GPU instead of the CPU, as GPUs generally are faster at performing large quantities of smaller calculations. 20 4 Results The structure of the results chapter corresponds to the last 4 subchapters of the method chapter. The subchapters generally consist of a short summary of the results, followed by graphs visualizing them and short graph descriptions. 4.1 Evaluation of network performance The evaluation of U-Track’s performance starts with three measures, as shown in figures 11 and 12, that showcases the network’s ability to detect particles. These three measures all tend towards sat- isfactory values. The evaluation goes on to show the network’s ability to predict particle attributes. The network tends towards yielding predictions with small but significant mean errors. This is shown in figures 13 and 14. For each SNR, the network is evaluated on batches of 1000 images with the following parameters: Parameter Value Image size 256 by 256 px Particle number 10 - 20 Particle radii 2 - 8 px Particle intensities 0.1 - 0.3 Bessel orders 1 or 2 Gradient intensities 0.1 - 0.3 Background intensities 0.0 - 0.4 Noise Poisson Table 1: The image parameters and their respective values used when evaluating U-Track’s perfor- mance. The weights of the loss function are mainly chosen through testing. To begin with, w1 = 30 is chosen, as lower values often lead to the network not predicting any particles, while higher values sometimes cause too many predictions. The other weights are chosen so that their respective losses typically take values in the same interval, with w2 = 1, w3 = 1, w4 = 1, w5 = 5. Thus intuitively, w5 = 10 might have been chosen, but at the time of the choice it was deemed that the intensity prediction was of lower priority than the predictions of positions and radii. Example images at chosen SNRs are shown in figure 10 below. Thereafter, the results of the net- work evaluation outlined in section 3.4 are shown. 21 (a) SNR = 1 (b) SNR = 2 (c) SNR = 5 (d) SNR = 10 Figure 10: Examples of simulated images on which the network is trained, evaluated and benchmarked against DeepTrack at chosen SNRs (1, 2, 5, 10). The red circles denote particles in the image. The radius of the red circle is the same as the radius of the particle. Figure 11: Pixel-by-pixel DULROC for U-Track as a function of SNR. The dotted line is the minimal DULROC expected from random predictions and 1 is the maximal DULROC expected from random predictions11. 22 (a) (b) Figure 12: (a). Precision of U-Track as function of SNR. (b). Recall of U-Track as function of SNR. Both the precision and recall increase heavily until they reach a maximum at around SNR = 10, after which they stay roughly constant. Figure 13: MAE of particle center predictions for U-Track as a function of SNR. This is calculated on a pixel-to-pixel basis for pixels that belong to a particle in the label. A similar pattern to the one seen in figure 12 is found. 11Random predictions are expected to fall on the diagonal (0, 0) −→ (1, 1) in the ROC curve. 23 (a) (b) Figure 14: (a). MAE of radii for U-Track as function of SNR. (b). MAE of intensities for U-Track as function of SNR. A downward trend is seen until around SNR = 10. Interestingly, a slightly upward trend seems to exist for higher SNRs in the prediction of the radii. Figures 11 and 12 show that, as expected, U-Track’s performance is poor for SNR = 1 (although still significantly better than a random guess). It then improves greatly up until around SNR = 10, after which it stays roughly constant. The fluctuations of performance for larger SNRs are thought to be the consequence of suboptimal parameters, as elaborated on in section 5.1. The same trend can be seen in graphs 13 and 14, with performance improving rapidly until SNR = 10 and then staying roughly constant. Interestingly, there are little fluctuations but U-Track’s per- formance seems to decrease slightly for SNRs over 12, which is especially visible for the prediction of radii. This could be a sign of overfitting, as the network is trained on images with a mean SNR of 10. 4.2 Benchmarking against DeepTrack The benchmarking against deeptrack consists of two main parts; one using only images containing a single particle and one using images containing multiple particles. For single particle images, U-Track consistently achieves a lower MAE for predicting the position of particles, as shown in figure 15. The results for precision and calculation time, shown in figure 16, are more varied, as the difference between the packages’ performance grows from being small for small SNRs to being considerably in the favor of U-Track for large SNRs. For multiparticle images, the results regarding detection and prediction of position are similar to the result for single particle images. This is seen in figures 17 and 18. Similarly as for single particle images, U-Track is much faster for large images, whereas DeepTrack is faster for small images, as seen in figure 19. 4.2.1 Single particle Just as in section 4.1, for each SNR, the networks were tested on batches of 1000 images. The image parameters used are shown in table 2. The results of the benchmarking are shown in figures 15 and 16. 24 Parameter Value Image size 51 by 51 px Particle number 1 Particle radii 2-8 px Particle intensities 0.1-0.3 Bessel orders 1 or 2 Gradient intensities 0.1-0.3 Background intensities 0.0 - 0.4 Noise Poisson Table 2: The image parameters and their respective values used when benchmarking U-Track against DeepTrack on single particle images. Figure 15: MAE of particle center predictions for U-Track and DeepTrack as a function of SNR for single particle images. The dotted line is the expected MAE for random predictions, as only predictions within a particle are counted. (a) Precision as function of SNR. (b) Calculation time per image. Figure 16: Precision as function of SNR and calculation time as function of image size for U-Track and Deep-Track. The precision converges to almost 1 for DeepTrack, but not for U-Track. U-Track is faster for small images, but not for large ones. 25 As seen in figure 15, U-Track outperforms DeepTrack for all SNRs and converges to a level of roughly one fifth of the one DeepTrack converges to. For the precision measured in figure 16, U-Track performs better than DeepTrack for very low SNR, but then converges to a level roughly 20% worse than that of DeepTrack. This is a symptom of U-Track being more prone to predict too high a number of particles, which DeepTrack cannot do. Figure 16 also shows that DeepTrack is slightly faster for smaller images and proceeds to being much faster for larger images. 4.2.2 Multiple particles Unlike for single particle images, the batch sizes used for multiparticle images are only 100. This decrease is made to lower overall computation time. The image parameters used were: Parameter Value Image size 256 by 256 px Particle number 10-20 Particle radii 2-8 px Particle intensities 0.1-0.3 Bessel orders 1 or 2 Gradient intensities 0.1-0.3 Background intensities 0.0 - 0.4 Noise Poisson Table 3: The image parameters and their respective values used when benchmarking U-Track against DeepTrack on multiple particle images. The results of the benchmarking are shown in the following figures: (a) (b) Figure 17: (a). DULROC of U-Track and DeepTrack as functions of SNR for multiple particle images. The black dotted line is the minimal DULROC expected from random predictions, see figure 11. (b). Particle center position MAE for U-Track and Deeptrack as function of SNR for multiparticle images. 26 (a) (b) Figure 18: (a). Precision for U-Track and DeepTrack as functions of SNR for multiple particle images. (b). Recall for U-Track and DeepTrack as functions of SNR for multiple particle images. (a) (b) Figure 19: (a). Calculation time per image for U-Track and DeepTrack as functions of image size for multiple particle-images. (b) is the logged results from (a). This is done to obtain a more nuanced picture of the packages’ efficiency compared to each other rather than in general. In figure 17 it is shown that for both mean DULROC and particle center position MAE, U-Track perform better than DeepTrack for practically all SNR. In particular, the DULROC of U-Track’s predictions coverges to a level of rougly a sixth of DeepTrack’s and the positional error of U-Track’s predictions converges to a level of roughly a fourth of that obtained by DeepTrack. As a more nuanced complement to the DULROC, the graphs of precision and recall in figure 18 shows that U-Track generally outperform DeepTrack in not only the ratio of predictions that are cor- rect but also in the number of particles that are detected. In figure 19, the calculation times for DeepTrack exceeds U-Track’s for images larger than around 80 by 80 px. The difference for small images is small per image, but in the event of running predictions on a vast number of small images, it might be non-negligible. 27 4.3 Tracking on experimental data The analysis of the tracking on experimental data is split into three parts. In the first section, the results from calculations of scientific interest are examined and compared to expected results. Following this is the distribution of track lengths together with an overview of eventual biases towards tracking particles of different intensities, and in the last section the quality of tracks are measured through their agreement with the diffusion hypothesis. When relevant, results are compared to a digital filtering based algorithm as a reference. 4.3.1 Tracking results For the results presented in this section a total of 10 000 frames were tracked, and only tracks containing 20 or more frames were included for analysis. The diffusion of the tracked particles is estimated with equation 8 and plotted in figure 20a. The median diffusion and the mean diffusion are 1.61 µm2/s and 1.64 µm2/s respectively. Particle radius is calculated approximately with equation 4 and is plotted in figure 21a. The relative intensity distribution is shown in figure 21b. The tracked particles are expected to follow a constant flow direction. This direction is calculated from the tracked particles, and each particle velocity is compared to this flow, as shown in figure 20b. The median particle direction differ 10.2◦ from the laminar flow direction. (a) (b) Figure 20: (a) shows a histogram visualizing estimated diffusion for tracked particles, with length of minimum 20 frames. The median value is 1.61 µm2/s. In figure (b), the direction of particle velocities are compared to the total flow direction off the tracked video. The angular difference is plotted, with median value 10.2◦. Statistical tests of the diffusion hypothesis can be seen in figure 22, performed on the longest continuous track available. Some similarity to the theoretical values is seen, but ultimately not enough to say anything for certain, with a p-value of 0.5 from Pearson’s goodness of fit test. The number of data points were also much fewer than the recommended amount of 1000 from the original article as no other tracks with the same diffusion coefficient and σ were found to create an average from. 28 (a) (b) Figure 21: (a) shows estimated radius of tracked particles. Radius is calculated approximately with equation 4, and is given up to a constant. In figure (b), relative intensity of tracked particles is plotted. (a) Histogram of P̌k relative to expected Pk (b) Sampled periodogram of P̌k Figure 22: Two visualisations of transformed displacements for a track of length 356 compared to expected results for diffusion (in black). The shape of the data makes it difficult to draw definite conclusions on whether the track describes diffusion. 4.3.2 Track length In absolute numbers, U-Track finds 1029 tracks above length 20, compared to the 405 found by the benchmark script. When tracks with gaps from the benchmark are split the number of such tracks fall to 306. The distributions of track lengths can be seen in figure 23, from which it is clear that U-Track also manages to find longer tracks. 29 (a) Tracks without gaps (b) Tracks with gaps Figure 23: Track length distributions where U-Track is compared to the benchmark script. Note that U-Track find more tracks for all lengths. From figure 24, there are several interesting observations to be made. Most striking is the clear resem- blance between the graphs of average intensity and average number of pixels per particle. Calculating the correlation between the two values for individual tracks yields a coefficient of 0.67, generally con- sidered a strong correlation, which confirms this likeness. Both graphs also follow a clear pattern of two different behaviours, a sharply increasing mean up to a breaking point around twenty frames in, beyond which only a slight positive slope may be distinguished. This slope does not indicate any significant bias. It should finally be pointed out that tracks below twenty frames have significantly many more outliers also of high intensity and pixel size, which is not shown by the graph. (a) (b) Figure 24: The filled area indicates a 95% confidence interval, where the darker line indicates the mean value. In (a) the average intensity is shown for different track lengths, where a short track tends to have smaller intensities. In (b), the y-axis instead shows pixel size of the first feature. The plots are close to identical, which is surprising. 30 4.3.3 Position accuracy The estimated position error σ is mostly distributed around 0.1 µm, decreasing noticeably between track lengths 0 to 100 frames, referring to figure 25. This result is comparable to slightly better than the benchmark script which does not have as many long tracks to analyse. Pearson’s test of variance with a significance level of α = 0.99 keeps the null hypothesis for 98.9% of all tracks, which is very close to the expected result of 99% would the values be drawn from an actual normal distribution. For the benchmark script, the corresponding value was 99.4% and as a comparison with a non diffusing direction, a test of diffusion in direction vf rejects the null hypothesis in 50% of cases. (a) (b) Figure 25: The plots visualizes the estimated localization error σ for tracks. In (a) the distribution of σ is shown, where count is the number of tracks in each interval, with mean 0.1 µm. (b) shows correlation between track length and average σ, where the filled area is a a 95% confidence interval. Longer tracks tends to have a smaller position error σ. 4.4 Image generation routines Three improved algorithms for image generation were developed: • The original DeepTrack algorithm but with the Bessel function only being calculated for elements near particle centers, defined as elements within a distance of three particle radii in either x- or y-direction. This threshold of distance from particle center is called a cutoff, not to be confused with the cutoff used when extracting particle information, introduced in chapter 3.3.1. • The same as above but calculating the Gauss function instead of the Bessel function. • Calculating the Gauss function using the GPU. This is done without a cutoff since the bottleneck of this approach is the amount of data being sent to the GPU and using a cutoff either means sending a larger number of matrices to the GPU or a larger amount of array accesses. The image parameters used were: 31 Parameter Value Image size 128 by 128 px (when not varied) Particle number 1-6 (when not varied) Particle radii 1-3 px Particle intensities 0.5-0.7 Bessel orders 1 (if used) Gradient intensities 0.25-0.75 Background intensities 0.3- 0.5 Noise Poisson SNR 2-10 Table 4: The image parameters and their respective values used when measuring speed of different image generation routines. The reason for choosing these values is that they more closely resemble the ones used by DeepTrack than the ones used in the rest of this report. The calculation times per image for the different algorithms were measured as below12: (a) (b) Figure 26: Calculation times as function of image size. (a). All the new image generation routines are included and the graph dispays that the old routine is the slowest. (b). The same figure as (a) but with the plot of the original routine excluded. The graphs suggest that the routines implementing a cutoff are the most time efficient. 12Computer specifications: Processor: Intel Core i7-9750H 2.60 Hz, RAM: 8GB DDR4 2400 MHz, GPU: NVIDIA GeForce GTX 1050 3 GB. 32 (a) All routines included. (b) Original routine excluded. Figure 27: Calculation times as function of number of particles. (a). All the new image generation routines are included and the graph displays that the old routine is the slowest. (b). The same figure as (a) but with the plot of the original routine excluded. The graphs suggest that the routines implementing a cutoff are the most time efficient. It is clear that all new routines outperform the original one and that the routines implementing a cutoff on distance from particle center are the most efficient. 5 Discussion The chapter of discussion contains the same subchapters as the results chapter and the last part of the method chapter. This is because the considerations elaborated on here generally arise from the thought of ”If a different choice had been made during the construction of the method, how would the results be affected?”. The matters handled in this chapter are thus often based on details from the method or theory chapters, but are approached from a point of view determined by the results chapter. 5.1 Evaluation of network performance The choice of evaluating performance of position, radius and intensity prediction without taking into account whether U-Track actually predicts that the pixel belongs to a particle can lead to mislead- ing results. For example, assume that the probability of a pixel being a true positive decreases with the distance from the center of the particle while the radius prediction performance increases. Then U-Track’s radius prediction might perform significantly worse when applied to real data compared to our evaluation. Another cause for concern is our definition of the SNR, which we inherited directly from DeepTrack. The SNR is defined as: SNR = √ µ σ , (23) where µ is the mean of the pixel intensity and σ is its standard deviation. In image processing, the SNR is usually defined as µ σ [15]. This makes the evaluation difficult to compare to other projects. Example images for chosen SNRs are shown in figure 10, but this only serves as a way to get a feeling for the 33 network’s performance and cannot be used for a quantitative analysis. A simple conversion algorithm could be to multiply our SNR by √ µ. However, because of the gradient and particle intensities, this is a highly approximate method. Our evaluation has a mean µ ≈ 0.2, so our ”breaking point” around SNR = 10 would then be converted to SNRconverted ≈ 5. As mentioned in chapter 3.5.2, the performance of both U-Track and DeepTrack depend greatly on the choice of their parameters and also, the optimization methods used for optimizing said parameters are not proven to always reach an optimum. Both the dependence and suboptimality likely manifests themselves in the results of the evaluation of network performance as well as in the results of the benchmarking against DeepTrack. More precisely, the fluctuations in figures 11 and 12 could be the results of the optimization method failing to converge towards the optimum. An argument for this hypothesis is that the data points in general follow a relatively clear pattern and that these fluctuations appear to be somewhat erratic and of inconsistent. 5.2 Benchmarking against DeepTrack There were several choices made when designing the benchmarking process. To eliminate subjectivity in the decision making process, some general guidelines were formulated: • Center the benchmarking around images containing multiple particles. This is the image type that the project focuses on, but it is not the primary focus of DeepTrack. • Do not modify the DeepTrack package in any way when using it, as it is highly subjective where the limit of reasonable modification is. • Use the methods as described in the original DeepTrack article or the examples provided in the DeepTrack git repository. This is both to make our tests as optimal as possible, and to save effort on designing the testing methods for DeepTrack. • When possible, design the tests around the way DeepTrack outputs data. These guidelines should be kept in mind when analyzing the results presented in this report, and especially when comparing them to the ones being presented in the original DeepTrack paper. At first glance, it might seem that U-Track performs better at all cases of particle tracking, even on single particle images. This is not the correct conclusion as the single particle images used in this project are different from the ones used in the original DeepTrack paper. In the original paper the par- ticles are always positioned near the center of the images while in our project, the particle positions are uniformly distributed in the images. This difference naturally results in a worse performance for Deep- Track which is not primarily designed for these types of images. It should be noted that DeepTrack might perform better than U-Track (and be faster) on the image type that it was originally intended for. As stated earlier, the primary goal of this project is tracking multiple particle images, so it is not obvious that benchmarking on single particle images is needed at all. However, this part of the testing allows a more direct comparison between the networks, as it does not include DeepTrack’s parameters. In the benchmarking of multiple particles, those parameters cannot be eliminated. The tests were made as objective as possible by algorithmically optimizing DeepTrack’s parameters. However, none of these parameters are proved to be optimal, so it is possible that DeepTrack’s performance could be made better by further tweaking. Furthermore, a slight modification13 of the DeepTrack package 13The current training routine was slightly modified, where instead of simulating images without particles by placing the particles outside of the image’s borders, we simply did not place out any particle at all. The network was then 34 was tried, and this showed a clear improvement in performance. However, significant changes to the DeepTrack package would most likely have to be made to improve the performance to the point where it would be comparable to U-Track’s on multiple particle images. The parameters of DeepTrack discussed up until this point are only the ones used in the post-processing. Before that, there are a number of other values that have to be decided. They include the scanning box size and step, as well as the number of layers in the network and their dimensions. This means that an objective comparison between packages is hard to make, as tweaking of a parameter might lead to better results. However, the selection of parameters was made according to the examples provided in the git repository, which suggests that they should be approximately optimal. As the two packages do not have the same direct output, the methods for translating their respective output to a common form might be prone to bias, as such methods might ignore nuances in the original output or infer new nuances not previously present. The methods used in this project, perhaps most significantly the routine outlined in section 3.3.1 and the scanning box method presented in section 3.5, aim to angle said bias in favour of DeepTrack by mostly being implemented on the output from U-Track (in the case of clustering pixels) and also being structured in a way similar to the output from DeepTrack (in the case of the scanning boxes from section 3.5). It should also be noted that an important goal for this project was improving the network’s com- putational efficiency for multiple particle images. Regardless of parameter choice or slight package modifications this has been achieved, as showcased in figure 19, with U-Track being around 100 times faster for images larger than 100 by 100 px. 5.3 Tracking on experimental data The discussion of tracking first relates to the methods and results on diffusion, intensity, track lengths and bias, followed by details on specific parts of the pipeline and possible improvements. 5.3.1 Plausibility of results The measured diffusion median value of 1.61 µm2/s is within the expected interval from the experi- ment, since some particles can be tethered by more than one DNA, see section 2.3.2. Thus the spread in measured diffusion values shown in figure 20a is because of a different number of tethered DNA to each particle, but is also influenced by statistical fluctuations of shorter tracks, and position errors from the network output. Diffusion was estimated in a direction perpendicular to the direction of the total flow vmedium as opposed to the individual flow vf for each particle. This decision was made in line with the theory, where vD should be in the same direction for all particles. This method is also more stable against outliers, since it is less dependent on the randomness of the motion of individual particles. However, as seen in figure 20b, this is not the case for all tracks. Some trajectories differ significantly from the flow direction, which may suggest either tracking errors or errors in the input data from sources such as an uneven surface during the microscopy, which particles may get stuck in. The radii have been estimated by an approximation seen in equation 4. If the parameters in equation 3 were known, the radius could have been estimated more accurately. The distribution seen in the histogram in figure 21a suggests that some radii of tracked particles differ with a factor of 2 or greater. Since Irel ∼ r2, this would imply that a difference with a factor of 4 or greater should be seen in the relative intensity distribution, figure 21b. This was not observed, as with the exception of one outlier, all tracked particles had a relative intensity differing less than a factor 1.2. This leads to the suspicion trained to predict a large r and (x,y) = (0,0) on empty images. This led to significantly better results than the ones presented here, but still worse than U-Track’s results. 35 that the network’s predictions of intensities are not accurate. The most likely reason is that the weight corresponding to intensity w5 = 5 chosen in the training is too low. An analysis of the intervals within which the attributes of the particles in the experimental data fall reinforces this suspicion, as the distances and radii are usually smaller than 3 pixels, while the particle intensities are usually smaller than 0.15. Therefore, a weight of about w5 = 20 should have been chosen in the training. This would likely have led to better results. Since the quality of the intensity prediction was deemed inadequate, the correlation between the radius and intensity of tracked particles was not investigated. The result of the plots of bias, figure 24, can be interpreted in two different ways. Either false positive detections are more likely few predicted pixels of low intensity and correctly filtered, or real particles of this type are less likely to be tracked over long times which would be biased. A clear statistical distinguishment between the two interpretations is difficult to draw from the data without looking at tracks individually and especially so when the predicted intensity might not be correct. What can be said is that the bias does at least not render the network unusable, but as the network is bound to perform better on higher SNRs, some performance increase on particles of stronger signal will always be present. The sharp change in trend around twenty frames does support the choice filtering out tracks below this length as tracks behaves differently beyond it. 5.3.2 The cost matrix Designing the cost matrix has a large impact on the final tracking results, while also being one of the more subjective parts of the processing. Two guiding principles were therefore formed during the development to provide a structure for approaching this problem. Firstly, cost terms should describe consistent physical attributes of the tracks rather than sought after results. Particles were known to determinedly move in a direction, be relatively sparse and vary in intensity a certain amount, which was modeled through CD and CI . Going more into detail, the maximum allowed distance M used in the term CD was decided through an unbiased statistical method of MSD calculation to find the separator between including feasible ∆x particle displacements while rejecting noise. As a counter example, it was found that tracks tended to include particles with a similar number of pixels. This information was not included in the cost matrix, as that would reinforce the tendency further even though it is not possible to prove its existence in the initial data. In reality the separation between these opposites is not as clear cut. Likely, there is for example an overlap between distances to noise and distances to and tracks which decreases the precision of the unbiased method of MSD. A certain level of fine tuning after preliminary results might therefore be inescapable to get the best possible final results. Secondly, weighing several continuous cost terms against each other was decided against, as results quickly become complex and unpredictable. This is the main reason why the intensity difference was used for filtering extreme cases, rather than assessing likeness of particles. Such terms are easier to combine but should still be used with caution, as extreme behaviour might be overseen with severe filtering. A final important detail of the cost matrix is the capping of costs at cmax seen in equation 17. This is enforced to make all unfeasible connections equally costly for the algorithm, guaranteeing that the least costly solution contains the least amount of unfeasible connections. If this was not the case, plausible connections may be sacrificed to favour several less costly unfeasible connections, which is not intended behaviour. 36 5.3.3 Track length and quality Neither of the proposed quantifications of track quality suggested inclusion of false predictions to be a problem, with a consistently small σ and displacement covariances scattered around zero. The quantifi- cations also improved from tracks regarded as noise to long tracks, which reinforces them as suitable measurements. Critically, examining these results, both methods are based on small displacements which are in some way guaranteed from the cost matrix mentioned above. Set against the result of track length distributions, a reasonable conclusion is that the current config- uration prioritizes fewer errors over longer tracks. A rough estimate for the average track length with a mean velocity of 6.17 µm/s over the frame width of 81.92 µm is 13.3 seconds or 318 frames while the current average track length is of length 44. It is easy to see how the two interests of long tracks and tracks of high quality conflict with each other, as one straight forward way to increase lengths is to loosen on the criteria for a connection between two positions to be made. A more subtle way of improving the algorithm is to imagine situations where a track would be cut off by the current algorithm and then try counteracting them. Such cases can be • The particle making a unexpectedly large move between two frames. • The particle not being detected by the network for one or more frames. • The particle making a faulty connecting in one frame, throwing it of track. Allowing for gaps in tracks similarly to the benchmarking script has potential to overcome the first two situations without relaxing conditions for other particles. One implementation would be extending the set of particles to be assigned to tracks (xn+1, i) I1 i=0 with (xn+2, i) I2 i=0, (xn+3, i) I3 i=0 and so on depending on the desired size of gaps and keeping unlinked particles further than one frame away for the next step of the tracking. Doing so with U-Track however, would effectively increase the amount of noise linearly with the amount of frames added, because of the false positives which are not as present in the benchmark script. A suggested improved way would therefore be to run one original tracking and then connect track ends with track beginnings which may be some frames away through the same algorithm. As many of the original particles are not considered and short tracks are possible to filter away as noise between the two trackings, the second tracking could use a more generous cost matrix without risking inclusion of more noise. One problem which was encountered when analysing tracks with gaps for the benchmarking script is to accurately extend the statistics of continuous tracks. For calculations of D and σ2, values were calculated separately for each subtrack and weighted together based their respective lengths, but since averaging of displacements was advised against in the original article because of their statistical de- pendencies this strategy is questionable in cases where the exact result is of greater importance. Other plausible methods include ignoring gaps or substituting gaps with extrapolated data, which could be a negligible error source for long tracks. 5.4 Image generation routines As shown in chapter 4.4, all three new image generation routines outperform the original method when measured on calculation time per image. We chose to implement the method calculating the Bessel function only near the particle centers. This is partly because it seems to be the fastest and partly, per- haps most significantly, because it retains the functionality of enabling use of other Bessel orders than 1. 37 In development, there were indications that neither the method chosen above or the one using the GPU were notably affected by increased image size. However, the chosen method was more affected by the number of particles than the one using the GPU, suggesting that there might be use cases with large and very dense images where the method using the GPU might be preferred to the one chosen. It should be noted that DeepTrack’s structure is such that the training is primarily done on small images containing only one particle. Thus the absolute time gained from implementing a cutoff is less prevalent for DeepTrack than it is for U-Track. 6 Conclusion and outlooks The aim of U-Track performing at least as well as DeepTrack in the aspects of image classification and calculation efficiency for multiple particle images is to be regarded as fulfilled. For such images, U-Track consistently scores higher in precision and recall and lower in DULROC and MAE. It also requires less calculation time both for predictions (in most cases) and for image generation. It should be noted that handling this type of images is not the primary focus of DeepTrack, and this is likely part of the reason for the improvement demonstrated by our method. The development of a framework for post-processing of network predictions into particle tracks fur- ther improves the usability of the library. This is exemplified by its application on experimental data with useful results. Several considerations for creating and analysing the tracks have been discussed, including prediction cutoff threshold, cost matrix calculation, and minimum track length filtering. The next step in the development of U-Track could be to change the architecture to a Recurrent Neural Network, which retains information from the previous frame for the prediction of the next one. This would most likely lead to better performance than predicting each frame separately. The biggest challenge with this approach is the simulation of data, as realistically simulated movement and flickering of particles is crucial for the approach to work. 38 References [1] S. Helgadottir, A. Argun, and G. Volpe, “Digital video microscopy enhanced by deep learn- ing”, Optica, vol. 6, no. 4, pp. 506–513, Apr. 2019. doi: 10.1364/OPTICA.6.000506. [Online]. Available: http://www.osapublishing.org/optica/abstract.cfm?URI=optica-6-4-506. [2] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation”, in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III, N. Navab et al., Eds. Cham: Springer International Publishing, 2015, pp. 234–241, isbn: 978-3-319-24574-4. doi: 10.1007/978-3-319-24574-4_28. [Online]. Available: https://doi. org/10.1007/978-3-319-24574-4_28. [3] G. Litjens et al., “A survey on deep learning in medical image analysis”, Medical Image Anal- ysis, vol. 42, pp. 60–88, 2017, issn: 1361-8415. doi: https://doi.org/10.1016/j.media. 2017.07.005. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ S1361841517301135. [4] S. Jõemetsa, “Supported lipid membranes and their use for the characterization of biological nanoparticles”, PhD thesis, Chalmers University of Technology, 2020. [5] (2020). U-track repository, [Online]. Available: https://github.com/FredrikMeisingseth/U- Track.git (visited on 05/13/2020). [6] T. Falk et al., “U-net: Deep learning for cell counting, detection, and morphometry”, Nature Methods, vol. 16, Jan. 2019. doi: 10.1038/s41592-018-0261-2. [7] D. M. Powers, “Evaluation: From precision, recall and f-measure to roc, informedness, marked- ness and correlation”, Journal of Machine Learning Technologies, pp. 37–63, 2011. [Online]. Available: http://hdl.handle.net/2328/27165. [8] C. Vestergaard, P. Blainey, and H. Flyvbjerg, “Optimal estimation of diffusion coefficients from single-particle trajectories”, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics, vol. 89, p. 022 726, Feb. 2014. doi: 10.1103/PhysRevE.89.022726. [9] Y. Yuan, M. Chao, and Y.-C. Lo, “Automatic skin lesion segmentation using deep fully con- volutional networks with jaccard distance”, IEEE Transactions on Medical Imaging, vol. 36, no. 9, pp. 1876–1886, 2017, cited By 115. doi: 10.1109/TMI.2017.2695227. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85027980102&doi=10.1109% 2fTMI.2017.2695227&partnerID=40&md5=9a387f4da29843912d3c1fdf3dd378d7. [10] R. Byrd et al., “Sample size selection in optimization methods for machine learning”, Mathe- matical Programming, vol. 134, no. 1, pp. 127–155, 2012, cited By 119. doi: 10.1007/s10107- 012-0572-5. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0- 84865685824&doi=10.1007%2fs10107-012-0572-5&partnerID=40&md5=a9b6209cd7eb9e6675576fee24510f39. [11] D. Midtvedt et al., “Size and refractive index determination of sub-wavelength particles and air bubbles by holographic nanoparticle tracking analysis.”, Analytical chemistry, 2019. [12] H. W. Kuhn, “The hungarian method for the assignment problem”, Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955. doi: 10.1002/nav.3800020109. eprint: https: //onlinelibrary.wiley.com/doi/pdf/10.1002/nav.3800020109. [Online]. Available: https: //onlinelibrary.wiley.com/doi/abs/10.1002/nav.3800020109. [13] J. A. Nelder and R. Mead, “A simplex method for function minimization”, The Computer Jour- nal, Volume 7, Issue 4, pp. 308–313, 1965. [Online]. Available: https://doi.org/10.1093/ comjnl/7.4.308. [14] R. Parthasarathy, “Rapid, accurate particle tracking by calculation of radial symmetry centers”, Nature Methods, vol. 9, 7. [Online]. Available: https://doi.org/10.1038/nmeth.2071. 39 [15] R. C. Gonzalez and R. E. Woods, Digital Image Processing (3rd Edition). USA: Prentice-Hall, Inc., 2006, isbn: 013168728X. [16] No author. (2020). Keras documentation, [Online]. Available: https://keras.io/ (visited on 02/09/2020). [17] X. Wu et al., “Faultseg3d: Using synthetic data sets to train an end-to-end convolutional neural network for 3d seismic fault segmentation”, Geophysics, vol. 84, no. 3, pp. IM35–IM45, 2019, cited By 20. doi: 10.1190/geo2018-0646.1. [Online]. Available: https://www.scopus.com/inward/ record.uri?eid=2-s2.0-85064681933&doi=10.1190%2fgeo2018-0646.1&partnerID=40& md5=0e0dbeb1a0438e02f50100d4f37d6f12. [18] C. Fiorio and J. Gustedtb, “Two linear time union-find strategies for image processing”, Theo- retical Computer Science, Volume 154, issue 2, pp. 165–181, 1996. [Online]. Available: https: //doi.org/10.1016/0304-3975(94)00262-2. 40