Real-Time Change Detection in SAR Images Master’s thesis in Systems, Control and Mechatronics MAX NORDIN Department of Electrical Engineering CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2017 Master’s thesis EX048/2017 Real-Time Change Detection in SAR Images MAX NORDIN Department of Electrical Engineering Chalmers University of Technology Gothenburg, Sweden 2017 Real-Time Change Detection in SAR Images MAX NORDIN © MAX NORDIN, 2017. Supervisor: ANDERS ÅHLANDER, Saab AB Examiner: TOMAS MCKELVEY, Department of Electrical Engineering Master’s Thesis EX048/2017 Department of Electrical Engineering Chalmers University of Technology SE-412 96 Gothenburg Telephone +46 31 772 1000 Cover: Two SAR images of the same area taken at two different occasions. Two targets have shown up in the right image and can be seen as two white spots in the middle of if. Typeset in LATEX Gothenburg, Sweden 2017 iv Real-Time Change Detection in SAR Images MAX NORDIN Department of Electrical Engineering Chalmers University of Technology Abstract Carabas is a surveillance system that uses a synthetic aperture radar (SAR) to take images of the ground from the air. Images taken at separate occasions can be compared with each other to find so called targets, such as vehicles hidden in the forest. This process is known as change detection. A new change detection method based on Bayesian statistics has been implemented in form of a C++ program. A few additions has been made to the change detection method to automate the process without human interaction. A new median filtering method has also been developed that was proved to be faster than using selection algorithms. The program utilizes a multi-core processor architecture that is already used for existing Carabas signal processing. To fully utilize the hardware, a pipeline structure was created that allows for different images to be processed in parallel and to divide the change detection of one image into parallel parts. From the C++ program it was found that change detection can be run in real-time with Carabas. The time it takes to do change detection is roughly one tenth of the time it takes to produce a SAR image. Using the pipeline structure it is possible to integrate change detection with the rest of the Carabas signal processing and reduce the computational time. Finally, a test environment were change detection can be tested has also been implemented. Keywords: change detection, Bayesian statistics, synthetic aperture radar, parallel processing, POSIX threads, OpenMP, median filtering. v Acknowledgements I would like to thank the people at Saab AB for giving me the opportunity to carry out my master’s thesis work with them. First and foremost, I would like to thank Anders Åhlander for supervising me and providing me with valuable feedback. I would also like to thank Jonas Lindgren for his supervision and insights. Finally, I would like to thank Hans Hellsten for his help with tackling the change detection algorithm. Max Nordin, Gothenburg, June 2017 vii Contents List of Figures xi List of Tables xiii 1 Introduction 1 1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Ethical aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Sustainability aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.5 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 SAR and change detection 5 2.1 Synthetic aperture radar . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Carabas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Change detection . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 The change detection algorithm . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Basic principle . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Target statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 Clutter statistics . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.4 Median filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Helpful algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Welford’s method . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Sorting algorithms . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.2.1 Bucket sort . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2.2 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2.3 Merge sort . . . . . . . . . . . . . . . . . . . . . . . 13 3 Parallel processing 15 3.1 Parallel hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Parallel software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Processes and threads . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 Data races . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5 Synchronization primitives . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5.1 Mutex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5.2 Condition variable . . . . . . . . . . . . . . . . . . . . . . . . 18 3.5.3 Other primitives . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.6 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 ix Contents 3.6.1 Amdahl’s law . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Methods 21 4.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Implementation of the change detection algorithm . . . . . . . . . . . 21 4.2.1 Target statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.2 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2.3 Histogram creation and maintaining . . . . . . . . . . . . . . . 24 4.2.4 Clutter statistics . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.5 Target locating . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.6 Median filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.6.1 Filtering the first row . . . . . . . . . . . . . . . . . 27 4.2.6.2 Filtering the second and subsequent rows . . . . . . 28 4.2.7 Target evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3.1 The master-worker model . . . . . . . . . . . . . . . . . . . . 32 4.3.2 Parallel filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.2.1 Parallel change detections . . . . . . . . . . . . . . . 34 4.4 Verification and testing of the implemented C++ program . . . . . . 36 4.4.1 Time measurements . . . . . . . . . . . . . . . . . . . . . . . . 36 4.4.2 Artificial targets . . . . . . . . . . . . . . . . . . . . . . . . . 37 5 Results 39 5.1 Evaluation and comparisons . . . . . . . . . . . . . . . . . . . . . . . 39 5.1.1 Comparison with MATLAB scripts . . . . . . . . . . . . . . . 39 5.1.2 Detection of false targets . . . . . . . . . . . . . . . . . . . . . 42 5.2 Execution time and parameters . . . . . . . . . . . . . . . . . . . . . 42 5.2.1 Automatic stop . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2.2 Type casting . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2.3 Median filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2.4 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3 Parallelization with OpenMP and pthreads . . . . . . . . . . . . . . . 47 5.3.1 Streaming data . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 Hardware and Limitations . . . . . . . . . . . . . . . . . . . . . . . . 49 6 Conclusion 51 Bibliography 53 x List of Figures 2.1 A SAR image obtained using Carabas. It is also possible to obtain images looking more like a photograph by using higher frequencies than the range 140-360 MHz used for this image. . . . . . . . . . . . . 6 2.2 Data points from a real SAR image inserted into two different his- tograms. Red color indicates a high density of data points and blue color a low density. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Median filtering with the unfiltered image on the left and the filtered image to the right. The red square on the left illustrates the filtering window. The median of the values within the window is the result in the filtered image in the red square to the right. . . . . . . . . . . . . 11 4.1 Overall structure of the C++ function for change detection where each block is a sub function. . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 The information from the window in Figure 2.3 sorted. . . . . . . . . 27 4.3 The second pixel the filtered value is computed for is marked red on the right and the corresponding filtering window in red on the left. The previous window in blue has been sorted in an array A1 and the elements in the yellow area in an array B1. . . . . . . . . . . . . . . . 28 4.4 Illustration of the merging. A1 (blue) is merged with B1 (yellow) into C1 (red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.5 The filtering process for all pixels not in the first row or column (in the filtered image). Here the previous filtering window (blue) has a sorted array A2. There is also a sorted array B1 saved from before (green). There is also one element (yellow) not in A2 or B1 that is included in the current window. . . . . . . . . . . . . . . . . . . . . . 30 4.6 The master-worker model. A master thread puts items in a buffer and the worker threads removes them and do some work associated with them. For change detection, the master divides an image into several sub-images and the workers do the change detection on the sub-images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.7 The structure of a filter which was implemented to be used for change detection and other types of signal-/image processing. . . . . . . . . . 35 5.1 Illustration of two targets appearing in the reference image. They can be seen as two small dots in the center of the white circle in the update image. The circles are not a part of the original SAR images. 40 5.2 The full 36 Mpix reference image containing two real targets. . . . . . 41 xi List of Figures 5.3 Data that is corrupted which can be seen as diagonal lines within the images. Due to these lines differing, targets were detected despite there being no real deployed targets. . . . . . . . . . . . . . . . . . . 41 5.4 The full 16 Mpix reference image where a false targets were detected. 43 5.5 False targets were detected by the bright spot in the middle of the circle. No big difference is seen between the two images. The circles are not a part of the original SAR images. . . . . . . . . . . . . . . . 43 5.6 Computation time in seconds of median filtering using different win- dow sizes m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 xii List of Tables 4.1 Hardware and operating system specification. . . . . . . . . . . . . . 21 4.2 All parameters that have to be set for the implemented change de- tection function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.1 Computation time in seconds of median filtering using different win- dow sizes m on a 5000 × 5000 matrix containing double precision floating point values. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2 Execution time and speedup for change detection on one SAR-image divided into 36 sub-images. . . . . . . . . . . . . . . . . . . . . . . . . 48 5.3 Average execution time and speedup from streaming SAR-images. . . 48 xiii List of Tables xiv 1 Introduction Using a Synthetic Aperture Radar (SAR), the Carabas system developed at Saab, is a surveillance system used to take high resolution images of the ground from an aircraft. By making measurements while moving and processing the data in a certain way, SAR allows for higher resolutions that otherwise would require a much longer antenna. The frequencies at around 20 − 360 MHz used with Carabas enables the electromagnetic waves to penetrate foliage, buildings and even the ground. Unlike other radars operating at higher frequencies, Carabas can be used to detect concealed objects such as a vehicle hidden under a tree. To find the vehicle in the image, Carabas uses Change Detection (CD) which finds objects referred to as targets that have appeared/disappeared when comparing to older images stored in a database. Change detection with Carabas has been an ongoing research area and other CD methods have been investigated for real-time use [1]. The new CD-algorithm devel- oped at Saab is based on Bayesian statistics and uses Bayesian methods to compute target probabilities. This method finds targets fast with similar low false alarm probabilities as more time consuming methods. It has been tested using Matlab with positive results and is now waiting to be implemented and integrated with Carabas. The CD-algorithm is presented by Hellsten [2] who describes how it works and is derived. Hellsten also provides some information about the implementation in Matlab [3]. 1.1 Purpose The purpose of this thesis work is to assist Saab with the development of Carabas by contributing to the integration of the change detection algorithm. The result should be a realization in C++ that takes two SAR-images and outputs possible targets alongside their probabilities. It should be well optimized and utilize parallel processor architecture. The execution time should not be slower than the time it takes to create SAR-images (around 40 seconds). 1 1. Introduction 1.2 Scope The realization is based on the novel CD method developed by Saab with the purpose to be integrated with Carabas. Therefore, any other potential methods which to do change detection with is not looked into. At firsthand the realization has been made to work with the multi-core processor hardware currently used with Carabas. Thus, the thesis work does not go in depth with different kinds of hardware, for instance GPUs or manycore processors. 1.3 Ethical aspects Carabas, which the thesis work will be a part of, is a surveillance system partially used for military operations. One can argue that the use or development of these types of systems are wrong due to the damage caused by warfare throughout history. On the other hand they can also be used as a defence to prevent damage. By researching these areas one can contribute to that these systems are made to prevent damage rather than cause damage. One can also make people feel safe and secure knowing that these defence systems will protect them. 1.4 Sustainability aspects While the main purpose of Carabas is for surveillance in military operations it has other uses as well. Examples are terrain topography mapping, biomass estimation and disaster monitoring. For instance, biomass estimation can help measuring de- forestation which causes global warming [4] and a decline in biodiversity [5]. Thus, this thesis has applications that plays a part in a sustainable future. 1.5 Thesis outline The thesis is divided into five other chapters. The content for each chapter is as follows: • Chapter 2 gives a brief explanation of the SAR technology. After that the new change detection algorithm is explained in detail and some additional useful algorithms are introduced. • Chapter 3 introduces parallel processing from both a hardware and software perspective. It continues by going more in depth on the software side. 2 1. Introduction • Chapter 4 describes how change detection was realized as a C++ program. First, how the algorithm was made to run efficient as code is explained which is followed by how it was parallelized. • Chapter 5 presents the end results. The correctness of the implementation as well as the execution time is considered and discussed. • Chapter 6 is an conclusion of the thesis and discusses future work. 3 1. Introduction 4 2 SAR and change detection To be able to follow the whole thesis, some basic concepts are explained in this chapter. This includes an explanation of SAR imaging and the new change detection algorithm. Some theory which is useful when working with the algorithm is also introduced. 2.1 Synthetic aperture radar A synthetic aperture radar, abbreviated SAR, is a type of radar that is used to create images, often of terrain taken from air crafts or satellites. The basics of SAR are the same as with any radar. It uses an antenna to transmit electromagnetic waves out in the environment. Some of these waves are reflected back at the radar and received by an antenna, which could be and often is the same antenna that transmitted the signal. The distance to different objects are obtained by measuring the time it takes for the signal to come back. What is special about SAR is that it uses a moving antenna. The moving antenna makes it so that the signals are transmitted and received at different positions, creating a so called synthetic aperture. This gives the radar a higher resolution which allows the creation of high resolution images of the terrain. SAR has several advantages over cameras, one being that the radio waves used are mostly unaffected by weather unlike visible light. It is also not dependent on an external source such as the sun which makes it work independent of the time of the day. Unlike with cameras, the data received with SAR does not resemble an image until after a few signal processing steps. By filtering the data both 2D and 3D images can be obtained where the 2D images are the focus for change detection. Each pixel in the 2D images has a complex value representing the amplitude and phase of the received signal. An example of an image taken with Carabas where the amplitude is plotted can be seen in Figure 2.1. A detailed explanation of the SAR technology along with the signal processing has been made by Moreira et. al [6]. 5 2. SAR and change detection Figure 2.1: A SAR image obtained using Carabas. It is also possible to obtain images looking more like a photograph by using higher frequencies than the range 140-360 MHz used for this image. 6 2. SAR and change detection 2.1.1 Carabas The Carabas system [7] developed at Saab stands out by using unusually low frequen- cies for SAR imaging. Two frequency bands are used at 20-90 MHz and 140-360 MHz respectively (in the VHF/UHF range). The electromagnetic waves at these frequen- cies, especially the lower band, have the ability to penetrate foliage, certain types of camouflage and can even go through buildings and a distance underground. While the images created might not be as nice to look at, it serves a purpose which is to be able to detect hidden objects known as targets. These are practically invisible to other types of radar or sensors. A typical target is a military vehicle hidden under a tree. Everything detected that is not a target will henceforth be referred to as clutter. Carabas is used to surveil large areas and multiple images are created one at a time, resulting in a stream of images to perform change detection on. 2.1.2 Change detection To find a target in an image a technique called change detection is utilized. As can be seen in Figure 2.1 trees appear as dots in the image, which is also true for vehicles or other similar sized objects that reflect radio waves. Change detection is based on comparing the image with an older reference image of the same area, to find targets that looks similar to and could occupy the same space as other objects. Due to different flight paths, calibrations, etc. the clutter will change between the two images. The problem is thus to find changes in the images that are due to targets appearing and not something else. A change detection algorithm should be able to detect targets reliably while keeping the false alarm rate low. 2.2 The change detection algorithm The change detection algorithm used in this project was recently invented by Hell- sten [2, 3] and has not been tested thoroughly. It has however been implemented in MATLAB and has given successful results on several occasions. Implanted targets have been found while other uninteresting changes from buildings, fences and power lines have been neglected. This result is better than those for other alternatives of change detection algorithms. 2.2.1 Basic principle Given a reference image R and an updated image U , the algorithm tries to find targets that are in the updated image but not in the reference image. This is done by computing the conditional probability p(x = xT |aU(x), aR(x)) that there is a 7 2. SAR and change detection target at pixel x given the measured real-valued amplitudes of the images at x, aR(x) and aU(x). Here, x should be the same geographical location for the two images. This conditional probability can with the help of Bayes’ theorem and the law of total probability be expressed as p(xT |aU , aR) = 1 1 + (1−p(xT |aR))p(aU |xC ,aR) p(xT |aR)p(aU |xT ,aR) (2.1) where xT denotes there is a target at location x and xC denotes there is no target i.e. only clutter. The likelihood function p(aU |xT , aR) is obtained by proposing a probability density function pT (a) for the amplitude of targets which is described in Section 2.2.2. The clutter counterpart p(aU |xC , aR) is estimated using data from the two images and the method used is described in Section 2.2.3. Finally assuming each image has N pixels, there are k targets each of size M pixels and each pixel is equally likely to contain a target, then p(xT |aR) = p(xT ) = Mk N . (2.2) Inserting this into equation (2.1) with 1− p(xT |aR) ≈ 1 one obtains pk(xT |aU , aR) = 1 1 + N Mk 1 η(x) (2.3) where η(x) = p(aU |xT , aR)/p(aU |xC , aR). The pixels which are most likely to contain targets are the ones with the highest values for pk(xT |aU , aR) and also η(x) since pk(xT |aU , aR) is monotonically increasing with respect to η(x). Because of clutter fluctuations, there is a chance for a high η(x) even if there is no target. Thus, the target nominees are the ones that instead maximizes pk(xT |ΛT (x)) which is the median of pk(xT |aU , aR) evaluated in a square ΛT (x) with an area of size M and with x in its center. This filters out large pixel-to-pixel changes but not targets. Multiple targets can be found in an image through an iterative process. This pro- cess starts by assuming there are k = 1 targets in the updated image. Because the number of targets and their locations are unknown, the data used to estimate p(aU |xC , aR) (i.e. the likelihood of aU given aR when there are no targets) includes every pixel in the image pair. This data set, containing every pixel, is denoted S0 and the corresponding estimated likelihood is denoted p0(aU |xC , aR). If there is a target in S0 there will be an error in p0(aU |xC , aR) which can later be removed and is why this iterative process takes place. The first target nominee, denoted x1,0, is found by maximizing p̂1(xT |ΛT (x)) which in turn is obtained from p0(aU |xC , aR) us- ing (2.3). Thereafter, p(aU |xC , aR) is estimated once again, now excluding an area Λ(x1,0) of size greater than M around x1,0. This new data set S1 = S0 \ Λ(x1,0) yields p1(aU |xC , aR) as well as p1(xT |ΛT (x)). If x1,0 is a target, the error from p0(aU |xC , aR) is not in p1(aU |xC , aR). At last, p1(xT |ΛT (x)) is maximized with the final target nominee x1,1. This can be repeated multiple times with the next iteration assuming k = 2 targets. The whole process is summarized below: 1. Set S0 to be all pixels in the image pair and compute p0(aU |xC , aR) from S0. 8 2. SAR and change detection 2. Starting with k = 1, use the already computed pk−1(aU |xC , aR) to estimate the probability of a target p̂k(xT |ΛT (x)) at all pixels in Sk−1. 3. Find the pixels {x1,k−1, x2,k−1, . . . , xk,k−1} that maximizes p̂k(xT |ΛT (x)) and form Sk = S0 \ Λ(x1,k−1) \ Λ(x2,k−1) \ · · · \ Λ(xk,k−1). 4. Compute pk(aU |xC , aR) from Sk. 5. Increment k by 1 and repeat from step 2 or end. The algorithm can run a set number of iterations or it can run until no more target nominees with high probability are found. 2.2.2 Target statistics To obtain p(aU |xT , aR), it is first assumed that there is no coupling between target and background. Thus the complex image amplitude for the updated image is aUe iϕU = aT e iϕT +aCeiϕC where T and C denotes target and clutter respectively. The variables aU , aT and aC are real and non-negative. To account for wave interference p(aU |xT , aR) is calculated using complex distributions according to ℘(aU |xT , aR) = ∮ ∫ ∞ 0 ℘T (aU − aCeiϕ)℘(aC |xC , aR)aCdaCdϕ (2.4) where ℘ represents a complex density function. These are independent on phase and therefore p(a) = 2πa℘(a). The density ℘T (aeiϕ) is the probability density of a target having the complex amplitude aeiϕ. This should be set to characterize targets and has in this context been set to a constant complex probability ℘T (aeiϕ) =  1 π(a2 max−a2 min) amin ≤ a ≤ amax 0 otherwise (2.5) where amin and amax are the minimum and maximum amplitude that can be expected from a target. The likelihood ℘(aC |xC , aR) is recognized as a complex version of p(aU |xC , aR). Due to a second order influence it can in this case be approximated as a Dirac delta function ℘(aC |xC , aR) = δ(aC − aR) 2πaR . (2.6) Evaluating (2.4) using (2.5) and (2.6) results in p(aU |xT , aR) = 2πaU℘(aU |xT , aR) = 2aU(∆ϕmax(aU , aR)−∆ϕmin(aU , aR)) π(a2 max − a2 min) (2.7) where ∆ϕmax(aU , aR) = arctan a2 max−a2 U−a 2 R Re √ a2 max−(aU−aR)2 Re √ (aU +aR)2−a2 max (2.8) and ∆ϕmin(aU , aR) = arctan a2 min−a2 U−a 2 R Re √ a2 min−(aU−aR)2 Re √ (aU +aR)2−a2 min . (2.9) 9 2. SAR and change detection (a) aUaR-space (b) Db, bR-space Figure 2.2: Data points from a real SAR image inserted into two different his- tograms. Red color indicates a high density of data points and blue color a low density. 2.2.3 Clutter statistics The last part is to determine p(aU |xC , aR). Due to this likelihood function changing between image pairs, it is estimated every time there is a new pair of images. The suggested method is using histograms to obtain the cumulative distribution at a set number of points by summing up the histogram bin counts. Values for additional points are obtained with an interpolation method. The likelihood p(aU |xC , aR) is then found as the derivative to the cumulative distribution. As can be seen in Figure 2.2a, simply dividing the aUaR-space into histogram bins would not yield a good result since most of the bins would be empty and a few number of bins would hold the majority of the pixel pairs. Instead it is noticed that p(aU |xC , aR) is spread along a line aR = kaU where k is the gradient of the line. Denote v = [ v1 v2 ]T the eigenvector corresponding to the largest eigenvalue to the covariance matrix Q = [ Var(aU) Cov(aU , aR) Cov(aU , aR) Var(aR) ] = [ σU σUR σUR σR ] (2.10) with which the gradient is given by k = v2/v1. Instead of the aUaR-plane, look at the Da, aR-plane where Da , kaU − aR (2.11) and p(Da|xC , aR) will be spread along the Da-axis instead. The letter D represents the difference. Since targets appearing in an image practically always increases the amplitude it is safe to assume that p(Da|xC , aR) = 0 for Da < 0. Thus only Da > 0 has to be included in the histogram. Finally, let Da and aR relate to Db and bR as a relates to b in a = Ceρb+D where ρ, C and D are constants that can be determined by testing different combinations of values. By using the Db, bR-plane, samples from p(aU |xC , aR) are much more evenly spread across the bins, see Figure 2.2b. 10 2. SAR and change detection 0 5 2 8 3 1 5 1 9 6 2 3 0 3 8 9 8 1 3 5 9 7 6 0 4 0 3 2 7 5 6 7 1 0 2 4 3 6 8 3 5 7 8 6 3 5 7 6 4 3 3 4 Figure 2.3: Median filtering with the unfiltered image on the left and the filtered image to the right. The red square on the left illustrates the filtering window. The median of the values within the window is the result in the filtered image in the red square to the right. 2.2.4 Median filtering Median filtering on 2D images functions like any other image filter. Each pixel in the filtered image will have a value that depends on the same pixel in the unfiltered image and an area around that pixel. The area is referred to as a window or kernel. For a linear filter the filtered value is a linear combination of all values within the window and for a median filter it is the median of the values in the window. An example of a median filtered image is seen in Figure 2.3. Here the window is a 3-by-3 square. The edges are not filtered in this case resulting in an image that is smaller. Further information about image processing can be found in Computer Vision [8]. 2.3 Helpful algorithms For an implementation of the change detection algorithm there are certain algo- rithms that can be used or are useful to know about. These are described below. 2.3.1 Welford’s method When using computers to solve mathematical problems there are numerical errors that occur due to a computers inability to represent all real numbers. When com- puting variance, several digits of precision can be lost or an entirely wrong answer 11 2. SAR and change detection can be obtained if the problem is approached in the wrong way. The problem with covariance is that involves summing up squared numbers. Computing this using the wrong approach in a computer program can not only cause loss of precision, but also cause overflow problems when working with huge data sets or big numbers. Welford’s method [9] was developed to prevent these problems when computing for instance variance. The method is an iterative process working on a data set {x1, x2, . . . , xn}. Let mk be the mean of x1, x2, . . . , xk where 1 ≤ k ≤ n and let sk = k∑ i=1 (xi −mk)2. (2.12) Using m0 = s0 = 0, sn is then the result from iterating through the following equations n times: mk = mk−1 + 1 k (xk −mk−1) (2.13) sk = (xk −mk)(xk −mk−1) (2.14) 2.3.2 Sorting algorithms Hellsten’s change detection algorithm makes use of median filtering. The median for a set of numbers is the middle element when all numbers have been ordered nu- merically. To obtain the median some type of selection or sorting is thus required. Sorting algorithms involves sorting an entire array of numbers while selection algo- rithms involves finding a number at a certain position. Because the whole array does not have to be sorted, selection algorithms are generally the fastest way to obtain the median [10]. This is however not the case when working with two-dimensional median filters where there are fast methods based on sorting [11, 12]. There is no best sorting algorithm for all purposes as different algorithms have their advantages and disadvantages [13]. They are often compared in terms of computational complexity, meaning how much the computation time scales with the number of elements in the data set. Different data also affects the computation time creating a best-case, worst-case and average scenario. For instance, the bubble sort algorithm has a best case performance of O(n) meaning it scales linearly with the number of elements n. This occurs whenever the set is already sorted. The average performance is however O(n2), which is worse than the more efficient algorithms with an average performance of O(n log n). Another important aspect is the memory usage of the algorithms. Some algorithms requires close to no extra memory at all while some requires extra memory for larger arrays. A few different algorithms are explained below. These are useful to know about when considering two-dimensional median filtering. For further reading about sorting and sorting algorithms one can look into The Art of Computer Programming: Sorting and Searching [14]. 12 2. SAR and change detection 2.3.2.1 Bucket sort Bucket sort can be a very efficient sorting algorithm making use of so called buckets or bins. An example could be that integers between 0 and 99 are to be sorted. Ten buckets are created where values 0 to 9 will be sorted into the first bucket, values 10 to 19 into the second bucket and so on. Deciding which bin to put a value in this case is done by looking at the most significant digit. For instance number 52 gets sorted into the bin indexed 5 i.e. the sixth bin. The elements in each bin can then be sorted with each other using bucket sort or any other sorting algorithm. The strength of bucket sort comes from that it does not have to compare different elements with each other, making it sometimes superior to other comparison based methods such as quicksort and merge sort. 2.3.2.2 Quicksort A very popular sorting algorithm is quicksort because of its all-round good per- formance. The algorithm starts by finding a so called pivot element which can be chosen randomly or with the help of another algorithm. The array is then parti- tioned meaning all elements smaller than the pivot elements gets placed on one side of it, while those larger than the pivot element gets placed on the other side. The process of choosing a new pivot element and partitioning is repeated with the two parts of the array divided by the pivot element. The algorithm continues recursively until the whole array is sorted. A variation of quicksort known as quickselect is a selection algorithm that can be used to find the median. With quickselect, only the side of the pivot element that is containing the median value is used for the next recursion. This is thus a fast way of finding the median without having to sort the entire array. 2.3.2.3 Merge sort Merge sort is another popular sorting algorithm. First the array is split into n sub arrays, each of length one. Shorter arrays are then merged together into longer arrays until only one array is remaining. The merging process also keeps the arrays sorted. Two arrays are merged together by comparing the first element in each of the arrays. The smallest element (or greatest depending on how you sort) of the two are removed from the array and gets placed in a new array. This continues until all elements have been added in numerical order to the new array. 13 2. SAR and change detection 14 3 Parallel processing During the last decade, parallel hardware has become widely popular and are used frequently in computers and embedded systems [15]. Modern CPUs are typically made up of several cores acting like separate processing units, allowing multiple in- structions to be carried out at the same time. For a program to utilize the hardware to its full potential it has to be tailored to do so. With this comes different chal- lenges concerning both hardware and software. Parallel programming is similar to asynchronous programming in a sense that programs are made with non-dependant tasks that can run in any order. The difference being that in the parallel case differ- ent tasks are processed at the same time, whereas in the asynchronous case only one task is processed at any given time. The basics to parallel programming is described in An Introduction to Parallel Programming [16]. 3.1 Parallel hardware There are different kinds of hardware specialized in different things that can be classified using Flynn’s taxonomy. The classification is based on whether the system architecture is made to process one or multiple instruction streams, as well as one or multiple data streams. The instruction stream tells the processor what to do, such as multiply two numbers, and the data stream provides the data i.e. the numbers which to multiply. The two important classifications for this thesis’ purpose are MIMD (Multiple In- structions Multiple Data) and SIMD (Single Instruction Multiple Data). MIMD includes CPUs with more than one core, also known as multi-core processors, where each processor core works with its own set of instructions and its own data. A MIMD system can be either a shared-memory system or a distributed memory sys- tem. The difference being that in a shared-memory system the different processors or cores share memory, while in a distributed-memory system each processor has its own private memory. A single PC is in most cases a shared-memory system but multiple computers connected together can create distributed memory systems such as supercomputers. Examples of the SIMD classification are GPUs, made to repeat the same instructions a multitude of times on different data which is useful 15 3. Parallel processing in computer graphics. Some GPUs could be classified as MIMD and are able to run multiple instruction streams at the same time and general purpose computing on GPUs (GPGPU) has made it possible for GPUs to perform computations normally handled by a CPU. 3.2 Parallel software The way software is created differs depending on the hardware. For shared-memory MIMD/SIMD and C/C++ there are different APIs (application programming inter- faces) for parallel programming. Two common ones are OpenMP and POSIX threads (pthreads). The goal of OpenMP and pthreads is essentially the same thing, split parts of a program that does not depend on each other and have these tasks be processed in parallel. The implementation differs slightly giving the two APIs some advantages and disadvantages over each other. OpenMP is a more straightforward approach allowing parallelization with as little effort as possible. Pthreads on the other hand is a little bit more in depth allowing better control over the program structure. Pthreads generally takes more time and code to implement but has a greater potential in terms of performance. 3.3 Processes and threads An instance of a computer program is called a process. In a normal computer there are always several processes running at once. Since a processor core only can work with one process at a time someone or something has to decide which process is allowed to run. This is known as scheduling and is handled by the operating system. The scheduling can also be influenced by the programmer. For instance, threads can be set to be prioritized over other threads. Processes work mostly unknowingly of each other with their own resources needed to maintain the process. A process contains one or more threads which in turn can share resources with other threads within the process. Because of this, creat- ing, communicating between, and sharing data between threads are faster than for processes. A thread is processed in a sequential order making it unable to be used by more than one processor or processor core at once. To use more than one core, multiple threads have to be used, either in the form of multiple processes with one thread, one process with multiple threads or a combination of the two. Because of the speed advantage, OpenMP and pthreads are used to creating and maintaining threads within the same process. 16 3. Parallel processing 3.4 Data races When working with multiple threads there are problems that can occur when more than one thread is trying to access a shared resource. These problems are called data races or race conditions and occur when multiple threads are trying to access the same variable and at least one of the threads is trying to write to the variable. The problem lies in that the outcomes may differ depending on the order the instructions of the different threads are executed in. In Pthreads programming [17] there is a good example of a data race where a man and a woman are withdrawing money from the same bank account using two different ATMs. The ATMs operate in parallel i.e. on different threads while sharing the bank account balance. The man makes a withdrawal of $50. The ATM reads the balance which is $125, subtracts $50 from it and writes the result $75 to the balance before giving the man his money. At the same time, the woman also makes a withdrawal of $50. The man has not yet finished his withdrawal so this ATM also reads the balance as $125. However, while in the process of subtracting, the man finishes his withdrawal which changes the bank account balance. The second ATM which also gets a result of $75 overwrites the balance (which already is $75) and gives $50 to the woman. Thus the man and woman did end up withdrawing a total of $100 while only subtracting $50 from the bank account balance. 3.5 Synchronization primitives To avoid data races and to aid interaction between threads there are so called syn- chronization primitives. These primitives come with the API’s but are implemented differently whether OpenMP or pthreads is used. Since pthreads has been the main focus during the thesis, this section describes how synchronization primitives are used with pthreads. 3.5.1 Mutex A common primitive is a lock often called mutex. When a thread locks a mutex it will be the owner of that mutex and is the only thread that can unlock it. If another thread requests to lock the mutex it immediately has to stop and wait for the mutex to be unlocked by its owner before proceeding. A mutex thus makes it impossible for more than one thread to enter a critical section with shared resources. A problem that comes with mutexes are deadlocks where one or more threads are permanently blocked and will never proceed. This can happen if for example a thread forgets to unlock a mutex or if two threads each are waiting for a mutex owned by the other thread. 17 3. Parallel processing 3.5.2 Condition variable Condition variables are another type of primitives used by threads to signal each other that a certain event has occurred. An event is something required for certain threads to proceed and could for instance be a set of data that is finished with being processed. If a thread locks a mutex, enters a critical section and sees that certain conditions are not satisfied for it to continue, it can use a condition variable to wait for those conditions to be met. While waiting it will unlock the mutex allowing other threads to enter the critical section. One of these threads will (hopefully) modify the shared resources so that the conditions are met i.e. the desired event occurs, and will use the condition variable to signal the waiting thread. Before resuming, the waiting thread must once again become the owner of the mutex. There are two types of signaling that can be done with a condition variable. A normal signal will wake one thread that is currently waiting. There is also broadcasting which acts like a signal on all threads that are waiting. 3.5.3 Other primitives Apart from mutexes and condition variables there are various other synchronization primitives. Semaphores are one of those. They act similar to mutexes in a sense that they protect shared resources. Semaphores are useful when the order in which the threads have to access shared resources is important or when more than one thread can be allowed into a critical section. A barrier is another type of synchronization primitive. When a thread reaches a barrier it has to wait for all other threads participating in the barrier to reach it before said thread can continue. 3.6 Performance For a parallelization to be justified it has to yield a performance gain in some way. Speedup S is a way of measuring how much faster the execution time of a parallel program Tp is compared to the execution time of the sequential program Ts doing the same computations. The speedup is defined as S = Ts Tp . (3.1) A linear speedup means that the speedup equals the number of cores. In this case, all computations have been equally split between the cores and nothing extra that takes time has been added. This is seen as a best case scenario which is unlikely to be obtained in practice. Another way of measuring performance is using efficiency E defined as E = S p = Ts Tpp (3.2) 18 3. Parallel processing where p is the number of processor cores. An efficiency of 100% is the same as a linear speedup. 3.6.1 Amdahl’s law Even though parallelization is useful to gain performance and reduce execution time, there are still limitations. For instance if 90% of computations in a sequential program is parallelizable, the total execution time can never be faster than that last 10%. Assume a fraction f of a program, 0 ≤ f ≤ 1, is not parallelizable but that the rest is parallelizable with a linear speedup. Then the execution time is Tp = (f + 1−f p )Ts (3.3) for the parallel program. From this an upper bound to the speedup is obtained according to S = 1 f + 1−f p ≤ 1 f . (3.4) which is known as Amdahl’s law. 19 3. Parallel processing 20 4 Methods A C++ function was created that performs change detection according to the al- gorithm in Section 2.2. Since there was a desire to speed up other computations, a C++ class was created that not only allows for parallelization of change detection, but also for other computations. A program that allows for testing of how well the change detection function works and how fast the execution time is was created alongside the change detection function. 4.1 Hardware The C++ implementation and testing of the change detection was carried out on the same hardware that have been used for the rest of the signal processing in Carabas. This was a Linux server with specifications that can be seen in Table 4.1. With a total of 12 processor cores split between two processors, parallelization was an important part of reducing the execution time. 4.2 Implementation of the change detection algo- rithm The change detection algorithm introduced in Section 2.2 was implemented as a sequential C++ function. It was divided into several sub functions with a structure illustrated in Figure 4.1 where the different sub functions are explained in more detail below. The inputs to the created function were two 2D complex valued SAR images, OS distribution SUSE Linux Enterprise Server 11 Processor(s) 2× Intel Xeon CPU X5675 @ 3.07 Ghz Number of CPU cores 2× 6 RAM 100 GB Table 4.1: Hardware and operating system specification. 21 4. Methods Pre-processing Reference and updated image Target statistics Histogram creation Clutter statistics Target locating Target evaluation Found targets Histogram maintaining Median filtering Figure 4.1: Overall structure of the C++ function for change detection where each block is a sub function. as well as some different parameters, see Table 4.2. Its output is a list of targets containing the location of the targets and the computed value for p(xT |aU , aR) from equation 2.1. The basic approach was to first compute everything not dependant on results from the iterative process, save it to memory so that it does not have to be computed more than once and then start iterating in a loop. Because at least two iterations are needed to find only one target, it saved time in even the simplest of cases. An extra part was also added to the algorithm to save additional computation time. This part, labeled Target Evaluation in Figure 4.1, was given the task to decide the number of iterations. For example, if the algorithm should be able to find ten targets, it has to go through at least eleven iterations. In most cases in practice, there are no targets to find within an image pair (targets are rare) and doing eleven iterations each time takes up resources that can be used for other things such as parallel change detections, see Section 4.3. In these cases the target evaluation would see that there were no high probability target nominees within the first few iterations and end the change detection. Below follows descriptions of this and all the other the different sub functions and how they were implemented. 4.2.1 Target statistics Everything necessary to compute p(aU |xT , aR) in equation (2.7) was placed within its own function. The target hypothesis did not change between iterations and thus the computations could be done before the loop. In fact, since the targets upper and lower amplitudes (amax and amin) were implemented as inputs to be set manually, it also did not depend on the images and could have been done before 22 4. Methods Parameter Description amax Maximum amplitude a target can have amin Minimum amplitude a target can have ∆aR Step size for the grid the different probabilities are evaluated at ∆Da Step size for the grid the different probabilities are evaluated at bn,a Number of histogram bins in the aR-direction bn,Da Number of histogram bins in the Da-direction ρa Each consecutive bin is eρa times larger than the previous one ρDa Each consecutive bin is eρDa times larger than the previous one m Target size in pixels, M = m2 dmin Minimum distance between two target nominees, dmin ≥ m kmax Maximum number of iterations pt Minimum probability required for a target Auto stop If the algorithm can stop itself before kmax has been reached ∆p Probability change required from nominees to not auto stop k∆p Number of iterations without a change ∆p before auto stop Table 4.2: All parameters that have to be set for the implemented change detection function. starting the change detection. However, since the computations were relatively fast it gave no advantage to do so. The proposed constant complex probability was implemented with some slight modifications to equations (2.8) and (2.9) to avoid working with complex numbers and dividing by zero. When (aU + aR)2 > a2 max and (aU − aR)2 < a2 max everything under the square roots are positive which makes the square roots real. When (aU + aR)2 ≤ a2 max or (aU − aR)2 ≥ a2 max the denominator is zero and the numerator is positive or negative respectively. Equation (2.8) could thus be modified to the following: ∆ϕmax(aU , aR) =  −π 2 |aU − aR| ≥ amax +π 2 aU + aR ≤ amax ∆ϕ̃max(aU , aR) otherwise (4.1) ∆ϕ̃max(aU , aR) = arctan a2 max − a2 U − a2 R√ a2 max − (aU − aR)2 √ (aU + aR)2 − a2 max (4.2) Analogously, ∆ϕ̃min(aU , aR) was obtained from ∆ϕmin(aU , aR) and amin. With these new expressions p(aU |xT , aR) was evaluated at a grid in the Da, aR-space. The grid- points were separated by the distances ∆Da and ∆aR in the range 0 to 1 at points Da = 1 2∆Da, 3 2∆Da, . . . , 1 − 1 2∆Da and aR = 1 2∆aR, 3 2∆aR, . . . , 1 − 1 2∆aR. This grid was used later to obtain η(x). The cmath library (math.h in C) was used to compute the arctangent, the square root and later also the logarithm. 23 4. Methods 4.2.2 Pre-processing The pre-processing of the implementation took care of the computations of the difference image, that is the image with values Da, using the two complex input images. Since these computations were the same between iterations they were placed before the loop, hence the name pre-processing. The first and most straightforward part of the pre-processing was to compute the amplitudes aR and aU by taking the absolute value of the complex numbers in the input images. Next up was the difference image which required the covariance matrix Q to be computed. The three different elements in the 2-by-2 matrix were computed in conjunction with each other using the Welford’s method, which was chosen because of its robustness. For a 2-by-2 matrix there is an expression that was used for the largest eigenvalue λ = Tr(Q) 2 + √ Tr(Q)2 4 − |Q| (4.3) where and Tr(Q) denotes the trace of Q and |Q| the determinant. The corresponding eigenvector is v = [ λ− σR σUR ] . (4.4) This assumes σUR is nonzero which is true if the values aR and aU indeed are scattered along a line. These expressions were used to obtain the difference image according to equation (2.11). Lastly the resulting images were normalized to con- tain values between 0 and 1. This was done to be consistent with the MATLAB implementation and to be able to use similar parameter values (mainly amax and amin). 4.2.3 Histogram creation and maintaining Two similar functions were implemented to move some of the computations used for p(aU |xC , aR) outside of the loop. First the histogram that was used to estimate p(aU |xC , aR) was created outside the loop and later it was also updated each iteration to exclude target nominees. To decide which bin in the histogram an amplitude a belonged to Hellsten used the equation a = Ceρb ⇒ b = 1 ρ log ( a C ) (4.5) where ρ = 0.5, C = 0.001 and a can have values ranging from 0 to 1.0967. This created 15 bins where pixels with b ∈ [−∞, 0) were put in the first bin, b ∈ [0, 1) in the second bin, b ∈ [1, 2) in the third bin and so on. The equation was modified to a = eρb − e−ρ eρbn − e−ρ ⇒ b = 1 ρ log ( a(eρbn − 1) + 1 ) (4.6) where once again ρ = 0.5, bn are the total number of bins and a are normalized to contain values from 0 to 1. This equation allowed for an easier change in the 24 4. Methods number of bins since it could just be inserted in the equation. It also gained a small performance boost in terms of execution time. The pixels with b ∈ [0, 1) were put in the first bin, b ∈ [1, 2) in the second and so on. By indexing the bins 0, 1, . . . , bn the index to use was obtained by taking the floor of b. In practice this was done by type casting the floating point number to an integer and use the integer as the index to an array. This was found to be faster than looping through the different bins and use comparisons to decide which bin an amplitude belongs to which had been used in the MATLAB implementation. The first equation could have been used with type casting in the same way. However, there first would have had to be a check whether b is non-positive to see if it should have been put in the first bin. The faster execution time can relate to how bucket sort is sometimes faster than comparison based sorting. Before the loop a two dimensional histogram was created (bR values in one dimension and Db values in the second dimension) using all pixels. In each iteration this histogram was then copied and the pixels in the vicinity of a target nominee removed from the histogram. This was faster than creating a new histogram every iteration because the number of pixels removed were much smaller than the total number of pixels. The area removed from the histogram due to targets had its center at the nominees found the previous iteration and was square shaped with a side of length 6m + 1 (same as for the MATLAB script). An extra array with the same size as the two images was used to keep track of which pixels had been removed from the histogram. This was necessary to ensure that the same pixel was not removed twice if two target nominees were close to each other. The resulting histogram was used for the clutter statistics. 4.2.4 Clutter statistics Here the final computations were made that resulted in p(aU |xC , aR). They all relied on the histogram so everything had to be placed inside the loop. The cumulative distribution P (Db|xC , bR) was estimated at every point Db = 1, 2, . . . , bn for each row corresponding to different bR values. This was achieved by summing up the total number of pixels with a lower amplitude, i.e. summing up the count with each bin prior, and dividing with the total count for all bins with the current bR value. The result was a grid with P (Db|xc, bR) estimated at values Db = 0, 1, . . . , bn and bR = 1 2 , 3 2 , . . . , bn − 1 2 . For example, the grid point bR = 1 2 corresponds to the bins containing bR values 0 to 1. The values for additional points were obtained by inter- polation. Bilinear interpolation was used because it had been proven to work in the MATLAB implementation and it prevented the cumulative distribution ending up with a negative derivative. The interpolation points were chosen in the Da, aR-plane as aR = 1 2∆aR, 3 2∆aR . . . , 1− 1 2∆aR and Da = 0,∆Da, 2∆Da . . . , 1. The probability density p(aU |xC , aR) was then obtained at points Da = 1 2∆Da, 3 2∆Da . . . , 1− 1 2∆Da as the derivative using p(n2 ∆Da|xC , aR) = P (n+1 2 ∆Da|xC , aR)− P (n−1 2 ∆Da|xC , aR) ∆Da (4.7) 25 4. Methods where n is an odd integer. Finally the ratio η(x) at these grid points was computed with the result from Section 4.2.1. 4.2.5 Target locating Left to do was to find the pixels that maximizes η(x). The values of η(x) was however not yet known for each pixel, only on a grid in the Da, aR-plane. Each pixel was given the same value as the closest grid point functioning like a piecewise constant interpolation. Once again this was the same approach as for the MAT- LAB scripts. By indexing the grid points 0, 1, . . . in the aR-direction, the index to choose for a pixel x0 was baR(x0) ∆aR c. The same applied in the Da-direction. The final probability was not necessary to compute for every pixel since finding the maximum of p(xT |Da, aR) equals finding the maximum of η(x). Before finding the maximum however, the η(x)-image had to be median filtered, see Section 4.2.6. As with the maximum, the median could also be calculated for η(x). After filtering, the pixel with the highest value of η(x) was found by searching through all pixels in the image and compare each one with the so far highest value. The k potential targets were found one by one by finding the maximum and after each target had been found the pixels within a distance dmin had their values for η(x) set to zero to not find the same target twice. 4.2.6 Median filtering One way median filtering can be done, is to for every pixel in the image, independent of each other, sort all pixels or elements within the filter window and take the middle value. Yet, not all elements in the window has to be sorted to find the median and there are fast selection algorithms for doing this [10]. This is however in many cases not the most effective way of filtering a whole image, especially when the filtering window is large and it takes more time to obtain the median. Since two filtering windows next to each other contains a lot of the same pixels some of the information used to obtain the median in one window can be carried over to the next. There have been several studies on optimization of median filtering [11, 12]. Most applications have however been on images containing pixel values ranging from 0 to 255. These make up the normal digital representation of images or pictures made to look at. Often the median filtering of these images have been making use of bin sorting by putting the pixel values in a histogram with 256 bins from which the median is obtained. The window is moved to the adjacent pixel and the histogram is updated by removing/adding the pixels that left/entered the window. The result is a combination of fast sorting and only have to partially update the histogram at every iteration. The sorting method using a histogram works very similar to the bucket sort algorithm. 26 4. Methods 0 52 51 90 3 8 (0,0) (2,0) (1,1) (0,2) (2,1) (0,1) (1,0) (2,2) (1,2) Figure 4.2: The information from the window in Figure 2.3 sorted. Back to this median filtering problem, the p(xT |Da, aR)- or η-”images” could in theory have an infinite amount of different values and were only limited by the precision of the data representation. So a few pixels with a big range of values had to be sorted. This made it hard to make use the same type of sorting in an effective way and thus, a different way of doing 2-dimensional median filtering was developed making use of comparison based sorting methods. For the median filter the window size was m × m where m was assumed to be an odd number. The filtering was done row by row where the window was fully sorted for every pixel. The sorting utilizes information from the previous pixel on the same row and from the pixel in the previous row but on the same column. The filtering for the first row and column was therefore slightly different. The edges were not filtered resulting in an image where m−1 2 pixels had been cut of from each side. 4.2.6.1 Filtering the first row For the first pixel in the first row all pixels in the window were sorted using quicksort. The result was an array containing elements in this case sorted by their η-values from smallest to largest. The elements were represented by a structure in C++ which apart from η also held the information about which row and column the pixel had in the non-filtered image. Only a pointers are sorted to avoid copying extra amounts of data. As an example consider Figure 2.3 where m = 3. For this image the sorted array which is denoted A1 is seen in Figure 4.2. The median is the middle value (3 for the example). The choice of sorting method was not very important since this sorting of m2 elements was only for one pixel in an image that could contain millions of pixels. For the next pixel one step to the right on the first row, the median was obtained by first considering the previous window. Looking at the example again, there are m elements not in A1 that has to be added and m elements in A1 that has to be removed, in order to obtain the new window. Figure 4.3 illustrates how the two different windows are related. A merging method based on merge sort was created that merges the new elements with the old window ,while also removing the elements no longer in the window. To do this, the new elements first had to be sorted and quicksort was used one again. For a large image and small window this sorting of m elements was done approximately once for every pixel in the first row and later 27 4. Methods 0 5 2 8 3 1 5 1 9 6 2 3 0 3 8 9 8 1 3 5 9 7 6 0 4 0 3 2 7 5 6 7 1 0 2 4 3 6 8 3 5 7 8 6 3 5 7 6 4 3 3 4 Figure 4.3: The second pixel the filtered value is computed for is marked red on the right and the corresponding filtering window in red on the left. The previous window in blue has been sorted in an array A1 and the elements in the yellow area in an array B1. also once for every pixel in the first column. This was still a small fraction of the total number of pixels in an image and any established sorting method would have been sufficient. Going back to the example, A1 and the sorted array of new elements B1 are merged together into an array C1 of the same length as A1 using a merging technique similar to what is used in merge sort. Before the first elements in A1 and B1 are compared with each other however, it is checked whether the element from A1 is still in the window. Hence, the need of a structure containing both value and position. Figure 4.4 illustrates how a merging works. The first element in array A1 is looked at first. If it originated from in the first column it is removed from array A1. Otherwise it is compared with the first element in B1. The smallest of them is in that case removed from its array and added to C1. This is repeated until C1 is full after which the median is obtained as the middle value. The merging was used repeatedly until the whole first row had been filtered. 4.2.6.2 Filtering the second and subsequent rows The second and following rows made use information obtained from the prior row. For the first row, the results from quicksort were therefore saved to memory. This included the initially sorted window array of size r2 as well as the smaller sorted arrays of size r. At the same time as the filtered values for the second row were computed, the same information was stored to be used for row three and so on. The filtering of the second and subsequent rows used the same general principle 28 4. Methods 521 93 8 (1,1) (0,2) (2,1) (0,1) (2,2) (1,2) 6 98 (0,3) (1,3) (2,3) 0 52 51 90 3 8 (0,0) (2,0) (1,1) (0,2) (2,1) (0,1) (1,0) (2,2) (1,2) 6 98 (0,3) (1,3) (2,3) 0 0 (0,0) (2,0) 5 (1,0) Figure 4.4: Illustration of the merging. A1 (blue) is merged with B1 (yellow) into C1 (red). 29 4. Methods 0 5 2 8 3 1 5 1 9 6 2 3 0 3 8 9 8 1 3 5 9 7 6 0 4 0 3 2 7 5 6 7 1 0 2 4 3 6 8 3 5 7 8 6 3 5 7 6 4 3 3 4 Figure 4.5: The filtering process for all pixels not in the first row or column (in the filtered image). Here the previous filtering window (blue) has a sorted array A2. There is also a sorted array B1 saved from before (green). There is also one element (yellow) not in A2 or B1 that is included in the current window. as for row one with a few key differences. The first difference was that there was no need to initially sort m ×m values for each row. Instead, the first column was processed in exactly the same way as the first row, so only m elements are sorted and then merged with the sorted window from the previous row. For the rest of the columns the quicksort was replaced with a merge. An illustration can be seen in Figure 4.5. This time, there is a new sorted array A2 and a sorted array saved from the previous row B1. The desired sorted array for the window C2 requires a merge between sorted arrays A2 and B2, just as for row one. Instead of sorting B2 starting from scratch it is noticed that B1 and B2 only differs by one element. Therefore B1 can with the same merging technique as before merged with the new element to create B2. 4.2.7 Target evaluation The final part of the change detection was not implemented by Hellsten but was added to fully automate the process and to be precise, make two decisions. First there had to be a decision on when to stop the change detection i.e. do no more iterations. Then there also had to be a decision on which of the target nominees that should be considered targets and given as the output. For the first decision a user was given two options. Change detection could either be set to run a fixed number of iterations kmax or it could be set to try and stop automatically. The automatic stop made use of the observation that targets within an image were found to end up as the first target nominees. Looking at the properties of the estimation to p(aU |xC , aR), when targets are removed the estimation changes since a target is different from the 30 4. Methods clutter. Removing only clutter however, does not change the estimation. Thus, if probabilities for all nominees (which are affected by the estimation of p(aU |xC , aR)) had seem to converged and the next found nominee had a low probability, it could be assumed that all targets had been found. The automatic stop was implemented by taking two parameters ∆p and k∆p as input. A target nominee can be said to have converged if for k∆p iterations the probability of it being a target have not increased by more than ∆p, where k∆p started counting from the last time time the probability had an increase by more than ∆p. If all target nominees that had been around for k∆p iterations had converged and the rest had a probability lower than ∆p, the algorithm would stop. Since the number of assumed targets affected the probability it was for this decision always assumed k = 1 targets for the first target nominee (with the highest probability), k = 2 for the second and so on. Before the change detection was finished the target evaluation would also decide which of the final nominees that were to be considered targets. This was done by comparing the probabilities with a threshold pt where targets have a probability greater than pt. From equation (2.3) it was noticed that the number of assumed targets k affects the probability. Therefore, if the number of nominees decreased after the threshold check, the probability was computed again with the new k. Since the probabilities changed, there had to be another threshold check and this was repeated until the number of targets did not decrease anymore. Thus, the value for k that had been used for the probabilities was equal to the number of outputted targets, while keeping all probabilities greater than pt. 4.3 Parallelization When parallelizing the change detection algorithm, there were two approaches that were considered. The first was to try and look into how to parallelize the different components of the change detection. The second was to keep the algorithm sequen- tial but have several change detections running at the same time. The latter was chosen due to the nature of the algorithm. SAR images are often taken over a large area around 6 km2. The scenery may differ a lot in different parts of the image causing the clutter distribution to also be different. Therefore it was natural to split a SAR image into smaller parts and doing change detection on each part separately. These smaller part will for future reference be referred to as sub images. Having different sub images processed in parallel is from the above statement appro- priate. It is also much easier than having to go through and parallelize all compo- nents of the change detection. Also, if there were parts of the change detection that could not have been made parallel with the former approach, causing limitations given by Amdahl’s law, it did not matter with the latter. A big disadvantage with the chosen approach was however that the processor cores stops working at different 31 4. Methods Worker Master Worker Worker Buffer Figure 4.6: The master-worker model. A master thread puts items in a buffer and the worker threads removes them and do some work associated with them. For change detection, the master divides an image into several sub-images and the workers do the change detection on the sub-images. times. For example if there are 16 sub-images to do change detection on but only twelve cores, each core would work on one sub-image leaving behind four. Then, if the first twelve are finished in an equal amount of there are only four sub-images left leaving eight of the cores to do nothing. These cores could however do something else which why a pipeline structure was implemented, allowing the other cores to work on something else such as the creation of SAR images. 4.3.1 The master-worker model The change detection computations were split between the different processor cores using something similar to a so called master-worker, master-slave or boss-worker model [17]. The model consists of one master thread and one or usually more worker threads. As different threads they could do computations concurrently and by creating a number of threads at least equal to the number of cores, they were able to run in parallel to take advantage of all cores. The job of the master thread was to simply to distribute the sub images to the workers. Since the change detection had to handle a constant stream of data, a buffer system was implemented were the master thread put work in a buffer. The workers would pick an item from the buffer which contained data (a void pointer) and instructions on how to process the data (reference to a function with a void pointer as input parameter). After that, the workers would do the work by processing the data (executing the function passed) and then proceed to pick another item from the buffer. The master-worker model is illustrated in Figure 4.6 and was implemented as a class using pthreads and was used to parallelize change detection, see Section 4.3.2. 32 4. Methods Since both the master thread and the worker threads wanted to access the buffer at some point it was possible for race conditions to occur which had to be prevented. Therefore, mutexes were used to allow only one thread to modify the buffer at a time. Condition variables were also used to put worker threads to sleep while the buffer was empty or to put the master thread to sleep if the buffer was full. In pseudocode the master thread would access the buffer according to Algorithm 1. The workers would access the buffer according to Algorithm 2. Algorithm 1 Access the buffer with the master thread. lock(m) . m is a mutex while buffer is full do wait(c, m) . c is a condition variable end while Write to buffer if buffer has exactly one item in it then broadcast(d) . d is a condition variable end if unlock(m) Algorithm 2 Access the buffer with a worker thread. lock(m) . m is a mutex while buffer is empty do wait(d, m) . d is a condition variable end while Read from buffer if buffer has exactly one empty space then signal(c) . c is a condition variable end if unlock(m) Algorithm 1 is explained as follows. When the master thread wants to put something in the buffer it starts by locking the mutex. If the mutex is already locked the thread has to wait and will be put to sleep until the mutex is unlocked. If the buffer is full, the thread has to wait for one or more items to be removed. It will therefore unlock the mutex and wait for a signal from one of the worker threads. When the buffer has space the master thread will lock the mutex again and can finally put an item in the buffer. If an added item is the only one in the buffer there is a possibility of there being worker threads waiting for items similarly how the master thread will wait when the buffer is full. The master will therefore broadcast to signal all worker threads sleeping. After finishing the broadcast, or if there are more than one item in the buffer, the mutex will be unlocked. 33 4. Methods If broadcasting is not supported, a similar behaviour can be achieved by signaling one sleeping worker thread every time an item is added to the buffer or by introducing an additional mutex. The worker threads goes through a loop in which they first access the buffer and afterwards proceeds to process the data. The way a worker would access the buffer is very similar to that of the master thread. 4.3.2 Parallel filters The master-worker class was utilized in another implemented class. This filter class was made with the purpose of receiving SAR-images and performing change detec- tion on them. It was however generalized so that any signal processing step could use its thread and buffer structure and was implemented as an abstract class where different filters such as change detection are classes derived from it. Multiple of these filters can create a pipeline structure [17] where every step in the pipeline is a filter with its own set of workers. The implemented filter structure made it possible to run separate filter steps at the same time so that a new image could start being processed at the same time as change detection was performed on the previous image. It also made it possible for several change detections on different images to run at the same time by keeping track of multiple images. Two new threads were introduced, the producer thread and the consumer thread, where the producer can be seen as the master thread. The producer was given the job to remove data from an input buffer and distribute work to the worker threads. Once all work associated to that data has been finished the consumer generates the output and place it in the input buffer to another filter. This creates a chain of filters which order can be modified for the users need and is suitable for the Carabas signal processing chain. An illustration of the filter structure is seen in Figure 4.7. Synchronization primitives were also used here in a similar manner to prevent two threads from accessing shared resources at the same time and to put threads to sleep when buffers are full or empty. The reason the master thread was replaced by two different threads was that it was easy to give flexibility to the program and prevent deadlocks. For instance, the consumer in one filter would sometimes have to wait and be put to sleep when outputting the result. With a producer thread the rest of the filter could however continue to receive and process data. 4.3.2.1 Parallel change detections The parallelization of change detection was achieved with a class derived from the filter class. This CD-class has instructions specific for change detection but could be modified for other types of computations. A full filtering process starts with an item being added in the input buffer. For change detection, this is a structure that 34 4. Methods Worker Worker Worker Producer Consumer Input buffer Worker buffer Work data Figure 4.7: The structure of a filter which was implemented to be used for change detection and other types of signal-/image processing. contains • A newer SAR-image (update) • An older SAR-image (reference) • Change detection parameters from table 4.2 • Parameters defining a grid of sub images • Parameters on artificial generation of targets • An array of found targets (empty before change detection) • An array of artificially generated targets (empty before change detection) where generation of targets are discussed later in Section 4.4.2. The item is picked up by the producer, which in turn puts it in a work array for data that is being processed. It then distributes work to the worker threads in a way specified by the derived class. For change detection the producer acts like a master thread from Section 4.3.1 and adds items to the worker buffer equal to the amount of sub images defined. It also creates a list were found targets are stored which with the two images, definition of sub area and CD-parameters are passed as data to the worker buffer. The producer then proceeds by going in a loop and starts by looking at the input buffer again. The worker function used by the workers is also defined by the derived class. In the case of the CD-class, it first performs change detection and then write the result to the list. When all sub images are finished, the consumer thread takes the item 35 4. Methods in the work area and puts it in an output buffer, which can be the input buffer to another filter. The SAR images are not changed in the change detection case, so the consumer has an additional task to add all found targets to an array in the item. For the consumer to know when a set of data has been fully processed a counter associated to that data set is used. The counter starts at zero and after the producer is finished with adding items to the worker buffer it increments the counter by an amount equal to the number of items it added to the buffer. When a worker is finished with processing its subset, the worker decrements the counter by one. When a thread changes the counter it also checks whether the counter is zero afterwards. If it is, all workers and the producer must be finished and the consumer that is most likely sleeping is signaled. More than one set of data can be processed at a time (such as change detection on two different image pairs). Hence, two list, ”ready” and ”done”, containing indices are used to keep track of the different data sets. The producer checks the first item in the ready list, removes it and adds the data to the corresponding place in the work data array. When the data is processed and the consumer is signaled, the index to that data in the work data array also gets added to the end of the done list. The consumer keeps removing indices that are in the first place in the done list and adds them to the end of the ready list. Thus, the ready list contains the empty places in the work data array and the done list contains the data that has finished processing. 4.4 Verification and testing of the implemented C++ program To verify that the change detection method had been implemented correctly, it was compared with the existing MATLAB implementation. Using the same image pair as input, the outputs were compared with each other. This was done with three separate image pairs. Certain individual parts of the change detection method were also tested with extra data. For instance, the median filter was tested with some test matrices to ensure that the output was correct. The performance and performance gain due to parallelization was also observed by measuring the execution time. 4.4.1 Time measurements The execution time was measured by looking at the wall time at start and end and taking the difference. A downside is that if there are other processes they may delay the one being measured. Since other processes did not consume a lot of resources during the measurements wall time was a valid choice. Measurements in C++ were made using the sys/time.h library (UNIX) and the function gettimeofday(). When 36 4. Methods measuring time in MATLAB, the tic/toc stopwatch was used. 4.4.2 Artificial targets To aid future testing of the algorithm, a way to create artificial targets was added. Creating new SAR-images that also have targets in them are expensive and time consuming. For instanse, to see if the algorithm has a low enough false alarm rate, these kind of artificially created images might be required. Making use of the filter class, a new derived class was made that adds a template target to the updated image. The following parameters are considered and can be set or randomized with a homogeneous distribution: • Number of targets • Amplitude of targets • Position of targets An array of added targets is also saved to the output. 37 4. Methods 38 5 Results The C++ realization of the change detection algorithm was in the end very success- ful. It produced exactly the same targets as with MATLAB and the execution time was much faster than what was expected from the beginning. This chapter presents a more detailed description of the different results. 5.1 Evaluation and comparisons The output from the C++ program that is of interest for comparison is a list of tar- gets found by the change detection algorithm. Each target have a position, which is the coordinate in the image, and a probability p(xT |aU , aR). Because of several dif- ferences with the MATLAB scripts the outputs are not expected to have exactly the same numerical values. Firstly, the MATLAB scripts included some extra process- ing parts that slightly altered the input data. Secondly, as previously discussed, the change detection function was implemented slightly different. Nevertheless, the two different programs are expected to have the same behaviour and the C++ program was, for the comparisons set, to mimic the behaviour of the scripts. 5.1.1 Comparison with MATLAB scripts A part of an image pair that was used is seen in Figure 5.1. Here there are two real targets about 10 m from each other. The C++ program was able find and locate these as two separate targets with probabilities of being targets greater than 99%. Also, no other target nominees with high probabilities were found within that area. This matches very well with what also was observed using the MATLAB scripts. The images were of size 36 Mpix but the change detection was done on smaller areas at around 1-2 Mpix. With the time consuming MATLAB scripts, only the area around the known targets was studied. The much faster C++ program could however cover the whole image pair. What should be noticed is that large parts of the two images were corrupted as can be seen from a full 36 Mpix image in Figure 5.2. This is due to the so called 39 5. Results (a) Reference image. (b) Update image. Figure 5.1: Illustration of two targets appearing in the reference image. They can be seen as two small dots in the center of the white circle in the update image. The circles are not a part of the original SAR images. nadir echo reflected from the ground below the flight path. Normally this is avoided by not taking images directly below the aircraft. In the image, everything right of the nadir track is just a mirror image of what is left of it. For a run with change detection on a grid with 36 squares of size 1 Mpix, 13 pre- sumably false targets were detected. A first group of of targets were found to the right of the flight path where there is no useful data. There were also targets found in the bottom left corner. Due to the nadir echo the two images differs to much creating unreliable clutter statistics. From Figure 5.3 this is apparent from what can be seen as diagonal lines and brighter spots in the images. Lastly, targets were also found around the dark areas in the bottom of the image pair. Here there is not enough data collected to accurately depict the area. For an area to have high resolution it has to be seen from a large range of angles looking from the aircraft. The flight path for both images in this pair was however rather short, resulting in worse resolution and image quality. The example above shows that with unreliable data the method produced false targets. The intended use for change detection is nevertheless for streaming data with full resolution and without artifacts caused by the nadir echo. The method however worked as intended in the non-corrupted areas where the two real targets were found and no additional ones. The targets placements in the sub images did also not seem to matter. In fact, if a target was located on the border between two sub images it was detected twice, once in each sub image. An additional method might thus be required to prevent double detection of the same target. For a second pair of images containing one target, the C++ program was able to find the target just like the MATLAB scripts. 40 5. Results Figure 5.2: The full 36 Mpix reference image containing two real targets. (a) Reference image. (b) Update image. Figure 5.3: Data that is corrupted which can be seen as diagonal lines within the images. Due to these lines differing, targets were detected despite there being no real deployed targets. 41 5. Results 5.1.2 Detection of false targets For testing purposes, one additional image pair from which the reference image is seen in Figure 5.4, was put through the change detection algorithm. Here there is as before, a nadir echo on the right hand side of both images. Therefore, the right side was neglected and only the left hand side was studied. No targets had been deployed between the images so in the best case scenario no targets should be detected. One object however caused the algorithm to detect targets. This object is seen in Figure 5.5 and is strongly reflecting. Comparing the SAR-image with an aerial photograph the reflection is likely due to a building with a metal roof. Some variations between the two images are (even if they should not be) detected as targets. As these variations are not typical for the sub image, p(aU |xC , aR) becomes low and thus η(x) high unless p(aU |xT , aR) is also low. The targets mentioned above may also be due to how p(aU |xT , aR) depends on the way targets are modeled using pT (a). In the current setup, the upper and lower bounds for amplitudes, amax and amin, are set to values between 0 and 1. The images are also scaled between 0 and 1 where 1 is the value for the pixel with the highest amplitude within the two images. With something highly reflective and with a large amplitude, amax and amin, which are chosen relative to this point instead of the forest around it, may become incorrect. Thus variations that are of greater amplitude than intended are seen as targets. 5.2 Execution time and parameters The main goal of the thesis was to make to make the change detection run automat- ically in real time. The real time limit was measured by comparing the execution time of the C++ change detection implementation to the creation of SAR images with Carabas. The creation of a 25 Mpix SAR image with the current system takes about 40 seconds, which sets a requirement on the maximum execution time change detection can have to still keep up with a constant stream of images. The execution time was found to vary between different data, settings and runs. For a 25 Mpix image with parallelization the time can be expected to be around 5 seconds. For a single sub image of 1 Mpix using typical parameters (see Section 5.2.4), one iteration takes around 0.43 s and the computations before the loop around 0.16 s. Since the goal of 40 seconds was achieved rather easily there was not as much time spent on trying to optimize it further. Instead more focus was put on parallelization for a potential integration of other SAR signal processing steps. The time aspect was nevertheless a big part of the project and below follows some observations regarding execution time. 42 5. Results Figure 5.4: The full 16 Mpix reference image where a false targets were detected. (a) Reference image. (b) Update image. Figure 5.5: False targets were detected by the bright spot in the middle of the circle. No big difference is seen between the two images. The circles are not a part of the original SAR images. 43 5. Results 5.2.1 Automatic stop One of the most potential aspects in terms of saving execution time is the automatic stop in the target evaluation. This functionallity makes it possible, without changing any parameters, to find several targets in one sub image while still not using that many of iterations for all sub images. The time saved will be highly dependent on the data and parameters. A test was carried out with the image pair in Figure 5.1, where kmax = 3 iterations are required to find both targets. Using the parameters ∆p = 0.2 and k∆p = 2 both targets were found and if the sub image was moved so that the targets were outside of it, the algorithm stopped after two iterations. Since targets are not expected most of the time, this potentially saves 33% of the computation time by not doing unnecessary computations. The percentage could potentially be even bigger if there is a desire to find more targets. Suitable values for ∆p and k∆p would therefore have to be found that saves as much time as possible while not missing any targets. 5.2.2 Type casting The most time consuming parts of the MATLAB script is when η(x) is computed for every pixel in the image by interpolating from a grid. The grid point used for a pixel is obtained by comparing the image amplitudes for that pixel with the the grid points using the < and > operators. During the interpolation, each pixel’s reference and update amplitude is effectively compared twice with each amplitude in the grid. So for 1 Mpix images and a 100×100 grid there is a total of 400 million comparisons (or 400 per pixel) which are also followed by some extra computations. In the C++ implementation the same was achieved with type casts instead. The image amplitudes which are floating point values, are type casted to integer values. These integer values are used as indices to obtain the grid points. A test of the com- putation time for type casting and a more optimized comparison based interpolation was made. Instead of always comparing with all grid points, the comparison based method compared with them one by one until the correct interval was found. So the image amplitudes are checked whether they are smaller than 0.01, then if they are smaller than 0.02 and so on until the correct intervals are found. Random ampli- tudes were generated assuming a logarithmic distribution from equation 4.6. This averaged out to about 26 comparisons per pixel for the comparison based method and took for 1 Mpix 0.092 seconds. Type casting did the same in 0.014 seconds with two type casts per pixel. If the computation time scales linearly with the number of comparisons a direct translation from the MATLAB scripts would have taken 1.42 seconds. In that case 99% of the computation time was cut of by using type casting. 44 5. Results Figure 5.6: Computation time in seconds of median filtering using different window sizes m. 5.2.3 Median filter In the final C++ implementation, the median filter uses around 90% of the com- putation time for an iteration in the change detection loop. The computation time of the median filter using different target or window sizes m on a 25 Mpix image is seen in Table 5.1 and in Figure 5.6. The new filter is compared with the MATLAB’s medfilt2 as well as an implementation of the quickselect algorithm [18], which is a fast method to determine the median for a variable amount of elements [10]. Looking at the results, the way of retaining information from one window to the next is faster than computing the median from scratch with quickselect. Comparing with medfilt2 which also uses some kind of technique to retain information from one window to the next, the new method is slower for the smaller m but catches up whenm grows larger. The size of a targetm was typically 5 (which equals 5×5 = 25 pixels). It was however desirable to be able to change to a larger target size. Since of the median filter consumes most of the computations, for faster change detections, faster median filtering is also required. For smaller m there are median filters faster than quickselect [10]. The downside being that they only work for one target size. A similar approach for sorting in conjunction with the new median filter could potentially speed up the process for certain values ofm. It could also be tested 45 5. Results m = 3 m = 5 m = 7 m = 11 m = 15 MATLAB’s medfilt2 3.66 10.4 19.7 48.0 87.8 Quickselect 6.68 18.3 34.2 80.7 148 New method 5.35 10.6 17.1 34.5 57.1 Table 5.1: Computation time in seconds of median filtering using different window sizes m on a 5000× 5000 matrix containing double precision floating point values. how necessary the median filtering is and if the window size has to be the same as the target size. If the median filtering is reduced but it does not affect how well targets are found, then the execution time will be faster. Finally, a faster execution time might not be the first priority. The current SAR signal processing, which is now slower than the change detection, could instead be investigated for potential improvements. 5.2.4 Parameters What causes different execution time for different data sets is primarily the target evaluation. For instance, the change detection of the image pair where one image is seen in Figure 5.2, resulted in several target detections. Since there are multiple high probability targets (true and false ones) it takes a longer time for all target nominees to converge which makes the algorithm run for more iterations. In the end, the execution time is longer than if there are no targets. Changing some of the other change detection parameters also did also impact the execution time. The parameters that were found to affect computation time are the following: • Size of a change detection sub image. This is the part of a SAR image used for change detection. A larger sub image includes more pixels and will thus require more computations. An optimal size should not have anything to do with the execution times but rather the clutter statistics. The sub image should have enough data represent the clutter while still accurately represent every part of the image and not be an over-generalization. Here sizes of around 1-2 Mpix had been used for the successful change detections. • Number of sub images from one (or multiple) SAR images. Since different sub images uses different processor cores, which there are twelve of, there can be up to twelve sub images without impacting the computation time that much. The sizes and number of sub images goes hand in hand. If change detection is to be carried out on a full SAR image, increasing one results in a decrease of the other, thus in the end not impacting the execution time. • Parameters deciding the number of iterations: kmax, ∆p and k∆p. Se Section 5.2.1. 46 5. Results • Size of the interpolation grid. Controlled with the parameters ∆aR and ∆Da. Making this grid contain more points could slow down the program. The number of bilinear interpolations and differentiations are in that case increased. A higher number of grid points than the used 100× 100 = 10000 have not yet shown to yield any advantages in change detection capabilities. An increased execution time will likely also be the case if the number of histogram bins are increased to be much larger. This is however not practical because there would not be enough data points to fill all bins. • Size of a target for median filtering. So far, the filtering window has been the size of a target. A bigger target would mean more numbers to sort for each median and thus more computations. An instance where a larger target would be used is if the images had a higher resolution. This is the case if the higher frequency band in Carabas is used. Because of the median filter represents around 90% of the computations, changing the target size will severely impact the computation time. • Choice of interpolation methods. There is currently no way of changing in- terpolation method within the C++ program without editing the code. If however the interpolation method is changed it should be noted that a com- putationally heavy interpolation method will impact the performance. 5.3 Parallelization with OpenMP and pthreads With parallelization, it was possible to utilize the hardware fully. The API pthreads was used for the final implementation but a master-worker model was utilized with OpenMP as well. A comparison between the two is seen in Table 5.2. The speedup and efficiency is however dependant on the input images, more precisely the number of sub images and the difference in execution time between the sub images. For example if there is only one sub image, only one core will be used and the execution time will not be any faster. This test was done with 36 sub images from Figure 5.2 where there is one sub image for each processor core. This could be seen as a best case scenario where all cores do the same amount of work. However, the two targets, the additional false targets and other differences caused the sub images to take different amount of time still causing some of the cores to finish before others. While this is only one example and do not tell how well the parallelization works in general or on average, it still works as a comparison between pthreads and OpenMP. As can be seen from the table, pthreads and OpenMP performs almost identically. The OpenMP solution was easier to implement as it only required to add an extra line in front of the for-loop that was already used for the sequential test. Pthreads on the other hand required more commitment but allowed for more flexibility which ended in the filter class. 47 5. Results Method Execution time (s) Speedup Efficiency Sequential 45.6 1 0.0833 OpenMP 4.73 9.64 0.803 Pthreads 4.72 9.66 0.805 Table 5.2: Execution time and speedup for change detection on one SAR-image divided into 36 sub-images. Method Execution time (s) Speedup Efficiency Sequential 57.3 1 0.0833 Pthreads 5.01 11.4 0.953 Table 5.3: Average execution time and speedup from streaming SAR-images. 5.3.1 Streaming data Results for how the final filter class performs with streaming data is seen in Table 5.3. The change detection was done 100 times in a row so that when there are no sub images left in the first image, the leftover cores immediately start working on the next image. This mimics how SAR images would be created in a real-time environment. The efficiency achieved at 95% is very close to the theoretical limit at 100% and not much more could be asked for. Images were added as soon as the input buffer had empty spaces but it could also work in a setup were new images are generated and added over time. In practice, if a new SAR image is created every 40 seconds and change detection takes around 5 seconds, it is unlikely to be more than one image processed in change detection at once. However it is possible to do change detection more than once on using same image but with different reference images or parameters. With the high efficiency there is also a potential in including existing signal process- ing that is not as efficient as this. If multiple C++ filters like the one implemented for change detection are added together, the same efficiency is not guaranteed. There were a total of 15 threads used here. One main thread creating the others and adding images to the input buffer and 14 tied to the filter where 12 are worker threads. As the workers are responsible for the majority of