Use of deep neural networks for classification of micro-Doppler signature from radar data Master’s thesis in MPCOM, MPSYS Daoyuan Yang, 940405-5791 Liting Zhou, 940622-2829 Department of Electrical engineering CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2018 ii Master’s thesis 2018 Use of deep neural networks for classification of micro-Doppler signature from radar data Daoyuan Yang, Liting Zhou Department of Electrical Engineering Chalmers University of Technology Gothenburg, Sweden 2018 Use of deep neural networks for classification of micro-Doppler signature from radar data Daoyuan Yang, Liting Zhou © Daoyuan Yang, Liting Zhou 2018. Supervisor: Kasra Haghighi, UniqueSec AB Examiner: Thomas Rylander, Chalmers University of Technology. Master’s Thesis 2018 Department of Electrical engineering Chalmers University of Technology SE-412 96 Gothenburg Telephone +46 723750147 Gothenburg, Sweden 2018 v Abstract This thesis explores the usage of deep neural networks for the classification of micro- Doppler signatures collected by means of radar. First, we conduct the simulation of a micro-Doppler model based on a Frequency-Modulated Continuous-Wave (FMCW) radar in combination with a point target, a bike wheel and a walking human, where the model is also validated with measurements. Second, we explore the use of mul- tiple deep-learning algorithms for micro-Doppler signature classification. A training set of 12 classes of human activities is measured and it is used to train Convolu- tional Neural Networks (CNN) and Recurrent Neural Networks (RNN). Although the overall test-accuracy is around 90%, the neural networks tend to mislabel some classes that have similar probabilistic distribution. Keywords: FMCW radar, micro-Doppler, CNN, RNN, classification, deep learning vi Acknowledgements We would like to thank Kasra Haghighi from Uniquesec AB, the thesis supervisor, for providing us with the opportunity to work on the project and for his continu- ous support during the past half year. We also appreciate the examiner, professor Thomas Rylander at Chalmers University of Technology, for helping us with thesis writing and for the permission to use computing cluster. viii x Contents List of Figures xv List of Tables xix 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Theory 5 2.1 Modeling micro-Doppler signature for FMCW radar . . . . . . . . . . 5 2.1.1 Stationary target . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Moving target with approximately constant velocity . . . . . . 8 2.1.3 Moving target with rapidly changing velocity . . . . . . . . . . 10 2.2 Micro-Doppler model of complex non-rigid bodies . . . . . . . . . . . 11 2.3 Classification of micro-Doppler signature by machine learning . . . . 12 2.3.1 Traditional machine learning (ML) algorithms . . . . . . . . . 13 2.3.2 Feature extraction algorithms . . . . . . . . . . . . . . . . . . 14 2.3.2.1 Principal component analysis (PCA) . . . . . . . . . 14 2.3.2.2 Linear discriminant analysis (LDA) . . . . . . . . . . 15 2.3.3 Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.3.1 K-nearest neighbors (k-NN) . . . . . . . . . . . . . . 16 2.3.3.2 Support vector machine (SVM) . . . . . . . . . . . . 17 2.3.4 Deep learning algorithms . . . . . . . . . . . . . . . . . . . . . 18 xi Contents 2.3.4.1 Logistic regression . . . . . . . . . . . . . . . . . . . 18 2.3.4.2 Feedforward neural networks . . . . . . . . . . . . . 19 2.3.4.3 Convolutional neural networks . . . . . . . . . . . . . 20 2.3.4.4 Recurrent neural networks . . . . . . . . . . . . . . . 22 2.3.4.5 Optimization and regularization of neural networks . 22 3 Hardware Description 25 3.1 FMCW radar evaluation kit introduction . . . . . . . . . . . . . . . . 25 3.1.1 Hardware connection . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.2 Antenna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.3 Transceiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.4 Power and controller board . . . . . . . . . . . . . . . . . . . 28 3.2 Hardware limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Modelling and Simulation 33 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Simulation and measurement settings . . . . . . . . . . . . . . . . . . 33 4.3 Point target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4 Bicycle wheel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.4.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.2 Experimental measurement . . . . . . . . . . . . . . . . . . . 39 4.5 Walking human . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.5.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.5.2 Experimental measurement . . . . . . . . . . . . . . . . . . . 45 5 Classification With Neural Networks 51 5.1 Data measurement campaign . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.3 Neural networks structure and classification results . . . . . . . . . . 54 5.3.1 CNN based models . . . . . . . . . . . . . . . . . . . . . . . . 55 5.3.1.1 Neural network structure . . . . . . . . . . . . . . . 55 xii Contents 5.3.1.2 Performance . . . . . . . . . . . . . . . . . . . . . . 55 5.3.1.3 Error analysis . . . . . . . . . . . . . . . . . . . . . . 57 5.3.1.4 The influence of DC component . . . . . . . . . . . . 59 5.3.2 RNN based neural networks . . . . . . . . . . . . . . . . . . . 60 5.3.2.1 Performance . . . . . . . . . . . . . . . . . . . . . . 60 5.3.2.2 Error analysis . . . . . . . . . . . . . . . . . . . . . . 61 5.4 Predict only the activity (4-class problem) . . . . . . . . . . . . . . . 61 5.5 Tuning parameters of neural networks . . . . . . . . . . . . . . . . . . 62 6 Discussion 67 6.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.1.1 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.1.2 Not enough data . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.1.3 Changes of the environment . . . . . . . . . . . . . . . . . . . 68 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.2.1 Changing radar setup . . . . . . . . . . . . . . . . . . . . . . . 69 6.2.2 Using practical activities . . . . . . . . . . . . . . . . . . . . . 69 6.2.3 Exploring other algorithms . . . . . . . . . . . . . . . . . . . . 70 6.2.4 Using simulation data to train . . . . . . . . . . . . . . . . . . 70 7 Conclusion 71 Bibliography 73 A Appendix 1 Spectrogram examples I B Appendix 2 Neural Network model summary V B.1 Software used in this thesis . . . . . . . . . . . . . . . . . . . . . . . . V B.2 Model summary of Double input CNN . . . . . . . . . . . . . . . . . V B.3 Model summary of RNN based neural networks . . . . . . . . . . . . VII xiii Contents xiv List of Figures 2.1 A plot of two successive sawtooth radar chirps’ frequency. . . . . . . . 6 2.2 The block diagram of an FMCW radar. . . . . . . . . . . . . . . . . . 7 2.3 A long chirp can be divided into multiple successive short chirps to meet the constant velocity assumption. . . . . . . . . . . . . . . . . . 11 2.4 A feedforward neural network with one hidden layer. . . . . . . . . . 20 2.5 An example of a convolutional layer and a pooling layer. . . . . . . . 21 2.6 The computational graph of a simple RNN with one hidden layer and a feedback connection of the hidden layer to itself (biases not shown). 22 3.1 The radar kit includes three parts: a transceiver board (RS2400K), a power and controller board (CO1000A), and a horn antenna with a gain of 20dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Noise power spectrogram density (PSD) measurement of the radar kit. 31 3.3 The actual signal transmission model of the radar kit. Two sawtooth chirps (approximated by staircase waveform) are plotted. Dashed line in each chirp represents the omitted steps, and one second gap is in between of two chirps due to serial communication. . . . . . . . . . . 32 4.1 Point target moves straight towards and away from the radar. . . . . 36 4.2 Point target oscillates tangentially to the radar. . . . . . . . . . . . . 37 4.3 The positions of the radar and the bike wheel in two situations. Due to the axis settings, the size of the wheel looks different in both figures. 38 4.4 Spectrograms of the wheel when the number of spokes is (a) one, (b) two, or (c) eight. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 xv List of Figures 4.5 Spectrogram of the wheel when the radar is placed at the z-axis. The wheel has one spoke. . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.6 Bike wheel measurement environment and spectrograms. . . . . . . . 41 4.7 The joints of the human model. . . . . . . . . . . . . . . . . . . . . . 43 4.8 The dimension of human parts. . . . . . . . . . . . . . . . . . . . . . 44 4.9 Human ellipsoid model. . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.10 Trajectories of human joints during a gait cycle. . . . . . . . . . . . . 44 4.11 The spectrogram generated by four body parts in high resolution FMCW radar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.12 The spectrogram generated by four body parts in low resolution FMCW radar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.13 The spectrogram of a walking human with high resolution radar set- tings S1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.14 The spectrogram of a human walking with low resolution radar set- tings S2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.15 Micro-Doppler measurement for human walking towards the radar with swinging arms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.16 Micro-Doppler measurement for human walking towards the radar without swinging arms. . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.1 Radar setup model for data measurement campaign. . . . . . . . . . . 52 5.2 Radar kit assembly with a 3D printed box. . . . . . . . . . . . . . . . 53 5.3 Data measurement campaign environment. . . . . . . . . . . . . . . . 53 5.4 The structure of the small CNN unit. . . . . . . . . . . . . . . . . . . 56 5.5 The structure of the Double input CNN which is built by two small CNN units. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.6 The plot of training history of Double input CNN. . . . . . . . . . . . 56 5.7 Confusion matrix of Double input CNN (12-class problem). . . . . . . 57 5.8 The example of a mislabeled sample. . . . . . . . . . . . . . . . . . . 59 xvi List of Figures 5.9 RNN based neural networks’ structure. Input: spectrogram of win- dow length 64 or 128; conv1d: 24 one dimensional convolutions with filter size 3; RNN unit: either GRU or LSTM unit with output size 24; avg: the forward output and backward output of bidirectional RNN units are averaged; Dense: two fully connected layers. Dashed lines represent recursive connection. . . . . . . . . . . . . . . . . . . . 60 5.10 Confusion matrix of bidirectional GRU model. . . . . . . . . . . . . . 62 5.11 Confusion matrix of bidirectional LSTM model. . . . . . . . . . . . . 63 5.12 Confusion matrix of Double input CNN (4-class problem). . . . . . . 64 5.13 Training history of an overfitting model based on Double input CNN when the data is divided into four different classes. . . . . . . . . . . 66 A.1 Spectrogram samples of the data set, calculated by STFT with win- dow length of 64 and an overlap of 16. Each row represents one posture. The order of the postures is: walking with swinging arms (walk_wa), walking without swinging arms (walk_woa), boxing while standing still (boxing) and standing still (standing. . . . . . . . . . . II A.2 The spectrograms of the same samples as in figure A.1, but calculated by STFT window length of 128 and an overlap of 32. Each row repre- sents one posture. The order of the postures is: walking with swinging arms (walk_wa), walking without swinging arms (walk_woa), boxing while standing still (boxing) and standing still (standing). . . . . . . III xvii List of Figures xviii List of Tables 3.1 RS3400K/00 24 GHz FMCW Transceiver Evaluation Kit Specification. 26 3.2 Command categories for radar controller. . . . . . . . . . . . . . . . . 28 4.1 Radar parameters for simulation and measurement. . . . . . . . . . . 34 5.1 Class description and the number of samples. . . . . . . . . . . . . . 52 5.2 Performance comparison between small CNN unit and Double input CNN (12 class problem). . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.3 Some common mistakes made by Double input CNN. . . . . . . . . . 58 5.4 RNN based neural networks performance. . . . . . . . . . . . . . . . . 61 5.5 Performance comparison between small CNN unit and Double input CNN (4-class problem). . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.6 Examples of the status of a machine learning model. . . . . . . . . . . 65 B.1 Software used in this thesis . . . . . . . . . . . . . . . . . . . . . . . . V xix List of Tables xx 1 Introduction 1.1 Background The original utility of radars is range and speed measurement. Another important application based on radars is target classification. Nowadays, radar-based target classification has gained a lot of interest due to demands in many areas such as se- curity, autonomous driving and health care [1]. One important feature that enables the classification of the target is the micro-Doppler signature generated by the tar- get [2]. The micro-Doppler phenomenon results from the individual motions of the target’s different parts, such as the arms and legs of a walking human. Thus, classi- fication of targets may be accomplished if the targets have different micro-Doppler signatures. Traditional methods for micro-Doppler signature classification requires significant domain knowledge and calculation to devise discriminative features. Recently, deep learning algorithms such as deep Convolutional Neural Networks (CNN) and deep Recurrent Neural Networks (RNN) have gained large research interests, since they can solve some complex problems without an explicit model. Therefore, the clas- sification may be done without sophisticated processing and feature extraction of the signal itself, where classical machine learning algorithms would require pre- processing that can successfully separate the classes by, e.g., hyper-planes. 1 1. Introduction 1.2 Problem formulation The goal of the project is to achieve activity classification of human activities mea- sured by a Frequency-Modulated Continuous-Wave (FMCW) radar. Four research questions will be addressed in the thesis: 1. What are the working principles of FMCW radars? 2. What is the performance of a commercially available FMCW radar at low price? 3. How should the micro-Doppler signatures be modeled for such FMCW radars? 4. What preprocessing of the raw radar signal is needed to form a data set that can be used to train machine learning models such as neural networks? 5. For vulnerable road users what classification performance can be achieved by different neural networks? To answer these questions, the project has two main parts: (i) the first part is the modelling, simulation and experimental evaluation of the micro-Doppler signatures measured by an FMCW radar; and (ii) the second part is to classify different human activities by CNN and RNN. 1.3 Contribution The main contribution of this project is to explore deep learning algorithms’ capabil- ity to classify the micro-Doppler signatures produced by different human activities. Despite some limitations in range resolution and noise floor of the off-the-shelf radar hardware, we demonstrate that neural network models are able to classify micro- Doppler signatures although the model size is kept small. These techniques can be implemented in low-cost embedded systems and, thus, contribute to many applica- tions in areas such as surveillance systems. 2 1. Introduction 1.4 Outline The rest of the thesis is structured as follows: • Chapter 2 presents a theoretical model of FMCW radars for some different scenarios and related work in micro-Doppler signature classification research. A brief introduction to machine learning is also given in this chapter. • Chapter 3 describes the detailed specification and usage of the FMCW radar developer kit, which is used for the rest of the project. This is followed by a discussion on some limitations of the radar hardware. • Chapter 4 demonstrates the simulation of the FMCW radar with three kinds of moving targets and a comparison with the measurement. • Chapter 5 describes the data measurement campaign, the pre-processing and the classification results of some neural networks (both CNN based and RNN based). For each neural network, the network structure is shown followed by classification result and error analysis. • Chapter 6 discusses the limitations of the methods and some future work. • Chapter 7 concludes the thesis. 3 1. Introduction 4 2 Theory 2.1 Modeling micro-Doppler signature for FMCW radar Micro-Doppler signature can be observed by many kinds of radars including pulse radars, Continuous-Wave (CW) radars and FMCW radars. For example, in [2], Chen shows that the micro-Doppler signatures of a helicopter and a human can be measured by a pulse radar by means of both simulation and measurement. It is also possible to observe micro-Doppler signatures with a CW radar where the baseband signal is a sinusoidal wave [3]. However, pulse radars and CW radars have their drawbacks. Pulse radars require a very accurate clock and high peak power. CW radars are not able to estimate the range of the target. To avoid these drawbacks, this thesis focuses on the FMCW radar. The FMCW radar is a special type of CW radar since it modulates the frequency of the baseband signal with some waveform such as sawtooth, triangle and sinusoidal waveform. A commonly used waveform is the sawtooth that implies that the radar signal’s frequency sweeps linearly in time from fc (carrier frequency) to fc + B, where B is the bandwidth. The sweep period is tp and sweep slope is k = B/tp. The expression of the transmitted signal with amplitude At is given by: 5 2. Theory Figure 2.1: A plot of two successive sawtooth radar chirps’ frequency. stx(t) = At cos [ 2π ( fct+ k 2 t 2 )] , (0 ≤ t ≤ tp) (2.1) Figure 2.1 shows the frequency of two transmitted sawtooth waveforms where such a linear variation of the frequency also is referred to as a chirp. The block diagram shown in figure 2.2 illustrates how the transmitted signal is generated. The baseband signal is generated by a sawtooth signal generator and a Voltage-Controlled Oscillator (VCO). An ideal VCO generates a sinusoidal signal whose frequency is proportional to the input voltage. Then, the baseband signal is mixed with the carrier signal and divided into two paths by the power splitter. One path is fed to the antenna and the other one is fed to the receiver. The power splitter requires a high isolation between the two output ports to avoid that the transmitted signal directly enters the receiver. In the following discussion, an FMCW model in combination with a point target is discussed for three cases. Here, we assume the radar is monostatic (transmitter and receiver are collocated) and it has an isotropic antenna. The radar is stationary and it is located at Rr in a global coordinate system. The target is located at Rt(t). It is moving with the velocity Vt(t). For this situation, we consider the three cases: 6 2. Theory Figure 2.2: The block diagram of an FMCW radar. • The target is stationary. • The target moves at approximately constant velocity. • The target moves at rapidly varying velocity. 2.1.1 Stationary target First, we consider the case when the target is stationary and it is located at a distance d = |Rt(0)−Rr| from the radar. The target is the only object illuminated by the radar. Because of the propagation time of the electromagnetic wave, the received signal is the delayed by τ = 2d/c with respect to the transmitted signal, where c is the wave propagation speed and Ar is the received amplitude determined by the radar equation [4]. The received signal is expressed as: srx(t) = Ar · stx ( t− 2d c ) (2.2) Figure 2.2 shows how the received signal is demodulated. The received signal srx(t) is mixed with transmitted signal and the 90-degree shifted version of the trans- mitted signal separately. The mixed signals pass low-pass filters and we get an 7 2. Theory in-phase (rI(t)) signal and a quadrature-phase (rQ(t)) signal in baseband. A com- plex baseband signal r(t) is constructed based on the combination of rI(t) and rQ(t) as follows: r(t) = rI(t) + jrQ(t) = AejΦ(t) Φ(t) = 2πktτ − πkτ 2 + 2πfcτ (2.3) where A is the amplitude of the baseband signal and it depends on miscellaneous gains of the whole process. The baseband signal has the frequency: fb = ∂Φ 2π∂t = kτ (2.4) where fb is called the range beat frequency. The distance between the radar and the object can be calculated by substituting τ = 2d/c into equation (2.4). 2.1.2 Moving target with approximately constant velocity Second, we assume that the target is moving. Its position, velocity and acceleration are denoted as Rt(t), Vt(t) and At(t) respectively. The velocity can be assumed to be approximately constant when the chirp duration is short enough. Short duration means that the target’s velocity does not change significantly during the chirp, i.e. ∫ t+tp t At(τ)dτ ≈ 0. While the target is moving at a constant velocity, the received signal has another frequency component induced by radial speed where this frequency component is referred to as the Doppler frequency. Denote the radial speed of the target as: vr(t) := Vt(t) · (Rt(t)−Rr) |Rt(t)−Rr| (2.5) Because of the constant velocity assumption, the radial speed is also assumed to be constant during one chirp, i.e. vr(t) ≈ vr(0), for 0 ≤ t ≤ tp. In order to measure the range and the speed from the received baseband signal, multiple successive chirps 8 2. Theory are transmitted. We analyze one chirp here, and it can be extended to the following chirps easily. Denote d(t) := |Rr − Rt(t)| as the distance between the target and the radar. Then, the initial distance is d(0). Substitute τ into equation (2.3) with τ(t) = 2(d(0) + vr(0)t)/c and, after some manipulation [5], the phase term (we still assume the same receiver structure as previously described) can be expressed as: Φ(t) = 2π  2d(0)k c ( 1− 2vr(0) c ) t+ 2vr(0) c fct + 2kvr(0) c ( 1− vr(0) c ) t2 + 2d(0) c ( fc − kd(0) c ), (0 ≤ t ≤ tp) (2.6) To further simplify equation (2.6), the range beat frequency (fb) and Doppler fre- quency (fd) are defined as: fb = 2d(0)k c fd = 2vr(0) c fc (2.7) In addition, a slowly moving object implies that we have (1 − vr(0)/c) ≈ 1 and (1−2vr(0)/c) ≈ 1, because vr(t)/c ≈ 0. By substitution of the range beat frequency and the Doppler frequency we get: Φ(t) ≈ 2π [ fbt+ fdt + 2kvr(0) c t2 + 2d(0) c ( fc − kd(0) c ) ], (0 ≤ t ≤ tp) (2.8) Equation (2.8) consists of four terms. From left to right they are a frequency term (range beat) which is proportional to range, another frequency term (Doppler shift) which is proportional to radial speed, a cross term that represents the range and Doppler coupling effect, and a constant phase term. A more detailed derivation can be found in [5]. 9 2. Theory Extending equation (2.8) to the multiple-chirp situation, the corresponding received baseband signal of each chirp contains the information of the range and velocity of the target at the start of each chirp. Specifically, assume N chirps are transmit- ted, and the chirps are indexed from i = 1, 2, · · · , N , the received baseband signal corresponding to each chirp can be expressed as: Φ(t) ≈ 2π [ fbit+ fdit + 2kvr(itp) c t2 + 2d(itp) c ( fc − kd(itp) c ) ], (itp ≤ t ≤ (i+ 1)tp) (2.9) where fbi and fdi is the beat frequency and Doppler frequency of the i-th received baseband signal: fbi = 2d(itp)k c fdi = 2vr(itp) c fc (2.10) Some algorithms, such as the two dimensional Fourier transform [6], use multiple chirps to compute the range and radial velocity. In our micro-Doppler classification problem, the spectrogram produced by multiple successive chirps is also more desir- able than single chirp because when the measurement time is longer, the received signal reveals more information about the movement pattern of the target. 2.1.3 Moving target with rapidly changing velocity In the last case, the target’s velocity changes significantly during the sweep time tp of the chirp, which is a result of non-zero acceleration. However, a long chirp can be divided (virtually) into multiple successive chirps which are short enough such that the assumption of constant velocity is valid. In other words, if there exists a maximum chirp duration tmax for the constant velocity assumption to hold but tp > tmax, we can assume N successive chirps is transmitted such that tp/N � tmax. 10 2. Theory Each short chirp then sweeps from fc + iB/N to fc + (i + 1)B/N during the time interval from itp/N to (i + 1)tp/N . In figure 2.3, a long chirp is divided into four shorter chirps and each lasts tp/4. The range resolution of an FMCW radar [7] is: δd = c 2B (2.11) Because the long chirp is divided, the bandwidth B of each short chirp decreases as the range resolution δd increases although the total bandwidth is fixed. Figure 2.3: A long chirp can be divided into multiple successive short chirps to meet the constant velocity assumption. 2.2 Micro-Doppler model of complex non-rigid bod- ies The models in the previous section apply to point targets. A point target is an idealized target that is useful in a mathematical model partly because we assume that it reflects the incident wave equally in all directions. Using a point target can simplify the radar model, but the objects of interest in practice have complex shape that scatters the incident wave non-uniformly. Non-rigid bodies such as humans and animals can also change their shape during the movement. The deformation of the body makes it difficult to simulate the scattered electromagnetic field. An approach to solve the problem is approximating a non-rigid body by a collection of connected 11 2. Theory rigid bodies. In [8], the micro-Doppler signature of a walking human measured by a pulse radar is simulated. By approximating the limbs and the trunk of the human by ellipsoids of different sizes, the motion of different parts of human can be clearly identified from the spectrogram. In the high-frequency limit the backscattering of an ellipsoid is similar to a point scatterer in the sense that the backscattering is associated with a small area on the ellipsoid where the surface normally points towards the radar, which applies to metal ellipsoids in particular and to varying degree other materials that may approximate metal. However, the Radar Cross Section (RCS) of an ellipsoid depends on its size and the aspect angle to the radar. Here we analyzed the range and radial velocity contributions to the output signal of an FMCW radar for the point target. In the next chapter we will apply this model in combination with ellipsoids in order to simulate the micro-Doppler signature of a walking human. 2.3 Classification of micro-Doppler signature by machine learning The problem of interest is to classify radar targets based on their micro-Doppler signatures. Related work in this area can be divided into two categories. In the first category, sophisticated pre-processing is first applied to the radar signal to extract certain features, and these features are then used to train some traditional classifiers such as a support vector machine (SVM). For example, Kim and Ling [9] used SVM on six features to classify seven activities of a human. The other category involves deep neural networks that usually require only very little pre-processing. Kim et al. [10] recognized seven gestures with the aid of a convolutional network (CNN). CNN also succeeded in human detection and human activities classification [11]. In this section, an overview of commonly used algorithms in related work is given. 12 2. Theory 2.3.1 Traditional machine learning (ML) algorithms The workflow of traditional ML shown below is an iterative process. The step four to six are usually iterated until the performance for both the training set and the test set are satisfying. 1. Preprocess the raw data. 2. Divide the data into a training set and a test set. 3. Extract features from the data. 4. Select a classifier and a set of hyperparameters. 5. Train the classifier with the training set. 6. Validate the accuracy given the test set. The first step is pre-processing the raw data recorded by the radar sensor. One way is to apply the short-time Fourier transform (STFT) to the raw data. The raw data can be further improved by clutter suppression algorithms such as notch filtering [12] and noise threshold [9]. To achieve high accuracy, a good feature extraction algorithm must be selected, which is the second step. A "good" feature extraction algorithm significantly reduces the dimension of the input data while keeping the important features for classifiers to work with. It has several advantages over directly feeding the raw data into the classifier. First, the raw data expressed as a spectrogram usually has a dimension of a few thousand, which easily results in an overfit when the training dataset is small. Second, the feature extraction saves computational resources by reducing the dimension of the data. After feature extraction, the workflow enters an iterative process in which a classifier is selected and trained while the hyperparameters are tuned in order to achieve a good performance. In the following sections, a few feature extraction algorithms and classifiers will be briefly introduced. 13 2. Theory 2.3.2 Feature extraction algorithms Sometimes people need to deal with high dimensional data such as images. Fea- ture extraction algorithms can extract necessary features from the high dimensional data and, thus, form a low dimensional representation of it. Feature extraction is necessary when the classifier suffers from high dimension of input data and the number of training data is small. Features can be either handcrafted or learned by ML algorithms. For example, six physical features are designed and extracted from the spectrogram of human activities [9] for classification. This is an example of handcrafted feature extraction. Apparently, it requires a lot of domain knowledge to design such features to represent the original data. ML algorithms such as the principal component analysis (PCA) and the linear discriminant analysis (LDA) are two linear algorithms that do not require much understanding of the data. They are commonly used to solve problems such as image classification and handwriting recognition. Jingli et al. extend the PCA and the LDA to 2-dimensional spectro- gram and use SVM as a classifier to classify human activities [13]. The PCA and the LDA are described in this subsection while the detailed derivation of the equations can be found in chapter 15 and chapter 16 of [14]. 2.3.2.1 Principal component analysis (PCA) PCA projects high dimensional vectors to a lower dimensional space. The projection to this lower dimensional space is determined by the dataset. The algorithm is unsupervised, i.e. no class label is needed. Suppose the dataset contains N data points (xn, 1 ≤ n ≤ N) and the data points are D dimensional real vectors. The PCA projects xn to a M -dimension vector yn by: yn = Bxn + c1 for 1 ≤ n ≤ N (2.12) where c1 is a constant bias vector. The projection matrix B which is orthogonal (BBT = BT B = I) and the bias vector are selected to minimize the square distance 14 2. Theory error between the reconstructed data point x̃n and the original data point: x̃n = BT yn + c2 L(B,Y , c2) = N∑ n=1 D∑ j=1 (x̃n j − xn j )2 (Loss function to minimize) Y := [y1, · · · ,yN ] (2.13) where c2 is another constant bias vector. It turns out that the projection matrix B and the bias vectors c1, c2 are closely related to the mean and the covariance matrix of the dataset. The PCA can be conducted as follows: 1. calculate the mean vector and the covariance matrix of the dataset: m = 1 N N∑ n=1 xn, S = 1 N − 1 N∑ n=1 (xn −m)(xn −m)T (2.14) 2. conduct eigenvalue decomposition on S to get eigenvectors e1, · · · , eM and eigenvalues λ1, · · · , λM (sorted in descending order with respect to eigenval- ues), discard the rest of eigenvectors and eigenvalues. 3. form the matrix as E = [e1, · · · , eM ], and the projection function is given by: yn = ET (xn −m) (2.15) 4. the reconstruction function is: x̃n = Eyn + m (2.16) 2.3.2.2 Linear discriminant analysis (LDA) The LDA is a type of supervised feature extraction algorithms that utilize the labels of the data points. The purpose of the LDA is to find a projection W (dimension D × M such that the separation of projected data points y = W T x within the same class is minimized while the separation of projected data points from different classes is maximized. 15 2. Theory Assume the dataset contains C classes and that each class has Nc data points, where 1 ≤ c ≤ C. Let Xc denote the subset that contains all the data points from a class c, where the mean mc and the covariance matrix Sc of each subset are calculated similarly to the equation (2.14). The algorithm is conducted as follows: 1. compute the between-class scatter matrix A: A = C∑ c=1 Nc(mc −m)(mc −m)T (2.17) where m is the mean of the whole dataset 2. compute the within-class scatter matrix B: B = C∑ c=1 NcSc (2.18) 3. compute the Cholesky factor of B, denote it as B̃ 4. compute the M principal eigenvectors of B̃−T AB̃−1 as W̃ T = [e1, · · · , eM ] 5. the projection matrix is W = B̃−1W̃ T 2.3.3 Classifiers A classifier can predict the label of a novel input based on the training data. In the study of Jiajin et al. [15], three kinds of classifiers (k-nearest neighbors, support vector machine and Bayes linear classifier) are used to classify simulated radar data. The first two classifiers are commonly used and usually perform better than Bayes linear classifier, so they are discussed in this section. The detailed discussion can be found in chapter 14 and chapter 17 of [14]. 2.3.3.1 K-nearest neighbors (k-NN) The k-NN assigns a label to a new input based on the k nearest training data to the input by counting the number of neighbors belonging to each class. Here, the k is a 16 2. Theory hyperparameter that is selected empirically in cross-validation. Define the training set as X = {x1, · · · ,xN} and the corresponding labels are y1, · · · , yN ∈ {1, · · · , C}, the k-NN algorithm is as follows: 1. compute the distance between the input x and every training data points dn = d(x,xn), for n = 1, · · · , N 2. select the k nearest training data points, and count the number of data points belonging to each class as l1, · · · , lC 3. assign the input to the class y = argmaxy ly The distance function can be Euclidean distance or other distance such as Maha- lanobis distance [14]. 2.3.3.2 Support vector machine (SVM) For simplicity, we introduce the SVM with hard decision margin first. Suppose the same notation of training set as k-NN, but the labels are binary: y1, · · · , yN ∈ {−1, 1}. The SVM finds two hyperplanes wT x+b = ±1 that separate the two classes while maximizes the distance (or the margin 2 |w|) between the two hyperplanes. To find the weights w and the bias b, a quadratic optimizing problem needs to be solved: minimize 1 2 |w| 2 subject to: yn(wT xn + b) ≥ 1, n ∈ {1, · · · , N} (2.19) A more practical SVM is the one with soft margin, which allows some mislabeling to happen in the training data. It is useful because the training data is not always linearly separable. Then, the optimization problem becomes: minimize 1 2 |w| 2 + 1 2C ∑ n (ξn)2 subject to: yn(wT xn + b) ≥ 1− ξn, ξn ≥ 0, n ∈ {1, · · · , N} (2.20) 17 2. Theory or, minimize 1 2 |w| 2 + C ∑ n ξn subject to: yn(wT xn + b) ≥ 1− ξn, ξn ≥ 0, n ∈ {1, · · · , N} (2.21) Equation (2.21) and equation (2.20) define the 1- and 2-norm soft-margin SVM sep- arately. The constant C is a hyperparameter that controls how much mislabeling is allowed. The slack variable ξn is the distance between xn and the correct margin. The optimization problems can be easily solved and there are many existing com- puter implementation. Assume (w∗, b∗) is the optimal solution. Then, the input data is assigned to the class sgn(w∗T x + b∗). 2.3.4 Deep learning algorithms Strictly speaking, deep learning (or deep neural network) belongs to ML algorithms. The word "deep" means that the number of layers of a neural network is large. One important reason to develop and use deep learning is that it can learn and represent highly complex functions for data in a high dimensional space, where traditional ML often fails [16]. In this section, we give an overview of some basics of neural networks beginning with the simplest case, which is logistic regression. 2.3.4.1 Logistic regression Given the input vector x, the weights w and bias b, the logistic regression calculates the following output y: a = wT x + b ŷ = σ(a) = 1 1 + e−a (2.22) The sigmoid function σ(a) maps any real number to (0, 1). In a binary classification problem with y ∈ {0, 1}, one can interpret the output of sigmoid as the likelihood 18 2. Theory ŷ = p(y = 1|x; W , b). The loss function is defined as the cross-entropy of the prediction ŷ and the ground truth y: L(ŷ, y) = −(y log (ŷ) + (1− y) log (1− ŷ)) (2.23) while the cost function of N training examples is: J (w, b) = 1 N N∑ i=1 L(ŷi, yi) (2.24) where the superscript i denotes the i-th training example. To train the model, a gradient descent method with learning rate α is used to find (w∗, b∗) that minimize the cost function: w := w − α ∂J ∂w b := b− α∂J ∂b (2.25) 2.3.4.2 Feedforward neural networks Feedforward neural networks (also known as fully connected neural networks) have multiple layers including one input layer, one output layer and several hidden layers. For each hidden and output layer, a linear operation is first applied to the output from the previous layer and the activation is calculated and fed to the next layer: h[l] = W [l]a[l−1] + b[l] a[l] = g(h[l]) (2.26) where the superscript [l] denotes the values associated to l-th layer. In equa- tion (2.26), the parameters to be learned is the weight matrices W [l] and the bias vectors b[l] of every layer. The function g(h) is a nonlinear activation function 19 2. Theory x g(h[1]) ŷ Hidden layerInput layer Output layer w[1], b[1] w[2], b[2] J(ŷ, y) Cost y Figure 2.4: A feedforward neural network with one hidden layer. that calculates the element-wise activation of h. In figure 2.4, a feedforward neural network with one hidden layer is illustrated. Similarly to logistic regression, the weights and biases are updated by gradient based optimization method to minimize the cost function during training. The training process is called backpropagation. Depending on the application, the activation function of the output layer is not limited to sigmoid function (suitable for binary classification). For example, in a multiple-class classification problem the activation at the output can be soft-max function whose output is a vector. The elements of the output vector of soft-max function are the prediction of the likelihood that the input belongs to each class. 2.3.4.3 Convolutional neural networks The basic CNN consists of three kinds of layers: convolution layers, pooling layers and fully connected (feedforward) layers. The input data is often called "volume" because it is a three-dimensional array (m × n × c). During convolution step, the volume is convolved with p filters, each has a dimension of f × f × c. Much like the convolution operation in one-dimensional space, each filter slides through the input volume in the first two dimensions and produces a two-dimensional array. After stacking the output p filters and applying element-wise activation function, the 2-D convolution operation is done. The pooling layer reduces the dimension of the volume. It is applied to each channel 20 2. Theory x1,1 x1,2 x1,3 x1,4 x1,5 x2,1 x2,2 x2,3 x2,4 x2,5 x3,1 x3,2 x3,3 x3,4 x3,5 x4,1 x4,2 x4,3 x4,4 x4,5 x5,1 x5,2 x5,3 x5,4 x5,5 a1,1 a2,1 a3,1 a4,1 a1,2 a2,2 a3,2 a4,2 a1,3 a2,3 a3,3 a4,3 a1,4 a2,4 a3,4 a4,4 p1,1 p2,1 p1,2 p2,2 Input Convolution Pooling Figure 2.5: An example of a convolutional layer and a pooling layer. separately, and the element of the output 2-D array is the average or the maximum value of a portion of the input. After some convolution and pooling layers, the volume is flattened and fed into a feedforward neural network introduced previously. To illustrate the process further, assume that the input x is a 2-D array (single channel). It is first convolved with a 2 × 2 filter w, and a 2 × 2 max pooling layer follows. As figure 2.5 shows, the convolution operation takes a portion of x which has the same size as the filter and computes the sum of the element-wise multiplication of that portion and the filter, followed by a non-linear activation g(): a11 = g( ∑ i=1,2 ∑ j=1,2 wijxij) (2.27) The rest of the elements are obtained similarly by shifting the filter within the input volume. The second step is max pooling, which puts a sliding window on the volume and selects the maximum value during sliding. An important property of CNN is parameter sharing. Note that the same filter is applied to multiple portions of the input data in CNN while in feedforward network each element in the input data is multiplied by a different factor. This allows the CNN to be trained by fewer data and the number of parameters in the neural network is significantly reduced compared to feedforward neural networks. 21 2. Theory x h o L y W U V Figure 2.6: The computational graph of a simple RNN with one hidden layer and a feedback connection of the hidden layer to itself (biases not shown). 2.3.4.4 Recurrent neural networks Recurrent neural networks (RNN) have succeeded in processing sequential data such as natural language. RNN has very flexible structures, and we select an example shown in figure 2.6 to illustrate how it works. The recurrent connection with a solid square means that the hidden unit is connected to itself at the next time step. The weight matrices are denoted as U , V and W . Denote the variables at time step t by adding a superscript (t). Then, the output vector o(t) is calculated from the input x(t) and the previous hidden unit h(t−1) as follows: a(t) = b + W h(t−1) + Ux(t) h(t) = g(a(t)) (activation) o(t) = c + V h(t) (output) (2.28) where b and c are biases, and g() denotes the activation function, and the loss at time t is L(t). The loss function is selected based on the application. 2.3.4.5 Optimization and regularization of neural networks Designing the architecture of a neural network is one piece of the puzzle, while training it can be difficult sometimes, especially when the network is very deep. Many optimization methods have been proposed to accelerate the optimization. Adam [17] is an effective method among them. Another problem arises when the neural network is well optimized on a training set but the performance is poor on 22 2. Theory a test set. In this case, regularization methods such as drop out [18] and L1, L2 regularization [16] are useful to prevent the model from overfitting. 23 2. Theory 24 3 Hardware Description 3.1 FMCW radar evaluation kit introduction The hardware used for the measurement campaigns in this thesis is the RS3400K/00 programmable radar development kit from Sivers IMA. The kit includes a transceiver, a control board and a horn antenna. This section first presents some important pa- rameters of each component of the development kit. Second, each component is briefly described. Two important limitations of the radar kit and how the limita- tions affect the measurement are discussed in the following section 3.2. In table 3.1, some important parameters the radar are listed. 25 3. Hardware Description Table 3.1: RS3400K/00 24 GHz FMCW Transceiver Evaluation Kit Specification. Parameter Unit RS3400K/00 FMCW transceiver module Carrier frequency 24.75 GHz Bandwidth 1.5 GHz AN1020K/00 antenna Gain 20 dB Bandwidth 22 - 33 GHz H-plane 3 dB beamwidth @24 GHz 18.6 degree E-plane 3 dB beamwidth @24 GHz 16.1 degree CO1000A/00 power and controller board Power supply 12 V Connection Serial Sample rate up to 20 KHz 3.1.1 Hardware connection As shown in figure 3.1, the transceiver module (RS3400/00) is connected to the power and controller board through two dual-row PCB headers. The antenna (AN1020K/00) can be connected to the transceiver either through a male-to-male SMA connector or a coaxial cable. In this project, the antenna is attached to the board through a male-to-male SMA connector. The power and controller board (CO1000A/00) requires a 12V DC power supply. Signals can be transmitted to a computer through RS232 serial/USB cable. 3.1.2 Antenna The radar is equipped by a horn antenna. The antenna has a rather narrow beam width and, thus, the distance between the radar and the target must be sufficiently 26 3. Hardware Description Figure 3.1: The radar kit includes three parts: a transceiver board (RS2400K), a power and controller board (CO1000A), and a horn antenna with a gain of 20dB. large in order to illuminate the target completely with its main lobe. For example, the distance between a typical bike wheel (36 cm in radius) and the radar should be more than 3 m. 3.1.3 Transceiver The transceiver is a compact module that integrates radio frequency circuit and a microcontroller for executing the commands of the control board. The module has a default center frequency at 24.75GHz. The bandwidth can be set from 0 to 1.5 GHz. This frequency range is actually within the ultra-wide bandwidth (UWB) of the 24 GHz radar. According to spectrum regulations and standards developed by the European Telecommunications Standards Institute (ETSI) and Federal Communications Commission (FCC),by the year 2022 in both Europe and the USA, the use of the UWB band will be phased out. The central frequency should be limited in the Narrow-Band (NB) from 24.05 - 24.25 GHz. In addition, a relatively high emission power up to 20 dBm is allowed with a bandwidth up to 250 MHz[19]. 27 3. Hardware Description 3.1.4 Power and controller board The power and controller board is used to power the transceiver module. Also, it has a microcontroller for the FMCW frequency sweep control of the transceiver. It also has a 10-bit ADC to sample the analog signal from the transceiver. A computer is needed to conduct the radar measurement. Communication between the computer and the control board occurs over RS-232. Human-readable commands could be used to set up desired parameters. Commonly used commands have several main categories. They are listed in table 3.2. Each category has sub-categories. INIT is used to initialize the transceiver to the previous setting rather than the default values. FREQUENCY and SWEEP are used to define the transmitted signal. Some parameters in these categories are relative to each other, e.g. start/stop frequency and bandwidth, and this implies that a modification in one parameter might result in changes of the other parameters. TRIG is used to define the trigger method of the transceiver and to put the device in a ready state for measurement. TRACE is used to return measurement result to the computer. Command categories Function INIT Initialize the transceiver. FREQUENCY Control frequency parameters, e.g. bandwidth. SWEEP Control chirp parameters, e.g. duration, numbers. TRIG Control trigger parameters TRACE Return measurement data HELP Provide a simple list of available commands Table 3.2: Command categories for radar controller. 3.2 Hardware limitations The radar kit has some unexpected features that we need to consider while designing data measurement campaign. 28 3. Hardware Description First, the measurement is not efficient. In the theoretical model, we described a high definition radar which can transmit successive short chirps. This allows the mea- surement to clearly show the change in range beat and Doppler frequency. However, the radar kit is not able to work in that way. The user needs to send a measurement command to the control board for each chirp. After the measurement command is received by the control board, it starts to measure and write measurement samples to its buffer. The buffer can store at most 1501 numeric samples. The control board returns the samples to a computer through the serial interface after the measure- ment is finished, and the last step takes a very long time (about one second for 1501 samples) as compared to the chirp duration (5 ms). More importantly, it is not possible to send another measurement command during this process. This de- lay caused by the serial communication makes it impossible to transmit and receive successive short chirps. If we were to transmit 256 5-ms chirps, it could take about 18.34 seconds to complete the data acquisition, where only 1.28 seconds are effective while the rest of the time is merely spent on communication overhead. In this case, the measured spectrogram is not able to show the continuous movement of a target. Second, the actual frequency sweep of the chirp is not a sawtooth waveform but a staircase approximation of the linear frequency sweep associated with the sawtooth. Thus, the radar transmits a sinusoidal signal with a piecewise constant frequency that is increased in multiple steps. The radar allows the user to set the duration of the sinusoidal signal for each time interval with constant frequency, (called sweep:idle according to the user documentation). Another parameter is freq:points that corre- sponds to the number of time intervals with constant frequency. The radar returns freq:points samples to the computer, so the chirp duration equals the product of sweep:idle and freq:points. Last, measurements show that the radar returns a baseband signal with a strong DC component and some noise in its low frequency band. The amplitude and the occupied bandwidth of noise increase as the bandwidth increases. The noise measurement was conducted on the top floor of a seven story building. During the measurement, the antenna was pointed to the sky and 50 chirps were transmitted for 29 3. Hardware Description each bandwidth. In figure 3.2a, the noise power spectrogram density (PSD) shows little difference below 400 Hz when different values for the bandwidth is used. After the DC component is suppressed (shown in figure 3.2b), the noise at the higher frequency range (400 - 900 Hz) is about -15 dB/Hz. However, as the bandwidth increases, the noise amplitude at low frequency significantly increases. Increasing the bandwidth can improve the range resolution of the radar but it also introduces noise that can potentially overwhelm the Doppler frequency induced by slow-moving target. Here, we need to make a compromise when we select the bandwidth. High bandwidth leads to high range resolution but, unfortunately, it also results in lower SNR for the low frequency band. Due to the reasons mentioned above, we finally choose a 750 ms radar chirp with 250 MHz bandwidth as listed in table 4.1 in the next chapter. Figure 3.3 shows the actual signal transmission model of the radar kit. 30 3. Hardware Description 0 100 200 300 400 500 600 700 800 900 1000 frequency / Hz -20 0 20 40 60 80 100 P S D / ( d B /H z ) noise PSD 0 250M 500M 750M 1000M 1250M 1500M (a) Noise PSD (raw data) 0 100 200 300 400 500 600 700 800 900 1000 frequency / Hz -20 -15 -10 -5 0 5 10 15 20 P S D / ( d B /H z ) noise PSD 0 250M 500M 750M 1000M 1250M 1500M (b) Noise PSD (mean subtracted) Figure 3.2: Noise power spectrogram density (PSD) measurement of the radar kit. 31 3. Hardware Description Figure 3.3: The actual signal transmission model of the radar kit. Two sawtooth chirps (approximated by staircase waveform) are plotted. Dashed line in each chirp represents the omitted steps, and one second gap is in between of two chirps due to serial communication. 32 4 Modelling and Simulation 4.1 Introduction Theoretical modelling and simulation provide insights into the micro-Doppler phe- nomenon. A radar propagation model with a non-rigid target is difficult to construct. To simplify the problem, a human body can be modeled as jointly connected rigid parts which is described in section 4.5. To break down the problem even further, the simulation of a single rigid body, like a bicycle wheel, is shown in section 4.4. Some samples of a rotating bike wheel and walking human were measured in the laboratory. The measurement results are demonstrated after the simulation. But at the very beginning of the work, a point target is used to evaluate the radar model in section 4.3. Although the radar sensor’s limitations in noise performance and resolution cause difference between simulation and measurement, the theoretical model is still a good way to help us to understand how FMCW radar measures micro-Doppler signature. 4.2 Simulation and measurement settings Table 4.1 lists three different sets of parameters that are used in this thesis, and they are labeled as S1, S2 and S3. Here, S1 is the ideal (high resolution) setting used in the simulation of the oscillating point, the bicycle wheel and the walking 33 4. Modelling and Simulation human. S2 is the low resolution setting, which is similar to the actual hardware setting used in the measurement. S3 is the setting of the radar hardware used in the measurement. These settings are used in the simulation and measurement in the rest of the chapter. Symbol Meaning Unit S1 1 S2 2 S33 tp Chirp duration ms 5 5 750 fc Carrier freq. GHz 24 fcS2 2 24 B Bandwidth MHz 250 1.67 250 N # of tx chirps / 256 600 2 fs Sample rate kHz 20 1.5 2 δd range res. m 0.6 90 / / waveform / sawtooth sawtooth staircase 1 S1 is the ideal (high resolution) setting used in the simulation corre- sponding to figure 2.1. The radar transmits successive 5-ms sawtooth chirps with this setting. 2 S2 is the low resolution setting corresponding to figure 2.3. The radar transmits four long (750 ms) sawtooth chirps, and each has a bandwidth of 250 MHz. Each long chirp is simulated by 150 short chirps with a band- width of 1.67 MHz and a duration of 5ms. The carrier frequency of short chirps changes periodically, i.e., fcS2 = 24 + 1.67× 10−3n (0 ≤ n ≤ 149). 3 S3 is the hardware setting used in the measurement corresponding to figure 3.3. The radar transmits two 750-ms staircase chirps, and a gap between the chirps exists due to serial communication. The exact range resolution cannot be calculated from equation 2.11 because of the stair- case waveform. Table 4.1: Radar parameters for simulation and measurement. 34 4. Modelling and Simulation 4.3 Point target 4.3.1 Simulation An moving point target is used to simulate the radar model described in section 2.1. Here, we assume that the chirp duration tp is small and multiple successive chirps are transmitted. The radar has an isotropic antenna and the point target moves with a sinusoidal velocity along the y-axis. The position of the target is Rt = [0, r0 + A · cos(2π · fmt), 0] given the oscillation frequency fm = 0.5 Hz and the amplitude A = 1 m, where the average position is r0 = 10 m. To illustrate that the relative position of the target and the radar significantly influence the micro- Doppler signature, the radar will be placed at [0, 0, 0] (target moving towards and away from the radar) and [r0, r0, 0] (moving tangentially to the radar). The radar is assumed to be operated with a short chirp duration and a high sampling rate, where the parameters are listed as S1 in table 4.1. Here, the parameters r0 and fm are chosen in accordance with the measurements that will be discussed later. The distance between the target and the radar is 10 m in the simulation, while the distance in measurement is around 8.5 m. The fm = 0.5 Hz can represent the typical frequency of a human activity such as arm swinging while walking. The three sub-figures in figure 4.1 show the situation when the point target and the radar are both on the y-axis. This allows the point target to produce the strongest variation in Doppler and range beat. Sub-figure 4.1a is the trajectory of the point target in relation to the radar, while sub-figure 4.1b is the variation of the range beat frequency and the Doppler frequency, and 4.1c is the spectrogram. The trace with the strongest amplitude in the spectrogram is approximately the sum of the two frequencies. When the radar is re-placed at [r0, r0, 0] and the distance of the point target relative to the radar remains approximately constant, as shown in sub-figure 4.2. The radial speed and variation in the distance are significantly reduced. As a result, the trace 35 4. Modelling and Simulation -1 -0.5 0 0.5 1 x/m0 5 10 y/m -1 -0.5 0 0.5 1 z /m target radar (a) Trajectory of the target (blue dashed line). 0 0.2 0.4 0.6 0.8 1 1.2 1.4 t/s -1000 -500 0 500 1000 1500 2000 2500 3000 3500 fr e q ./ H z doppler beat (b) Range beat (red) and Doppler frequency (blue). 0.2 0.4 0.6 0.8 1 1.2 Time (secs) 0 1 2 3 4 5 6 7 8 9 10 F re q u e n c y ( k H z ) -60 -55 -50 -45 -40 -35 -30 -25 P o w e r/ fr e q u e n c y ( d B /H z ) (c) Spectrogram of the baseband signal. Figure 4.1: Point target moves straight towards and away from the radar. in the spectrogram becomes almost a straight line and it corresponds to a target at an approximately constant distance with an approximately zero radial velocity. 4.4 Bicycle wheel From the theoretical simulation of a point target with the radar placed at different positions, we know that the micro-Doppler signature is highly dependent on the incident angle of the radar signal to the moving target. A bicycle wheel is a useful test case since it can produce periodic movement of the spokes as the wheel is 36 4. Modelling and Simulation 0 2 4 6 8 10 x/m9.5 10 10.5 y/m -1 -0.5 0 0.5 1 z /m target radar (a) Trajectory of the target (blue dashed). 0 0.2 0.4 0.6 0.8 1 1.2 1.4 t/s -500 0 500 1000 1500 2000 2500 3000 3500 fr e q ./ H z doppler beat (b) Range beat (red) and Doppler frequency (blue). 0.2 0.4 0.6 0.8 1 1.2 Time (secs) 0 1 2 3 4 5 6 7 8 9 10 F re q u e n c y ( k H z ) -60 -55 -50 -45 -40 -35 -30 P o w e r/ fr e q u e n c y ( d B /H z ) (c) Spectrogram of the baseband signal. Figure 4.2: Point target oscillates tangentially to the radar. rotating, where the bike spokes may be coarsely modeled by individual point targets (inspired by Victor’s study on helicopter engine [2]). 4.4.1 Simulation Compared to the point target, a bicycle wheel could generate a more complex micro- Doppler signature. All the radar simulation parameters are identical to the case with the point target in the previous section, where these are listed as S1 in table 4.1. The spokes of the wheel are modeled as Nw point targets that rotate in the x-y 37 4. Modelling and Simulation -5 0 5 x/m-2 0 2 4 6 8 10 y/m -5 0 5 z /m spoke (as point target) radar (a) Side view simulation 0 0.4 2 4 0.2 0.4 z /m 6 wheel rotation and radar position 0.2 y/m 8 0 x/m 10 0 -0.2 -0.2 -0.4 -0.4 spoke (as point target) radar (b) Top view simulation Figure 4.3: The positions of the radar and the bike wheel in two situations. Due to the axis settings, the size of the wheel looks different in both figures. plane and around the z-axis with radius rw = 0.3 m. The position of each point (Rwi(t))can be described as: Rwi(t) = [rwcos(Φi(t)), rwsin(Φi(t)), 0] Φi(t) = ωt+ 2πi/Nw, (0 ≤ i ≤ Nw − 1) (4.1) where ω is the angular speed (a typical bike wheel rotates at 80 rpm, which is about 8.38 rad/s) and rw is 0.3m. To show how the micro-Doppler signature changes with the aspect angle radar, the radar is placed on the y-axis ([0, r0, 0]) or on the z-axis ([0, 0, r0]). Where r0 is also 10 meters. Figure 4.3 shows these two situations. First, consider the case when the radar is located on the y-axis, so the variation in the radial speed and in the distance of the spokes is at maximum. Figure 4.4 shows the spectrograms of the wheel as the number of spokes increases. When there is one spoke rotating, the spectrogram looks like a sinusoidal curve. As the number of spokes increases, more sinusoidal curves appear with different phases. For 8 spokes, it is difficult to visually distinguish the corresponding eight sinusoids in the spectrogram. Second, when the radar is placed on the z-axis the variation of radial speed is zero 38 4. Modelling and Simulation number of spoke 1 0.2 0.4 0.6 0.8 1 1.2 Time (secs) 0 1 2 3 4 5 6 7 8 9 10 F re q u e n c y ( k H z ) -60 -55 -50 -45 -40 -35 -30 -25 P o w e r/ fr e q u e n c y ( d B /H z ) (a) number of spoke 2 0.2 0.4 0.6 0.8 1 1.2 Time (secs) 0 1 2 3 4 5 6 7 8 9 10 F re q u e n c y ( k H z ) -60 -55 -50 -45 -40 -35 -30 -25 -20 P o w e r/ fr e q u e n c y ( d B /H z ) (b) number of spoke 8 0.2 0.4 0.6 0.8 1 1.2 Time (secs) 0 1 2 3 4 5 6 7 8 9 10 F re q u e n c y ( k H z ) -60 -55 -50 -45 -40 -35 -30 -25 -20 P o w e r/ fr e q u e n c y ( d B /H z ) (c) Figure 4.4: Spectrograms of the wheel when the number of spokes is (a) one, (b) two, or (c) eight. and the distance is constant, which gives a flat curve in the spectrogram as shown in figure 4.5. No matter how many spokes there are, the result for this radar position is similar. 4.4.2 Experimental measurement The radar setting S3, as listed in table 4.1, is used in the measurement. A bicycle with two wheels (36 cm in radius) was placed up-side-down in front of the radar kit in the experiment (shown in figure 4.6a). Unlike the free space environment used in the 39 4. Modelling and Simulation number of spoke 1 0.2 0.4 0.6 0.8 1 1.2 Time (secs) 0 1 2 3 4 5 6 7 8 9 10 F re q u e n c y ( k H z ) -60 -55 -50 -45 -40 -35 -30 P o w e r/ fr e q u e n c y ( d B /H z ) Figure 4.5: Spectrogram of the wheel when the radar is placed at the z-axis. The wheel has one spoke. simulation, the measurement was conducted in a laboratory with many other objects such as walls and chairs. The radar’s height was around 50 cm and the distance between the radar and the bike was 3 m. This allowed the main beam to cover the whole rear wheel (but the ground reflection was inevitable). The radar sent two 0.75s chirps and between the chirps there was a pause due to serial communication delay. First, a sample was measured when the rear wheel was stationary and, then, another one is measured when it was rotating. The front wheel was kept stationary all the time. The DC component of the samples was suppressed by subtracting the mean. As shown in figure 4.6b, a curve with strong amplitude appear at low frequency when the wheel was stationary. And in figure 4.6c multiple horizontal curves span the spectrogram. Because the slope of the chirp was very small, the measurement result was similar to the one in [3], where a CW radar is used. 40 4. Modelling and Simulation (a) A photo of the measurement environment. (b) Spectrogram of the stationary bike wheel. (c) Spectrogram of the rotating bike wheel. Figure 4.6: Bike wheel measurement environment and spectrograms. 4.5 Walking human 4.5.1 Simulation The global human walking model proposed by Boulic et al. [20] is the kinetic model used in the simulation. The model is based on an empirical mathematical parame- terization using biomechanical experimental data instead of motion equations. The model is an average human walking model in which the information of personalized 41 4. Modelling and Simulation features is averaged out from measurements of a lot of individuals [8]. A MATLAB program that simulates the micro-Doppler effect of human walking with a pulse radar is implemented in [8]. Our simulation of a walking human is based on this program, with some modifications in the radar model. Human is a non-rigid body since the shape of the body changes during its movement. To simplify the target model, the human is modeled as jointly connected ellipsoids. Each ellipsoids independently reflect the incident wave and the radar receives the summation of the reflections. In this way, a non-rigid body problem is treated as multiple rigid body problems. There are 17 different joints that represent the base, thorax, head, left/right shoulder, left/right elbow, left/right hand, left/right hip, left/right knee, left/right ankle and left/right toes. 16 links for kinematics analysis are defined for the neck, torso, left/right shoulder, left/right upper arm, left/right lower arm, left/right hip, left/right upper leg, left/right lower leg and left/right foot. Figure 4.7 marks all the joints and links. In the high-frequency limit, closed form equations of RCS of perfect electric conducting ellipsoids can be found in [8]. The ellipsoids are connected as shown in figure 4.9. The lengths of the ellipsoids’ long axis are listed in [21] and marked in figure 4.8. The motion of the walking human can be decomposed into gait cycles. The gait cycle is defined as the period from one foot touching the ground to the same foot touching the ground again. The gait cycle can be decomposed into a sequence of small motions. Forward propulsion of the centre of gravity [22] is involved in the gait cycle. There are two phases in one gait cycle: the stance phase, when the foot remains in contact with the ground, and the swing phase: when the foot is not in contact with the ground. The model assumes that the human walks at a fixed frequency and speed. For each gait cycle, the kinematic model of the human is described by 12 trajectories, while the so-called Denavit–Hartenberg parameters are used to describe the links’ coordinates. Six of the trajectories are used to describe the six degrees of freedom of the human’s spindle, referred to as a base in the model. Five of the trajectories are used to describe the joint angles of the limbs, where the left and the right trajectories 42 4. Modelling and Simulation Torso left hipright hip left kneeright knee left ankleright ankle left toeright toe left shoulderright shoulder left elbowright elbow left handright hand thorax head head length neck length torso length hip length upper leg length lower leg length foot height thorax length upper arm length lower arm length Figure 4.7: The joints of the human model. are mirrored (given a suitable translation) with respect to the vertical plane that divides the body in two equal parts. The last trajectory is used to describe the rotation of the thorax. The 3-D trajectories of some joints are plotted in figure 4.10. To calculate the reflected signal of the ellipsoids, the trajectory and the RCS of the ellipsoids are required. The trajectory of ellipsoids’ center points can be calculated from the global human walking model. The RCS of the ellipsoids can be calculated from the relative position and orientation of the ellipsoids to the radar. Since we assume that each ellipsoid has an independent reflection, simulation of different body parts can be analyzed separately, i.e. shadowing is ignored. We assume that the human is 1.8 meters tall and walks towards the radar at a speed of 1.8 m/s. The radar is placed 10 meters away and 1.5 meters above the ground. Figure 4.11 shows the four simulation spectrograms during three gait cycles of four selected body parts of the walking human. The setting S1 of a high resolution radar in table corresponds to figure 4.11. The frequency in each spectrogram tends 43 4. Modelling and Simulation 0.288H 0.130H 0.052H 0.288H 0.191H 0.245H 0.246H 0.039H 0.259H 0.188H 0.145H Figure 4.8: The dimension of human parts. Figure 4.9: Human ellipsoid model. -1 -0.5 0 0 z /m 0.5 1 2 0.4 x/m 0.2 y/m 4 0 -0.2 6 -0.4 Base Head Left Elbow Left Hand Right Knee Right Toe Figure 4.10: Trajectories of human joints during a gait cycle. to decrease over time because the human is getting closer to the radar and the range beat frequency decreases. The left foot has a larger instantaneous speed and it generates a larger frequency variation as shown in the spectrogram. The torso moves steadily forward, so its frequency in the spectrogram tends to be steady. However, the actual radar kit cannot achieve such high resolution as in the simulation 44 4. Modelling and Simulation because it is unable to transmit successive chirps and the chirp duration cannot be smaller than 75 ms. In order to observe at least one full gait cycle which is roughly 1 second long, a long chirp is considered to be used in the simulation to model the actual measurement. The setting of a low resolution radar (S2 ) in table 4.1 is also considered. Figure 4.12 shows the results for a simulation with 750-millisecond-long chirps. The long chirp is modeled as 150 virtual short chirps, as described in section 2.1.3. Each short chirp is 5 ms long which is the same as in the previous simulation. Each short chirp has 1.67 MHz bandwidth so that the range resolution degrades to 90 m. The range beat frequency of the target is around 2 kHz in the previous high resolution simulation S1, while the range beat frequency of the same target is less than 100 Hz in current setting S2. Since the range resolution is very low, it is hard to extract the information about the variation in range from the spectrogram. However, the Doppler frequency is not affected by the low range resolution. In the previous simulation as seen in figure 4.13, the Doppler frequency is not distinct when the radar setting resolution is high. But in this low resolution case as seen in figure 4.12, the Doppler frequency is clear over the whole sample frequency range. Given the superposition of the reflection from every body part, the micro-Doppler signature of the whole human body is shown in figure 4.13 and 4.14 for S1 and S2 respectively. Figure 4.13 illustrates both the range and velocity variation. However, the radar setting cannot be used in the current hardware. Figure 4.14 is the simu- lation using the actual radar setting with a long chirp duration. It shows detailed micro-Doppler signature, clear gait cycle of the movement and almost no signal related to range variation. 4.5.2 Experimental measurement For the human measurement, the radar is configured according to the settings S3 in table 4.1. Before we plot the spectrograms, the mean of the radar signal is subtracted. Figure 4.15 shows the scenario when a person walks at a normal speed towards the radar from a distance of ten meters. Figure 4.16 shows the scenario 45 4. Modelling and Simulation when a person walks towards the radar without swinging the arms. Even though none of the measurement managed to capture a full gait cycle of the person as the simulation in figure 4.14, the measurements provide different micro-Doppler signatures according to the actual movements. Also, it should be noticed that the measurement results are similar to the simulation results. 46 4. Modelling and Simulation torso 1 2 3 4 Time (secs) 0 5 10 F re q u e n c y ( k H z ) left lower arm 1 2 3 4 Time (secs) 0 5 10 F re q u e n c y ( k H z ) left lower leg 1 2 3 4 Time (secs) 0 5 10 F re q u e n c y ( k H z ) left foot 1 2 3 4 Time (secs) 0 5 10 F re q u e n c y ( k H z ) Figure 4.11: The spectrogram generated by four body parts in high resolution FMCW radar. torso 1 2 3 Time (secs) 0 200 400 600 F re q u e n c y ( H z ) left lower-arm 1 2 3 Time (secs) 0 200 400 600 F re q u e n c y ( H z ) left lower-leg 1 2 3 Time (secs) 0 200 400 600 F re q u e n c y ( H z ) left foot 1 2 3 Time (secs) 0 200 400 600 F re q u e n c y ( H z ) Figure 4.12: The spectrogram generated by four body parts in low resolution FMCW radar. 47 4. Modelling and Simulation Micro-Doppler of a walking human with 600 short chirps 0.5 1 1.5 2 2.5 3 3.5 4 Time (secs) 0 2 4 6 8 10 F re q u e n c y ( k H z ) -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 P o w e r/ fr e q u e n c y ( d B /H z ) Figure 4.13: The spectrogram of a walking human with high resolution radar settings S1. Micro-Doppler of a walking human with 4 long chirp 0.5 1 1.5 2 2.5 3 3.5 Time (secs) 0 100 200 300 400 500 600 700 F re q u e n c y ( H z ) -60 -50 -40 -30 -20 P o w e r/ fr e q u e n c y ( d B /H z ) Figure 4.14: The spectrogram of a human walking with low resolution radar set- tings S2. 48 4. Modelling and Simulation Figure 4.15: Micro-Doppler measure- ment for human walking towards the radar with swinging arms. Figure 4.16: Micro-Doppler measure- ment for human walking towards the radar without swinging arms. 49 4. Modelling and Simulation 50 5 Classification With Neural Networks 5.1 Data measurement campaign Four different human activities are considered in the classification problem: walking while swinging arms (0), walking without swinging arms (1), boxing while standing still (2) and standing still (3). As discussed in chapter 4, the micro-Doppler signa- ture changes significantly with the radial velocity of the target. For a target or a body composed of many rigid parts, we assume that the motion is mainly in the direction of translation (or walking for a human). The angle between the direction of translation are the direction towards the radar is referred to as the aspect angle below. In order to explore the potential of deep learning algorithms to distinguish aspect angles, we measured these four activities at different aspect angles by chang- ing the orientation of the person: facing the radar with the front (0), with the back (1) and with the side (2). There are 12 classes (shown in table 5.1) of data measured in total. Besides, the data is evenly collected from two people. In the measurement campaign, the radar is configured to use settings S3 in table 4.1, i.e., the bandwidth is set to 250 MHz and each chirp is 750 ms. Each chirp has 1501 samples. Therefore, the sampling frequency of the radar is 2000 Hz. A MATLAB script is used to measure the data. The data is measured in a 50-chirp batch, which means the person is doing the same activity during each batch measurement. 51 5. Classification With Neural Networks Class name label (0-11) activities (0-3) direction (0-2) # of samples walk_wa_side 0 0 2 350 walk_wa_front 1 0 0 350 walk_wa_back 2 0 1 350 walk_woa_front 3 1 0 350 walk_woa_side 4 1 2 350 walk_woa_back 5 1 1 350 boxing_front 6 2 0 450 boxing_back 7 2 1 450 boxing_side 8 2 2 350 standing_front 9 3 0 350 standing_back 10 3 1 350 standing_side 11 3 2 350 Table 5.1: Class description and the number of samples. 1.5m 8.5 m Figure 5.1: Radar setup model for data measurement campaign. The measurement campaign is carried out in a laboratory shown in figure 5.3. This picture is taken while the human is walking without arms swinging and the radar is pointing at the back of the human, corresponding to label 5 in the table 5.1. A special 3D printed box, as shown in figure 5.2, is built to hold the radar on a 1.5 meters high tripod. The aperture of the horn antenna is oriented horizontally such that the main lobe is parallel to the floor of the laboratory. A person executes the activities around 8.5 meters away from the radar, which makes the main lobe illuminates the entire body, as shown in figure 5.1. There is no object blocking the line-of-sight path between the radar and the person. The indoor environment contains multiple large objects such as the walls surrounding the person, but they are 52 5. Classification With Neural Networks Figure 5.2: Radar kit assembly with a 3D printed box. Figure 5.3: Data measurement cam- paign environment. stationary and, thus, they do not induce any contribution to the Doppler frequency. They are not likely to produce high range beat frequency due to the long duration of the radar chirp. 5.2 Data preprocessing Inspired by [23] and natural language processing algorithms, the radar signal in time domain is converted to joint time-frequency domain by STFT to produce a spectro- gram. In addition, we use Python as programming language instead of MATLAB for the deep learning part of the project because of two reasons: first, powerful deep learning frameworks such as Tensorflow and Keras are available in Python; second, small embedded devices such as Raspberry Pi are able to run light-weight deep learning models written in Python. Therefore, the data measured in the previous section is converted from MATLAB format to Numpy archive which is a Python package for the manipulation of arrays and matrices. The Numpy archive contains four data entities for each sample: the time-domain raw data, two spectrograms and one class label. The three-step preprocessing is: 53 5. Classification With Neural Networks 1. The DC component is suppressed by subtracting each chirp’s mean. As shown in chapter 3 where the DC component is removed, the SNR of Doppler signa- ture is significantly higher and, thus, it is easier to distinguish. 2. Two spectrograms with different STFT window lengths are calculated for each chirp. 3. The samples are shuffled and divided into two sets. The training set contained 85% of the samples and the test set contained the rest. The first kind of spectrogram is generated by the STFT with the window length of 64 and the sample overlap of 16, and its dimension is 33 × 33. The other kind of spectrogram has the window length of 128 and the window overlaps of 32, which results in a spectrogram of dimension 17 × 65. The window is a Hanning window. Figure A.1 and A.2 in the appendix A show examples from each class. The spec- trogram with a window length of 128 has a high frequency resolution and a low time resolution, whereas the other one has high time resolution and low frequency resolution. We will show that using two kinds of spectrograms together can improve the classification accuracy in a CNN based neural network. 5.3 Neural networks structure and classification results We use five neural networks to solve the classification problem, three are CNN based and two are RNN based. Among the CNN based neural networks, three kinds of networks are tested. The first two use only one kind of spectrogram (either a window length of 64 or 128). The third one uses both these spectrograms simul- taneously and the accuracy is significantly higher. Among the RNN based neural networks, bidirectional GRU (Gated Recurrent Unit) [24] and bidirectional LSTM (Long Short-term Memory) [24] are utilized. Among all neural network models, we use the CNN called Double input CNN which uses two spectrograms and it achieves 54 5. Classification With Neural Networks the highest test accuracy (90.15%), followed by bidirectional RNN with GRU cell (89.55%) or LSTM (89.39%) cell. For all neural networks, batch normalization is used to reduce covariance shift [25] and dropout layers and early stopping are used to prevent overfitting. Adadelta optimization [26] is used in a stochastic gradient descent. Generally, each model is trained for 200 to 700 epochs. The weights of the model are saved after each epoch only if the loss on the test set is reduced. The model with the highest test accuracy is selected to represent the best performance of the model. 5.3.1 CNN based models 5.3.1.1 Neural network structure In this part, we start with a shallow CNN neural network shown in figure 5.4, which has three convolution layers and one fully connected layer. Note that the batch normalization layers and dropout layers are omitted in figure. This shallow model is able to achieve 97% test accuracy on MNIST handwritten digit database [27], and the database has similar input and output shape as our problem. Then, we extend this model to give it two inputs to utilize two kinds of spectrograms (shown in figure 5.5). All the detailed model structures are listed in appendix B. 5.3.1.2 Performance model win. length train acc. test acc. train loss test loss epochs small CNN 64 0.8604 0.8727 0.3950 0.3318 240 small CNN 128 0.8543 0.8606 0.4267 0.3884 400 Double input CNN both 0.9115 0.9015 0.2296 0.3112 170 Table 5.2: Performance comparison between small CNN unit and Double input CNN (12 class problem). 55 5. Classification With Neural Networks Figure 5.4: The structure of the small CNN unit. Figure 5.5: The structure of the Double input CNN which is built by two small CNN units. Table 5.2 shows the performance of the three CNN based models. In figure 5.6, the accuracy and loss of the model are plotted. When the epoch reaches 200, the loss and test accuracy stop improving while the performance on the training set is still becoming better, which is a sign of overfitting. The model trained after 170 epochs is selected at last. The test accuracy of Double input CNN is 3 to 4 percent higher than the other two models that only use one type of spectrogram. Figure 5.6: The plot of training history of Double input CNN. 56 5. Classification With Neural Networks 5.3.1.3 Error analysis The error analysis of the model is carried out by calculating the confusion matrix from the prediction of the test set. We also pick some mislabeled examples to show why the model makes mistakes. In the confusion matrix of the Double input CNN shown in figure 5.7, it is apparent that the accuracy on the diagonal is either way better than the average accuracy (90.15%) or far worse than the average. For exam- ple, the Double input CNN is very likely to mistake the activity "walk_woa_side" with "walk_wa_side" or "standing_front". wa lk_ wa _fr on t wa lk_ wa _b ac k wa lk_ wa _s ide wa lk_ wo a_ fro nt wa lk_ wo a_ ba ck wa lk_ wo a_ sid e bo xin g_ fro nt bo xin g_ ba ck bo xin g_ sid e sta nd ing _fr on t sta nd ing _b ac k sta nd ing _s ide Predicted label walk_wa_front walk_wa_back walk_wa_side walk_woa_front walk_woa_back walk_woa_side boxing_front boxing_back boxing_side standing_front standing_back standing_side Tr ue la be l 0.73 0.25 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 0.00 0.00 0.05 0.00 0.02 0.07 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.02 0.77 0.04 0.00 0.08 0.00 0.00 0.00 0.06 0.00 0.00 0.12 0.00 0.02 0.66 0.00 0.02 0.03 0.12 0.02 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.04 0.08 0.03 0.77 0.00 0.03 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.98 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.95 small cnn double input, n_classes = 12 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.7: Confusion matrix of Double input CNN (12-class problem). Table 5.3 lists some commonly mislabeled classes with the error ratio taken from the confusion matrix. We also summarize our findings as follows: 57 5. Classification With Neural Networks True label Classified as Error ratio (%) walk_wa_front walk_wa_back 25 walk_woa_side walk_wa_side 12 walk_woa_side standing_front 12 boxing_back walk_woa_side 8 walk_woa_back boxing_back 8 Table 5.3: Some common mistakes made by Double input CNN. 1. When the person is standing still, the model has a surprisingly high accuracy in detecting the direction despite the absence of Doppler frequency. 2. Among the activities that are not standing still, the model is also able to correctly identify the activities when the person faces the radar with the front, except for walk_wa_front. 3. When the person faces the radar with the side and s/he is not standing still, the model is prone to make mistakes except for boxing_side. The reason is that the radial velocity of the movement is tangential and very little Doppler frequency can be observed. 4. While the person faces the radar with the back and s/he is not standing still, the model has high accuracy only in walk_wa_back. Next, we take a closer look at the mislabeled samples. On the left of figure 5.8, a mislabeled example belonging to "walk_wa_front" is shown, but it is classified as "walk_wa_back" (most frequent error listed in table 5.3). On the right side, there are four correctly labeled examples from the test set. Because of low frequency res- olution and the very slight difference between the two activities, the micro-Doppler signatures look very similar. It may be difficult for the neural network to learn the minor difference given limited training samples. From the confusion matrix, we can infer what kinds of physical information the model makes use of. First, the model uses the variation in radial speed when the 58 5. Classification With Neural Networks 0.0 0.2 0.4 0.6 t /s 0 200 400 600 800 1000 f/ H z walk_wa_front -> walk_wa_back 0.0 0.5 t /s 0 200 400 600 800 1000 f/ H z walk_wa_front 0.0 0.5 t /s 0 200 400 600 800 1000 f/ H z walk_wa_front 0.0 0.5 t /s 0 200 400 600 800 1000 f/ H z walk_wa_back 0.0 0.5 t /s 0 200 400 600 800 1000 f/ H z walk_wa_back m islabeled exam ple vs correct exam ples Figure 5.8: The example of a mislabeled sample. person faces the radar with the front. Second, the model uses the change in RCS. Although "boxing_side" has little radial speed variation because the movement is perpendicular to the radial direction as viewed from the radar, the model correctly labels every example of this class in the test set. We examine some samples from the activities from "boxing_side" and they look very similar to standing still. Because the RCS is changing when boxing, the model can distinguish it from standing still. 5.3.1.4 The influence of DC component The same structure of Double input CNN is trained on spectrograms of which the DC component is not subtracted. The model easily overfits the data and it achieves 90% in training accuracy but only 85% in test accuracy. Thus, suppression of the DC component improves the performance of neural network significantly. 59 5. Classification With Neural Networks 5.3.2 RNN based neural networks RNN based neural networks are widely applied to sequential data such as speech and other types of sound. The micro-Doppler signature is presented by a spectrogram, which is essentially sequential data that describes how the spectrum of the signal varies with time. We test two RNN neural networks and their structures are shown in figure 5.9. The dropout layers and batch normalization layers are omitted in this figure. They have the same structure except for the RNN unit. Figure 5.9: RNN based neural networks’ structure. Input: spectrogram of window length 64 or 128; conv1d: 24 one dimensional convolutions with filter size 3; RNN unit: either GRU or LSTM unit with output size 24; avg: the forward output and backward output of bidirectional RNN units are averaged; Dense: two fully connected layers. Dashed lines represent recursive connection. 5.3.2.1 Performance The accuracy of RNN based neural networks shown in table 5.4 is very close to Double input CNN. However, the loss is higher than Double input CNN, which means RNN based neural networks has a lower confidence level. We also implement the RNN with two inputs and tried to benefit from both the high time resolution 60 5. Classification With Neural Networks and the high frequency resolution as Double input CNN, but the accuracy is only around 80%. RNN unit train acc. test acc. train loss test loss epochs GRU 0.8816 0.8955 0.4112 0.4054 633 LSTM 0.8701 0.8939 0.4342 0.4259 466 Table 5.4: RNN based neural networks performance. 5.3.2.2 Error analysis The confusion matrices of the RNN based neural networks are similar to the one of double input CNN. However, the most likely error made by double input CNN does not happen to RNN models. The confusion matrices are shown in figure 5.10 and figure 5.11. 5.4 Predict only the activity (4-class problem) In previous discussion we try to solve a 12-class problem, with around 300 training samples for each class. However, in some applications such as fall detection, people are only interested in predicting the type of activities (such as walking normally or falling down) rather than the aspect angle of the target. Therefore, we also modify the output layer’s size of the neural network to four, and the output is then the prediction of the activity type. In this case, the number of training samples is increased to 900 for each class. We trained a small CNN unit and a Double input CNN to predict the activity type, but the accuracy is not significantly increased even if the training set size is tripled for each class. In Table 5.5, the loss and accuracy are listed. Comparing the result of 12-class problem (table 5.2) with the result of the 4-class problem (table 5.5), we can notice an improvement in the accuracy of the small CNN unit. However, the overall accuracy is still around 90%. 61 5. Classification With Neural Networks wa lk_ wa _fr on t wa lk_ wa _b ac k wa lk_ wa _s ide wa lk_ wo a_ fro nt wa lk_ wo a_ ba ck wa lk_ wo a_ sid e bo xin g_ fro nt bo xin g_ ba ck bo xin g_ sid e sta nd ing _fr on t sta nd ing _b ac k sta nd ing _s ide Predicted label walk_wa_front walk_wa_back walk_wa_side walk_woa_front walk_woa_back walk_woa_side boxing_front boxing_back boxing_side standing_front standing_back standing_side Tr ue la be l 0.94 0.04 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.16 0.84 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.70 0.00 0.09 0.11 0.00 0.05 0.05 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.83 0.02 0.00 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.03 0.83 0.00 0.02 0.02 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.04 0.07 0.81 0.00 0.00 0.04 0.01 0.00 0.00 0.00 0.02 0.02 0.00 0.02 0.00 0.92 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.93 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00 0.96 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.96 Confusion matrix 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.10: Confusion matrix of bidirectional GRU model. The confusion matrix of Double input CNN on the 4-class problem is shown in figure 5.12. The accuracy on prediction the activity "standing_still" is much higher than the other three activities. So the model still has bottlenecks in certain activities. 5.5 Tuning parameters of neural networks We mentioned in section 5.3 that the dropout layers and the early stopping are used for preventing the model from overfitting. The goal of tuning the dropout factor and epoch number is to reduce the gap between the training accuracy and test accuracy. 62 5. Classification With Neural Networks wa lk_ wa _fr on t wa lk_ wa _b ac k wa lk_ wa _s ide wa lk_ wo a_ fro nt wa lk_ wo a_ ba ck wa lk_ wo a_ sid e bo xin g_ fro nt bo xin g_ ba ck bo xin g_ sid e sta nd ing _fr on t sta nd ing _b ac k sta nd ing _s ide Predicted label walk_wa_front walk_wa_back walk_wa_side walk_woa_front walk_woa_back walk_woa_side boxing_front boxing_back boxing_side standing_front standing_back standing_side Tr ue la be l 0.94 0.04 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.07 0.93 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.84 0.00 0.00 0.09 0.00 0.00 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.98 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.85 0.04 0.00 0.08 0.00 0.00 0.00 0.00 0.00 0.00 0.15 0.00 0.03 0.69 0.00 0.02 0.03 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.03 0.07 0.05 0.76 0.01 0.03 0.04 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.02 0.92 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.95 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00 0.02 0.04 0.00 0.91 Confusion matrix 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.11: Confusion matrix of bidirectional LSTM model. The reason is discussed below. In a supervised machine learning problem, the Bayes optimal accuracy is the the- oretical upper bound that any machine learning model can possibly achieve [28]. The Bayes optimal accuracy basically uses the conditional probability density of the target (or label) given the observation. In our thesis, the target is the type of the activity whose the observation is a spectrogram. However, in most real-world prob- lems like ours, this conditional probability is unknown so that the accuracy upper bound is also unknown. It is possible to use human-level accuracy as an estimation of the Bayes optimal accuracy in some problem, e.x. image classification. Usually 63 5. Classification With Neural Networks Figure 5.12: Confusion matrix of Double input CNN (4-class problem). 64 5. Classification With Neural Networks model win. length train acc. test acc. train loss test loss epochs small CNN 64 0.8963 0.9045 0.2749 0.2807 260 small CNN 128 0.9160 0.8984 0.2379 0.2573 240 Double input CNN both 0.9147 0.9090 0.2313 0.2637 100 Table 5.5: Performance comparison between small CNN unit and Double input CNN (4-class problem). the training accuracy is higher than test accuracy. A special case occurs if dropout layers are used. In our results shown in table 5.5, the test accuracy is higher because dropout layers are active during training but they are disabled during the evaluation on the test set. Thus, should the dropout layers be disabled on the training set, training accuracy would be higher. A model underfits the data when the avoidable bias is high, while it is overfits the data when the variance is high. In the table 5.6, we list some examples of the status of a machine learning model. We also show an example of an overfitting model in figure 5.13. human-level acc. 0.91 model train acc. test acc. bias variance status 1 0.80 0.80 high low underfitting 2 0.99 0.80 low high overfitting 3 0.90 0.89 low low acceptable Table 5.6: Examples of the status of a machine learning model. If human-level accuracy is available, the goal would be to make the model’s training accuracy approach or even surpass the human-level accuracy while keeping the vari- ance small. However, the human-level accuracy is also unknown in this work. As a result, we first try to train some models with a rather large capacity (many layers and parameters) and get an overfitting model. Then, we estimate the highest test accuracy as a proxy to Bayes optimal accuracy. We find that the training accuracy can easily approach 100% because it is an overfitting model, but the test accuracy is always around 90% or lower. After that, we reduce the model capacity by removing 65 5. Classification With Neural Networks Figure 5.13: Training history of an overfitting model based on Double input CNN when the data is divided into four different classes. layers or reducing the size of layers until the variance is reduced to an acceptable level. Next, we test different dropout factors until we find a model with a training accuracy and a test accuracy that are both close to 90% regardless of how long it is trained. Finally, the training stops at the epoch when the test accuracy is at its highest value. We mitigate problems associated with overfitting in this way. 66 6 Discussion 6.1 Limitations Although the trained model achieves the test accuracy of around 90%, many prob- lems remain unsolved because of some limitations. In this section, we discuss these problems and their cause. 6.1.1 Overfitting It is possible that the overfitting occurs because the dataset is divided into only two sets, which are a training set and a test set. A better way to divide the dataset is to create yet another set, which is referred to as the development set. With three sets, we can train the neural network on the training set while observing the accuracy on the development set. The test set should not be seen by the neural network until it does well on the development set. This might avoid overfitting. However, the dataset only contains 4400 samples in total. Thus, we made a compromise such that the size of the training set is kept relatively large by not using any samples to create another set. 67 6. Discussion 6.1.2 Not enough data With more data, the neural network models may bring us closer to a better classi- fication result. The performance of neural networks relies heavily on the number of training data. 4400 is a rather small data set for training neural networks, which forces us to use shallow neural networks to avoid overfitting. Thus, it is worth trying to acquire more data in order to get higher accuracy. 6.1.3 Changes of the environment All the samples in the dataset are collected in the same environment. Several large metal objects are close to the measurement subject and it is possible for the radar to measure the second and third reflection from th