Acoustic Signal Analysis and Feature-
Based Classification of BOAS
For the Health and Welfare of Brachycephalic Dogs

Master’s thesis in Biomedical Engineering

JENNIE BERNDTSON

Department of Physics

CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2025
www.chalmers.se

www.chalmers.se


Master’s thesis 2025

Acoustic Signal Analysis and
Feature-Based Classification of BOAS

For the Health and Welfare of Brachycephalic Dogs

JENNIE BERNDTSON

Department of Physics
Division of Material Physics

Chalmers University of Technology
Gothenburg, Sweden 2025


Acoustic Signal Analysis and Feature-Based Classification of BOAS
For the Health and Welfare of Brachycephalic Dogs
JENNIE BERNDTSON

© JENNIE BERNDTSON, 2025.

Supervisor & Examiner: Magnus Karlsteen, Department of Physics

Master’s Thesis 2025
Department of Physics
Division of Material Physics
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: A close-up of a French bulldog, CC0 1.0 [1].

Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria
Printed by Chalmers Reproservice
Gothenburg, Sweden 2025

iv

https://creativecommons.org/publicdomain/zero/1.0/deed.en


Acoustic Signal Analysis and Feature-Based Classification of BOAS
For the Health and Welfare of Brachycephalic Dogs
JENNIE BERNDTSON
Department of Physics
Chalmers University of Technology

Abstract
This thesis examines a feature-based approach for classifying Brachycephalic Ob-
structive Airway Syndrome (BOAS) in dogs using acoustic signal analysis and ma-
chine learning. Audio recordings of dogs breathing, collected both before and after
physical exercise, were preprocessed through normalization, filtering, and data aug-
mentation techniques to enhance signal quality. Features were extracted using the
openSMILE toolkit and refined through statistical tests, notably the Mann-Whitney
U-test, to identify those most indicative of BOAS severity. Two modeling strate-
gies were employed: separate classifiers for pre- and post-exercise recordings and
a hybrid model that incorporates both. The hybrid model, trained using decision
tree-based methods including Random Forest and XGBoost, demonstrated superior
performance, achieving an AUC of 1.0 and an average prediction confidence of 88.5%
when evaluated on an unseen dataset of five dogs. Although more data is needed
to ensure the model’s reliability and generalization to unseen data, these findings
highlight the potential of a feature-based tool as a practical and accessible option
for BOAS classification, thereby improving the health and welfare of brachycephalic
dogs.

v


Acknowledgments
First and foremost, I would like to thank Johan Thorell, a Licensed Veterinarian at
Hallands Djursjukhus Slöinge, for allowing me to participate in his functional grading
tests with four French Bulldogs. I also appreciate Gunilla Mattsson, Clinic Manager,
and Henrik Hedberg, Licensed Veterinarian at Viskadalens Djurklinik Evidensia, for
allowing me to participate in their grading test of a pug. Also, a big thanks to the
dog owners for allowing me to collect the data. This project would not have been
the same without your willingness to contribute to this research.

Thank you to my supervisor and examiner, Magnus Karlsteen, for all the support,
helpful advice, and encouragement throughout the project. Also, thanks to Tim
Pagrell for being such a great sounding board while working on a similar thesis.

Special thanks to Isabella Sykkö for her support when starting the project and her
earlier work in gathering data, which provided a great starting point for this project.
Finally, thanks to Maria Dimopoulou at SLU for your important work in improving
the health of brachycephalic dogs, and for your help in gathering the dogs and data
that made this project possible.

Jennie Berndtson, Gothenburg, June 2025

vii


viii


Definitions & Acronyms

BOAS Brachycephalic Obstructive Airway Syndrome
Brachycephalic Means shortened head, used to describe dog breeds with a flat face
RFG-Scheme Respiratory Function Grading Scheme
BOAS-negative When the result from the RFG-Scheme is 0 or 1
BOAS-positive When the result from the RFG-Scheme is 2 or 3
ROC Receiver Operating Characteristic
AUC Area Under the ROC
FFT Fast Fourier Transform
RMS Root Mean Square
openSMILE A toolkit used for extracting audio features
XGBoost Extreme Gradient Boosting

ix


x


Contents

Definitions & Acronyms viii

List of Figures xiii

List of Tables xv

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Theory 5
2.1 Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Feature Extraction Using OpenSMILE . . . . . . . . . . . . . . . . . 6
2.3 Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3.1 Pearson and Spearman . . . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Mann-Whitney U-test . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.3 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Methods 11
3.1 Audio Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.3 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5.1 Model Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

xi


Contents

3.5.2 Training and Evaluating the Classifier . . . . . . . . . . . . . 17

4 Results and Analysis 19
4.1 Audio Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.3 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Discussion 31
5.1 Answering the Research Questions . . . . . . . . . . . . . . . . . . . . 31
5.2 Model Performance and Limitations . . . . . . . . . . . . . . . . . . . 33
5.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6 Conclusion 35

References 37

A Results of Model Setup 1 I

B Results of Model Setup 2 V

xii


List of Figures

2.1 Architecture of a Random Forest Model. . . . . . . . . . . . . . . . . 8
2.2 The ROC curve (blue line) shows the trade-off between TPR and

FPR. The shaded area represents the AUC. . . . . . . . . . . . . . . 9
2.3 Illustration of k-fold cross-validation. . . . . . . . . . . . . . . . . . . 9

3.1 Pipeline of the methodology. . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Processes when performing model Setup 1 and Setup 2. . . . . . . . . 16

4.1 Time-domain plots of audio recordings of a dog’s breathing, showing
the original (blue) and Peak normalized (orange) signals. Plot (a)
shows quiet breathing, while plot (b) shows loud breathing. . . . . . . 21

4.2 Time-domain plots of audio recordings of a dog’s breathing, showing
the original (blue) and RMS normalized (orange) signals. Plot (a)
shows quiet breathing, while plot (b) shows loud breathing. . . . . . . 21

4.3 Frequency-domain plot of a single breath recorded (a) before exercise
and (b) after exercise. . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.4 Frequency-domain plot of a recorded (a) wind noise and (b) a door
opening in the background. . . . . . . . . . . . . . . . . . . . . . . . 22

4.5 Frequency-domain plot of a (a) single beep tone and (b) the original
recording where the beep occurs. . . . . . . . . . . . . . . . . . . . . 23

4.6 Frequency-domain plot of the same sound recorded during the same
time but with two different phones. . . . . . . . . . . . . . . . . . . . 24

4.7 Frequency-domain plot of a simultaneous recording using (a) OnePlus
and (b) Samsung, with the estimated cutoff frequencies for both the
high-pass and low-pass filters marked by red dotted lines. . . . . . . . 25

4.8 Representation of Jitter and Shimmer perturbation measures in a
speech signal [44] CC BY-NC-ND 3.0. . . . . . . . . . . . . . . . . . . 27

xiii

https://creativecommons.org/licenses/by-nc-nd/3.0/


List of Figures

xiv


List of Tables

3.1 Distribution of dogs by BOAS grade in the first dataset. . . . . . . . 12

4.1 Distribution of dogs by BOAS grade in the new dataset. . . . . . . . 20
4.2 Results of training a Random Forest classifier using cross-validation,

applied to pre- and post-exercise data under three preprocessing set-
tings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Distribution of openSMILE features among 100 Mann-Whitney fea-
tures for non-preprocessed data. . . . . . . . . . . . . . . . . . . . . . 26

4.4 Classification results of the OnePlus recordings preprocessed using
data augmentation, high-pass filtering, and low-pass filtering, and
classified using XGBoost. . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.5 Classification results of the Samsung recordings preprocessed using
RMS normalization, data augmentation, high-pass filtering, and low-
pass filtering, and classified using XGBoost. . . . . . . . . . . . . . . 29

A.1 Model performance for different preprocessing settings of Pre- and
Post-exercise data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . I

A.2 Pre-exercise recordings without preprocessing . . . . . . . . . . . . II
A.3 Post-exercise recordings without preprocessing . . . . . . . . . . . . II
A.4 The mean of the predicted probability of the correct class for the

pre-exercise and post-exercise models. . . . . . . . . . . . . . . . . . . II
A.5 High-pass filtered pre-exercise recordings with an estimated cutoff

threshold of 5%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
A.6 High-pass filtered post-exercise recordings with an estimated cutoff

threshold of 5%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
A.7 The mean of the predicted probability of the correct class for the

pre-exercise and post-exercise models. . . . . . . . . . . . . . . . . . . III

B.1 Model performance for different preprocessing settings. . . . . . . . . V
B.2 Original data without preprocessing. . . . . . . . . . . . . . . . . . . V
B.3 RMS normalized signals. . . . . . . . . . . . . . . . . . . . . . . . . . V
B.4 High-pass filtered with an estimated cutoff threshold of 5%. . . . . . VI
B.5 High-pass filtered with an estimated cutoff threshold of 10%. . . . . . VI
B.6 High-pass filtered with an estimated cutoff threshold of 15%. . . . . . VI
B.7 High-pass filtered with an estimated cutoff threshold of 20%. . . . . . VI
B.8 RMS normalized and high-pass filtered with an estimated cutoff thresh-

old of 15%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI

xv


List of Tables

B.9 Data augmentation by randomly amplifying and reducing the ampli-
tude with 10% and high-pass filtered with an estimated cutoff thresh-
old of 15%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII

B.10 Data augmentation by randomly amplifying and reducing the ampli-
tude with 10% and high-pass filtered with an estimated cutoff thresh-
old of 15% and RMS normalized. . . . . . . . . . . . . . . . . . . . . VII

B.11 Data augmentation by randomly amplifying the amplitude with 15%
and reducing the amplitude with 5% and high-pass filtered with an
estimated cutoff threshold of 10% and RMS normalized. . . . . . . . VII

B.12 Data augmentation by randomly amplifying the amplitude with 15%
and reducing the amplitude with 5% and high-pass filtered with an
estimated cutoff threshold of 15% and RMS normalized. . . . . . . . VII

B.13 High-pass filtered with an estimated cutoff threshold of 15% . . . . . VIII
B.14 High-pass filtered with an estimated cutoff threshold of 15% and RMS

normalized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII
B.15 Data augmentation by randomly amplifying and reducing the ampli-

tude with 10% and high-pass filtered with an estimated cutoff thresh-
old of 15% and RMS normalized . . . . . . . . . . . . . . . . . . . . . VIII

B.16 Data augmentation by randomly amplifying the amplitude with 15%
and reducing the amplitude with 5% and high-pass filtered with an
estimated cutoff threshold of 15% and RMS normalized . . . . . . . . VIII

B.17 Classification of the OnePlus recordings. . . . . . . . . . . . . . . . . IX
B.18 Classification of the Samsung recordings. . . . . . . . . . . . . . . . . IX
B.19 Classification of the OnePlus recordings. . . . . . . . . . . . . . . . . IX
B.20 Classification of the Samsung recordings. . . . . . . . . . . . . . . . . IX
B.21 Classification of the OnePlus recordings. . . . . . . . . . . . . . . . . X
B.22 Classification of the Samsung recordings. . . . . . . . . . . . . . . . . X
B.23 Classification of the OnePlus recordings. . . . . . . . . . . . . . . . . X
B.24 Classification of the Samsung recordings. . . . . . . . . . . . . . . . . X
B.25 Classification of the OnePlus recordings. . . . . . . . . . . . . . . . . XI
B.26 Classification of the Samsung recordings. . . . . . . . . . . . . . . . . XI

xvi


1
Introduction

Brachycephalic obstructive airway syndrome (BOAS) is a chronic and lifelong patho-
logical condition that significantly impairs breathing and reduces the quality of life
in many popular dog breeds [2]. BOAS primarily affects brachycephalic breeds,
those with characteristically shortened skulls, such as Bulldogs, French Bulldogs,
Pugs, and Boston Terriers [3], [4]. These breeds are increasingly popular among dog
owners, particularly in Western countries.

Several reports have documented a marked rise in the popularity of brachycephalic
breeds over recent years. For example, a UK pet insurance company reported that
registrations of French Bulldogs alone have increased by more than 500% over the
past decade [5]. As the demand for these breeds increases, so does the number of
affected dogs, resulting in a corresponding rise in cases of BOAS and associated
health complications. This highlights the urgent need for accessible and reliable
methods to assess the severity of this condition.

To address this, the University of Cambridge has developed a Respiratory Function
Grading Scheme (RFG-Scheme) in which a dog undergoes a comprehensive veteri-
nary examination [6]. The evaluation includes an auditory assessment of the dog’s
breathing before and after a three-minute trotting exercise. Based on the findings,
the dog is assigned a severity grade ranging from 0 to 3. This grading system has
proven effective in identifying the severity of BOAS and guiding treatment decisions.

However, when this project was proposed approximately five years ago, the Cam-
bridge Functional Grading Test faced a significant accessibility barrier. At that
time, only veterinarians who had completed a certification course at the University
of Cambridge were authorized to conduct the test. This requirement meant that
very few professionals in each country were qualified to perform the assessment,
thereby limiting its practical use on a broader scale. Also, as of early 2025, the
Swedish Kennel Club requires all brachycephalic dogs to pass this test to be eligible
for breeding, further increasing the demand for accessible testing solutions [7]. This
presented a clear gap in diagnosing BOAS.

This project aims to address this gap by developing a classification system for BOAS-
related breathing sounds in dogs, modeled on the principles of the Cambridge grad-
ing scale but designed to operate independently of a certified veterinarian. By in-
creasing accessibility to BOAS assessment, this approach could contribute to earlier
diagnosis and, ultimately, improved welfare for brachycephalic dogs.

1


1. Introduction

The expected outcome of this project is a feature set that effectively represents the
relevant characteristics of the breathing data, enabling a model to classify dogs as
BOAS-negative or BOAS-positive with high probability and reliability.

1.1 Background
In some moderately affected dogs, clinical signs of BOAS may not be apparent while
the dog is at rest. To more accurately assess the condition, the functional grading
test is used [6]. This test provides a standardized evaluation of the dog’s respiratory
function under light physical stress.

The procedure begins with the veterinarian listening to the dog’s breathing using
a stethoscope while the dog is at rest. The dog is then assigned to perform a light
exercise, typically a trotting task intended to cover 400 meters in three minutes,
which corresponds to a light run. Immediately following the exercise, the veteri-
narian listens again to the dog’s breathing. Additionally, the veterinarian observes
the dog’s nostrils for signs of stenosis (narrowing). However, the appearance of the
nostrils is not included in the final grading. According to Maria Dimopoulou, a
doctoral candidate and clinical veterinarian at the Department of Clinical Sciences
at the Swedish University of Agricultural Sciences (SLU), the nostrils may appear
narrower when a dog is stressed, such as during a clinical evaluation, making the
nostrils an unreliable indicator in this context.

As mentioned, the dog’s condition is then graded from 0-3 in the RFG-Scheme, and
the Department of Veterinary Medicine at Cambridge University defines the BOAS
grades as [6]:

• Grade 0 - BOAS free; annual health check is suggested if the dog is under 2
years old.

• Grade 1 - clinically unaffected but with mild respiratory signs, an annual health
check is suggested if the dog is under 3 years old.

• Grade 2 - BOAS affected with moderate respiratory signs. The dog has a clin-
ically relevant disease and requires management, including weight loss and/or
surgical intervention.

• Grade 3 - BOAS affected with severe respiratory signs. The dog should undergo
a thorough veterinary examination, including possible surgical intervention.

Where grades 0 and 1 are considered BOAS-negative, and grades 2 and 3 are BOAS-
positive.

Dogs showing symptoms should undergo a comprehensive veterinary evaluation,
which may include surgical intervention [8]. However, diagnosing and grading BOAS
can be challenging due to limited access to specialized veterinarians and the sub-
jective nature of assessments, which depend heavily on the veterinarian’s expertise
and experience. Therefore, a classification model is needed to provide veterinarians
with valuable support in their decision-making process.

2


1. Introduction

1.2 Aim
This project aims to collect audio data from brachycephalic dogs and apply signal
processing techniques to extract features that strongly correlate with the breathing
characteristics of phone recordings from BOAS-negative and BOAS-positive dogs.
These features will then be used to train a classification model to determine whether
a dog is BOAS-negative or BOAS-positive.

1.3 Related Work
There have been two previous master’s theses focused on classifying BOAS using
machine learning [9], [10]. Both projects aimed to develop models based on spec-
trogram representations of audio signals using a shared dataset recorded with a
dictaphone and a digital stethoscope. However, neither study applied extensive
signal processing to enhance the signal characteristics, and they trained separate
models for pre- and post-exercise recordings. Both reported poor performance on
the pre-exercise data. To address this limitation, this thesis examines whether a
hybrid model can outperform separate models, particularly for the challenging pre-
exercise data. Both also reported problems with overfitting, which is common when
having small datasets. Therefore, decision tree classifiers are used as they reduce
the risk of overfitting [11].

Building on these theses, researchers from Chalmers and SLU investigated the use of
signal analysis for BOAS severity assessment [12]. Using digital stethoscope record-
ings, they extracted seven features from frequency-transformed audio segments and
evaluated them with ANOVA and ROC curves. Their work emphasized feature-
based analysis rather than model complexity. This thesis builds on that direction
by focusing on signal preprocessing and feature extraction to identify parameters
that better capture the distinctions between BOAS-negative and BOAS-positive
dogs. Unlike the earlier theses, spectrograms will not be used.

More recently, Isabella Sykkö at Chalmers introduced the use of smartphones for
audio recording, replacing the earlier hardware tools. The main part of the dataset
used in this thesis was collected during her collaboration with Dimopoulou at SLU.
Sykkö applied basic feature extraction and simple models, such as logistic regression,
and developed a prototype smartphone app for practical use. However, neither her
features nor the app are included in this work.

Another related scientific article used the openSMILE toolkit for audio feature ex-
traction [13]. OpenSMILE will also be the primary tool for feature extraction in
this thesis.

Finally, a closely related work is Tim Pagrell’s master’s thesis, where he aims to
develop a similar model [14]. However, in addition to a feature-based approach, he
also investigates the use of spectrograms to represent the data, a method similar to
that employed in previous master’s thesis works. He is also investigating the most
appropriate machine learning model for the intended task.

3


1. Introduction

1.4 Limitations
This project does not aim to compare different machine learning models in depth.
However, it will include the use of two decision tree-based classifiers and a brief
discussion of other models that may be suitable for this type of data.

The dataset used in this thesis will not include audio recordings from earlier theses,
as those were not recorded using smartphones. Instead, the primary dataset will
consist of recordings collected during Sykkö’s work, supplemented by a small number
of new recordings gathered during this project.

The prototype app developed in earlier work will not be used or further developed
as part of this thesis.

1.5 Research Questions
List of questions that the thesis will answer:

• What signal processing methods enhance the breathing characteristics of a
phone recording of a BOAS-negative and BOAS-positive dog?

• Which features show the strongest correlation with BOAS-negative and BOAS-
positive dog samples?

• Is it more suitable to use separate models for pre- and post-exercise recordings
for the intended task, or a hybrid model that incorporates both?

• Do the audio recordings differ between phones?

4


2
Theory

This section provides the theoretical background for the concepts and methods em-
ployed throughout the thesis. It covers signal processing tools, the feature extrac-
tion method, statistical tests, and relevant information on the machine learning
techniques applied in the study.

2.1 Signal Processing
This section provides information on normalization methods, filters, and data aug-
mentation.

2.1.1 Normalization
Normalization of audio refers to applying a constant audio gain to an audio record-
ing by altering the signal amplitude for all values in the signal [15]. This can be
performed through Peak normalization or Loudness normalization.

Peak normalization involves scaling the signal based on its loudest point (peak)
[15]. A standard method, which many people associate with normalization, involves
adjusting the audio so that its maximum absolute value equals 1. Peak normalization
does not account for the loudness of the signal, which varies with frequency and
duration.

Loudness normalization adjusts the average loudness of the signal to a target level
by adjusting the gain [15]. One approach to estimating the average loudness of
the signal is to determine its average power, such as the root mean square (RMS)
amplitude. Using the RMS amplitude, the signal can be scaled to a desired level.

2.1.2 Data Augmentation
To address variability in loudness caused by differences in measurement distance and
the recording properties of various phones, this project also applies data augmenta-
tion. Data augmentation is a widely used technique in machine learning, particularly
when working with small datasets [16]. It involves a range of methods to expand
the dataset and enhance the diversity of the data, which can help reduce overfitting
and improve the model’s performance. This project uses data augmentation to sim-
ulate the variability introduced by recording with different phone devices. Thus, it

5


2. Theory

serves more as an alternative to normalization than a strategy for improving model
performance.

2.1.3 Filtering
The filters used to preprocess the audio signals are frequency-selective, specifically
high-pass and low-pass filters. A high-pass filter enhances high frequencies by filter-
ing out lower frequencies, while a low-pass filter enhances low frequencies by filtering
out higher ones [17]. In this project, Butterworth low-pass and high-pass filters are
used. The Butterworth filter is well-suited for this purpose because it has no ripple
in either the passband or the stopband, which is a desirable property for frequency-
selective filtering [17], [18]. It, therefore, preserves the desired parts of the signal
without distortion and cleanly removes the undesired frequencies.

2.2 Feature Extraction Using OpenSMILE
OpenSMILE is a toolkit for audio feature extraction designed for applications in
speech, music, and general sound recognition [19], [20]. It processes raw audio sig-
nals and extracts a wide range of features. This project uses the ComParE 2016
feature set from the openSMILE library. This set extracts 6 373 features derived
from both the time and frequency domains, including, for example, signal energy,
loudness, Mel-frequency cepstral coefficients (MFCCs), pitch, and voice quality [20].
A comprehensive description of all features is available in the openSMILE docu-
mentation [20]. OpenSMILE provides an extensive, high-dimensional feature set.
Therefore, feature reduction is necessary to identify the most relevant features for
this type of audio [21]. Feature reduction is performed using a statistical method
described in the following section.

2.3 Statistical Tests
Statistical tests can be used to evaluate the extracted features and determine whether
variables are correlated. This section provides an overview of the main statistical
tests applied in this thesis.

2.3.1 Pearson and Spearman
The Pearson product-moment correlation coefficient measures the strength and di-
rection of the linear relationship between two variables [22]. The correlation coef-
ficient ranges between -1 and 1, where values closer to 1 indicate a strong positive
linear relationship, values closer to -1 indicate a strong negative linear relationship,
and values near 0 suggest little to no linear relationship. The Pearson correlation
coefficient, therefore, describes how closely the data points align with an imaginary
straight line. The closer the points are to this line, the closer the coefficient is to
±1, and the stronger the linear correlation between the two variables.

6


2. Theory

The Spearman’s rank-order correlation coefficient is a non-parametric counterpart to
the Pearson product-moment correlation coefficient [23]. Like Pearson’s, it measures
the strength and direction of the relationship between two variables, but instead of
focusing on linear relationships, it captures monotonic relationships. A monotonic
relationship means that as the value of one variable increases or decreases, the value
of the other tends to do the same, though not necessarily at a constant rate. The
resulting coefficient, like Pearson’s, ranges between -1 and 1 and is interpreted in
the same way; values closer to ±1 indicate a stronger monotonic relationship, while
values closer to 0 indicate little to no monotonic relationship.

2.3.2 Mann-Whitney U-test
The Mann–Whitney U-test is a non-parametric statistical test used to compare two
independent samples or groups [24], [25]. It is beneficial for assessing differences
between groups when the data are continuous but not normally distributed [24].
Often described as the non-parametric equivalent of the t-test, it does not assume
normality and instead relies on the ranks of the data rather than their raw values.
All observations from both groups are combined and ranked, and the test assesses
whether the sum of ranks differs significantly between the two groups.

The null hypothesis of the Mann–Whitney U-test states that the two populations
are equal, meaning there is no difference in the distribution of ranks between the
samples. To evaluate whether this hypothesis can be rejected, the p-value is calcu-
lated. If the p-value exceeds the significance level (typically set at 0.05), we do not
reject the null hypothesis [25]. This analysis aims to identify features that differ sig-
nificantly between the two groups: BOAS-negative and BOAS-positive. Therefore,
we focus on features with p-values less than 0.05.

2.4 Machine Learning
This section includes information about the machine learning models and evaluation
metrics used for the classification.

2.4.1 Models
A Random Forest classifier consists of multiple individual decision trees, as illus-
trated in Figure 2.1 [11]. Each decision tree is trained on a random subset of the
dataset and predicts the class label, in this case, either Class 0 (BOAS-negative) or
Class 1 (BOAS-positive). The final classification is obtained by taking the majority
vote of the predictions from each decision tree. Additionally, the Random Forest
classifier provides a probability estimate for each class, calculated as the proportion
of trees that predicted that class out of the total number of trees. This probability
reflects the model’s confidence in its prediction.

For example, suppose decision tree 1 predicts Class 0, while decision trees 2 and 3
predict Class 1. Since the majority of the trees (2 out of 3) predict Class 1, the final
predicted class will be Class 1. The prediction probability for Class 1, in this case,

7


2. Theory

is 2/3. The Random Forest classifier, therefore, improves predictive accuracy and
reduces the risk of overfitting [11].

Decision Tree 1 Decision Tree 2 Decision Tree 3

Dataset

Prediction 1 Prediction 2 Prediction 3

Average Prediction

Figure 2.1: Architecture of a Random Forest Model.

Another model tested for classification is Extreme Gradient Boosting (XGBoost).
XGBoost is a learning algorithm based on gradient boosting [26]. Unlike Random
Forest, which builds trees independently in parallel, gradient boosting allows XG-
Boost to build decision trees sequentially, where each new tree tries to correct the
errors made by the previous ones [27]. This gradient-boosting approach allows XG-
Boost to achieve high predictive performance.

2.4.2 Evaluation Metrics
The Receiver Operating Characteristics (ROC) plot is a popular measure used
for evaluating the performance of a classifier, especially in medical classification
problems [28], [29]. The ROC curve is a graphical plot with the False Positive Rate
(FPR) on the x-axis and the True Positive Rate (TPR) on the y-axis, as shown in
Figure 2.2. The FPR represents the proportion of negative observations that are
incorrectly classified as positive [29]. Similarly, the TPR represents the proportion
of positive observations that are correctly classified.

Given the ROC curve, the area under the curve (AUC) can be derived, as shown
by the blue area in Figure 2.2. AUC is a useful tool for differentiating between
classifiers, as it summarizes each classifier’s performance into a single measure [30].
An AUC of approximately 0.5 indicates that the model has no class separation
capacity, while a value of 1.0 means that the model has perfectly differentiated
between the classes, with no false positives or false negatives [30], [31]. If an AUC
of 1.0 cannot be achieved, an AUC above 0.8 is often considered acceptable [32].

8


2. Theory

ROC

AUC

TPR

FPR

1

0
0 1

Figure 2.2: The ROC curve (blue line) shows the trade-off between TPR and
FPR. The shaded area represents the AUC.

Accuracy is used to evaluate the classifier when using cross-validation. Classifier
accuracy is determined by:

Accuracy = Number of Correct Predictions
Total Number of Predictions .

Prediction probability represents how confident the Random Forest classifier is in
its decision, based on the averaged outputs from all decision trees. This probability
is used as the evaluation metric when classifying the new dataset, providing not only
a predicted class but also a measure of certainty behind that prediction.

2.4.3 Cross-validation
Cross-validation is a technique used in machine learning to obtain a more reliable
estimate of a model’s performance. In traditional approaches, the dataset is ran-
domly divided into separate training and test subsets. In k-fold cross-validation,
the dataset is split into k equal-sized folds [33]. The model is trained k times. For
training, it uses k − 1 folds, with the remaining fold used for testing. This process
ensures that every data point is used for training and evaluation. An illustration of
this procedure is shown in Figure 2.3.

Test Train Train Train Train

Train Test Train Train Train

Train Train Test Train Train

Train Train Train TrainTest

TestTrain Train Train Train

Dataset

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5

Split 1

Split 2

Split 3

Split 4

Split 5

Performance

Performance

Performance

Performance

Performance

Average
Performance

Figure 2.3: Illustration of k-fold cross-validation.

9


2. Theory

In this project, Stratified k-fold cross-validation is used, as the dataset is imbalanced,
with more BOAS-negative dogs than BOAS-positive dogs. Stratification ensures
that each fold maintains the same class distribution as the whole dataset, which helps
produce more consistent and representative performance estimates, particularly for
classification tasks involving class imbalance [34].

10


3
Methods

This thesis aimed to develop a model capable of classifying whether a dog is BOAS-
negative or BOAS-positive based on two audio recordings of its breathing. The
problem was addressed using the pipeline illustrated in Figure 3.1 to identify fea-
tures that effectively capture the differences between BOAS-negative and BOAS-
positive audio signals. The methodology consisted of five main stages: data collec-
tion, preprocessing, feature extraction, feature reduction using statistical methods,
and classification-based evaluation.

Collection of
audio recordings

Preprocessing the
data

Extracting
features

Finding the best
features using

statistical methods

Evaluating
features using a

classifier

Normalization Filtering

Figure 3.1: Pipeline of the methodology.

The first step involved collecting audio recordings. Although a set of recordings
from previous studies was available, additional data were needed to test the model’s
performance in a real-world setting and to enhance the robustness of the evaluation.
Therefore, new recordings were collected from a group of dogs to supplement the
existing dataset.

In the preprocessing step, raw audio recordings are refined through normalization
and filtering techniques to enhance the signal quality and ensure consistency. Next,
in the feature extraction stage, a broad set of acoustic features was extracted from
the preprocessed signals. To identify the most informative features, statistical meth-
ods were then applied, narrowing down the feature set to those most relevant for
distinguishing between BOAS-negative and BOAS-positive cases. Finally, these se-
lected features were evaluated using a classifier, providing insight into the quality
and discriminative ability of the features.

Since the final evaluation depended on the classification performance, it was essential
to carry out all preceding steps, including preprocessing, to assess the impact of each
methodological choice, such as different normalization and filtering strategies, on the
overall performance.

11


3. Methods

The final code is available on GitHub, including all functions and files mentioned in
the following sections.

3.1 Audio Recording
The first dataset consisted of 85 audio recordings collected from 31 dogs representing
various brachycephalic breeds. Table 3.1 shows the distribution of the dogs along
with their corresponding BOAS grades. Several dogs had multiple recordings, in-
cluding both pre- and post-exercise samples. The recordings were captured using
three different Android phones: OnePlus A5000, Samsung A5, and Samsung S10.
All dogs included in the dataset were over one year of age and free from respiratory
diseases other than BOAS.

Table 3.1: Distribution of dogs by BOAS grade in the first dataset.

BOAS Functional Grading Number of dogs
Grade 0 6
Grade 1 14
Grade 2 10
Grade 3 1
Total 31

To improve the model’s performance, new data were collected to form a smaller
supplementary dataset. A total of five dogs were recorded, comprising one Pug and
four French Bulldogs. The Pug was recorded at Viskadalens Djurklinik Evidensia,
while the four Bulldogs were recorded at Hallands Djursjukhus Slöinge. Recordings
were taken both before and after the 3-minute exercise test. Airway evaluations
were conducted by authorized veterinarians using the validated RFG scheme. The
audio was recorded simultaneously using two different mobile phones, a Samsung
Galaxy Note 20 Ultra and a OnePlus A5000. Before all recordings, the dog owners
signed a written informed consent allowing us to process the data.

The file names of each recording in the dataset, along with its BOAS grade and
a definition of whether it was a pre- or post-exercise recording, were saved in a
Metadata CSV file to easily extract the desired files using the Python library Pandas
[35].

3.2 Data Preprocessing
To begin processing the audio recordings, they were first imported into Python using
the Soundfile module, which reads audio files and returns both the audio data and
its corresponding sample rate [36]. An if-statement was added to ensure that if
the audio is recorded in a two-channel configuration, the code processes only one
channel. Based on the sample rate information, a time axis was then constructed,
allowing the audio recordings to be represented as time series data for subsequent
preprocessing steps.

12

https://github.com/TimPagrell/Masters_thesis_BOAS/tree/main/BOAS%20Jennie


3. Methods

The new data that was recorded using the Samsung was initially stored in the M4A
format, whereas the new OnePlus and earlier recordings were in WAV format. The
M4A files were therefore converted to WAV format to ensure consistency across the
dataset before further processing.

3.2.1 Normalization
To account for variabilities between audio recordings, such as differences in the
distance between the microphone and the dog during recording, normalization was
applied to the signals. Two normalization approaches were tested and evaluated to
determine the most suitable method for this data.

The first normalization approach was peak normalization. When performing this,
the signals were scaled based on their maximum amplitude so that the waveform
was within the range of -1 to 1. The audio is loaded, and Peak normalized using
the function read_and_normalize_file.

Another normalization method was Loudness normalization, in this case, RMS nor-
malization. In this approach, each signal was scaled by a factor calculated as
the signal’s RMS value plus 0.01 to avoid scaling the signal by zero depending
on the RMS value. The audio is loaded, and RMS normalized using the function
load_and_RMS_normalize.

Another approach tested for addressing the variability between audio recordings was
data augmentation, which is described in Section 3.2.3.

The model outputs were analyzed after each technique was applied to evaluate the
performance of the normalization methods. Plots were generated to investigate
further the differences between peak normalization and loudness (RMS) normal-
ization, showing both the original signal and its corresponding normalized signal.
These plots were created for two representative cases: one dog with BOAS grade
0 and very quiet breathing and another with BOAS grade 2 exhibiting noticeably
loud breathing in the audio recordings. This analysis was performed because it is
essential that the model can accurately differentiate between the breathing sounds
of BOAS-negative and BOAS-positive dogs.

3.2.2 Filtering
The audio signals were filtered using high-pass and low-pass Butterworth filters to
reduce background noise and other disturbances introduced by phone recordings.
The high-pass filter was applied to remove low-frequency components, such as wind
noise, that are unlikely to contribute useful information for classification. Simi-
larly, the low-pass filter was used to eliminate high-frequency components, such as
background sounds, which were irrelevant for distinguishing breathing patterns.

To determine appropriate cutoff frequencies for the filters, several representative
recordings were cropped to isolate specific sounds. This was performed in file
cropped_data.py. These included breaths recorded before and after exercise, non-
breathing noises such as wind noise from the dog’s exhalation into the micro-

13


3. Methods

phone, and background sounds like a door opening. Each sound segment was ana-
lyzed in the frequency domain using the Fast Fourier Transform (FFT) in the file
cropped_data_fft_plot.py. This analysis helped identify the frequency ranges
where important breathing-related sounds occur and distinguish them from irrele-
vant or disruptive frequency components that should be filtered out.

To determine an appropriate cutoff frequency for the high-pass filter, the signals
were initially high-pass filtered using a range of fixed cutoff frequencies between 80
Hz and 1000 Hz.

To further improve it, I implemented a dynamic filtering approach that consists of
a function determining an estimated cutoff for each particular signal. The func-
tion is named estimate_cutoff_frequency and analyzes each signal’s frequency
spectrum by first converting the time-domain signal using the FFT. The function
then calculates the cumulative energy distribution of the frequency spectrum and
determines the frequency at which a specified percentage of the total signal energy
is exceeded. This determined frequency is then used as the cutoff for the high-pass
filter. Therefore, if the threshold is set to 5%, the function selects a cutoff frequency
such that the lowest-frequency components, which contain the first 5% of the sig-
nal’s total energy, are removed. This approach allows the filtering to adapt to the
characteristics of each recording. The threshold were set to 5%, 10%, 15% and 20%
to determine an appropriate cutoff frequency threshold for the high-pass filter.

To determine an appropriate cutoff frequency for the low-pass filter, the signals were
processed using a range of fixed cutoff frequencies between 14000 Hz and 5000 Hz.
The methodology was also adapted for low-pass filtering using the same dynamic
filtering function developed for the high-pass filter. In this case, the cumulative
energy distribution of the frequency spectrum was used to identify the frequency
above which a specified percentage of the total signal energy was exceeded. Thresh-
olds of 90% and 95% were tested, resulting in cutoff frequencies that preserved the
lower-frequency components carrying 90% and 95% of the signal’s total energy, re-
spectively, while removing the remaining high-frequency content. This allowed the
low-pass filtering to adapt to the characteristics of each signal.

The signals were filtered using a digital low-pass and high-pass Butterworth filter
from the SciPy library [37]. The functions used for high-pass and low-pass filtering
are called butter_highpass_filter and butter_lowpass_filter, respectively.

3.2.3 Data Augmentation
Data augmentation was another approach used to address the variability between
audio recordings rather than normalization. Data augmentation is performed not
only for this purpose but also to improve the overall performance of the models.
Therefore, it is described as a separate part of the preprocessing methodology rather
than a part of normalization.

The data augmentation approach was used to randomly amplify or reduce the am-
plitude of the entire signal by a specified percentage. The augmentation was per-
formed in the file AUG_random_amplitude_up_or_down.py. First, the signals were

14


3. Methods

randomly amplified or reduced by 10%. Then, they were randomly amplified by 15%
and reduced by 5%. After performing this, along with the other desired preprocess-
ing methodologies, the signals were saved, and features were extracted. These fea-
tures were then saved as a DataFrame and concatenated on top of a similar feature
set in file Concat_CSV_files.py, but that set did not have the random amplification
as part of the preprocessing. This resulted in a dataset twice as large, augmented
to account for the variability in amplitude between recordings taken with different
phones.

3.3 Feature Extraction

Since the recordings are time series signals, feature extraction is necessary to capture
and describe relevant patterns within the recordings in a form suitable for machine
learning. After all signals were preprocessed using various signal processing methods,
features were extracted using the openSMILE toolkit [38]. The extracted features
were organized into DataFrames using Pandas and saved as CSV files for further
analysis using the function create_csv_with_OpenSMILE_features.

3.4 Statistical Methods

After feature extraction, feature reduction was performed to identify the most rele-
vant features. This step is important in machine learning, as it helps retain only the
most informative features, thereby improving prediction accuracy and generalization
to unseen data [21].

Using the feature DataFrame obtained in the previous section, three different statis-
tical tests were applied to analyze the relationships between features and to select
those that best distinguish between BOAS-negative and BOAS-positive dogs.

Spearman and Pearson correlation tests were applied to examine the relationships
between individual features and the target label. Both tests output a correlation
coefficient for each feature, and features with an absolute correlation coefficient
greater than 0.6 were selected for further analysis. Both tests were applied in the
file Spearman_and_Pearson.py.

In addition, the Mann–Whitney U-test was used to identify features that best dif-
ferentiate between BOAS-negative and BOAS-positive dogs. Features with p-values
less than 0.05 were considered statistically significant, as they indicate a difference
between BOAS-negative and BOAS-positive. However, since this often resulted in a
large number of features, only the top 100 features with the smallest p-values were
retained. The implementation included a check to raise an error if fewer than 100
features met the threshold. In that case, all features with p-values less than 0.05
were selected for further analysis. The function Mann_Whitney_U_test_csv extracts
these features and creates a CSV file.

15


3. Methods

3.5 Machine Learning
Once the feature set had been created, the final step was to train a machine learning
model to evaluate its ability to classify the data correctly and test the various pre-
processing settings. This was achieved using two different model setups, as described
in the following section.

3.5.1 Model Setup
As mentioned in the introduction, the classifier was trained using two different
setups, as shown in Figure 3.2. Setup 1 involves training two separate models:
one using the pre-exercise recordings and the other using the post-exercise record-
ings. Setup 2 is a hybrid approach, trained on a combined feature set that in-
cludes both pre- and post-exercise recordings. When Setup 1 was performed, the
dataset was split into two separate feature DataFrames: one for the pre-exercise
recordings and one for the post-exercise recordings. Each of these datasets was pro-
cessed independently following the methodology described in Figure 3.1. The file
Extract_Features_Setup_1.py was developed to process all audio recordings by
performing preprocessing based on user-defined parameters, extracting openSMILE
features, reducing them using the Mann–Whitney U test, and generating the final
feature set.

Post-Exercise
recordings

Pre-Exercise
recordings

Pre-Exercise & Post-Exercise
recordings

Model Model Model

Setup 1 Setup 2

Prediction Prediction Final Prediction

Final Prediction

Figure 3.2: Processes when performing model Setup 1 and Setup 2.

In Setup 2, a hybrid model was constructed using a combined feature set. To do this,
the original dataset was filtered to include only those dogs for which both pre- and
post-exercise recordings were available. Since three dogs were missing one of the two
recordings, there were only 28 dogs included when performing Setup 2. The same
preprocessing and feature extraction methods were then applied to both subsets.
After extracting features using openSMILE, the feature names in each DataFrame
were renamed to indicate the recording context: pre-exercise features were renamed
with the suffix _before_et, and post-exercise features with _after_et. The two
feature DataFrames were then concatenated side by side using Pandas, resulting in a

16


3. Methods

single feature vector that included information from both recordings for each dog, in
file OpenSMILE_feature_extraction_setup_2.py. This combined DataFrame was
subsequently subjected to the statistical test feature selection, allowing the model
to identify relevant features from both pre- and post-exercise recordings.

3.5.2 Training and Evaluating the Classifier
Before the new dataset was introduced, the initial work was done using the first
dataset. After preprocessing the data, features were extracted and reduced using
the Mann-Whitney U-test. The resulting feature set was then saved in a CSV file
for use in training and evaluation.

The primary classifier used in this study was the Random Forest classifier. To
prepare the data for training, the column representing each dog’s BOAS grade (class
0 or class 1) was separated from the rest of the dataset and used as the model’s
label. All remaining columns were treated as input features, consisting of numerical
features derived from previous steps.

To evaluate the model’s performance, the dataset was split into five folds for cross-
validation. The Random Forest model was trained and validated across these five
splits, in file Train_RF_using_Cross_Validation.py. For each fold, accuracy and
the AUC were recorded. The final performance was reported as the mean and
standard deviation of these metrics. The Random Forest classifier and the Stratified
k-folk cross-validation was applied using the scikit-learn library [39].

Once the new dataset had been collected, the best-performing methodology iden-
tified during the initial experiments was further tested and refined. The Random
Forest classifier was still used, along with stratified 5-fold cross-validation, to account
for the imbalanced dataset. The model was trained on the first dataset and then
evaluated on the newly collected dataset to investigate how the model generalized
to new data, in the file Classify_new_files.py. Besides accuracy and AUC, the
performance was assessed by examining which dogs were correctly classified and the
associated prediction probabilities for each dog. Based on the best-performing re-
sults, the same feature set was used to train an XGBoost model to evaluate whether
the prediction probabilities could be improved, in the file XGBoost.py. The model
was applied using the XGBoost Python package [40].

This methodology was initially applied to model Setup 1, where the pre- and post-
exercise data were evaluated individually. The methodology was then extended
to model Setup 2, which incorporated a hybrid solution that combined both pre-
and post-exercise data for evaluation. Performing Setup 2, the tests were conducted
separately with the audio recordings on the OnePlus A5000 and the Samsung Galaxy
Note 20 Ultra separately to ensure that the methodologies worked for both devices.

17


3. Methods

18


4
Results and Analysis

This chapter presents the project’s most relevant results, focusing on signal process-
ing techniques, feature extraction, statistical methods, and classification approaches.

The Appendix provides additional detailed classification results for the two model
setups, focusing on how different preprocessing strategies affect model performance.
Tables A.1 and B.1 display classification results obtained using the Random Forest
Classifier and Stratified 5-fold cross-validation for the first dataset, with different
preprocessing settings applied to model Setups 1 and 2, respectively. The following
tables in Appendices A and B present classification results obtained by training the
classifier on the first dataset and using the new dataset as input.

Appendix A presents results using model Setup 1, where the classification is divided
between pre- and post-exercise data. Tables are presented in pairs, where each
preprocessing condition is evaluated separately on pre- and post-exercise recordings.
For instance, Tables A.2 and A.3 show classification results without preprocessing
for pre- and post-exercise data, respectively. Similarly, Tables A.5 and A.6 present
results obtained by applying a high-pass filter with a 5% cutoff to the pre- and post-
exercise recordings. The individual classification probabilities for each recording
type are further merged in Tables A.4 and A.7 to provide an average performance
estimate across the pre- and post-exercise recordings.

Appendix B presents the classification results for Model Setup 2, initially using
OnePlus recordings, followed by Samsung recordings. It begins with baseline re-
sults using raw data (Table B.2) and progressively adds complexity through RMS
normalization (B.3) and high-pass filtering with an estimated cutoff frequency of
varying thresholds 5–20% in Tables B.4 to B.7. Further refinement is introduced in
Tables B.8 to B.12, which incorporate data augmentation, including both symmet-
ric (±10%) and asymmetric (+15%, -5%) amplitude adjustments, alongside filtering
and normalization, to investigate their cumulative impact on model generalization.
Tables B.13 to B.16 present results obtained by applying selected preprocessing
strategies to Samsung recordings, comparing the differences between phone record-
ings. The final Tables (B.17 to B.26) enable phone-recording comparisons (OnePlus
vs. Samsung) under matched preprocessing conditions, offering insights into how
preprocessing settings transfer across different phones.

19


4. Results and Analysis

4.1 Audio Recording
Table 4.1 presents the distribution of BOAS grades based on the RFG results for the
five dogs included in the new dataset. Four dogs were classified as BOAS-negative,
while one was classified as BOAS-positive.

Table 4.1: Distribution of dogs by BOAS grade in the new dataset.

BOAS Functional Grading Number of dogs
Grade 0 1
Grade 1 3
Grade 2 1
Grade 3 0
Total 5

4.2 Signal Processing
This section presents key results from the signal processing, including normalization,
filtering, and data augmentation.

4.2.1 Normalization
Table 4.2 includes a subset of the results in Table A. These results show that peak
normalization performs poorly for the pre-exercise model. To understand why, we
examine the plots showing the original and normalized signals of a BOAS-negative
dog with quiet breathing and a BOAS-positive dog with loud breathing.

Table 4.2: Results of training a Random Forest classifier using cross-validation,
applied to pre- and post-exercise data under three preprocessing settings.

Preprocessing settings Pre-exercise Post-exercise

Mean accuracy: Mean AUC: Mean accuracy: Mean AUC:

Original 0.867 ± 0.083 0.961 ± 0.054 0.925 ± 0.061 1.000 ± 0.000

RMS normalization 0.867 ± 0.083 0.967 ± 0.044 0.925 ± 0.061 0.987 ± 0.027

Peak normalization 0.844 ± 0.133 0.886 ± 0.115 0.925 ± 0.061 0.987 ± 0.026

Figure 4.1 shows the results of Peak normalization applied to these two recordings.
Peak normalization did not perform well on recordings with relatively quiet breath-
ing, as it uniformly amplified the amplitude of all samples, making the quiet breaths
appear as loud as the originally louder ones. For instance, in Figure 4.1a, the nor-
malized quiet breath fluctuates around ±0.25, and similarly, in Figure 4.1b, the
normalized loud breath fluctuates around a similar amplitude (±0.5). This dimin-
ishes the distinction between quiet and loud breathing. Therefore, it is reasonable
that Peak normalization seems to work for the post-exercise data but not for the

20


4. Results and Analysis

pre-exercise recordings since all signals in the post-exercise are not as quiet as they
can be in the pre-exercise recordings.

(a) (b)

Figure 4.1: Time-domain plots of audio recordings of a dog’s breathing, showing
the original (blue) and Peak normalized (orange) signals. Plot (a) shows quiet

breathing, while plot (b) shows loud breathing.

RMS normalization was applied instead to achieve more consistent results. Figure
4.2 shows the corresponding RMS-normalized recordings. This method scales the
amplitude based on the overall energy of each recording, providing a more consis-
tent loudness across samples. In Figure 4.2a, the quiet breathing remains around
±0.25, but in Figure 4.2b, the loud breathing shows much larger fluctuations (around
±10), preserving the difference in loudness between quiet and loud breathing more
effectively.

(a) (b)

Figure 4.2: Time-domain plots of audio recordings of a dog’s breathing, showing
the original (blue) and RMS normalized (orange) signals. Plot (a) shows quiet

breathing, while plot (b) shows loud breathing.

Comparing Table B.2, B.3 verifies that RMS normalized data performs better than
using original data without preprocessing.

21


4. Results and Analysis

4.2.2 Filtering
The Fourier transform of a single dog’s breath before exercise is shown in Figure
4.3a, while the transform of a breath after exercise is shown in Figure 4.3b. The
figures show that the breath when the dog is resting ranges over a broader range of
frequencies (0 Hz to 14 000 Hz) compared to a breath after exercise, which ranges
from 0 Hz to 5 000 Hz. This indicates which cutoff frequency one should use when
low-pass filtering the signals. In this case, the pre-exercise data should have a cutoff
frequency above 10 000 Hz, while the post-exercise data should have a cutoff fre-
quency around 5 000 Hz. On the other hand, when examining the low-pass filtering
results in Appendix A, it is not apparent which cutoff frequency is most appropriate.

(a) (b)

Figure 4.3: Frequency-domain plot of a single breath recorded (a) before exercise
and (b) after exercise.

An example of noise that appears in several recordings is the scraping noise of the
dog’s breath into the microphone. Figure 4.4a shows the Fourier transform of the
wind noise. The noise has a very low frequency; the peak is at 95 Hz, and the
high-pass filter removes it.

(a) (b)

Figure 4.4: Frequency-domain plot of a recorded (a) wind noise and (b) a door
opening in the background.

22


4. Results and Analysis

A more particular background noise is the sound of a door opening, which is shown
in the frequency domain in Figure 4.4b. Unlike wind noise, this sound spans a
wide range of frequencies, making it more challenging to remove completely through
filtering. However, despite this noise in the recording of dog number 5 from the new
dataset, the model can still correctly classify the BOAS grade. This suggests that
the model is robust to certain types of background noise that cannot be fully filtered
out.

Another sound appearing in several recordings is a beeping noise produced when
the person conducting the recording starts or stops a digital stethoscope. This
beep has a distinct peak at 1 300 Hz, as shown in Figure 4.5a. Therefore, applying
a high-pass filter to remove it would risk eliminating important lower-frequency
information. However, Figure 4.5b shows that the beep’s frequency content relative
to the entire recording shows that its contribution is minimal compared to the
overall frequency distribution. Therefore, I chose not to filter out the beeps, as that
might risk removing other informative lower frequencies. Additionally, I reviewed
all recordings and confirmed that the beeping occurs in both BOAS-negative and
BOAS-positive cases, ensuring it does not bias the classification results.

(a) (b)

Figure 4.5: Frequency-domain plot of a (a) single beep tone and (b) the original
recording where the beep occurs.

By observing the results of the high-pass and low-pass filtered signals in Table
A.1, it is evident that a single, fixed cutoff frequency is unsuitable for all signals.
For the pre-exercise data, the model performed better when a high-pass filter with
a cutoff frequency of 80 Hz was applied. In contrast, the model showed similar
performance across the different filter configurations for the post-exercise data, with
none outperforming the original data without preprocessing. The pre-exercise data
yielded better performance in the low-pass filtering tests with a low-pass filter set
at 13 000 Hz. For the post-exercise data, the classification results were comparable
to those of the original data when using cutoff frequencies of 8 000 Hz and 10 000
Hz. These findings suggest the need for a more dynamic approach, where the cutoff
frequencies for both filters are determined for each individual signal.

23


4. Results and Analysis

From the results in Table B.4 to B.7, it can be concluded that thresholds of 10%
or 15% for estimating the high-pass filter cutoff frequency yield the best results.
Further analysis of Table B.11 and B.12 indicates that a threshold of 15% performs
best overall for estimating the cutoff frequency.

Although the prediction probabilities vary slightly across different preprocessing
settings using the OnePlus recordings, the overall model performance is good, as it
correctly predicts the BOAS grade for all dogs in nearly every setting. However,
the results for the Samsung phone are less promising (see Tables B.13 to B.16).
A frequency-domain analysis of simultaneous phone recordings, shown in Figure
4.6, reveals significant differences in audio preprocessing. The Samsung recording
appears to have been low-pass filtered with a cutoff frequency of 20 000 Hz, likely
to prevent aliasing by the Nyquist theorem, as both phones sample at 48 000 Hz. In
contrast, the OnePlus recording appears to have undergone smoothing around 13 000
Hz, possibly due to an older microphone with reduced high-frequency sensitivity.
This observation clarified that a low-pass filter would be necessary, even though the
initial results in Table A.1 did not suggest its need. Additionally, normalization
became more critical, as the overall loudness in the Samsung recording is higher
than in the OnePlus recording, as shown in Figure 4.6.

Figure 4.6: Frequency-domain plot of the same sound recorded during the same
time but with two different phones.

As mentioned, Table A.1 shows that a single, fixed cutoff frequency is unsuitable
for all signals. It is unclear whether a low-pass filter was necessary, as the results
were almost identical to those from the non-normalized data. However, based on the
observations regarding the differences in preprocessing between the two phones, low-
pass filtering was applied to achieve the best results for preprocessing, including high-
pass filtering, normalization, and data augmentation, as investigated on the OnePlus
recordings, to ensure that the model also predicted correctly for the Samsung phone.

All proceeding results are high-pass filtered with a threshold of 15%. Results from
low-pass filtering with an estimated cutoff threshold of 95% performed better for

24


4. Results and Analysis

both phones (Table B.17 and B.18) than low-pass filtering with an estimated cutoff
threshold of 90% (Table B.19 and B.20). Performing RMS normalization on the
low-pass filtered data with an estimated cutoff threshold of 95% (Table B.21 and
B.22) produced worse results, even when combined with data augmentation.

Figure 4.7 shows the same sound recorded with the OnePlus and the Samsung in the
frequency domain with corresponding estimated cutoff frequencies. The red dotted
line at the lower frequency end represents the cutoff frequency for the high-pass
filter, and the other red dotted line represents the estimated cutoff frequency for the
low-pass filter.

(a) (b)

Figure 4.7: Frequency-domain plot of a simultaneous recording using (a)
OnePlus and (b) Samsung, with the estimated cutoff frequencies for both the

high-pass and low-pass filters marked by red dotted lines.

4.2.3 Data Augmentation
An important observation regarding data augmentation through amplitude ampli-
fication or reduction is that it tends to be canceled out when being combined with
Peak normalization. However, this was not the case when using RMS normalization.

Results related to data augmentation are shown in Table B.9 to B.12 for the OnePlus
recordings, Table B.15 and B.16 for the Samsung recordings, and Table B.23 to B.26
for the final comparison between the OnePlus and Samsung recordings.

Comparing Table B.10 and B.12, where the signals underwent the same preprocess-
ing but differed in amplitude adjustment, the average prediction probability for the
correct class is higher when using 15% amplification and 5% reduction, compared
to using 10% for both amplification and reduction.

To compare RMS normalization with data augmentation, refer to Tables B.21 and
B.22 versus Tables B.25 and B.26. The results show that the model performs better
with data augmentation alone than with RMS normalization for the OnePlus phone.
A combination of data augmentation and RMS normalization yields the best results
for the Samsung phone, see Table B.24.

25


4. Results and Analysis

4.3 Feature Extraction
Among the hundred features extracted by the Mann-Whitney U-test, the feature
types listed in Table 4.3 were the openSMILE features that appeared in most subsets.
Therefore, these are the features that best describe the differences in characteristics
between BOAS-negative and BOAS-positive dog recordings:

Table 4.3: Distribution of openSMILE features among 100 Mann-Whitney
features for non-preprocessed data.

Feature type Number of features
Auditory Spectra 60
MFCC 19
RMS Energy 13
FFT Magnitude 3
Jitter 2
Shimmer 1
HNR 2
Total 100

The Auditory spectra, according to the openSMILE documentation, is used to
describe psychoacoustic sharpness, which, according to a study, is a way to explore
the psychology behind how people perceive sound, often about loudness [20], [41].
This can be studied by looking at the frequency content of a specific sound [41].
One way to do this is by observing the frequency spectrum, a graph showing the
amplitude in decibels of the different frequencies in the sound.

Mel-frequency Cepstral Coefficients (MFCCs) capture the shape of a signal’s
power spectrum in a way that aligns with human auditory perception [42]. Central to
this process is the Mel scale, a perceptual scale of pitch where equal steps correspond
to equal perceived differences in pitch [43]. Because human hearing is more sensitive
to changes in lower frequencies than higher ones, the Mel scale emphasizes finer
resolution at the lower end of the frequency spectrum. To derive MFCCs, the signal
is first transformed into the frequency domain using the Discrete Fourier Transform
(DFT), after which the Mel scale is applied to approximate how humans perceive
sound [42].

RMS Energy or Root Mean Square Energy represents the average loudness of a
signal. The openSMILE documentation does not provide a detailed explanation
of this feature, but several related features appear to be derived from RMS energy.
Based on their feature names, these features likely correspond to different percentiles
of the signal.

Magnitude of the Fast Fourier Transform (FFT) represents the energy distri-
bution across frequencies and is derived by calculating the magnitude of the signal’s
frequency domain, which is determined using the FFT in this case.

26


4. Results and Analysis

Jitter and Shimmer are voice-quality parameters. Jitter refers to the small, rapid
variations in a sound wave’s frequency pitch from one cycle to the next [44]. Shimmer
measures the rapid variations in the amplitude (loudness) of the sound wave across
successive cycles. Figure 4.8 represents Jitter and Shimmer.

Figure 4.8: Representation of Jitter and Shimmer perturbation measures in a
speech signal [44] CC BY-NC-ND 3.0.

Harmonics-to-Noise Ratio (HNR) is a measure of the ratio between the pe-
riodic (harmonic) and aperiodic (noise) components in a speech signal, expressed
in decibels [45]. Higher HNR values indicate a cleaner, more periodic sound, while
lower values suggest increased noise. The HNR is a logarithmic measure of the
signal’s energy ratio, which the following formula can derive,

HNR = 10 × log10

∫
w |H(w)|2∫
w |N(w)|2

where X(ω) corresponds to the speech signal in the frequency domain, H(ω) to the
harmonic component and N(ω) to the noise component.

4.4 Statistical Tests
Applying the Spearman correlation test to the original combined openSMILE feature
set in Setup 2, 29 out of 12 745 features had a Spearman correlation coefficient
greater than 0.6. Using the Pearson correlation test, 13 features had a Pearson
correlation coefficient greater than 0.6.

Applying the Mann–Whitney U-test to the same DataFrame, the test identified
1 179 features with a p-value smaller than 0.05. After selecting 100 of these fea-
tures with the lowest p-value, counting the number of features from the pre-exercise
data (those named before_et), there were 13 features and 87 post-exercise features
(those named after_et). This clearly shows that the post-exercise recordings are
more informative and correlate better with the characteristics of BOAS-negative
and BOAS-positive dogs’ recordings.

27

https://creativecommons.org/licenses/by-nc-nd/3.0/


4. Results and Analysis

4.5 Classification
The best-performing classification results, using a preprocessing methods that was
most effective for both OnePlus and Samsung recordings, are shown in Tables B.25
and B.26. This pipeline includes data augmentation by randomly increasing the
amplitude by 15% and decreasing it by 5%, along with high-pass filtering (cutoff
estimated at 15%) and low-pass filtering (cutoff estimated at 95%). For the Samsung
recordings, the addition of RMS normalization further improved performance, as
demonstrated in Table B.24.

For the OnePlus recordings, the optimal preprocessing setup includes:

• No normalization

• High-pass filtering with an estimated cutoff of 15%

• Low-pass filtering with an estimated cutoff of 95%

• Data augmentation (amplitude increased by 15% and decreased by 5%)

According to Table B.1, this setup yields a mean accuracy of 81.3% and a mean
AUC of 1.0 when training the Random Forest classifier with the filtering methods
alone. Adding data augmentation increases the mean accuracy to 100%, while the
mean AUC remains at 1.0.

For the Samsung recordings, the optimal preprocessing setup includes:

• RMS normalization

• High-pass filtering with an estimated cutoff of 15%

• Low-pass filtering with an estimated cutoff of 95%

• Data augmentation (amplitude increased by 15% and decreased by 5%)

Using only the filtering methods and RMS normalization results in a mean accuracy
of 85.3% and a mean AUC of 1.0. When data augmentation is added, mean accuracy
rises to 96.4%, with the mean AUC still at 1.0.

Classification results using the XGBoost model on the same feature sets are pre-
sented in Tables 4.4 and 4.5.

Table 4.4: Classification results of the OnePlus recordings preprocessed using
data augmentation, high-pass filtering, and low-pass filtering, and classified using

XGBoost.

Dog # True class Predicted class Probability (Class 0) Probability (Class 1)
1 0 0 0.978 0.022
2 0 0 0.864 0.136
3 1 1 0.037 0.963
4 0 0 0.813 0.187
5 0 0 0.751 0.249

The average probability of the correctly predicted classes across all dogs is 87.4%.

28


4. Results and Analysis

Table 4.5: Classification results of the Samsung recordings preprocessed using
RMS normalization, data augmentation, high-pass filtering, and low-pass filtering,

and classified using XGBoost.

Dog # True class Predicted class Probability (Class 0) Probability (Class 1)
1 0 0 0.970 0.030
2 0 0 0.963 0.037
3 1 1 0.424 0.576
4 0 0 0.950 0.049
5 0 0 0.965 0.035

The average probability of the correctly predicted classes in this case is 88.5%.

29


4. Results and Analysis

30


5
Discussion

5.1 Answering the Research Questions

What signal processing methods enhance the breathing characteristics of
a phone recording of a BOAS-negative and BOAS-positive dog?

The results clearly show that Peak normalization is not suitable for this type of
data. Since the amplitude of the breathing signals correlates with whether the
dog is BOAS-negative or BOAS-positive, normalizing using the peak distorts this
important information. BOAS-negative dogs tend to breathe more quietly, and peak
normalization removes this difference by making all signals equally loud.

RMS normalization, however, proved to be a significantly better method. It adjusts
the signal while maintaining the relative loudness, which means that the natural
variations in breathing volume between dogs are preserved.

The results also show that using fixed cutoff frequencies for all recordings when
filtering does not yield optimal results. Each recording can have different frequency
characteristics, and one filter setting does not suit all signals. That is why a dynamic
filtering method, where the cutoff frequencies are determined individually for each
signal, gave better results. This ensures that the filtering adapts to the signal rather
than vice versa. Filtering with a high-pass filter, estimated to have a cutoff at a
threshold of 15%, and low-pass filtering, estimated to have a cutoff at a threshold
of 95%, performed best for both phones.

The resulting filtering methods and data augmentation techniques performed well
across recordings from both phones. However, RMS normalization was less effective
for the OnePlus recordings. Further investigation is needed to develop a preprocess-
ing pipeline that consistently performs well across a broader range of phones.

Which features show the strongest correlation with BOAS-negative and
BOAS-positive dog samples?

The openSMILE features most strongly correlated with BOAS-negative and BOAS-
positive labels were frequency spectra, MFCCs, RMS energy, FFT magnitude, jitter,
shimmer, and HNR. RMS energy, jitter, and shimmer are time-domain features,
while the others are frequency-domain features. This shows that features from the
frequency domain are essential, which also supports the idea of using filtering and

31


5. Discussion

processing techniques that enhance frequency-related properties in the signal. Since
most relevant features are derived from the frequency domain, it is crucial to focus
on preserving and enhancing the signal’s frequency characteristics through effective
preprocessing.

Is it more suitable to use separate models for pre- and post-exercise
recordings for the intended task, or a hybrid model that incorporates
both?

When comparing the two model setups, the classification results indicate that Setup
2, the hybrid model trained on both pre- and post-exercise data, outperforms Setup
1, which employs two separate models. The hybrid model had higher accuracy,
AUC, and prediction confidence overall. This was expected, considering that the
feature selection process (using the Mann-Whitney test) yielded 87 features from
post-exercise data and only 13 from pre-exercise recordings. That means post-
exercise recordings contain more useful information related to the characteristics of
a BOAS-negative and BOAS-positive dog’s recording.

While using separate models (Setup 1) is still a valid alternative, it would require
combining the predictions using some form of weighted average, most likely giving
more weight to post-exercise data. However, doing so adds complexity and is not
guaranteed to yield better results. Therefore, the hybrid model is the best option
for this task.

Do the audio recordings differ between phones?

The audio recordings between different phone brands, in this case a OnePlus and
a Samsung, differ significantly. As shown in the results, the frequency distribution
differed between the two phones. This affects the preprocessing and the classification
of these recordings. Although I found a preprocessing method that worked for both,
it still varied significantly between them. In the future, it would be desirable to have
a preprocessing setup that works for several other phones as well.

Another variable to consider is how the recording conditions impact the data. When
collecting the first dataset, recordings were taken by Sykkö and Dimopoulou, who
were well-informed and careful during the recording. However, the environment at
the veterinary clinics where I performed the recordings was not ideal, and not every-
one was aware that recordings were being made. This resulted in some recordings
having more background noise and other sounds, which could affect quality.

On the other hand, it can be beneficial for the model to be trained on recordings
from different conditions. In the future, this system may be used by regular people
at home, and the recordings will also vary significantly. Training on noisy or varied
data can make the model more robust, if there is sufficient data to train it on.
However, this variability still needs to be accounted for during signal analysis to
avoid classification issues.

32


5. Discussion

5.2 Model Performance and Limitations

One clear observation that can be drawn from the classification results is that the
model generally has the most difficulty classifying the BOAS-positive dog in the
new dataset. This is reasonable since both datasets are imbalanced, with a greater
number of BOAS-negative recordings than BOAS-positive recordings. Since the
model is trained on this imbalanced dataset, it will naturally predict BOAS-negative
dogs more easily than BOAS-positive dogs.

The statistical tests revealed that more features had a strong monotonic relationship
with the BOAS label rather than a linear one. That is one reason I chose to use
a Random Forest classifier rather than a linear model. Still, the difference was not
huge, so a linear model could have been used too, but it likely would not have
performed as well.

It is also important to note that the feature set was selected and optimized specifi-
cally for the Random Forest model. That means the same set might not work well
for other models or neural networks. Although the final results appear promising,
they do not guarantee that the model will perform well on all new data. The dataset
is still small, and the evaluation was based on how well the model predicted the dogs
in the new dataset. It is possible that some of the new recordings were very similar
to older ones. However, even so, it demonstrates that a feature-based classification
of BOAS is possible if the signal processing is designed to highlight the relevant
characteristics in the recordings.

Better models may exist for this task, and it could be worth testing other machine
learning methods or data augmentation techniques. Pagrell’s thesis includes a more
detailed investigation regarding models and data augmentation [14].

An important aspect regarding machine learning models when working with medical
data is that the model should be interpretable [46]. One should avoid using “black
box” models as there is no simple way to interpret the model’s decisions. The Ran-
dom Forest classifier is not the best alternative, as tracking its decisions throughout
all its trees can be challenging. Therefore, it has been considered a black box. On
the other hand, with the right tools, it is possible to make the model interpretable,
thereby gaining insight into its internal decision-making process. The cited arti-
cle presents tools for this, but this project has not investigated it thoroughly [46].
XGBoost is a better choice in this regard, as it is a more advanced model that can
output feature importance, allowing users to see which features have been used most
in each tree’s decision [47]. However, using a less interpretable model for the feature
set created in the project is not particularly risky, as I have complete insight into
which features I input into the model, and I know that all of them represent prop-
erties of the signal’s characteristics. No particular artifacts or background noises
directly influence an extracted feature; the signal is normalized and filtered, and the
features are determined based on the whole signal, not individual segments.

Generally, the classification results are promising. For the final classification results,
the AUC remains 1.0, which is ideal when working with medical data. This means

33


5. Discussion

that the model has perfectly differentiated between the two classes, with no false
positives or false negatives, which is ideal.

5.3 Future Work
The most important thing going forward is to collect more data. A larger, more
varied dataset would improve many of the challenges with classification and gener-
alization. Not only would it diminish the imbalance in the dataset, but it would
also improve differences between phones; for example, recordings from the OnePlus
and Samsung phones were quite different, likely due to differences in microphone
hardware and how the phones process sound.

Although I developed a signal processing pipeline that handled both devices, the
optimal preprocessing methods varied slightly between them. This suggests that
other phone brands may introduce similar issues. Due to these potential additional
variabilities, the first step in future data collection should be to gather recordings
from a broader range of phones. This would likely necessitate further investigation
of the signal processing methodologies until there is sufficient data for the model to
learn and generalize effectively across these differences.

In addition to expanding the dataset, future data collection could benefit from in-
cluding other relevant parameters that may serve as useful features for classification.
For example, since BOAS in dogs is related to their physique, including informa-
tion such as the dog’s weight and neck circumference could be valuable. Similarly,
physiological measures like heart rate and oxygen saturation provide insight into the
dog’s respiratory condition and could enhance model performance. However, these
types of data would only be feasible to collect in veterinary settings, as they require
special instruments and would not be practical for use by dog owners at home.

Another future project is to continue developing the mobile app that Sykkö initiated.
One way would be to create an external API that receives recordings from the app,
performs the preprocessing, openSMILE feature extraction and classification, and
then returns the prediction. In that way, the app could be used more easily in a
real-life setting.

34


6
Conclusion

This project demonstrates the potential of feature-based classification for diagnosing
BOAS. By applying appropriate signal processing methods and extracting features
using openSMILE, a feature set can be constructed that captures the distinguishing
characteristics of recordings from BOAS-negative and BOAS-positive dogs.

Preprocessing the audio using RMS normalization, combined with a dynamic fil-
tering approach that adapts the cutoff frequencies to each recording, enhances the
most relevant aspects of the signal. Additionally, a hybrid model that incorporates
both pre- and post-exercise recordings yields a more informative feature set. Using
the Mann-Whitney U-test for statistical analysis further supports the effectiveness
of feature reduction, resulting in a smaller yet more meaningful set of features.

Despite variations in audio recordings between phones, the preprocessing pipeline
successfully extracted key information from both OnePlus and Samsung recordings.
The Random Forest classifier accurately identified all five test cases, and using the
XGBoost model further improved prediction probability.

Although additional data is needed to ensure the model’s reliability and generaliza-
tion to unseen data, this approach shows promising outcomes for using feature-based
classification of BOAS.

35


6. Conclusion

36


References

[1] “A close up of a dog’s face with a blurry background. french bulldog smart look
dog. - PICRYL - public domain media search engine public domain image.” (),
[Online]. Available: https://timelessmoon.getarchive.net/amp/media/f
rench-bulldog-smart-look-dog-animals-885295 (visited on 04/28/2025).

[2] S. Mitze, V. R. Barrs, J. A. Beatty, S. Hobi, and P. M. Bęczkowski, “Brachy-
cephalic obstructive airway syndrome: Much more than a surgical problem,”
The Veterinary Quarterly, vol. 42, no. 1, pp. 213–223, issn: 0165-2176. doi:
10.1080/01652176.2022.2145621. [Online]. Available: https://www.ncbi
.nlm.nih.gov/pmc/articles/PMC9673814/ (visited on 03/04/2025).

[3] Administrator. “About BOAS.” (Feb. 16, 2016), [Online]. Available: https:
//www.vet.cam.ac.uk/boas/about-boas (visited on 03/04/2025).

[4] “Brachycephalic dogs: What we know about frenchies, pugs and bulldogs,”
Felcana. (), [Online]. Available: https://felcana.com/blogs/blog/brachy
cephalic-dogs (visited on 03/04/2025).

[5] “The rise and fall of popular dog breeds | everypaw.” (), [Online]. Available:
https://www.everypaw.com/all-things-pet/the-rise-and-fall-of-po
pular-dog-breeds (visited on 04/10/2025).

[6] Administrator. “Recognition & diagnosis.” (Feb. 16, 2016), [Online]. Available:
https://www.vet.cam.ac.uk/boas/about-boas/recognition-diagnosis
(visited on 03/04/2025).

[7] S. Kennelklubben. “RFG-Scheme.” (), [Online]. Available: https://www.skk
.se/uppfodning/halsa/andning/rfg-scheme/ (visited on 05/09/2025).

[8] Administrator. “Management & treatment.” (Feb. 16, 2016), [Online]. Avail-
able: https://www.vet.cam.ac.uk/boas/about-boas/management-treatm
ent (visited on 05/09/2025).

[9] M. Mårtensson, “Brachycephalic obstructive airway syndrome (BOAS) classi-
fication in dogs based on respiratory noise analysis using machine learning,”
2021. [Online]. Available: https://hdl.handle.net/20.500.12380/302233
(visited on 04/23/2025).

[10] H. Pettersson and O. Stensöta, “Data augmentation for audio based machine
learning classifying brachycephalic obstructive airway syndrome (BOAS) in
dogs,” 2021. [Online]. Available: https://hdl.handle.net/20.500.12380/3
03984 (visited on 04/23/2025).

37

https://timelessmoon.getarchive.net/amp/media/french-bulldog-smart-look-dog-animals-885295
https://timelessmoon.getarchive.net/amp/media/french-bulldog-smart-look-dog-animals-885295
https://doi.org/10.1080/01652176.2022.2145621
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9673814/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9673814/
https://www.vet.cam.ac.uk/boas/about-boas
https://www.vet.cam.ac.uk/boas/about-boas
https://felcana.com/blogs/blog/brachycephalic-dogs
https://felcana.com/blogs/blog/brachycephalic-dogs
https://www.everypaw.com/all-things-pet/the-rise-and-fall-of-popular-dog-breeds
https://www.everypaw.com/all-things-pet/the-rise-and-fall-of-popular-dog-breeds
https://www.vet.cam.ac.uk/boas/about-boas/recognition-diagnosis
https://www.skk.se/uppfodning/halsa/andning/rfg-scheme/
https://www.skk.se/uppfodning/halsa/andning/rfg-scheme/
https://www.vet.cam.ac.uk/boas/about-boas/management-treatment
https://www.vet.cam.ac.uk/boas/about-boas/management-treatment
https://hdl.handle.net/20.500.12380/302233
https://hdl.handle.net/20.500.12380/303984
https://hdl.handle.net/20.500.12380/303984


References

[11] “RandomForestClassifier,” scikit-learn. (), [Online]. Available: https://scik
it-learn/stable/modules/generated/sklearn.ensemble.RandomForest
Classifier.html (visited on 04/23/2025).

[12] M. Dimopoulou, H. Peterson, O. Stensöta, et al., “Use of respiratory sig-
nal analysis to assess severity of brachycephalic obstructive airway syndrome
(BOAS) in dogs,” The Veterinary Journal, vol. 308, p. 106 261, Dec. 1, 2024,
issn: 1090-0233. doi: 10.1016/j.tvjl.2024.106261. [Online]. Available: htt
ps://www.sciencedirect.com/science/article/pii/S1090023324002004
(visited on 04/23/2025).

[13] A. Oren, J. D. Türkcü, S. Meller, et al., “BrachySound: Machine learning based
assessment of respiratory sounds in dogs,” Scientific Reports, vol. 13, no. 1,
p. 20 300, Nov. 20, 2023, Publisher: Nature Publishing Group, issn: 2045-2322.
doi: 10.1038/s41598-023-47308-0. [Online]. Available: https://www.natu
re.com/articles/s41598-023-47308-0 (visited on 04/23/2025).

[14] T. Pagrell, “Diagnosing Brachycephalic Obstructive Airway Syndrome in Dogs
Using Computer Vision and Machine Learning,” [Online]. Available: https:
//odr.chalmers.se/communities/82b3e123-24a1-47ec-8544-f8ee5b27a
c29 (visited on 05/28/2025).

[15] P. priyanka. “Audio normalization,” Medium. (Sep. 5, 2023), [Online]. Avail-
able: https://medium.com/@poudelnipriyanka/audio-normalization-9d
bcedfefcc0 (visited on 03/20/2025).

[16] K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and
data augmentation techniques,” Global Transitions Proceedings, International
Conference on Intelligent Engineering Approach(ICIEA-2022), vol. 3, no. 1,
pp. 91–99, Jun. 1, 2022, issn: 2666-285X. doi: 10.1016/j.gltp.2022.04.02
0. [Online]. Available: https://www.sciencedirect.com/science/article
/pii/S2666285X22000565 (visited on 05/05/2025).

[17] S. R. Devasahayam, Signals and Systems in Biomedical Engineering: Physi-
ological Systems Modeling and Signal Processing. Singapore: Springer Singa-
pore, 2019, isbn: 978-981-13-3530-3 978-981-13-3531-0. doi: 10.1007/978-9
81-13-3531-0. [Online]. Available: http://link.springer.com/10.1007/9
78-981-13-3531-0 (visited on 05/05/2025).

[18] “Butterworth filter - an overview | ScienceDirect topics.” (), [Online]. Avail-
able: https://www.sciencedirect.com/topics/engineering/butterwort
h-filter (visited on 05/05/2025).

[19] F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile: The munich versatile
and fast open-source audio feature extractor,” in Proceedings of the 18th ACM
international conference on Multimedia, ser. MM ’10, New York, NY, USA: As-
sociation for Computing Machinery, 2010, pp. 1459–1462, isbn: 978-1-60558-
933-6. doi: 10.1145/1873951.1874246. [Online]. Available: https://dl.acm
.org/doi/10.1145/1873951.1874246 (visited on 05/05/2025).

[20] F. Eyben, F. Weninger, F. Gross, and B. Schuller, “Recent developments in
openSMILE, the munich open-source multimedia feature extractor,” in Pro-
ceedings of the 21st ACM international conference on Multimedia, ser. MM ’13,
New York, NY, USA: Association for Computing Machinery, 2013, pp. 835–
838, isbn: 978-1-4503-2404-5. doi: 10 . 1145 / 2502081 . 2502224. [Online].

38

https://scikit-learn/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
https://scikit-learn/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
https://scikit-learn/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
https://doi.org/10.1016/j.tvjl.2024.106261
https://www.sciencedirect.com/science/article/pii/S1090023324002004
https://www.sciencedirect.com/science/article/pii/S1090023324002004
https://doi.org/10.1038/s41598-023-47308-0
https://www.nature.com/articles/s41598-023-47308-0
https://www.nature.com/articles/s41598-023-47308-0
https://odr.chalmers.se/communities/82b3e123-24a1-47ec-8544-f8ee5b27ac29
https://odr.chalmers.se/communities/82b3e123-24a1-47ec-8544-f8ee5b27ac29
https://odr.chalmers.se/communities/82b3e123-24a1-47ec-8544-f8ee5b27ac29
https://medium.com/@poudelnipriyanka/audio-normalization-9dbcedfefcc0
https://medium.com/@poudelnipriyanka/audio-normalization-9dbcedfefcc0
https://doi.org/10.1016/j.gltp.2022.04.020
https://doi.org/10.1016/j.gltp.2022.04.020
https://www.sciencedirect.com/science/article/pii/S2666285X22000565
https://www.sciencedirect.com/science/article/pii/S2666285X22000565
https://doi.org/10.1007/978-981-13-3531-0
https://doi.org/10.1007/978-981-13-3531-0
http://link.springer.com/10.1007/978-981-13-3531-0
http://link.springer.com/10.1007/978-981-13-3531-0
https://www.sciencedirect.com/topics/engineering/butterworth-filter
https://www.sciencedirect.com/topics/engineering/butterworth-filter
https://doi.org/10.1145/1873951.1874246
https://dl.acm.org/doi/10.1145/1873951.1874246
https://dl.acm.org/doi/10.1145/1873951.1874246
https://doi.org/10.1145/2502081.2502224


References

Available: https://dl.acm.org/doi/10.1145/2502081.2502224 (visited on
05/05/2025).

[21] “Feature reduction - an overview | ScienceDirect topics.” (), [Online]. Available:
https://www.sciencedirect.com/topics/computer-science/feature-re
duction (visited on 04/30/2025).

[22] A. K. Kurtz and S. T. Mayo, “Pearson product moment coefficient of corre-
lation,” in Statistical Methods in Education and Psychology, A. K. Kurtz and
S. T. Mayo, Eds., New York, NY: Springer, 1979, pp. 192–277, isbn: 978-
1-4612-6129-2. doi: 10.1007/978-1-4612-6129-2_8. [Online]. Available:
https://doi.org/10.1007/978-1-4612-6129-2_8 (visited on 05/02/2025).

[23] “Introduction to nonparametric methods | EBSCO research starters.” (), [On-
line]. Available: https://www.ebsco.com/research- starters/busines
s - and - management / introduction - nonparametric - methods (visited on
05/04/2025).

[24] “Mann-whitney u test: Assumptions and example,” Informatics from Technol-
ogy Networks. (), [Online]. Available: http://www.technologynetworks.com
/informatics/articles/mann-whitney-u-test-assumptions-and-examp
le-363425 (visited on 03/04/2025).

[25] J. L. Devore, Probability and statistics for engineering and the sciences, 8th
ed. Boston, MA: Brooks/Cole, Cengage Learning, 2012, OCLC: 696106248,
isbn: 978-0-538-73352-6.

[26] “XGBoost documentation — xgboost 3.0.1 documentation.” (), [Online]. Avail-
able: https://xgboost.readthedocs.io/en/release_3.0.0/ (visited on
05/08/2025).

[27] “Machine learning - a first course for engineers and scientists,” sml-book-page.
(), [Online]. Available: http://smlbook.org/ (visited on 05/08/2025).

[28] R. Kannan and V. Vasanthi, “Machine learning algorithms with ROC curve
for predicting and diagnosing the heart disease,” in Soft Computing and Med-
ical Bioinformatics, N. B. Muppalaneni, M. Ma, and S. Gurumoorthy, Eds.,
Singapore: Springer, 2019, pp. 63–72, isbn: 978-981-13-0059-2. doi: 10.1007
/978-981-13-0059-2_8. [Online]. Available: https://doi.org/10.1007/97
8-981-13-0059-2_8 (visited on 05/02/2025).

[29] “Introduction to the ROC (receiver operating characteristics) plot,” Classifier
evaluation with imbalanced datasets. (Jun. 9, 2015), [Online]. Available: http
s://classeval.wordpress.com/introduction/introduction-to-the-roc
-receiver-operating-characteristics-plot/ (visited on 05/02/2025).

[30] C. Chan. “What is a ROC curve and how to interpret it,” Displayr. (Jul. 5,
2018), [Online]. Available: https://www.displayr.com/what-is-a-roc-cu
rve-how-to-interpret-it/ (visited on 03/04/2025).

[31] S. Narkhede. “Understanding AUC - ROC curve,” TDS Archive. (Jun. 15,
2021), [Online]. Available: https://medium.com/towards-data-science/un
derstanding-auc-roc-curve-68b2303cc9c5 (visited on 03/11/2025).

[32] M. P. Muller, G. Tomlinson, T. J. Marrie, et al., “Can routine laboratory
tests discriminate between severe acute respiratory syndrome and other causes
of community-acquired pneumonia?” Clinical Infectious Diseases: An Offi-

39

https://dl.acm.org/doi/10.1145/2502081.2502224
https://www.sciencedirect.com/topics/computer-science/feature-reduction
https://www.sciencedirect.com/topics/computer-science/feature-reduction
https://doi.org/10.1007/978-1-4612-6129-2_8
https://doi.org/10.1007/978-1-4612-6129-2_8
https://www.ebsco.com/research-starters/business-and-management/introduction-nonparametric-methods
https://www.ebsco.com/research-starters/business-and-management/introduction-nonparametric-methods
http://www.technologynetworks.com/informatics/articles/mann-whitney-u-test-assumptions-and-example-363425
http://www.technologynetworks.com/informatics/articles/mann-whitney-u-test-assumptions-and-example-363425
http://www.technologynetworks.com/informatics/articles/mann-whitney-u-test-assumptions-and-example-363425
https://xgboost.readthedocs.io/en/release_3.0.0/
http://smlbook.org/
https://doi.org/10.1007/978-981-13-0059-2_8
https://doi.org/10.1007/978-981-13-0059-2_8
https://doi.org/10.1007/978-981-13-0059-2_8
https://doi.org/10.1007/978-981-13-0059-2_8
https://classeval.wordpress.com/introduction/introduction-to-the-roc-receiver-operating-characteristics-plot/
https://classeval.wordpress.com/introduction/introduction-to-the-roc-receiver-operating-characteristics-plot/
https://classeval.wordpress.com/introduction/introduction-to-the-roc-receiver-operating-characteristics-plot/
https://www.displayr.com/what-is-a-roc-curve-how-to-interpret-it/
https://www.displayr.com/what-is-a-roc-curve-how-to-interpret-it/
https://medium.com/towards-data-science/understanding-auc-roc-curve-68b2303cc9c5
https://medium.com/towards-data-science/understanding-auc-roc-curve-68b2303cc9c5


References

cial Publication of the Infectious Diseases Society of America, vol. 40, no. 8,
pp. 1079–1086, Apr. 15, 2005, issn: 1537-6591. doi: 10.1086/428577.

[33] T.-T. Wong and P.-Y. Yeh, “Reliable accuracy estimates from k-fold cross
validation,” IEEE Transactions on Knowledge and Data Engineering, vol. 32,
no. 8, pp. 1586–1594, Aug. 2020, issn: 1558-219