Signal processing unit for bone
conduction microphone and speaker
An implementation of delay, pitch shifting and echo cancella-
tion

Master’s thesis in Embedded electronic system design

Daniel Eliasson
Lucien Stauffer-Kee

DEPARTMENT OF ELECTRICAL ENGINEERING

CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2024
www.chalmers.se

www.chalmers.se


Master’s thesis 2024

Signal processing unit for bone conduction
microphone and speaker

An implementation of delay, pitch shifting and echo cancellation

Daniel Eliasson
Lucien Stauffer-Kee

Department of Electrical Engineering
Division of Signal processing and Biomedical engineering

Unit of Biomedical Signals and Systems
Chalmers University of Technology

Gothenburg, Sweden 2024


Signal processing unit for bone conduction microphone and speaker
An implementation of delay, pitch shifting and echo cancellation
Daniel Eliasson
Lucien Stauffer-Kee

© Daniel Eliasson & Lucien Stauffer-Kee, 2024.

Supervisor: Karl-Johan Fredén Jansson, Signal processing and Biomedical engineer-
ing
Examiner: Sabine Reinfeldt, Signal processing and Biomedical engineering

Master’s Thesis 2024
Department of Electrical Engineering
Division of Signal processing and Biomedical engineering
Unit of Biomedical Signals and Systems
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: Printed circuit board of an electronic fluency device.

Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria
Printed by Chalmers Reproservice
Gothenburg, Sweden 2024

iv


Signal processing unit for bone conduction microphone and speaker
An implementation of delay, pitch shifting and echo cancellation
Daniel Eliasson & Lucien Stauffer-Kee
Department of Electrical engineering
Chalmers University of Technology

Abstract
This paper introduces the development of a signal processing unit for bone con-
duction microphone and speaker generating real-time adjustable delayed auditory
feedback (DAF) and frequency altered feedback (FAF). These types of feedbacks
are known to effectively increase fluency in people who stutter. The incidence of
stuttering is about 1% of the world population. The effects of DAF have been noted
since the 1950s. The user’s voice is captured by a microphone, processed, and then
relayed back through a speaker. DAF introduces a delay of 50 to 200 milliseconds,
while FAF alters the pitch by a quarter to a full octave. A notable challenge for
bone conduction devices is the potential for strong feedback paths between speaker
and microphone, which can result in echo or oscillation.
The signal processing algorithms, including DAF, FAF, and echo cancellation, were
developed and tested using MATLAB®. Additionally, an analog chain was con-
structed and evaluated on a breadboard, featuring variable amplification and power
amplifier to directly drive a passive speaker. To achieve a standalone device, the
algorithms were ported to a microcontroller, which was further enhanced with a
user-friendly interface, including a rotary encoder and LCD, allowing adjustments
of the algorithms without programming expertise. The entire system was then inte-
grated onto a custom-designed printed circuit board (PCB), combining both analog
and digital circuitry.
The MATLAB® script successfully implements all algorithms; DAF, FAF and echo
cancellation. It can be used either with sound files like wav or mp3, or through
the use of a audio interface the MATLAB® script can be used in real-time for live
application such as a test with a real person.
The hardware implemented design on PCB has a working and tested DAF, and a
untested implementation of echo-cancellation. Due to limitation in floating point
performance of the microcontroller a pitch shifting algorithm remains incomplete.
The hardware device has sufficient audio quality with a total harmonic distortion
of about 3% adhering to IEC 60645-1. This work lays the groundwork for future
enhancements, particularly in refining the pitch-shifting capability.

Keywords: Signal processing, stuttering, DAF, FAF, echo-cancellation, Raspberry
Pi Pico, audio.

v


Acknowledgements
We would like to thank our kind supervisor Karl-Johan Fredén Jansson who have
provided facility, components and expertise. The characterization of the device
would not have been possible to such a degree without you.

Daniel Eliasson & Lucien Stauffer-Kee, Gothenburg, June 2024

vii


List of Acronyms

Below is the list of acronyms that have been used throughout this thesis listed in
alphabetical order:

AAF Altered Auditory Feedback
ADC Analog to Digital Converter
DAC Digital to Analog converter
DAF Delayed Auditory Feedback
FAF Frequency Altered Feedback
FIR Finite Impulse Response
GPIO General Purpose Input Output
LCD Liquid Crystal Display
LED Light Emitting Diode
LMS Least Mean Square

ix


Contents

List of Acronyms ix

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Specification of the issue being investigated . . . . . . . . . . . . . . . 2

2 Theory 3
2.1 Digital delay line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Pitch shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3.1 Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Time framing . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Echo cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Methodology 11

4 Results 15
4.1 Software unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.1.1 Pitch-shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.2 Frequency shifting . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 Hardware unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.1 Circuit design . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.2 Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Discussion 31

6 Conclusion 33

Bibliography 34

7 Appendix III

xi


Contents

xii


1
Introduction

1.1 Background

Devices based on skin microphones are currently researched to be used as bone con-
ductive stethoscope for hearing aids among other application. A similar device to
hearing aids can be used for altered auditory feedback (AAF). Today an approximate
one percent of our population are experiencing experiencing some amount of stut-
tering [1]. Both delayed auditory feedback (DAF) and frequency altered feedback
(FAF) have been already identified to inhibit stutter to a large degree. Research
in inhibiting and reducing stuttering have yielded between 60-100% stutter inhibi-
tion by altering the auditory feedback [2]. Conclusion of these findings suggest that
the means of altering the audio, primarily DAF and FAF, could be used in clinical
application.
Research in bone conduction microphone and transducers for use within hearing aids
have recently seen great progress. One of the more elusive problems of objectively
measuring the audibility of bone conductive devices have been solved by using skin
microphones [3]. It is therefore possible to verify the operation of a bone conduc-
tion device for audio playback. Bone conduction devices have also been successfully
used as implants with no serious adverse effects [4], it is therefore of great interest
for making better, smaller and seemingly invisible hearing aids among other things.
Previous study have determined the optimal placement for bone conductive micro-
phone for speech intelligibility which have been understood to provide better audio
clarity in noisy environments compared to traditional air microphones [5].
Because bone conduction devices has seen very recent advancements, it has become
ever more interesting to apply the technology to new fields such as stutter inhibition.
Since there’s many system similarities between hearing aids and the devices proposed
to inhibit stutter, there’s also great potential for near-term product development and
thus patient life quality improvement.

1.2 Aim

For the purpose of continuing the research in stutter inhibition and evaluate if skin
microphones are suitable, a suitable signal processing unit needs to be constructed.
Since the best combination of settings for inhibiting stutter can vary between indi-
viduals among other things, a large degree of customization is needed.

1


1. Introduction

1.3 Limitations
Despite the testing of the device on the participants of the project, no clinical studies
will be conducted during the course of the master thesis.
Although many of the functions will be adjustable, it will not be investigated how
they perform while being adjusted in real-time. We assume a steady state of these
adjustable parameters and any artifacts created during parameter change will be
disregarded.

1.4 Specification of the issue being investigated
A complete signal processing system needs to be designed. The system is expected
to receive an analog signal and using a combination of analog and digital circuitry,
apply an adjustable delay and a pitch shift. In addition the microphone signal
may be enhanced with echo-cancellation as this is a issue more prevalent to bone
conducting speakers and microphones. When the signal processing is complete the
system should output the audio as an analog signal to a speaker.

2


2
Theory

In this chapter we will go through some of the more niche knowledge applicable to
this report. It is however assumed that knowledge of electronics as well as signals
and signal processing at a basic level is already known. Therefore no discussion
about the differences in time-domain and frequency-domain or how one signal may
be transformed from one to the other domain. It is also assumed knowledge of
the difference between continuous signals and discrete signals, both in time and
amplitude.

3


2. Theory

2.1 Digital delay line
Digital delay line is a discrete element in signal processing which allows a signal
to be delayed by certain number of samples, see figure 2.1. The delay line may be
realized in various ways, one of those is a circular buffer.

Figure 2.1: Delay line with signal delay m

The circular buffer uses a fixed length of memory and through changing the position
of read and write you can decide when to read back an old value, see figure 2.2. The
delay is not fixed to the buffer size instead a change in the relative position of
the read output to the write input [6] will have different delay. This is because
it effectively creates a part of the buffer which has been written but has not been
outputted, in essence delaying it from being output. The largest spacing that can
exist is the same as the size of the buffer. A delay larger than the buffer size will
otherwise overwrite input-data before ever being read out.

(a) buffer at position 000 (b) buffer at position 001

Figure 2.2: The working principle of circular buffer

4


2. Theory

2.2 Fast Fourier Transform
The Fast Fourier Transform (FFT) is a set of algorithms which aims to transform
a signal from the time domain in to the frequency domain. In most applications it
is equivalent to discrete Fourier Transform (DFT) with the big advantage of being
faster. FFT is not just one algorithm, instead it is a name entitling all algorithms
equalling DFT in function but being computationally faster. The most famous and
widely used varient is the Cooley–Tukey algorithm and is the one we will discuss.
The Cooley–Tukey algorithm aims to break down the computational task in to
smaller ones. In practice this is done by breaking a N long DFT to two lighter
DFTs of length N/2. Through the use of some lighter mathematical manipulation
the results will become the same. This being recursive means that in the ideal case
DFT with the length of a 2 power, it only needs to calculate the trivial DFT of
length 1 and the rest is lighter manipulation [7]. Whenever a move to the frequency
domain is done some consideration about the window function should be done. The
default rectangular window may give substantial artifact, instead it is common to
use a Hanning window.

5


2. Theory

2.3 Pitch shifting
The process of pitch shifting aims to alter the signal in frequency domain to shift it
up or down. One of the core principles is to alter the playback speed. However alter-
ing the playback speed is not possible in real-time applications as having playback
faster/slower than recording speed means either running out of samples to play or
an ever increasing remainder to play neither of which is desirable. There are many
solutions to overcome this issue, one of these is the phase vocoder.
The two fundamental parts of the phase vocoder is resampling and phase realign-
ment. To achieve a playback speed different from the recording speed resampling is
used, which effectively changes the pitch. The other part, phase realignment makes
sure to keep phase coherence, preventing audible artifacts [8].

2.3.1 Resampling
Resampling is the process of changing the sample amount. This means that the
signals still occupies the same window of time but now has more or less amount of
samples, see figure 2.3 for a visual representation. Playback after resampling with
the original sample rate changes the time window of the signal instead. In turn
changing the size of the time frame a signal occurs means it plays faster or slower
in time, in other words you have shifted the signal in frequency domain [9].

(a) Original signal (b) Equivalent time contin-
uous signal

(c) Resampled, original
time frame

(d) Resampled, original
sample rate

Figure 2.3: Resampling of a signal

When resampling it is important to know the general frequency spectrum of the
signal as the signal may not be supported by the new sample rate resulting in
aliasing artifacts.
In practice it is separate in to two different methods, one for increasing sample
rate and one for decreasing. To increase sample rate insertion of zeros in between
samples is done, which is followed with a low-pass filter designed according to the
maximum supported frequency allowed by the original sample rate. This low pass

6


2. Theory

filter is sometimes called a reconstruction filter. To decrease the sample rate is done
by first making sure there is no frequency content not supported by the new sample
rate. Which is done by filtering the signal with a low-pass filter, sometimes called
an anti-aliasing filter. After this is done, exclusion of samples at an even spacing
trims away to desired sample rate. These two methods by themselves will only
allow changes to the sample rate as a multiple or division by an integer factor. To
change the sample rate to any rational number both methods needs to be done in
succession, first increase the sample rate and later decrease it [10].

2.3.2 Time framing
In a real time application an extra step is needed to be able to perform multi
sample processing. One function fulfilling this is to store multiple samples in a
buffer representing a frame of time, see figure 2.4.

Figure 2.4: Signal divided in to frames

In the context of resampling each frame to achieve pitch shift there is two substantial
issues, visualized in figure 2.5. When each frame is stretched out there is an overlap,
in which case there is two signals that needs to be combined. In the second case
there is a far worse problem. When the frame is shrunk it results in spaces of time
in which there is no signal to play.

7


2. Theory

(a) Segmenting without overlap (b) Segmenting with overlap

Figure 2.5: Effect of resampling on time frame signal

A solution to the issue of having no signal to play it to create the initial frames
with an overlap. Thus there’s a margin of time which the frames can be shrunk. To
handle overlapping time frames a window function may be applied to each frame
and then it is possible to simply add them together [11].
One issue will however remain, phase misalignment, the solution of which have given
the phase vocoder algorithm its name. Through the use of FFT, each small time
frame can be transformed into the frequency domain. Samples will then represents
magnitude and phase at a specific frequency instead of an amplitude at a given time.
This leads to the opportunity to manipulate the phase while keeping the frequency
content the same. Inverse FFT can then be applied to resynthesise each time frame
back to time domain [12]. The end algorithm is summarized in figure 2.6.

8


2. Theory

(a) block diagram (b) signal diagram

Figure 2.6: Block diagram and the corresponding changes in signal it makes for
the pitch shifting algorithm

9


2. Theory

2.4 Echo cancellation
Echo cancellation is the process of removing echoes due to an unwanted feedback
path by first estimating the feedback path, secondly reproduce the echoed signal
using the estimated feedback path to then thirdly, remove it from the signal, see
figure 2.7.

Figure 2.7: Block diagram of echo cancellation

The estimation of the feedback path is commonly done using a finite impulse re-
sponse (FIR) filter. This is an adequate solution for most naturally occurring feed-
back as it encompass both signal delay and frequency alteration such as low-pass or
high-pass. It will however not work for signal deterioration like clipping or distortion
as the FIR filter cannot approximate these.
For the echo cancellation to work, there needs to be an accurate estimation of the
feedback path and in order to do so, the filter requires to be trained. To train
the FIR filter it’s possible to play a known sound from the speaker and use the
microphone to listen to it. For better performance, it’s common to use a sound with
a wide frequency spectra and so it is typical to play white noise. Then, comparing
the result from the real feedback path that’s captured through the microphone to
the estimated result, the difference is the error and is called e. By using the least
mean square (LMS) algorithm the filter can be incremented for as low as possible
error e [13]. Ideally this will converge giving you a good approximation.

10


3
Methodology

The aim of this project is to develop a tool to be used for investigations of delayed
auditory feedback for rehabilitation of patients who stutter. In more detail, the goal
is to develop a signal processing prototype with the functions in figure 3.1. The
device will have an analog and a digital section. The analog section main purpose
is to amplify the signal. Meanwhile the digital section contains the different audio
alterations like delay and pitch shifting.

Microphone Input gain ADC Noise cancelling

X'X

Z-1 Pitch shift

DAC

Y

FAFDAF

Output gain

Analog Digital

Speaker

Figure 3.1: Block diagram

We began working in MATLAB® creating scripts of the different algorithms. The
scripts were first tested on audio clips confirming each algorithms function. These
scripts were then combined into a complete model able to process audio clips or
real-time audio through an audio interface.
The learning’s from the MATLAB® experimenting was then used to create a stand
alone unit based around the Raspberry Pi Pico microcontroller. First assembled and
tested on breadboard where alterations on hardware could easily be done. Analog
section was partly simulated in LTSpice and then tested with components on bread-
board. Features not implemented in the MATLAB® scripts such as an interface to

11


3. Methodology

change settings was also added. We then moved to a printed circuit board (PCB)
when we achieved a stable hardware solution. The PCB was designed in KiCad.
The firmware running on the Raspberry Pi Pico was written in C using the Ras-
berry Pi Pico version of Visual Studio Code. The algorithms were initially written
heavily based on previous work in MATLAB®. A lot of work was however put in
to optimizing these algorithms for the specific platform. See figure 3.2 for a block
representation of above.

Figure 3.2: Overview of task schedule

Once the standalone unit was finished work on measuring performance commenced.
To characterize the device a set of measurements have been taken using a dynamic
signal analyzer connected by a multitude of methods. Directly connected, meaning
electrically stimulating the input of the device and simultaneously measuring the di-
rect output voltage. We’ve also measured the device connected to a bone-conducting
microphone and bone vibrator to characterize the behavior in a close to intended
application. The microphone is being stimulated by a speaker inside the anechoic
test chamber BK 4222. The bone vibrators force is being measured through the
artificial mastoid BK 4930, see figure 3.3. All tests have been done with maximum
input and output gain. The measured data collected includes frequency response,
bone vibrator force, total harmonic distortion, noise and input saturation point.
The bone vibrators tested were the Radioear B71w, B81 and B250.

12


3. Methodology

(a) overview (b) artificial mastoid

(c) microphone in anechoic chamber
setup

Figure 3.3: Measurement setup

13


3. Methodology

14


4
Results

We designed two different solutions. Firstly, a software unit consisting of a MATLAB®

prototype running algorithms and using an audio interface for input and output.
Secondly, a custom and completely standalone hardware unit.

4.1 Software unit
The software unit consists of MATLAB® scripts which runs on a computer. The
script may either run using previously recorded data in the form of audio files or
using real time audio. Recorded data is convenient to verify performance of the FAF.
In order to run the script with real time audio, an audio interface is to be used, see
figure 4.1. To be able to drive a speaker directly from the audio interface we used
an additional power amplifier, the same as described later in 4.2.1. For the FAF,
two different methods were implemented, pitch-shifting and frequency shifting.

Figure 4.1: Software unit with real-time sampling of audio

4.1.1 Pitch-shifting
In order to produce FAF we need to alter the sampled audio in frequency domain.
One way we did this was to implement a pitch-shifter using a phase vocoder as
described in 2.3. The implementation in MATLAB® follows the phase vocoder with-
out any major alterations. The only notable difference is how we have chosen to
implement resampling. For improved computational efficiency a simplification has
been done, by instead to use linear interpolation between the two closest samples for
resampling. This will introduce some unwanted harmonics but they are of fairly low

15


4. Results

amplitude, see fig 4.2. The interpolation method is computationally the same effort
independent to the resampling ratio, which is not true for conventional resampling.

10
0

10
1

10
2

10
3

10
4

Frequency [Hz]

-120

-100

-80

-60

-40

-20

0

M
a
g
n
it
u
d
e
 [
d
B

]

X 1186.14

Y -7.43971

X 1000

Y 0

Figure 4.2: Pitch shifting using interpolation resampling of 1.2 times speed in-
crease. Original signal is a sinewave at 1 kHz and is drawn blue, altered signal is
orange

4.1.2 Frequency shifting
Frequency shifting can be used to create a Frequency Altered Audio. It is inherently
different to pitch shifting. The principle we used was to transform the audio in to
frequency domain using FFT. Then the values of amplitude of frequency bins were
shifted up or down a predefined integer. Finally the frequency spectrum is re-
synthesized to time-domain. The resulting sound files differ in timber rather than
sounding pitch-shifted because the frequency spectrum is translated sideways rather
than linearly compressed or expanded. This means that harmonics are not multiples
of the fundamental frequency anymore.
Specifically the algorithm divides the signal into frames. FFT is calculated for
each frame. In the present rendition, a frame is 16000 samples and the sampling
frequency is 16 kHz. The frequency spectrum of each frame is then shifted. Figure
4.3 illustrates the principle with a shift of the frequency spectrum of a single frame
300 bins higher. Then an inverse FFT of the modified spectrum is calculated to
have an audio signal of the frame. See figure 4.4 for a block diagram describing this
process. This solution is inherently less complex than the pitch shifter as it foregoes

16


4. Results

the resampling step.

10
0

10
1

10
2

10
3

Frequency [Hz]

0

5

10

15

20

25

30

35

40

45

M
a
g
n
it
u
d
e
 [
a
rb

ri
ta

ry
 u

n
it
]

Original

Frequency shifted

Figure 4.3: Frequency spectra of a frame before and after a shift of +300 bins

17


4. Results

Figure 4.4: Block diagram of the frequency shift signal path

4.2 Hardware unit
The hardware unit was constructed in two stages, firstly a breadboard prototype,
see figure 4.5 and then, final PCB version, see figure 4.6. A system overview in the
form of a block diagram can be seen in figure 4.7. The signal path starts by sending
the audio through a variable gain amplifier. This allows the user to set the input
level in order to utilize most of the dynamic range and thus also resolution of the
ADC without clipping. The ADC converts the audio signal from analog to digital in
order to further process it in the digital domain by a processor. After processing it
is converted back in to analog form by a DAC. A variable gain amplifier allows the
user to set the output level to a comfortable level. Finally a output power amplifier
allows driving a passive speaker directly from the device.

18


4. Results

Figure 4.5: Breadboard version of the hardware unit

(a) top (b) bottom

Figure 4.6: PCB version of the hardware unit

4.2.1 Circuit design

In more detail the variable gain amplifier consists of an opamp, a resistor and a
potentiometer, see figure 4.8. The circuit is a common inverting amplifier with the
feedback resistor replaced with a potentiometer thus allowing a variable gain. As
the potentiometer is in one of its extreme position the feedback will assume a short
and the output will be zero. In the other extreme position it will assume a value
approximately 10 times larger than the fixed resistor resulting in a good amount
of gain. The fixed resistor could be discarded if we instead feed the input signal
through the unconnected pin on the potentiometer, this would however result in
a very large gain swing from -infinite to +infinite, making the potentiometer very
sensitive.

19


4. Results

Figure 4.7: Block diagram of the signal path

Figure 4.8: Schematic view of variable gain amplifier

20


4. Results

The power amplifier is a op-amp driven class AB bipolar transistor amplifier. When
using a single transistor pair we could observe severe current saturation on the op-
amp preventing a good output level. Therefore, to reduce current draw from the
op-amp and in effect increase output level we’ve chosen transistors in a Darlington
configuration, thus increasing amplification factor of the transistor stage.

Figure 4.9: Schematic view of output power amplifier

21


4. Results

To control various aspects of the signal processing we needed some visual and tactile
interface to the device. We chose to add a character LCD and a rotary encoder in
order to implement a simple menu system in which the user can change certain
parameters. The raspberry pi pico has a built in LED, this was used to indicate the
input level being high and close to clipping.

Figure 4.10: Block diagram of digital interfaces

In addition to the digital interfaces, one gpio is set up as to control the contrast
level of the LCD by acting as an analog negative supply. This is achieved through
some additional circuitry, shown in fig 4.11. The GPIO outputs a pulse train. First
an inline capacitor isolates and removes the dc-bias. Then diodes will only allow the
negative voltage to be passed on to LCD VEE. Lastly an additional capacitor and
resistor smooths the voltage and adds a small load to the output.

Figure 4.11: Block diagram of digital interfaces

22


4. Results

4.2.2 Firmware
The firmware running on the raspberry pi pico has multiple functions to tend to,
see figure 4.12 for a complete overview. The most important one being reading
and writing to the DAC. It’s very important that it is handled consistently for the
echo cancellation and potential pitch shifting to work because these assumes a fixed
and known sampling rate. Even the DAF may suffer severe quality deterioration,
albeit at a lesser degree, if used with a varying sample rate as this could lead to
a sample being played at different rate than it was recorded (due to the delay).
For the best repeatability, we’ve implemented the signal chain through a repeating
interrupt (technically a hardware alarm) thus halting whatever process is running
to guarantee a consistent sampling-rate is fulfilled. However, it’s still important
that we set the repeating time to be longer than the execution time. Not doing so
would cause one interrupt to be called whilst the previous is still running, thereafter
pushing the old interrupt to the stack, which after some time would lead to a stack
overflow and therefor corrupting the memory. Even though we use a hardware alarm
function which would not cause a stack overflow, it would however still malfunction
as the subsequent samples will be delayed, lowering the sample-rate to a unknown
value. See [14] for a complete hardware reference.
We’ve also used interrupts to handle the main input, the rotary encoder. The three
signals, SW, CL and DT, are grouped to one hardware interrupt. The interrupt
function deciphers which signals changed and runs the appropriate function accord-
ing to the menu system followed by requesting an LCD update by setting a flag.
Holding the button down allows the user to reset the device to initial settings.
Finally the main program starts by initialize the hardware and sets the corresponding
GPIO. It sets the proper interrupt mask to be ready for input and it redraws the
LCD. The main loop checks whether the LCD flag is set and if so it then redraws
the LCD.

23


4. Results

Figure 4.12: Block diagram of the firmware

The menu system is fairly concise and consist of a scroll-able subset of functions
using the rotary encoder. Depressing the button of the rotary encoder allows the
user to run or change a parameter of the currently displayed function, see figure
4.13.

24


4. Results

Figure 4.13: Block diagram of the menu system

4.2.3 Optimization
It shall be emphasized that the execution-time has been identified as the limiting
factor for the sample-rate. We could measure this using the built in hardware timers
of the pico, see figure 4.1. Training of the FIR also necessitate some extra time as
it needs to compute not only the feedback filter but also the remaining error. We’ve
settled on a sample-time of 35µs (approximately 28kHz) , which results in a decent
margin for consistent sample-rate.
When implementing the FAF algorithms we stumbled upon a big hurdle, creating a
performant FFT algorithm. Following the Cooley-Tukey approach it would appear
the Raspberry Pi Pico isn’t fast enough for the high enough sampling rates and so
we’ve therefore tried improving it in a multitude ways. Some of these improvements
are general across the Pico platform and is implemented for all algorithms, whilst

25


4. Results

Table 4.1: Execution time for sampling, echo cancellation and playback

execution time (µs)
normal operation 21

training echo cancellation 28

others are algorithm specific.
First of all, we changed the variable type used inside of the firmware. It is common
to use floats in this application and many libraries does so, for instance KISS FFT.
However, due to the lack of a dedicated floating point unit within the Pico, it
is exceptionally slow for this particular variable type. We eventually determined
through testing that 16 bit signed integers are instead the preferred variable type.
Secondly, we choose to implement the FFT in radix 4 mode instead of radix 2 which
is more common. Radix 2 and 4 refers to how many parts each iteration divides
the signal in and we saw a considerable performance increase whilst using radix
4. Depending on hardware and code compilation this isn’t necessarily the optimal
solution since the amount of iteration and therefore also the amount of function
calls is far greater in a radix 2 implementation compared to radix 4. Because each
function call results in a jump instruction and possible some push and pull on the
stack memory it could instead lead to severe slow downs.
Thirdly, we removed the functions input variables in order to prevent unnecessarily
pushing and pulling on the stack. This lead to a big performance increase for the
echo-cancellation algorithm from this small change.
Lastly, we overclocked the microcontroller to 250MHz from it’s default value of
125MHz. The Pico has generally been proven to be reliable even at these speeds,
however, since the FAF could not be implemented with an acceptable sample rate
even when overclocked, it was later returned to default for higher reliability and
lower power consumption.

26


4. Results

4.3 Measurements
For the first test, by directly measuring the device, we characterize the frequency
response of the device over four different input amplitudes, see figure 4.14. By
observing the graphs, we note a very flat response over most of the frequency spectra
with a gentle taper which indicates a slight low pass behavior. The low pass behavior
is believed to be inherent to the specific DAC that we have chosen since there is
mentioning of such behavior in the data-sheet. We can also observe that the gap
between lines diminish with increasing amplitude as we reach saturation.

10
2

10
3

10
4

-22

-20

-18

-16

-14

-12

-10

-8

Figure 4.14: Frequency response with direct electric stimuli of different amplitude

27


4. Results

After connecting vibrator and microphone to the device, we’ve then measured the
frequency response using three different bone vibrators, see figure 4.15. The response
curves are similar to those of the bone vibrator alone from their respective datasheet,
therefore confirming that the system does not substantially alter the sound charac-
teristics. We can also observe that the peak force being fairly high across the board,
which indicates a good power amplifier design.

10
2

10
3

10
4

40

50

60

70

80

90

100

110

120

(a) B71W

10
2

10
3

10
4

30

40

50

60

70

80

90

100

110

120

(b) B81

10
2

10
3

10
4

60

70

80

90

100

110

120

130

140

(c) B250

Figure 4.15: Force frequency response for various input amplitudes and bone
vibrators

28


4. Results

Aside from frequency response, we also intended to characterize distortion levels.
We measured the total harmonic distortion when feeding the device a sine wave
tone. Measurements are done with a bone conducting microphone at 70dBSPL, see
figure 4.16. The input level is measured between the microphone and the device
and is noted to have a very low distortion at less than 1%. The output is measured
directly without a bone vibrator connected and whilst we’ve observed that the device
is causing some distortion, it is still fairly low at about 3% adhering to IEC 60645-1
(less than 6%).

200 300 400 500 600 700 800 900

-140

-120

-100

-80

-60

-40

-20

(a) 125Hz

400 600 800 1000 1200 140016001800

-140

-120

-100

-80

-60

-40

-20

(b) 250Hz

Figure 4.16: Total harmonic distortion measured with two different input signals,
one 125Hz sine, the other 250Hz sine

29


4. Results

Finally we made noise measurements by leaving the input disconnected and whilst
measuring the output, both directly and whilst connected to a B250 bone vibrator,
see figure 4.17. We can establish that the noise is very low and inaudible in most
applications.

10
2

10
3

10
4

-105

-100

-95

-90

-85

-80

-75

-70

(a) direct measure

10
2

10
3

10
4

0

10

20

30

40

50

60

(b) Bone-vibrator measure

Figure 4.17: Noise characteristic without input stimuli measured directly or force
from bone-vibrator

30


5
Discussion

Whilst the software version implemented in Matlab is function complete and encom-
pass DAF, FAF and echo cancellation successfully, a standalone hardware unit can
still be preferable. The primary drawback of using a normal computer is that the
buffering of audio input causes an inherent delay and thus limits the practical range
for DAF. For this reason, it is of some interest to further develop the hardware unit
to encompass all of the same features within a single device.
Since we have established that it is interesting to have a working FAF implementa-
tion on hardware, we can ascertain a few options for future work. A conceptionally
simple solution is to change the digital platform either by using a faster microcon-
troller or by using something more niche and purpose built, like a dedicated digital
signal processor (DSP) or a field programmable gate array (FPGA). The current
software is only using one thread and because the Raspberry Pi Pico has two cores,
another option is to rewrite the code for multi threading which could yield a dou-
bling in performance. A third option is to lower the frequency criteria and thus
the requirement on sample-rate which would decrease computational load. Whilst
lowering the frequency criteria could come at a cost of audio quality, in accordance
with frequency response of the bone conducting vibrators, we can expect a decent
margin due to its poor high frequency behaviour.
Even though a multitude of tests performed on the PCB design seems to indicate
that the design has been successful, we have achieved low distortion values and a
very good spectral performance, the DAF works and can be adjusted in different
increments, from 0 to more than 500ms of delay, there is still a point of contention.
Even though the echo cancellation is also implemented, it has not yet been exten-
sively tested due to it will likely require a very complicated setup to replicate the
intended use.
Whilst radix 4 was determined to be a better option than radix 2 when implementing
FFT, further testing can be done to find the optimal radix for the given hardware.
Although, due to the considerable amount of new code needed to evaluate any radix
over 4 whilst also expected to have diminishing improvements, instead, investigated
the performance of non-recursive FFT is likely of higher interest.
We think that the frequency shift is a promising solution to achieve FAF at a reduced
computational cost as it achieves the core function of altering the sound making it
brighter or darker in tone without resampling. It is however not yet proven if it
has the same effect in real patients as the pitch shifting do. Even further deviation
could be made from the standard pitch shifting, for instance chorus effect is common
frequency type effect which alters between pitching audio up and down, something
which may be similar enough to the proven methods that it works too.

31


5. Discussion

During testing, the power consumption was noted to be poor by the fact that bat-
teries had to be changed more frequently than desired. It stands to reason that the
power amplifier circuit is likely poor performing in terms of power efficiency. Whilst
not confirmed, the Darlington pair configuration is suspected to worsen the already
poor efficiency of the class AB amplifier circuit. Due to the strong need for it early
in the hardware prototype phase, the power circuit design was primarily guided by
our familiarity to its implementation and component availability. For future im-
provement, we believe that a class D amplifier circuit should be investigated due to
its superior power efficiency.
Another point for reduction in power consumption is voltage regulation and battery
management. A 9V battery solution was chosen due to its convenience and due
to the preference for a replaceable battery which can lead to continuous operation.
However, since both the LCD interface and processing parts are powered by 5V, a
linear regulator is used. It would be preferable to use a switched DC to DC power
supply such as a buck converter for voltage regulation since its efficiency is generally
higher.
Lastly, LCD interface is a likely contestant for optimisation in terms of power effi-
ciency. We’ve already omitted LCD backlight for the PCB design due to its large
contribution to power consumption. The LCD used is an old and widely adopted
model but for future work a more modern display with less power consumption could
be investigated.

32


6
Conclusion

The main goal of this project was to provide a means of introducing delayed auditory
feedback, an effect known to induce fluency in people who stutter. A MATLAB®

script and a standalone hardware unit have been produced, with the addition of
frequency altered feedback and echo-cancelling features. The device has been made
to function with bone conduction microphone and transducers. While some elec-
tronic fluency devices are commercially available, further research can be carried
out about the use of bone conduction devices in the field of stutter inhibition. Fu-
ture enhancements include refinement of the pitch-shifting feature, miniaturisation,
energy consumption.

33


6. Conclusion

34


Bibliography

[1] O. Bloodstein, N. B. Ratner, and S. B. Brundage, A Handbook
on Stuttering. Plural publishing Inc, 2021, ch. 2. [Online]. Avail-
able: https://www.google.com/books/edition/A_Handbook_on_Stuttering_
Seventh_Edition/Abw0EAAAQBAJ

[2] D. Hudock and J. Kalinowski, “Stuttering inhibition via altered auditory
feedback during scripted telephone conversations,” Int J Lang Commun
Disord, 2013, pMID: 24372890. [Online]. Available: https://doi.org/10.1111/
1460-6984.12053

[3] A.-C. Persson, B. Håkansson, M. C. Mechanda, W. B. Hodgetts, K.-J. F.
Jansson, M. Eeg-Olofsson, and S. Reinfeldt, “A novel method for objective
in-situ measurement of audibility in bone conduction hearing devices – a
pilot study using a skin drive bcd,” International Journal of Audiology,
vol. 62, no. 4, pp. 357–361, 2023, pMID: 35238713. [Online]. Available:
https://doi.org/10.1080/14992027.2022.2041739

[4] S. Reinfeldt, M. Eeg-Olofsson, K.-J. Fredén Jansson, A.-C. Persson, and
B. Håkansson, “Long-term follow-up and review of the bone conduction
implant,” Hearing Research, vol. 421, p. 108503, 2022, acoustic Implant
Technology. [Online]. Available: https://doi.org/10.1016/j.heares.2022.108503

[5] M. McBride, P. Tran, T. Letowski, and R. Patrick, “The effect of bone
conduction microphone locations on speech intelligibility and sound quality,”
Applied Ergonomics, vol. 42, no. 3, pp. 495–502, 2011. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0003687010001523

[6] S. Chandrasekaran. (2014) Implementing circular buffer in c. Ac-
cessed : 2024. [Online]. Available: https://embedjournal.com/
implementing-circular-buffer-embedded-c/

[7] A. W. M. van den Enden and N. A. M. Verhoeckx, Discrete-time signal
processing : an introduction. Prentice Hall, 1989, ch. The DFT and
the FFT, pp. 141-148. [Online]. Available: https://archive.org/details/
discretetimesign0000ende

[8] W. A. Sethares. A phase vocoder in matlab. Accessed : 2024. [Online].
Available: https://sethares.engr.wisc.edu/vocoders/phasevocoder.html

[9] F. Grondin. (2009) Guitar pitch shifter. Ch. 2 Pitch shifting. [Online].
Available: https://www.guitarpitchshifter.com/pitchshifting.html

[10] A. W. M. van den Enden and N. A. M. Verhoeckx, Discrete-time
signal processing : an introduction. Prentice Hall, 1989, ch. Multirate
systems, pp. 235-251. [Online]. Available: https://archive.org/details/
discretetimesign0000ende

35

https://www.google.com/books/edition/A_Handbook_on_Stuttering_Seventh_Edition/Abw0EAAAQBAJ
https://www.google.com/books/edition/A_Handbook_on_Stuttering_Seventh_Edition/Abw0EAAAQBAJ
https://doi.org/10.1111/1460-6984.12053
https://doi.org/10.1111/1460-6984.12053
https://doi.org/10.1080/14992027.2022.2041739
https://doi.org/10.1016/j.heares.2022.108503
https://www.sciencedirect.com/science/article/pii/S0003687010001523
https://embedjournal.com/implementing-circular-buffer-embedded-c/
https://embedjournal.com/implementing-circular-buffer-embedded-c/
https://archive.org/details/discretetimesign0000ende
https://archive.org/details/discretetimesign0000ende
https://sethares.engr.wisc.edu/vocoders/phasevocoder.html
https://www.guitarpitchshifter.com/pitchshifting.html
https://archive.org/details/discretetimesign0000ende
https://archive.org/details/discretetimesign0000ende


Bibliography

[11] F. Grondin. (2009) Guitar pitch shifter. Ch. 3.1 - 3.2. [Online]. Available:
https://www.guitarpitchshifter.com/algorithm.html

[12] ——. (2009) Guitar pitch shifter. Ch. 3.3 - 3.4. [Online]. Available:
https://www.guitarpitchshifter.com/algorithm.html

[13] B. Widrow, J. Glover, J. McCool, J. Kaunitz, C. Williams, R. Hearn, J. Zeidler,
J. Eugene Dong, and R. Goodlin, “Adaptive noise cancelling: Principles and
applications,” Proceedings of the IEEE, vol. 63, no. 12, pp. 1692–1716, 1975.

[14] Hardware apis. Raspberry Pi Ltd. Accessed : 2024. [Online]. Available:
https://www.raspberrypi.com/documentation/pico-sdk/hardware.html

cleardoublepage appendix

I

https://www.guitarpitchshifter.com/algorithm.html
https://www.guitarpitchshifter.com/algorithm.html
https://www.raspberrypi.com/documentation/pico-sdk/hardware.html


Bibliography

II


7
Appendix

Operation directions

Startup
Whilst the device is off, make sure the microphone, speaker and battery is plugged
in. Then, the device can be turned on using the red switch on the front side of the
device. Be mindful there’s no loud feedback which can cause discomfort, if present,
immediately turn the device off and turn to the section about setting input and
output gain.

Echo cancellation
After startup, whenever the microphone or speaker are re-positioned and when gain
is changed, the FIR filter needs to be retrained. If FIR training is not performed
after startup, the device will operate without echo cancellation. Using the rotary
dial on the front of the device, navigate the menu until the LCD display says "Echo
suppression", then press on the rotary dial to begin FIR training. During training,
the wearer will hear a white noise. The FIR training will continue until the rotary
dial is pressed again. Adequate time for training is expected to be roughly 20s but
will vary by changing training parameters within the firmware. If the trained FIR
doesn’t yield a desirable result, consider changing the training parameters within
the firmware.

DAF delay
The DAF delay is running by default. To change the delay value, navigate first to
"DAF" using the rotary dial, then, press down on the rotary dial. The rotary dial
is now used to control the delay time and is immediately updated. In order to turn
DAF off, change the delay to 0. When the desired delay value is achieved, pressing
the rotary dial will return the rotary dial to control the menu.

FAF pitch change
Since FAF is not implemented it will appear as a placeholder menu item. Pressing
the rotary dial whilst navigated to "FAF" will let you adjust the pitch change value
however this is also a placeholder menu item and won’t affect the operation of the
device. Press the rotary dial again to return the rotary dial to control the menu.

III


7. Appendix

Battery
Low battery voltage can lead to reduced volume output and distorted sound. In
order to diagnose if the battery is at fault, navigate to "Battery" using the rotary
dial. Then a voltage measurement can be done by pressing the rotary dial once. For
ideal performance battery voltage should be kept above 8V.

LCD contrast
If the LCD is hard to read then the contrast should be adjusted. This is done by
first navigating to "LCD" using the rotary dial. Press the dial once and it will now
be used to adjust the contrast value. Press the rotary dial again to return the rotary
dial to control the menu.

Setting input and output gain
Two blue potentiometers are located on the back of the device. The two controls can
be adjusted using a flat blade screwdriver. In order to calibrate these controls, first
the gain should be set to its minimum. This is achieved by turning the controls fully
anti-clockwise. The input gain should then be increased by turning clockwise until
slight activity is shown by the clipping LED during normal speech. Too high input
gain will result in severe clipping and deteriorate the sound quality. The output
gain can be set as high as possible accordingly with the comfort of the user. Now
the input and output gain is calibrated.

IV


DEPARTMENT OF ELECTRICAL ENGINEERING
CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden
www.chalmers.se

www.chalmers.se

	List of Acronyms
	Introduction
	Background
	Aim
	Limitations
	Specification of the issue being investigated

	Theory
	Digital delay line
	Fast Fourier Transform
	Pitch shifting
	Resampling
	Time framing

	Echo cancellation

	Methodology
	Results
	Software unit
	Pitch-shifting
	Frequency shifting

	Hardware unit
	Circuit design
	Firmware
	Optimization

	Measurements

	Discussion
	Conclusion
	Bibliography
	Appendix