Modelling of auditory salience by use of acoustic features, deep neural networks and brain signal analysis
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Auditory salience is the property by which a sound stands out from its surrounding,
a phenomenon that people’s hearing system is dealing with at almost every moment
of the day. It is the reason our attention alters from one sound source to another
sound source, or instrument to instrument when listening to a music piece. The
term salience is widely used in the field of perception and cognition to describe
any feature of a stimulus that stands out from the rest for a variety of reasons. It
can be influenced by emotional, motivational, or cognitive elements and it is not
always linked to physical characteristics of the stimuli, like intensity, temporal- or
frequency contrast. Salient sound events might be represented, for example, by
your phone’s message rings, a melody played on the piano or a loud passing car
in quiet environment. In neuroscience the attention mechanism towards salience is
split up in two parts. One is stimulus-driven attention and it is referred as bottom
up attention whereas cognitive-driven attention is known as top-down attention.
Bottom-up attention in auditory salience is studied extensively, however top-down
is less.
This thesis work explores a relative new approach to investigate the bottom-up and
top-down attention towards auditory salience, by using deep neural network techniques
and EEG brain signals analysis. Deep neural networks have been related to
the human perceptual visual salience, early layers of the deep neural networks resemblances
with physical features of an image (bottom-up attention), whereas the deeper
latter layers with higher-level semantic properties (top-down attention). Therefore
this project adopts a similar approach, focusing instead on auditory salience.
To assess auditory salience a computational model was built based on three frameworks.
One frameworks deals with extracting acoustic features from song audio
using the Python Librosa library. The second framework is based on the existing
pre-trained convolutional neural network VGGish to resemble the deep neural
network. And lastly, activity of EEG brain signals was analysed and represented
through compact descriptors. We then computed and analysed the correlation between
each pair of the three framework’s outputs.
All three frameworks utilized data from the public open OpenMIIR database, which
contains 12 song stimuli, 10 participants and EEG recordings while those songs were
played to the subjects.
The result of the correlations between the three frameworks showed various correlations
values in a patterned manner for the different network layers, which is in line with prior
expectations. However, the overall correlation values are suspiciously high, which should
therefore be interpreted loosely. Nevertheless, a preliminary computational model was
developed, which with appropriate modifications could be used for further studies.
Beskrivning
Ämne/nyckelord
auditory, salience, music, perception, acoustic, features, bottom-up, top down, attention, CNN, EEG.