Modelling of auditory salience by use of acoustic features, deep neural networks and brain signal analysis

Publicerad

Författare

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Auditory salience is the property by which a sound stands out from its surrounding, a phenomenon that people’s hearing system is dealing with at almost every moment of the day. It is the reason our attention alters from one sound source to another sound source, or instrument to instrument when listening to a music piece. The term salience is widely used in the field of perception and cognition to describe any feature of a stimulus that stands out from the rest for a variety of reasons. It can be influenced by emotional, motivational, or cognitive elements and it is not always linked to physical characteristics of the stimuli, like intensity, temporal- or frequency contrast. Salient sound events might be represented, for example, by your phone’s message rings, a melody played on the piano or a loud passing car in quiet environment. In neuroscience the attention mechanism towards salience is split up in two parts. One is stimulus-driven attention and it is referred as bottom up attention whereas cognitive-driven attention is known as top-down attention. Bottom-up attention in auditory salience is studied extensively, however top-down is less. This thesis work explores a relative new approach to investigate the bottom-up and top-down attention towards auditory salience, by using deep neural network techniques and EEG brain signals analysis. Deep neural networks have been related to the human perceptual visual salience, early layers of the deep neural networks resemblances with physical features of an image (bottom-up attention), whereas the deeper latter layers with higher-level semantic properties (top-down attention). Therefore this project adopts a similar approach, focusing instead on auditory salience. To assess auditory salience a computational model was built based on three frameworks. One frameworks deals with extracting acoustic features from song audio using the Python Librosa library. The second framework is based on the existing pre-trained convolutional neural network VGGish to resemble the deep neural network. And lastly, activity of EEG brain signals was analysed and represented through compact descriptors. We then computed and analysed the correlation between each pair of the three framework’s outputs. All three frameworks utilized data from the public open OpenMIIR database, which contains 12 song stimuli, 10 participants and EEG recordings while those songs were played to the subjects. The result of the correlations between the three frameworks showed various correlations values in a patterned manner for the different network layers, which is in line with prior expectations. However, the overall correlation values are suspiciously high, which should therefore be interpreted loosely. Nevertheless, a preliminary computational model was developed, which with appropriate modifications could be used for further studies.

Beskrivning

Ämne/nyckelord

auditory, salience, music, perception, acoustic, features, bottom-up, top down, attention, CNN, EEG.

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced