F DEPARTMENT OF COMMUNICATION AND LEARNING IN SCIENCE CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2020 www.chalmers.se Assessing Cognitive Workload Between Different Tasks Using EEG to develop and examine a method to measure variation of cognitive workload between different levels of difficulty Master’s thesis in Biomedical Engineering1 and Learning and Leadership2 FANNY APELGREN1 & IDA PETTERSSON2 http://www.chalmers.se/ ii Assessing Cognitive Workload Between Different Tasks: Using EEG to develop and examine a method to measure variation of cognitive workload between different levels of difficulty Fanny Apelgren and Ida Pettersson © FANNY APELGREN AND IDA PETTERSSON, 2020 Supervisor: Eva Lendaro, Department of Electrical Engineering, Chalmers University of Technology Supervisor: Sheila Galt, Department of Communication and Learning in Science, Chalmers University of Technology Examiner: Samuel Bengmark, Department of Mathemathical Science, Chalmers University of Technology Master’s Thesis 2020 Communication and Learning in Science Chalmers University of Technology SE-412 96 Gothenburg Telephone +46 31 772 1000 Cover: The picture is taken from our own EEG data iii Abstract Assessing cognitive workload is an important tool, for example when evaluating different techniques for improving prostheses. Here, we have developed a method to compare how the cognitive workload differs if a prosthesis has sensory feedback or not. We have used electroencephalogram (EEG) and performed a pilot study on ten intact limb subjects. An easy and hard level were constructed by changing the weight of a force sensitive cube that were to be lifted back and forth over a barrier while counting sounds in an auditory oddball task. A third level consisted of only the oddball task. The difference in difficulty between the different levels were verified by measuring performance, and perceived effort. On a group level, these measurements all indicated that the no task condition was easiest, and the hard task condition was most demanding. Measurements of the number of lifts for different repetitions of the easy and hard conditions also showed signs of a learning effect during the performance of the easy task. The cognitive workload was measured by using the event- related potentials (ERP) technique and frequency bands. The results showed that the ERP component P3 was the only one that could indicate a significant difference between all three levels. A comprised measurement (consisting of the sum of ERP components N1, P2, P3, and LPP) and the alpha frequency bands (low-, high-, and broadband alpha) also showed a significant effect between some of the conditions. Keywords: Cognitive workload, Mental load, Learning, Electroencephalogram, EEG, Event-Related Potential, ERP, P3, Grasping task, Oddball Task, Dual-Task Paradigm iv Acknowledgements The writing of this thesis has, as most other projects, been a rocky road. We have had moments on top where everything has gone our way and we felt like real scientist ready for our big breakthrough, but we have also stumbled around in dark valleys, unsure of what to do next. Nevertheless, we have finally made it, and here is our finished work! But we would have never made it without the support we have gotten from people around us. We are very grateful for the Biomechatronics and Neurorehabilitation lab under Max Ortiz Catalan for the chance to do this, and for lending us both their time and equipment. Thank you everyone for meeting us and sharing your knowledge and experience, and for changing your plans to let us perform our measurements at the lab. Especially we would like to thank Shahrzad Damercheli who patiently helped us during all our measurements and training. And, of course, a huge thank you to our supervisor Eva Lendaro who guided us along the way, believed in us, and taught us how to think like scientists. We are also very grateful for our never-ending source of support, feedback, ideas and hot chocolate (at least this would have been never-ending if hot chocolate could have been sent online): Sheila Galt. Thank you for everything! We also had some other meeting to help us with our literature review and statistical analysis. So, thank you to Yommine Hjalmarsson and Serik Sagitov for sharing your expertise. This project would also not have been possible without all the wonderful people who volunteered to spend hours with us while we poked them in the head with gel-filled syringes and asked them to move a cube hundreds of times. Thank you to everyone who somehow took part in our experiment, helped us practice, contributed your ideas in our mini-pilot study or offered to help, but were stopped by the arrival of the Covid-19 virus. We are also very thankful for our friends and family who have supported us through this work, during panic and euphoria. Everyone who has taken time to listen when we tried to explain our work, or helped us when it was time to think about something else by sharing a hug, a lunch, or an ice cream. Lastly, we would like to thank the two people who have been most important for this work: each other! As mentioned, this work has taken us over high mountains and through dark valleys and deep pits, with many treacherous rocks to stumble on. Sometimes, working together has been challenging, but it was by doing it together that we got out of those pit holes. We would never have reached as far alone. And, if we wouldn’t have done it together, we would have no one to share the amazing views from the top with! https://www.facebook.com/profile.php?id=100002143578515&ref=br_rs v Table of contents Abstract ................................................................................................................................ iii Keywords: ......................................................................................................................... iii Acknowledgements .............................................................................................................. iv Abbreviations ...................................................................................................................... viii List of Figures ....................................................................................................................... ix List of Tables ......................................................................................................................... x 1 Introduction .....................................................................................................................1 1.1 Background ..............................................................................................................1 1.2 Brief description of this work .....................................................................................2 1.3 Aims and limitations ..................................................................................................3 1.4 Research questions ..................................................................................................4 1.5 Contribution ..............................................................................................................4 1.6 Thesis outline ...........................................................................................................4 2 Theory ............................................................................................................................6 2.1 Cognitive workload ...................................................................................................6 2.2 Electroencephalogram (EEG) ...................................................................................7 2.2.1 Referencing ........................................................................................................8 2.2.2 Electrode positioning ..........................................................................................8 2.2.3 Artifacts and Noise .............................................................................................9 2.3 Event-Related potentials ..........................................................................................9 2.3.1 Basic concept ................................................................................................... 10 2.3.2 ERP Components ............................................................................................ 10 2.3.2.1 N1 .............................................................................................................. 12 2.3.2.2 N2 .............................................................................................................. 12 2.3.2.3 P2 .............................................................................................................. 12 2.3.2.4 P3 .............................................................................................................. 13 2.3.2.5 LPP............................................................................................................ 14 2.3.3 Frequency bands ............................................................................................. 15 2.3.3.1 Delta .......................................................................................................... 16 2.3.3.2 Theta ......................................................................................................... 16 2.3.3.3 Alpha ......................................................................................................... 16 2.3.3.4 Beta ........................................................................................................... 17 2.3.3.5 Gamma ...................................................................................................... 17 vi 2.3.3.6 Theta/Alpha: .............................................................................................. 17 2.3.4 Differences between individuals ....................................................................... 18 3 Method ......................................................................................................................... 19 3.1 Grasping task ......................................................................................................... 19 3.2 Dual-task paradigms and oddball tasks .................................................................. 20 3.2.1 The choice of stimuli ........................................................................................ 21 3.2.2 Secondary task: Counting, reacting or ignoring? ............................................. 22 3.2.3 Stimuli timing .................................................................................................... 22 3.3 Experimental procedure ......................................................................................... 23 3.3.1 Participants ...................................................................................................... 23 3.3.2 Tasks ............................................................................................................... 24 3.3.3 Self-assessment using NASA-RTLX ................................................................ 24 3.3.4 Procedure ........................................................................................................ 25 3.4 Signal processing ................................................................................................... 27 3.4.1 Offline referencing ............................................................................................ 27 3.4.2 Amplification ..................................................................................................... 28 3.4.3 Filtering ............................................................................................................ 28 3.4.4 Epoching .......................................................................................................... 28 3.4.5 Artifact management ........................................................................................ 29 3.4.5.1 Artifact detection and rejection .................................................................. 29 3.4.5.2 Artifact correction ...................................................................................... 31 3.4.5.3 The choice of whether to reject or correct artifacts .................................... 33 3.4.6 Averaging ......................................................................................................... 34 3.5 Measurements to indicated cognitive workload ...................................................... 35 3.5.1 Measuring amplitudes of averaged ERPs ........................................................ 35 3.5.1.1 Choosing a method to measure amplitude ................................................ 35 3.5.1.2 The choice of which ERP components to examine ................................... 36 3.5.1.3 The choice of latency window and electrode sites ..................................... 36 3.5.1.4 A compiled measurement for ERP ............................................................ 37 3.5.2 Measuring frequency bands ............................................................................. 37 3.5.2.1 The choice of which frequency bands to examine ..................................... 37 3.5.2.2 Choosing a method ................................................................................... 38 3.5.2.3 The choice of electrode sites ..................................................................... 38 3.6 Statistical analysis .................................................................................................. 38 3.7 Design of force sensitive cube ................................................................................ 39 vii 3.7.1 Design requirements ........................................................................................ 39 3.7.2 Design process ................................................................................................ 40 4 Results ......................................................................................................................... 43 4.1 Notes on the execution of the experiment .............................................................. 43 4.2 Performance ........................................................................................................... 44 4.3 Perceived effort ...................................................................................................... 46 4.4 ERP components .................................................................................................... 47 4.5 Frequency bands .................................................................................................... 52 5 Discussion .................................................................................................................... 56 5.1 Experiment execution ............................................................................................. 56 5.2 Performance ........................................................................................................... 57 5.2.1 Comparison of conditions ................................................................................ 57 5.2.2 Assessment of a learning process within each condition ................................. 58 5.3 Perceived difficulty of the different conditions ......................................................... 58 5.4 ERP components .................................................................................................... 58 5.5 Frequency bands .................................................................................................... 59 5.6 Differences between individuals ............................................................................. 60 6 Conclusions .................................................................................................................. 62 7 Future work .................................................................................................................. 64 7.1 Experiment with anesthesia .................................................................................... 64 7.2 Possible improvements of the cube ........................................................................ 64 7.3 Examination of the usefulness of EEG to evaluate learning processes .................. 64 Appendix A: Experiment protocol ...........................................................................................I Appendix B: Earlier versions of the force sensitive cube ...................................................... V Appendix C: Informed consent .......................................................................................... VIII Appendix D: Artifact management details .......................................................................... XIV Appendix E: Two-way ANOVA analysis............................................................................. XVI Appendix F: Electrode clusters .......................................................................................... XIX viii Abbreviations ABBREVIATION MEANING SHORT EXPLANATION EEG Electroencephalogram A technique to record electrical signals arising from brain activity EMG Electromyography A technique to record electrical signals arising from skeletal muscle activity EOG Electrooculography A technique to record electrical signals arising from eye activity ERP Event-Related Potential Small changes in recorded EEG as response to an event HEOG Horizontal Electrooculography Electrodes placed at the side of each eye to measure horizontal eye movements ICA Independent Component Analysis A technique to separate different components of a signal NASA-RTLX NASA Raw Task Load Index A simplified version of a self-assessment questionnaire developed by NASA VEOG Vertical Electrooculography Electrodes placed under and above one eye to measure vertical eye movements and blinks PSD Power Spectral Density A signal’s power content versus frequency ix List of Figures Figure 1. The brain and its four different lobes ......................................................................7 Figure 2. Electrode positioning according to the normal and extended 10-20 system ...........9 Figure 3. An ERP waveform where each peak is named after the convention used in this thesis ................................................................................................................................... 11 Figure 4. A sketch of the latency ranges ERP components N1, N2, and P2 ....................... 13 Figure 5. A sketch of the latency ranges ERP components P3 and LPP ............................. 15 Figure 6. The experimental setup for the grasping task, with the force sensitive cube ........ 20 Figure 7. Person fitted with EEG cap with 128 electrodes ................................................... 24 Figure 8. EEG signals with eye blinks and horizontal eye artifacts ...................................... 30 Figure 9. Scalp topography showing blinks and horizontal eye movements ........................ 33 Figure 10. Final version of the force sensitive cube ............................................................. 40 Figure 11. Forces measured by the cube ............................................................................ 42 Figure 12. The performance in the grasping task as measured as number of lifts per minute ............................................................................................................................................ 44 Figure 13. The success rate as a function of the number of lifts per minute ........................ 45 Figure 14. The performance of the subjects as measured by the accuracy of the oddball task ...................................................................................................................................... 46 Figure 15. Task load index from the self-assessment questionnaire NASA-RTLX .............. 47 Figure 16. The ERP waveform averaged over all subjects, conditions and channels .......... 48 Figure 17 Scalp distribution showing the amplitudes of the ERP components .................... 49 Figure 18. The grand average over all subjects as shown in the different clusters .............. 50 Figure 19 The amplitude of each ERP component and condition in the small clusters ....... 51 Figure 20. The amplitude of each ERP component and condition in the large clusters ....... 52 Figure 21. Scalp distributions of the spectral power density for frequency bands ............... 53 Figure 22. The absolute spectral amplitude of each frequency band and condition in the small clusters ...................................................................................................................... 54 Figure 23. The absolute spectral power amplitude of each frequency band and condition in the large clusters ................................................................................................................. 55 Figure 24. Version 1 of the force sensitive cube ................................................................... V Figure 25. Version 2 of the force sensitive cube .................................................................. VI Figure 26. Curve showing the calibration of the force sensitive cube ................................ VIII x List of Tables Table 1. EEG frequency bands ............................................................................................ 16 Table 2. An illustration of the relationship between different levels here and in related studies ................................................................................................................................. 19 Table 3. Variables measured by the cube ........................................................................... 42 Table 4. Summary of the data that was gathered for each subject, excluding EEG data .... 43 Table 5. Components that were removed following ICA analysis ...................................... XIV Table 6. The number of epochs that were accepted after artifact rejection ........................ XV Table 7. Description of the construction of each of the small and large electrode clusters for measuring the ERP components. ...................................................................................... XIX Table 8. Description of the construction of each of the small and large electrode clusters chosen for measuring the frequency bands ....................................................................... XIX 1 1 Introduction Here we introduce this thesis work by presenting the background for why this research is needed. After that we give a short description of the method of this work together with our aims and limitation. We also present our research questions and how we contribute to the research field. Lastly, we give you as a reader a guide for the structure of this report. 1.1 Background The loss of a limb would imply a major change in almost any lifestyle, and there are many dedicated scientists, engineers, doctors, therapists etc. around the world working to make this change as convenient as possible. One possibility is to use a prosthesis as a substitute for the lost limb. Most often these prostheses are strapped on with a socket and electrodes are attached to the remaining part of the limb [1]. The socket often chafe the skin and imply a lot of discomfort for the person wearing it [2]. These kind of prostheses are also unreliable and patients therefore often choose not to use them [1]. In 1990 the world’s first osseointegrated prosthesis was implemented in Gothenburg, Sweden [3]. This means that the prosthesis is integrated in the bone of the amputee using a titanium rod. Apart from being a more secure and comfortable way of attaching the prosthesis than the conventional socket, this solution also opens up for the possibility to connect it to the muscles and nerves inside the remaining part of the limb. 29 years after the first osseointegrated prosthesis, in 2019, a Swedish man was the first in the world of getting an osseointegrated hand prosthesis with a neuromuscular interface [4], a so called e-OPRA [1]. This medical and technical achievement has been made possible by the collaboration between Chalmers Biomechatronics and Neurorehabilitation Laboratory (BNL), Centre for Advanced Reconstruction of Extremities at Sahlgrenska University Hospital and the company Integrum AB as part of their project “Natural Control of Artificial Limb Through an Osseointegrated Implant” [5]. By using the neuromuscular interface, the electrodes can pick up the signals from the muscles and nerves in the remaining part of the limb. That way, when the amputee execute the movement associated to move the hand, the hand moves [6]. By introducing sensory feedback in the prosthesis, which reflects when pressure that is applied to the surface of the hand of the prostheses, the nerves in the limb are stimulated. This way signals can also be sent from the hand to the brain and the brain can react to the stimuli given by the sensory feedback. Before the implementation of the neuromuscular interface, prosthesis users had to rely only on visual feedback and could not feel how hard they pressed an object or even if they touched it at all [6]. With the addition of sensory feedback that gives the carrier a significantly better experience [4], Chalmers BNL hopes to further facilitate and improve the quality of life for people with amputated limbs or motor impairments. Adding sensory feedback to a prosthesis intuitively seems to facilitate performing different tasks, like for example picking up a fragile object such as an egg. However, this needs to be investigated formally. One amputee who have received a prosthesis with sensory feedback have tried lifting a fragile object and they have broken or dropped the object less frequently with sensory feedback compared to when that feature is disconnected [7]. Also, it is possible that sensory feedback increase performance, but perhaps the effort is also increased. For this reason, there is a need for a quantitative and objective measurement of 2 the mental effort, or cognitive workload, for performing a task, such as lifting an egg, with and without sensory feedback. Such a method could also be used to evaluate different stimulation paradigms, i.e. different ways to stimulate the nerves. The first problem that arises when designing a method to measure cognitive workload is that there are currently only four people in Sweden with an implemented e-OPRA system [8]. This makes the sample size insufficient for reaching meaningful conclusions. Therefore, the conditions of lifting a fragile object with an e-OPRA prosthesis needs to be replicated with intact limb subjects as a complement. One way of doing this is to measure the cognitive workload for lifting the fragile object with your hand and compare this to when the sensory feedback is removed by using anesthesia on the hand and digits. The second problem is that research that involve a physical intervention needs to be approved by the Swedish Ethical Review Authority [9]. However, this application normally takes 60 days to be approved [10] which makes this approach unsuited for the time limit of this project. When measuring cognitive workload, there are several different options when choosing a method. These include pupil size measurements (e.g. [11]–[13]), heart rate variability (e.g. [14]–[16]) and breathing frequency (e.g. [16]). In this work, we will use two of the most common techniques to measure cognitive workload: electroencephalogy (EEG) using event-related potentials (ERP, e.g. [17]–[19]) and a self-assessment tool called NASA- RTLX (a task load index developed by the National Aeronautics and Space Administration [20], e.g. [15], [21], [22]) to adapt a method and test if that can be used to assess cognitive workload in this kind of task. In 2019 a small study was made as part of Linn Berntssons master thesis at BNL. Her method was designed for testing amputees and were run with one amputee and compared three different conditions: no task, with sensory feedback and without sensory feedback. In the two last one the subject was instructed to lift a force sensitive cube back and forth over a small barrier. The first two conditions were also run with two intact limb subjects. The cognitive workload was evaluated using a combination of ERP measurements and the NASA-RTLX self-assessment tool. The results showed promise, but the method was tested with too few subjects to be able to draw any real conclusions. [23] 1.2 Brief description of this work This report is part of a master’s thesis at Chalmers University of Technology where both writers, Fanny Apelgren and Ida Pettersson, have studied Engineering Physics. We then moved on to a master’s in Biomedical Engineering and Learning and Leadership, respectively. The thesis was written at the Department of Communication and Learning in Science (CLS) and the project was executed at Chalmers Biomechatronics and Neurorehabilitation Laboratory (BNL) at the Department of Electrical Engineering, under the Associate Professor Dr. Max Ortiz Catalán. The project has been supervised by Eva Lendaro (BNL) and Sheila Galt (CLS). In this study, we measured event-related potentials (ERP) using EEG equipment for three different conditions, each recorded in three blocks. The participants performed a lifting task by moving a force sensitive cube back and forth over a small barrier at the same time as they performed a secondary task of listening to and counting sounds, known as an oddball 3 task. The cube lit up when it was pressed too hard and the weight of the cube could be changed to vary the difficulty of the task between easy and hard. The participants were told that if the cube lights up, this indicates that the cube has been pressed to hard and “broke”, and that they should try to move the cube as many times as possible without “breaking” it. The third condition consists of merely the secondary task, i.e. counting sounds. This is called the no task condition. The EEG data were studied to examine if differences in event- related potentials components and the frequency bands could be seen. During each condition the number of times the cube was lifted over the barrier and the number of times that it was “broken” were counted. The participant also reported the number of sounds that they counted in each block and filled out a self-assessment form, called the NASA-RTLX [20], to report the effort of each condition. The number of lifts and “breaks” per minute together with the difference between presented sound and counted sounds were studied as an indication of the subject’s performance, the result of the NASA- RTLX was used to measure perceived effort and the EEG data served as a quantitative measurement of the cognitive workload. The different types of data and the experiment procedure that are used in this study were gathered from previous work on cognitive workload and recommendations from experienced scientists of the field. The force sensitive cube was designed with a few other similar models as an inspiration but was adapted for the criteria for this study. It has also been designed with the possibility for further development in mind, to enable use in other future studies. 1.3 Aims and limitations The present study aims to develop and examine a method that can be used to measure the difference in cognitive workload with and without sensory feedback. The method by Linn Berntsson [23] have served as an inspiration, but we have mainly looked at other studies that have been tested with more subjects to develop our own improved methodology that is also adapted for intact limb subjects. Since anesthesia cannot be used without an ethical approval, we will test the method with other conditions to simulate the difference of with and without sensory feedback. Therefore, different levels of difficulty will be used as a substitute. The aim of this is to investigate how the variance in cognitive workload between the different levels of difficulty can be measured. We have also looked for signs of a learning process and the method has been tested using ten intact limb subjects. If the method can detect differences between different levels of difficulty, it could also be expected to be able to measure the difference between with and without sensory feedback, since these conditions are also believed to be different in difficulty. Therefore, the goal is for this study to serve as a pilot test in preparation for a future study where this methodology, or an adaption of it, will be used to investigate the difference in cognitive workload of performing a lifting task with and without sensory feedback, by using anesthesia. With the limited timeframe of this work, we have done our best to process and analyze all the EEG data. However, there remains other ways to examine the data that has been recorded, this will be discussed further in the section about future work. Among other things, we will not examine the EEG data or the results from NASA-RTLX for different parts of each condition. Signs of a possible learning process within the conditions will only be examined by looking at the factors measuring performance. 4 1.4 Research questions 1) Will the subjects experience the expected difference in difficulty between the different conditions, on group and/or individual level, as indicated by… a) …the perceived effort, given by the scores on the NASA-RTLX? b) …the performance, given by number of lifts, success rate and accuracy of the oddball task? 2) Can the proposed method be used to measure differences in cognitive workload, on group and/or individual level as indicated by… a) …event-related potential (ERP) components? And if so, which components? b) …frequency bands? And if so, which frequency bands? 3) Which latency windows (for ERP components only) and electrode sites should be used to examine the differences in cognitive workload with ERP components and frequency bands? 4) Can a learning effect be observed during each condition by comparing the performance for each of the three blocks? 1.5 Contribution The aim of this study is part of a larger goal, that ultimately comes down to improving the quality of life of amputees. By preparing for the future study with anaesthesia, this work is a step to provide quantitative evidence that adding sensory feedback in artificial limbs does lower the cognitive workload. This knowledge in turn will provide an incentive for the further development of prostheses. In addition to providing a method and a pilot study for this future study, this method might as mentioned also be used to evaluate different aspects of the prosthesis design, for example different stimulation paradigms. This work has also contributed to the total knowledge about ERP experiments and discovered several conflicting opinions about the best procedures of the field. We have also discovered the lack of, and importance of, motivation and reasoning to explain why certain methods were chosen. Besides the results of this thesis the collected data could also be analysed further, and more aspects could be examined using for example ANOVA statistical analysis, which seems to be the most common procedure in ERP studies (e.g. [17], [18], [24]). Studies using this or similar methods might also examine the learning process of receiving and learning to use a prosthesis. Even though the neuromuscular interface and the sensory feedback can be shown to decrease the cognitive workload, learning to live with a prosthesis will still demand practice and learning new strategies. The study of this progress can be an important step in the development of both prosthesis technology and the strategies used to teach someone to use a prosthesis. 1.6 Thesis outline In the following chapters the thesis work will be described in further detail, starting with the theoretical background that lays the foundation of the work. After that follows the methods section where different parts of the method, from experiment procedures to data processing and analyses, are discussed. We describe possible approaches, discuss how they have 5 been used in other studies and present how we have chosen to do and why. The results for the performance, perceived effort and EEG data are presented and thereafter discussed. Lastly, there is a conclusion and ideas for future work. 6 2 Theory We will start by introducing the main concept of this work: cognitive workload. We will also give a background of the measurement methods: electroencephalogram (EEG) and event- related potentials (ERP). The latter will also be discussed further in the methods section (section 3). Here we will give a brief introduction of the technique together with how ERP components and frequency bands can be used to assess cognitive workload. We will then conclude the section by discussing how and why EEG measurements can differ between different individuals. 2.1 Cognitive workload We are all aware that some tasks demand more cognitive resources than others. Most people have no trouble walking and talking at the same time, but when you are asked to solve some equations it might be harder to keep up an interesting conversation. This comes back to attention and cognitive workload (also known as mental workload or cognitive load). There are different definitions to these, rather familiar, concepts and the relationship between attention and cognitive workload is also a matter of discussion. Rietschel et al. [19] states that “attention refers to the directed allocation of cognitive resources”. Similarly, Kantowitz [25] argues that cognitive workload is a subset of attention. Magill [26] elaborates this statement by saying that “attention refers to several characteristics associated with perceptual, cognitive, and motor activities” and that “a related view extends the notion of attention to the amount of cognitive effort we put into performing activities”. In this work there is no need to keep these interlaced concepts apart, so attention and cognitive workload will both be used in reference to the cognitive resources demanded by a person to perform a certain activity or task. To get back to the question of why we can perform some tasks simultaneously while others cannot, we need to introduce what is known as attentional reserve, or attention capacity. This theory states that we have a certain amount of attention, or cognitive workload, and that this can be split to do several things. Each task demands some of the attention from our reserve and leaves the rest. In the example above, walking does not demand a lot of cognitive workload and leaves some attention that you can use for example for talking. Meanwhile, solving equations might not leave enough spare attention in the reserve for conversation, and perhaps walking and talking at the same time does not allow you follow a map to find your way in a new place. That means that cognitive workload has an inverse relationship to the remaining resources of attentional reserve [27]. When the cognitive workload increases for a task, for example if you try to solve increasingly complex equations, the resources left for other tasks decrease. Workload and attention seem to be closely related to performance and learning. Kantowitz [25] suggests a model where too low or too high workload leads to lower performance and this view is supported by Winnie et al. [28] who says that efficient learning happens at the optimal level of cognitive workload. As a further link to learning, Magill [4] suggests that a new task takes a lot of cognitive effort in the beginning, but that learning takes place and thereby the attentional demands decrease with practice. This is known as the practice effect and means that learning of a task can be indicated in difference ways. Either by a decrease of cognitive workload together with a stable level of performance, by an increase 7 of performance with a stable level of cognitive workload, or by the combination of increased performance and decreased workload. Something else that needs to be considered when looking at attention and cognitive workload is how it is balanced by the demands of the task at hand. If the challenge of the task is too low one will experience boredom, and frustration will emerge if the challenge is too much compared to the skill level. The area in between these two outer limits, where skill and challenge are perfectly matched, is usually called flow. This is the feeling that can make you keep up a task, for example a video game, for a long time. If you are bored or frustrated because the game is too easy or too hard you are likely to stop playing. So it is believed that both these conditions will decrease the attention of the task [29]. 2.2 Electroencephalogram (EEG) One way to measure cognitive workload that is commonly used is by the electroencephalogram (EEG). EEG is a clinical tool that measures the electrical activity of the cerebral cortex with electrodes attached to the human scalp. The cerebral cortex is the outermost layer of the cerebrum, which is the largest part of the brain, and is divided into left and right hemisphere. Each hemisphere is in turn divided into four lobes: frontal, temporal, parietal and occipital lobes, that are associated with different functions of the human body [30]. The brain and its different regions can be seen in Figure 1. Figure 1. The brain and its four different lobes: frontal, parietal, occipital and temporal. The electrodes that are used to measure the electrical activity of the brain usually consist of a metal disk or pellet. They can be attached to the head with stickers, but since the number of electrodes used for a measurement normally is more than 16, they are usually attached to a cap that can much easier be fitted to the subject’s head. The electrodes pick up electrical activity from the brain in the form of electrical potentials for currents to flow from one electrode to a ground electrode. Since the recorded signals are in the range of 0 to 100 𝜇V they typically need to be amplified by a factor of 1000-100000 before they are further processed [31]. There are mainly four different characteristics for electrodes. They can be either wet or dry, and either passive or active [32]. Wet electrodes are generally Ag/AgCl electrodes, and one needs to put a conductive gel between the scalp and the electrode to get a good and stable 8 electrical connection. This also helps lowering the impedance of the electrode-scalp connection. A lower impedance induces less noise which is important to get good quality of the measurement [31]. Dry electrodes instead consist of a single metal, often stainless steel, that act as a conductor and the electrode are put directly on the scalp. The difference between passive and active electrodes are that the active electrodes include a pre- amplification module after the conductive material. By that, the signal can be amplified before additional noise are introduced when the signal travels from the electrode to the system that measures the signal. This increases the signal to noise ratio. For passive electrodes there is no preamplification, which means that noise arising as the signals travel from the electrode to the measuring system will be amplified as much as the EEG signals. The different types of electrodes are combined, for example one can use active, wet electrodes [32]. 2.2.1 Referencing When creating an EEG amplifier, the ground electrode must be connected to a ground circuit for the EEG amplifier to work. This ground circuit is typically connected to other parts of the amplifier, which means that electrical noise is introduced at the site of the ground electrode. This means that there are noise present in the signal from the ground electrode that are not present in the signal from the other electrodes. To get rid of this noise, EEG recording systems use differential amplifiers. With the differential amplifier a reference electrode is used together with the operating electrode and the ground electrode to cancel out the noise. The differential amplifier records the potential between the operating electrode (O) and the ground electrode (G), as well as the potential between the reference electrode (R) and G. The amplifier then outputs the difference between these potentials O-G-(R-G) = O-R and since the noise from the ground circuit are the same for both O-G and R-G any noise generated at G will be eliminated in O-R. In other words, to get a single channel of EEG all three electrodes (operating, reference and ground) are needed. [31] 2.2.2 Electrode positioning To get useful data that are comparable to other studies and possible to analyse it is important to position the electrodes in a correct way on the head. The most commonly used system to define the position of the electrodes is the 10-20-system [31]. Originally, this system used 21 electrodes, where two of them were placed on the earlobes and the rest were placed according to measurements of specific landmarks on the scalp. The landmarks used are the nasion (just above the nose, between the eyes), inion (the indent in the back of the head) and the left and right pre-auricular points (right in front of each ear), see Figure 2. An equator through the nasion, inion and the left and right pre-auricular points, together with a line between the nasion and inion and a line between the left and right pre-auricular points defines the measurements used to place the electrodes. The equator and the lines are then divided into sections with the first mark at 10 % and the following marks at 20 % intervals, resulting in the electrode positioning in Figure 2a. An extended 10-20-system with 128 electrodes can also be used, with marks on every 10 %, see Figure 2b [33]. Here we can also see that the electrodes are marked with letters and numbers. This is a way to indicate the location of the electrode. The latter gives the scalp region (F: frontal, T: temporal, C: central, P: parietal, O: occipital). The numbers indicate the distance from the center, where larger number are further from the central line. Even numbers are used for 9 the right hemisphere and odd numbers for the left. The letter “z” stands for the number zero and is used instead of the number “0” to avoid confusion with the letter “O”. Figure 2. Electrode positioning according to the normal and extended 10-20 system. The letters stand for scalp region (F: frontal, T: temporal, C: central, P: parietal, O: occipital). The numbers represent the distance from the center, with even numbers to the right and odd numbers to the left. “z” stands for zero and is used instead of the number to avoid confusion with the letter “O”. a) 10-20 system b) Extended 10-20 system 2.2.3 Artifacts and Noise The electrodes do not only detect signals from activity in the brain, but also pick up other, non-neural, signals. Every time the subject moves, clenches their jaw, frowns, move their eyes, blink or something similar, this gives an electrical signal that can be picked up by the electrodes. Electrical signals from muscle movements are called electromyography (EMG). External sources, such as electrical equipment, can also emit electrical signals that can be picked up by the EEG electrodes. Another source for disturbance is the equipment itself, for example if the connection between an electrode and the scalp is instable. All of these non-EEG, signals are called artifacts [31]. Some of the muscle movements, especially from the eyes since they are located close to the electrodes, can cause big disturbances of the recorded signal. Others, like electrical equipment or some muscle activations, are smaller and more regular. Both of these kinds of artifacts need to be handled to be able to see the subtle changes of the small, often below 100 µV, neural activity. How this can be done is discussed in section 3.4.5. 2.3 Event-Related potentials Here we will briefly introduce the event-related potential (ERP) technique, which is the cornerstone of the method of this thesis. The different aspects of this technique will be discussed in further detail in section 3. We will also present the concept of ERP components and frequency bands as a way to measure cognitive workload. Lastly, we will discuss different reasons for why measurements of cognitive workload can differ between different individuals. 10 2.3.1 Basic concept A common technique when measuring EEG is to use event-related potentials (ERPs). This is a way to single out certain activities in the brain. Raw EEG data is often hard to use, since it is a mix of all the neural activities in the brain. Even if you are told to focus on a certain task, your mind easily wanders. The ERP technique was first used in 1977 by Wickens et al. [34] and builds upon the idea that a certain stimulus, or event, can trigger a specific brain activity. The ERPs are measured by presenting some kind of stimuli, for example sounds or flashes of light, repeatedly while measuring EEG. Each stimulus is time-locked to the EEG data and marked by a line at the appropriate time. Later, a short section of EEG data, a so called epoch, is extracted around every stimulus. The epoch begins a short time before the stimulus and ends a certain time after the stimulus. It is common to use 100-200 milliseconds pre-stimulus and 800-1000 milliseconds post-stimulus. The idea is that noise that is unrelated to the stimulus will cancel out when many epochs are averaged together and leave the EEG signals that are related to the stimuli. The book “An Introduction to the Event-Related Potential Technique” by Steven J. Luck [31] is a commonly used reference in this work. This, together with articles using the ERP technique, has helped us make all of the decisions involved in conducting an ERP experiment. 2.3.2 ERP Components Luck [29, p. 68] gives the following definition of ERP components: “An ERP component can be operationally defined as a set of voltage changes that are consistent with a single neural generator site and that systematically vary in amplitude across conditions, time, individuals, and so forth. That is, an ERP component is a source of systematic and reliable variability in an ERP data set.” These voltage changes can then be picked up by the EEG electrodes, with different weights depending on the relative location of the source and each electrode. Here it is important to note the difference between ERP components and ERP peaks. The peaks in the ERP does also show voltage changes, but these changes do not necessarily reflect changes in a given component. For example, if the voltage of a positive peak is reduced it might reflect a reduction of an underlying positive component, but it might also be an increase of a negative component at the same latency, i.e. at the same time compared to the stimulus. There are some techniques to extract the components from the data, a common one being independent component analysis (ICA) that will be used and discussed in this work (see section 3.4.5.2). However, none of these methods can be completely trusted, and should be used with caution [31]. So, a single peak can never be assumed to represent a single component. Nevertheless, one can look at many different electrodes and study the latency of a peak. Since the time for a signal to travel the different distances from the source to each electrode can be closely estimated to be equal, the timing will coincide for one component at different electrodes. To avoid having to investigate all electrode sites (since they can be many), and still be able to draw conclusions from an ERP waveform, Luck recommends to use the components that 11 have been shown useful in earlier studies, either from other similar experiments or, if you are first in your field, from other fields [31]. As a way to facilitate discussions about ERP and comparisons between studies there is a conventional method for naming the different peaks of an averaged ERP waveform. These names start with a letter, either N or P, to denote whether a peak is positive or negative. After that follows a number, describing one of two things. In the first convention the number describes the ordinal position of the specific peak, i.e. the first positive peak would be called P1 and the third negative peak N3. This convention is depicted in Figure 3. However, this plot uses the old convention of plotting ERP waveforms with the negative axis directed upwards. In this work we will use the same naming convention but with the positive axis upwards, as is common in most modern ERP studies [31]. The other possible way is to name the peak according to latency (i.e. the time after stimulus onset), so that a positive peak occurring around 300 ms after the stimulus onset would be called P300. In some cases, peaks are also named to describe their function or location, such as the error related negativity (when the subject discovers that he or she did something wrong) or late positive potential. [31] Figure 3. Depicting an example of an ERP waveform where each peak is named after the convention used in this thesis. The letter (P or N) stands for positive or negative (although note that negative is upwards in this plot) and the number stands for the peaks’ ordinal position. As mentioned, each component can be referenced to either by using the ordinal position of the peak (e.g. N1) or the latency (e.g. P200). The latter describes the latency at which the component is usually found, but this varies between different experiments and therefore this notion can be confusing. Luckily, the latency is often about 100 times the ordinal position, so that P1~P100, N2~N200 and so on [31]. However, some old conventions linger and P3 is still often referred to as P300 because it was first found about 300 ms post stimulus even though it is more common to arise later than that [31]. The latency also tells us something about the stage of the stimulus processing by the brain. That means that earlier components arise from perceptual processing in the brain while later components reflect later stages of the reaction, including evaluation of the stimulus [17]. Here we describe some components that have been shown to be an indication of cognitive workload, that were the most common in our literature research. We will use the naming convention based on ordinal position. 12 2.3.2.1 N1 The N1 component is specific to auditory stimuli and is characterized as one of the initial components in an auditory ERP, called long-latency auditory ERP components. It has been suggested that the N1 component signal the detection of acoustic change in the environment. The single-peak N1 component is evoked by short transient stimuli or by onsets of noise and has been shown to consist of three temporally overlapping constituents. The dominant contribution to the N1 component is most prominent at the fronto-central electrodes. [35] The N1 component has been linked to cognitive workload in several studies [18], [36]–[39]. One of the examined studies could show that N1 varied between some of the levels but not all [24] and two failed to show significance for N1 [17], [19]. In these studies, N1 was found between 75 and 180 ms post-stimulus, where the studies that were successful of linking N1 to cognitive workload seems to have found it in the later region of that interval, see Figure 4. 2.3.2.2 N2 The N2 component is known for containing several different subcomponents: N2a, N2b, N2c. However, the basic N2 component (that will be discussed here) is said to be elicited by a repetitive, nontarget stimulus and it gets a larger amplitude if the stimulus is novel (not repeated). Depending on if the stimulus is task-relevant or task-irrelevant the N2 component appears with different latency, with later latency if the stimulus is task-relevant (the difference between task- relevant and irrelevant stimuli will be discussed more in 3.2). Also, if the stimulus is auditory a larger effect is seen in the central sites and if the stimulus is visual the effect shifts to be larger in the posterior sites instead. [31] The N2 component has been examined in two of the studies that we have looked at [36], [37] and both showed that it successfully assess the cognitive workload. They found N2 in the interval 200 to 400 ms post-stimulus, see Figure 4. 2.3.2.3 P2 The P2 component is most prominent at the frontal and central scalp sites and is typically larger for stimuli containing simple, infrequent target features. At posterior sites, the P2 component often interferes with N1, N2 and P3 and therefore it is hard to distinguish at posterior sites. [31] Two of the studies in our literature study could show a significant correlation between P2 and different levels [17], [18], but three other could not verify this correlation [19], [24], [37]. P2 was found between 166 and 270 ms post stimulus, see Figure 4. 13 Figure 4. A sketch of the latency ranges where the different ERP components have been found according to our literature study. Each line is marked with the reference for the study it is taken from. Green lines indicate that there has been a significant difference between the different levels of difficulty. Orange means that a difference could only be seen between some of the tested levels, but not all. Grey lines mean that no significant differences could be shown. Studies that have not specified the latency range are marked with “?”. 2.3.2.4 P3 The P3 component is the most examined ERP component when it comes to cognitive workload [40], and that shows in our literature study. There are also several other components that are closely related to P3, and sometimes hard to differentiate from it. The ones examined in the studies we have read are novelty P3, P3a, P3b and early and late P3a. The P3 component is typically evoked by rare task-relevant events and it is said to reflect an updating of the context information, which often is assumed as an update of the working memory. There is also clear evidence that the amplitude of the P3 component can be influenced by the amount of attention allocated to a stimulus, which has been most clearly observed in dual-task experiments where the subject is to perform two tasks at the same time. The latency of the P3 component changes over the scalp and is shorter over the frontal areas and longer over the parietal areas. It also differs between individuals depending on how rapidly the subject can allocate their attentional resources, such that the latency is shorter for subjects with higher mental speed. [35] As mentioned above, the P3 component can be divided into several subcomponents: mainly the P3a, P3b and novelty P3 component. These subcomponents are typically elicited by different task conditions and can be recognized by their different topographic distributions. The P3a subcomponent has a centro-parietal maximum amplitude distribution and is elicited by rare tones presented in a series of frequent tones without a task. If novel distracters (such as a dog barking) are used in a sequence of frequent tones a fronto- central P3 potential is elicited, which is called the novelty P3. The P3b subcomponent is elicited by task-relevant stimuli and has a parietal maximum amplitude distribution. Often, 14 P3b and the classic P300 (P3) are said to be the same component. It is also found that the novelty P3 differs from the classic P300 component and that the P3a and novelty P3 are most likely variants of the same ERP that varies in scalp topography depending on attentional and task demands. [35] Many studies have linked the P3 component and it’s relatives to cognitive workload [17], [18], [34], [37], [39]–[42]. One study failed to show significance for P3 [29] and one could only show significance between some of the levels [36]. In the successful cases, P3 has been found in the interval between 270 and 517 ms post-stimulus, and more commonly in the earlier part of that interval, see Figure 5. When looking at the related novelty P3, it has also been proven successful in assessing cognitive workload by several studies [19], [24], [43]. One has only shown a change of amplitude between some of the levels examined [44]. The novelty P3 component has been observed between 250 and 332 ms post-stimulus, see Figure 5. Lastly, some studies have linked P3a to cognitive workload [38], [45], where one split the component into early and late P3a. Another study saw no correlation between different levels and neither P3a nor P3b [43]. The components were found somewhere in the range between 210 and 405 ms post-stimuli, see Figure 5. 2.3.2.5 LPP The late positive potential (LPP) is commonly identified as a midline centro-parietal ERP with a strong connection to emotional stimuli such as pleasant and unpleasant pictures. It becomes evident at 300 ms, and can therefore be mistaken for the P3 component, but the LPP component often continues for latencies up to 2000 ms, even though it is maximal in the latency range of 300-1000 ms. LPP has also been shown to indicate reaction time to a stimulus by that the LPP amplitude increases when the reaction time increases. [35] The LPP component has been shown to be an indicator of cognitive workload [17], [18], [21]. The findings have been within the interval 400 to 610 ms post stimulus, but two of these three studies found LPP close to the end of this interval, see Figure 5. 15 Figure 5. As the previous figure, this is a sketch of the latencies where each component has been found, and each study is marked with its reference number. This figure includes the same color coding as the previous one (green: significance, orange: partly significance and grey: no significance), but here darker colors are used to indicate different versions of P3. This is also indicated by letters where “a” is P3a, “b” is P3b, “N” is Novelty P3, “ea” is early P3a and “la” is late P3a. A dot indicates that no interval was given, only the latency of the peak. As before, “?” denotes studies where the latency has not been specified. 2.3.3 Frequency bands The measured EEG signals often have an oscillatory, repetitive behaviour and therefore the collective electrical activity of the cerebral cortex is often called a rhythm. The EEG rhythms diverse between individuals and depends on things like the mental state of the subject, if they are awake or sleeping for example. Since the electrical activity arises from the activation of neurons in the brain, the rhythms can have different frequency depending on how synchronous the activated neurons are. The frequency range for the rhythms is approximately between 0.5 and 30-40 Hz and are often divided into five frequency bands, Delta, Theta, Alpha, Beta and Gamma [30]. The Alpha band is also sometimes subdivided into Low- and High-Alpha. The ranges of each band differ slightly between different studies. However, the differences in how the frequency bands are defined are relatively small (around 1 Hz). So, in this work we will discuss previous findings about a certain frequency band, such as Alpha, without consideration about the fact that the studies have used slightly different definitions of Alpha. We have decided to use the same ranges as was used by Rietschel et al. [46], which are presented in Table 1. Now we will present each of these frequency bands and their connection to cognitive workload, as shown by other studies that are part of the literature study of this work. We also describe the quotient Theta/Alpha. 16 Table 1. EEG frequency bands [46]. EEG FREQUENCY BANDS DELTA RHYTHM <3 Hz THETA RHYTHM 3-8 Hz ALPHA RHYTHM 8-13 Hz LOW-ALPHA 8-10 HIGH-ALPHA 10-13 BETA RHYTHM 13-30 Hz GAMMA RHYTHM >30 Hz 2.3.3.1 Delta The Delta rhythm has a large amplitude and is mostly present during deep sleep. In normal adults it is normally not observed in the awake state other than that it is indicative of cerebral damage or brain disease[30]. It has also been shown that Delta rhythms are involved in motivational processes such as the necessity to satisfy the basic biological needs. [47] Our literature study has shown that there seems to be no significant correlation between the Delta frequency band and cognitive workload. We found two studies that measured Delta in tasks of varying difficulty, but neither saw any significant results [46], [48]. 2.3.3.2 Theta The Theta rhythm mostly occurs during drowsiness and certain stages of sleep [31], but it has also been shown to correlate with a variety of behavioural, cognitive and emotional variables. The main domain seems to be memory and emotional regulations, but there are also indications that Theta activity occurs when performance of a learned task is increasing most rapidly and that it declines as tasks becomes familiar [47]. Especially at frontal scalp sites Theta activity can be facilitated by emotions, focused concentration and during mental tasks [49], meaning that it is expected to increase with increasing workload. Theta is, together with Alpha (described below), the frequency band that has been shown to relate most to cognitive workload [40]. Several studies have shown that theta can show the difference in cognitive workload between different levels [11], [21], [40], [50]. However, our literature study has also shown that several studies have failed to show this correlation [14], [29], [45], [46], [48] and a few studies have seen statistical significance for theta between some levels, but not between all [24], [44]. This can for example mean that there is a difference between the easy condition compared to the medium and hard, but that no difference can be seen between the two latter conditions. 2.3.3.3 Alpha The Alpha rhythm occurs during wakefulness over the posterior regions of the head and does normally have higher amplitude over the occipital areas. It is typically characterized by 17 rounded or sinusoidal waveforms. The amplitude varies between individuals and in a given individual also from time to time but is normally below 50 µV in adults. It is commonly blocked or attenuated by attention and mental effort, especially visual attention, and are most prominent when the eyes are closed [51]. The amplitude of the Alpha frequency band is therefore expected to decrease with increasing workload. As mentioned, Alpha and Theta has been shown to indicate cognitive workload [40]. Alpha is also the most studied frequency band in the literature that we have studied for this work, sometimes split up into sub-bands Low- and High-Alpha. Several studies have seen a significant difference in Alpha between different levels of difficulty [11], [14], [21], [29], [40]. One of the studies have, however, failed to show a significant effect [50]. When comparing High- and Low-Alpha, the upper frequency range seems to often yield significance [24], [44], [46], [48] while the lower range often only can show difference between some of the levels [24], [44]. 2.3.3.4 Beta Activity recognized as Beta rhythm are mainly found over the frontal and central regions of the head and is found in almost every healthy adult. The amplitude does normally not exceed 30 µV and it can be blocked by motor activity and tactile stimulation [51]. Beta activity normally increases with drowsiness and light sleep and also with mental activation [49]. In our literature study, there has been little evidence of a correlation between Beta and cognitive workload. Most studies that have examined beta have not been able to show a significant effect [14], [29], [46], [48], [50] while one has seen a difference only between some of the conditions [24]. 2.3.3.5 Gamma The Gamma rhythm consists of high-frequency oscillations and are said to be related to a state of active information processing [30]. Induced Gamma activity have been reported during sensory, cognitive and motor processing and may be related to sensory binding as well as sensorimotor integration [51]. One of the studies that we have read have seen evidence of a significant difference between different levels for the Gamma frequency band [46]. One study has seen effects between some of the conditions but not all [24]. However, two studies have also failed to show a correlation between gamma and cognitive workload [50], [52]. 2.3.3.6 Theta/Alpha: Besides the frequency bands, the quotient Theta/Alpha is also commonly used when assessing cognitive workload. There are several ways of calculating this ratio, often by using either frontal or parietal (see Figure 1) electrodes when measuring Alpha and Theta. Frontal Thetha/parietal Alpha [44] and frontal Theta/frontal Alpha [24] has both been used to indicate cognitive workload. Another study performed by Gentili et al. [45] showed that the Theta/Alpha ratio could be calculated from electrodes in the same area and still show significantly higher values for a higher level of difficulty. 18 2.3.4 Differences between individuals As mentioned, cognitive workload is here defined as the cognitive resources demanded by a person to perform a certain activity or task. This means that the cognitive workload is not only correlated to the difficulty of the task, but also to the abilities of the individual. When the task demands are close to exceeding a person’s ability, the workload is high, and the limits to boredom and frustration depend on both the task and the individual skill level. Apart from this, ERP measurements also varies between individuals. Differences between different subjects can reflect biological differences such as skull thickness or cortical folding patterns [31]. Other factors that can affect the ERPs when measuring cognitive workload are age, lack of sleep, time-of-day, time since the last meal, time of year and geographic location (mainly because of difference in daylight), exercise (mainly affects older people), and the intake of common drugs such as caffeine, nicotine and alcohol [53]. 19 3 Method Here we present how we have constructed our method, by describing general theory for the different parts and discussing how others have chosen to do. 3.1 Grasping task As mentioned in section 1.1, this work is a pilot study in preparation for measuring cognitive workload on intact limb subjects performing a grasping task with and without sensory feedback, where the latter condition will be done by using anesthesia on their hands and digits. Further, the conditions of this study are meant to mimic the conditions of with and without sensory feedback in prosthetic hands. A graphic illustration of the connection between the easy and hard condition of the different studies can be found in Table 2. Table 2. A schematic illustration of how the different levels are meant to be represented in our study and the future studies with anesthesia and prosthetic hands, respectively. OUR STUDY STUDY WITH ANESTHESIA PROSTHETIC STUDY EASY TASK Lighter cube Without anesthesia With sensory feedback HARD TASK Heavier cube With anesthesia Without sensory feedback So, the main task to be examined in this work is a grasping task. This is performed by lifting a force sensitive cube (described more in section 3.7) back and forth over a barrier as many times as possible, without pressing it too hard i.e. breaking it. If the cube is pressed too hard it is indicated by that a red LED bar light up. The weight of the cube can be increased by adding extra weights to the cube, in order to make it harder to lift it without pressing it too hard. In that manner there are two different difficulties for the grasping task: easy and hard. These are meant to represent the different conditions of with and without sensory feedback, that will be used in the future study with anaesthesia that this work is in preparation for. Pictures of the experimental setup and the force sensitive cube can be seen in Figure 6. The force sensitive cube and its design process is further described in section 3.7. The grasping task is comparable to the modified Box and Blocks test, i.e. the Virtual Eggs Test developed by Clemente et al. [54]. 20 Figure 6. The experimental setup for the grasping task, with a closeup of the force sensitive cube. a) The setup for the grasping task, with two boards separated with a barrier. The cube was to be lifted back and forth over the barrier. b) The force sensitive cube, with force sensors, LED bar and weights. The weights could be removed to reduce the difficulty of the grasping task. 3.2 Dual-task paradigms and oddball tasks Studies using the ERP (event related potential) technique are commonly performed by measuring ERPs of a secondary task that is performed simultaneously with a primary task of interest. This design is needed when it is not possible to directly assess the workload of the primary task, for example if there are no clear stimuli. The subjects are to primarily perform the primary task as well as possible and use remaining cognitive resources for the second task, doing it as well as possible under the circumstances. The secondary task in an ERP study can be for example to see flashes of light while performing a primary task of solving equations. Using ERP, the brain potentials related to the stimuli are measured. This way, the brain’s responses to the secondary task stimuli are expected to decrease as the difficulty of the primary task increases, and this shows by a decrease in amplitude of the different ERP components presented in section 2.3.2. The ERP technique thereby uses the inverse relationship between cognitive workload and attentional reserve, mentioned in section 2.1. This use of two simultaneous tasks, a primary and a secondary task, to measure the cognitive workload of the primary task is called a dual-task paradigm. In some studies, the subjects are required to react to the stimuli in some way, for example by pressing a button or by silently counting, while other studies tell the subjects to ignore the stimuli. A common dual-task paradigm is what is called an oddball task. Here, the stimuli contain common non-targets and rare targets, differentiated by for example pitch or colour. It is usual that the common non target represent 80 % of the stimuli. The ERPs are measured around the rare targets, since several ERP components are larger for a stimulus from a rare category than a common. The stimuli are usually either visual, auditory or somatosensory. The dual-task paradigm is widely used, but also questioned. One argument is that adding a secondary task will affect the performance of the first task, and thereby change the variable under investigation [18]. To deal with this problem, it is often recommended to use task irrelevant stimuli, i.e. stimuli that the subject should ignore [25]. However, Castellar et al. [55] examined this and could not find evidence that the primary task, in this case a game, 21 was affected by the secondary task of reacting to target sounds as fast as possible by pressing a button. When applying a dual task paradigm, there are many different factors to consider. These include deciding if the subjects should ignore or react to the stimuli, what type of stimuli to use and the timing of the stimuli. We will continue by discussing these options. 3.2.1 The choice of stimuli As mentioned, stimuli can be either visual, auditory or somatosensory. Since the primary task of this work (lifting a force sensitive cube) involves using visual and sensory feedback, we have chosen to use auditory feedback for the secondary oddball task. This so that the secondary task should interfere with the primary task as little as possible. When using auditory stimuli, an approach that has become common is the novelty oddball task, which include novel, complex sounds (e.g. [18], [19], [39], [43], [45], [55]). This means a collection of complex sounds (e.g. a dog barking or a car honking) that are not repeated within each subject. When comparing different kinds of auditory stimuli, Dyke et al. [38] showed that complex sounds were better for measuring cognitive workload than simple sounds (e.g. a tone of a certain frequency). They could, however, not see any difference between if the sounds were repeated or not. In novelty oddball studies, it is common to use 80 % common, simple sounds (e.g. a low pitch tone), 10 % rare, simple sounds (e.g. a high pitch tone) and 10 % novel, complex sounds (e.g. a person coughing or a mosquito buzzing) (e.g. [39], [43], [55]). The ERPs are usually measured around the novel, complex sound since this is a better way to elicit ERP components [38] and these sounds are most often task-irrelevant by either having the subjects react to the rare, simple sounds by pressing a button or count them (e.g. [39], [55]), or by asking the subjects to ignore all sound and only focus on the primary task (e.g. [14], [18], [19], [45]). This means that the novel, complex sounds are used for the ERP measurement but are not relevant for any of the tasks. However, according to a study made by Debener et al. [43] task irrelevance is not necessary when applying the novelty oddball task. They also found that the novelty P3 was actually larger for task relevant sounds. That is to say that it was more effective to let the subjects count the novel sounds, that were also used for ERPs, than to count the rare, task-irrelevant sounds. This is evidence against the common view, and all other studies that we have looked at, both before and after Debener’s discovery, still use task irrelevant stimuli when measuring ERPs. For this study, we apply the novelty oddball task using 80 % common, simple sounds, 10 % rare, simple sounds and 10 % novel, complex sounds, as described above. Henceforth, these sounds will be referred to as common, rare and novel, respectively. We choose to use 500 Hz as common sounds and 1500 Hz as rare sounds, since these sounds represented the broadest range of frequencies we could use that were deemed comfortable to listen to for the subjects. The novel, complex sounds were randomly chosen from 93 different audio clips and were only played once during each condition. The ERPs were measured by using the novel sounds, as recommended above [38] and the subjects were asked to count the rare sounds. This choice was made against what was shown by Debener et al. [43], since we decided to rather use the common approach of using task- irrelevant stimuli to measure ERPs. This will make it easier to compare the results of this study to others. 22 3.2.2 Secondary task: Counting, reacting or ignoring? If the subjects are instructed to count sounds, this can also be used as an indication of cognitive workload. Since a harder task should decrease the attention available for counting, more errors should be made with a harder task than an easy one. However, Luck [31] raises an issue with this method. Since error could arise from missing a target, from mistaking a nontarget as a target or from losing count, it is impossible to tell if a correct number means that no error was made. For example, the combination of the two first errors would result in a correct number of counted targets. For this reason, the alternative of pressing a button in reaction of a target can be superior, since that allows both misses and false pushes to be considered. A problem with this approach is that it requires a movement that will result in artifacts. We decided to ask our subjects to count the rare, simple sounds of the oddball task. This choice was made to avoid subjects getting bored of the task, since the primary task is very repetitive. This because boredom might affect the willingness to focus your attention on a task, as discussed in section 2.1. 3.2.3 Stimuli timing If the interstimulus interval, i.e. the time between two stimuli, is too short there is a risk that the different epochs will overlap, which will mean that potentials resulting from one sound might affect the next one. On one hand, you want the interval to be as short as possible to maximize the number of epochs to draw data from. On the other hand, the ERP components are bigger the longer the interstimulus interval and if the stimulus are played too often that might be tiering for the subject. Also, if the interval is too long, so called stimulus-preceding negativity can occur, which means that the subject is anticipating a sound. This can be confusing when analysing the results. Luck recommends around 1000 ms interstimulus interval, and to use a temporal jitter of at least ±100 ms, since varying the interval also prevents stimulus-preceding negativity. This means that the interstimulus interval could vary randomly between 900 and 1100 ms, according to Luck. Also, since the epochs around each novel sound will later be averaged together (for more details, see 3.4.6), varying the interstimulus interval also helps to prevent regular noise, such as alpha waves, to show in the averaged ERP waveforms. When it comes to the duration of stimuli, Luck recommends 50-100 ms for simple sounds and 300-400 ms for novel sounds, with 5-20 ms rise and fall time. [31] It seems like most of the earlier studies that have applied the novelty oddball task (e.g. [13], [16]–[18], [23], [37], [42], [44], [54]) have, however, used a longer time for the duration of the simple sounds. They have all used the same sounds, originally from[56], and to facilitate comparing our result to theirs we have decided to use the same source for our sounds. Therefore, we have played the sounds in random order with a varied interstimulus interval between 960 and 1360 ms, as it follows Lucks recommendation and has been used by for example Debener [43] and Castellar Núñez [55]. The novel sounds are from the work of Fabiani et al. [56] and the duration is between 159 and 399 ms (mean 335,43 ms). The pure tones from the same source was 336 ms long, and as mentioned we chose to use 500 Hz for the frequent sounds and 1500 Hz for the rare. Rise and fall time are 10 ms for the pure tones, but vary for the novel, depending on their properties. 23 3.3 Experimental procedure This thesis mainly resulted in a developed method to measure cognitive workload, that consists of the different parts described above: a dual-task paradigm consisting of an oddball task and a grasping task, where the cognitive workload is evaluated through EEG measurements and the self-assessment questionnaire NASA-RTLX. To test if the method could be used to measure cognitive workload, we performed a pilot study including 10 subjects. 3.3.1 Participants A good average ERP waveform can be obtained either by using long trials or many trials. However, the long preparation time for each subject (about one hour with the subjects for our experiment) makes it unrealistic to examine many subjects. Normally each study uses about 10-20 subjects for ERP measurements [31]. We have measured ERPs for 10 subjects. The 10 participants (six females and four males) was students at Chalmers University of Technology in the age 24 to 28, with mean age 25.5 and standard deviation 1.43. All had normal or corrected to normal vision and hearing. The subjects’ handedness was evaluated through the Waterloo Handedness Questionnaire. This is made up of a series of questions of which hand one would use for performing certain tasks. The options were left always, left usually, both equally often, right usually and right always. The score is then added by assigning the options with values -2, -1, 0, 1, and 2, respectively. The score ranges from ±72 and this score would thereby indicate a strong preference for either the left (-72) or the right (+72) hand. According to this all subjects were right-handed, with a received score in the range 41 to 56, with mean score 49.2 and standard deviation 4.38. The subjects also read and signed an informed consent, which can be found in Appendix C, before the experiment. To measure EEG we used an EEG system environment from g.Tec Medical Engineering, including g.HIamp multi-channel biosignal amplifier for 144 channels, g.GAMMA EEG cap with 128 g.SCARABEO active Ag-AgCl electrodes and g.TRIGbox trigger pulse box. The software used to collect the EEG data was g.RECORDER, a biosignal recording system from g.Tec. EEG was recorded at 2400 Hz from 128 electrodes according to the extended 10-20 system, which is described in section 2.2.2 and can be seen in Figure 2. Included in these 128 electrodes are four eye electrodes (EOG), two electrodes put on each earlobe and 122 scalp electrodes. As ground electrode we used the AFz electrode (between Fp and F in Figure 2). No online reference was used, instead the data was referenced offline. For a picture of a subject fitted with the cap and electrodes, see Figure 7. The trigger pulse box was used to time-lock the audio stimuli from the oddball task described above in section 3.2, which was played for the subjects through in-ear headphones. Headphones were used so that the subjects should hear equally in both ears, compared to if speakers had been used where there is a risk that the speaker sound is heard differently in the ears. For localizing and digitizing the exact individual position of the electrodes in 3D for all subject we used Polaris Krios System from Northern Digital Inc. 24 Figure 7. Person fitted with EEG cap with 128 electrodes. You can also see the EOG electrodes around the eyes. 3.3.2 Tasks The developed method consists of three different conditions: no task, easy task and hard task, using combinations of the grasping task and the oddball task, described in section 3.1 and 3.2, respectively. No task means that the subject performs only the auditory oddball task while focusing their gaze at a plus sign on a computer screen. For the easy and hard task conditions the subject was to perform the grasping task at the same time as the auditory oddball task. The easy and hard condition represent a dual task paradigm, described in section 3.2, where the grasping task is the primary task and the oddball task is the secondary task. The easy and hard conditions are, as mentioned in section 3.1, meant to replicate the conditions of with and without sensory feedback that will be used in the future study with anaesthesia. The no task condition is to add another level of cognitive workload that can be used as a baseline to examine if we can measure the differences in cognitive workload between different levels. 3.3.3 Self-assessment using NASA-RTLX To get an indication of how much workload the subjects themselves thought they put into each task, we used a self-assessment questionnaire, or task load index, developed by the National Aeronautics and Space Administration called NASA-TLX [57]. The NASA-TLX is commonly used to assess perceived effort (e.g. [14], [21], [39], [44], [45]). The questionnaire consists of six subscales which represent the variables: mental, physical, and temporal demands, frustration, effort, and performance. Each subscale is a twenty-step scale from 0 to 100 and the subjects were asked to put a cross on the step of each subscale that best represented their effort on each task. For each task, the values for all subscales were added together and divided by six to get the averaged NASA Raw Task Load Index (NASA-RTLX). This index is more commonly used in many studies because it is simpler to apply compared to the NASA-TLX which also includes an additional weighting process to weight the different subscales against each other [20]. We used the NASA-RTLX for each task and subject to get an indication of whether the perceived workload differed between the tasks. This will be used as an indicator to see if 25 the subjects experienced the expected difference between the different conditions. Subjects were also told to mark the different blocks in each condition with 1, 2, or 3 if they experience a difference in effort. They were also told that they could mark with an X if they estimated the same value for all blocks. In the end, most of the subjects did not experience a difference between the blocks, so only conditions were examined. For subjects who marked a difference, we have used the mean value. 3.3.4 Procedure Before the experiment, the subject was fitted with the EEG head cap and a connection was made between each electrode and the scalp using a conductive gel. Then the EOG electrodes were attached around the eyes using adhesive labels, and the reference electrodes, used for offline referencing, were clipped to the earlobes. The impedance for the connections was kept below 50 kΩ and were also controlled regularly between the measurements. Lastly the electrode positions were scanned. The participant also filled out the informed consent, a photo agreement and the Waterloo Handedness Questionnaire. They were asked to use their dominant hand for the grasping task. During the experiment the subject was seated in a chair with an adjustable table in front of them. They got in-ear headphones through which the sounds for the auditory oddball task were played. Before the experiment started the subjects were informed about the tasks they were going to perform and got the possibility to ask questions about the procedure. We emphasised that the cube should be lifted as many times as possible without breaking it, and that the grasping task was the main task. The subjects also got to listen to one sound (or more if requested) of each type: frequent, rare, novel and start/stop-sound, so that they knew what to listen for. The start/stop sound consisting of three consecutive tones, were used to notice the participant that they could start respectively end doing the task. We also adjusted the audio to fit the subject’s preference. This could give rise to some differences between the individual results, since the intensity of a sound affects the amplitude of the reaction, or ERP components [31]. However, keeping the volume constant would have meant that the subjects would experience different volumes because of differences in hearing, which would also give raise to individual differences. If the sounds had been hard to hear or painfully loud, this would have contributed to exhausting the subjects faster. Therefore, the subjects got to adjust the volume so that they felt most comfortable. The subject was also instructed to not blink excessively, to not frown, clench their jaws or keep unnecessary tension in any other muscles. This is to avoid artifacts and will be discussed further in section 3.4.5. The time needed for each condition depends on how many epochs is needed to get a satisfactory ERP waveform. This in turn depends on what you are looking for in the data and how much noise there is, but Luck [31] recommends 10-50 epochs for larger components, such as P3, and 100-500 for smaller, such as P1. This because more measurements increase the signal-to-noise ratio and thereby makes it possible to study smaller components. Since stimulus duration, interstimulus interval and the percentage of novel sounds are already set (see section 3.2.3), the time for each condition depends on the number of epochs we chose to measure. We have chosen to use an algorithm that plays 600-720 stimuli in total. With novel sounds being 10 % of the sounds that gives us 26 60-72 novel sounds per condition. That way we also have some margin if some of the epochs needs to be rejected because of blinks or other artifacts, at least for large components. To keep the novelty of the novel sounds they were not repeated during a condition, which means that they were repeated a maximum of three times for each subject, with at least five minutes between the repetitions. Some of the sounds were not repeated at all. The algorithm randomizes the order of the played sounds as well as the interstimulus time, such that frequent, rare and novel sounds are mixed and played with different interstimulus times between each other. Maximizing the number of epochs needs to be weighed against too long blocks. This will exhaust the subjects and might affect the number of subjects willing to participate. But more importantly it will affect the subject’s ability to stay focused on the task, and longer times might therefore do more damage than good. It is also important to insert enough time for rest between the measurements. This helps to keep the subjects alert and focused on the task. It can also reduce blinking and muscle artifacts during the measurement, since the breaks gives the subject time to blink and stretch. For this reason, each condition was divided into three blocks of about four minutes with at least one minute break between them. Between each condition there was also time for about five minutes break, or more depending on what the subject wanted. By letting subjects perform the same task for three hours, or until they were exhausted, Trejo et al. [58] have shown that fatigue will affect the measurement by increasing the amplitude of both the alpha and theta frequency bands and the P2 component. However, the same study showed that N1 and P3 was not significantly affected by the time of the task. This means that each subject completed in total nine blocks, three for each condition. At the start and end of each block the special start/stop sound was played. After each block the subjects reported the number of rare sounds they had counted. During the longer break after each condition the subjects also filled in the NASA-RTLX questionnaire. All subjects started with the no task condition, and moved on to easy task and hard task, in that order. By doing so, the level of arousal will tend to vary between the conditions. To avoid this, Luck [31] recommends varying conditions unpredictably within each trial block. However, in the future study that this work is in preparation for, it will not be possible to switch back and forth between the conditions, since the conditions in that case will be with and without anaesthesia, respectively. It would be possible to use a random order between the different conditions, for example by inviting the subjects for two separate days, but we decided against this since the same order would make it easier to compare the different subjects’ learning processes. When the subjects performed the grasping task, the number of times they lifted the cube over the barrier was counted in order to get an indication of how well they accomplished the task during the different levels of difficulty. This was then divided by the time to compute number of lifts per minute, taking into account the fact that the total duration of the blocks shifted slightly. Since the task was to lift the cube without breaking it, we also counted the number of times they broke the cube (pressed it too hard such that it lit up). A success rate was then computed by subtracting the number of times the cube was broken from the number of total lifts. These measurements of performance will, together with the NASA- RTLX, be used to verify the differences between the different conditions. It will also be 27 investigated for each block and compared to look for learning effects. It is expected that if learning takes place, performance would increase between the blocks. An experiment procedure for this work can be found in Appendix A, with more information about preparation, execution and the work needed after each experiment. Also, at the end of the experiment the subjects also participated in another study. However, the procedures or results for this are not discussed in this work and since it was performed at the end it should not affect the result of this work. 3.4 Signal processing Before analysing the EEG data, it needs to be processed to reduce the signal-to-noise ratio and obtain clean averaged curves to measure ERP components and frequency bands. This section describes common steps for signal processing and motivate our choices for this work. The signal processing has been done using EEGLAB [59], which is a freely available MATLAB toolbox, and the plugin ERPLAB [60]. These are specifically designed to analyse EEG and ERP data. 3.4.1 Offline referencing Even if the EEG equipment uses a reference site during the measurements, as discussed in section 2.2.1, this site needs to be specified and sometimes changed offline before analysing the data. This is called offline referencing or, if the reference site is changed, re- referencing. Since there are no electrically neutral sites on the head or the body in terms of neural activity, there are no perfect reference sites. This means that ERP measured at an active electrode will both reflect the EEG at the active electrode site and the reference site. Therefore, it is important to choose the reference site with caution, so it does not cancel out important information in the data. This means for example that a reference site near the site of interest is not a good choice. Also, reference sites that pick up much noise should be avoided to not get extra noise in the data. Which reference site that is the best depends on the application. [31] Common reference sites used are one or both of the earlobes (e.g. [17]–[19], [21], [37], [39], [44], [46], [61]) or one or both of the mastoids (the bones directly behind the ears, e.g. [36], [40], [41], [55], [