Human intent-recognition system for safety-critical human-machine interactions Master’s thesis in Complex Adaptive Systems Simon Künzler Department of Mechanics and Maritime Sciences CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2020 Human intent-recognition system for safety-critical human-machine interactions SIMON KÜNZLER Department of Mechanics and Maritime Sciences CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2020 Human intent-recognition system for safety-critical human-machine interactions SIMON KÜNZLER Supervisor: Pinar Boyraz Baykas, Department of Mechanics and Maritime Sciences Examiner: Pinar Boyraz Baykas, Department of Mechanics and Maritime Sciences Master’s Thesis 2020:52 Department of Mechanics and Maritime Sciences Vehicle Safety Division Chalmers University of Technology SE-412 96 Gothenburg Cover: Subject performing experiment under assistance of shared-control algorithm Typeset in LATEX Gothenburg, Sweden 2020 Abstract The aim of this thesis was to investigate the potential of eye tracking technology, to help recognizing the intent of humans when working with a machine under shared control. An experiment was designed to study the eye gaze behaviour of test subjects, while manipulating a two degrees-of-freedom (DOF) SCARA robot. The subjects were given the task to maneuver the end-effector of the robot through a sequence of LEDs located on the robot action plane. The LED sequence was different for each experiment run and not known by the subjects before the start of each run. In the first step, eye gaze data was collected while the robot was unactuated. The fixation point of the subjects gaze was 4.5 times more likely to be in the proximity of the goal LEDs they intended to connect, opposed to fixating a point outside of the intended area. In addition, when the subjects planned to move from one LED to the next, the subject’s gaze tended to fixate on the next LED between one and two seconds before reaching the position with the robot end-effector, depending on how much distance the subject had to cover when moving from the current LED to the next. After reaching the fixated position, the gaze is shifted almost immediately (with 0.1-0.2s delay) onto the next LED, while movement onset is delayed about 0.5 seconds. This information was then used to develop an algorithm to predict which LED a subject is intending to reach. While performing a second set of tests, more data was collected, but this time under shared-control with the robot. The implemented algorithm was able to successfully identify the next goal LED in the subject’s planned path and to provide assistance in the movement of the robot arm. How far ahead of time the goals were recognized was dependent on how soon the subjects gaze shifted from a reached LED to their next planned goal LED. If the subject fixates on a goal LED 0.3s before initiating the movement towards it, the robot was able to perform the whole movement between the LEDs. In most cases the algorithm initiated the support half-way through the planned motion of the subjects. No significant differences in the subjects gaze data between passive robot manipulation and shared control could be identified. Keywords: Human intent-recognition, Eye tracking, Shared-control, Human-robot shared manipulation Contents Abbreviations 1 1 Introduction 3 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Theory 7 2.1 Intent recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Shared control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Eye tracking / Hand-eye coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Experiment design and setup 11 VIII CONTENTS 3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.1 Eye tracking device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.2 SCARA robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1.3 LED surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Experiment design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4 Data collection 15 4.1 Experiment participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Eye tracking data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 End-effector position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.4 Experiment time constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5 Data analysis 19 5.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2 End effector position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.3 Fixations on surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.4 Fixations vs end effector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 CONTENTS IX 5.5 Pupil diameter during task execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.6 Fixation dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.7 Fixations comparison between test subjects . . . . . . . . . . . . . . . . . . . . . . . . 23 6 Predictive algorithm design 31 6.1 Goal of algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 6.2 Information used from passive robot data analysis relevant to algorithm . . . . . . . . . 32 6.3 Input parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.4 Limitations of algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.5 Algorithm structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 7 Results 37 7.1 Evaluation of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 7.1.1 Recorded data during shared control experiment . . . . . . . . . . . . . . . . . 37 7.1.2 Comparison between passive experiment and shared control . . . . . . . . . . . 38 7.2 Subjects evaluation of algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 8 Conclusion 41 X CONTENTS 8.1 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 8.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Bibliography 44 List of Figures 46 List of Tables 47 Appendix 47 A.1 Pseudo code of algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Abbreviations RW Real-world MDP Markov Decision Process POMDP Partially observable Markov Decision Process DOF Degrees of freedom SCARA Selective Compliance Assembly Robot Arm IR Infrared HSV Hue, saturation, value 2 CONTENTS Chapter 1 Introduction 1.1 Background Human machine interactions play a major part in our daily lives. To improve user experience and efficiency, recent systems succeeded by adapting to their operators. Dialogue systems, autonomous driving and intelligent user interfaces are only few of the emerging technologies that heavily rely on predicting people’s intentions, goals or next planned actions. Eye gaze has been proven to be a rich source of information when humans are performing a task. For example, gaze fixations in the scene can reveal how humans perceive a task and pupil size is an important indicator of cognitive load. If these indicators can be factored into a shared-control application, a better collaboration between humans and automated machines/systems could be accomplished. 1.2 Aim The aim of this thesis is to investigate the potential of eye tracking technology, to help recognizing the intent of humans when working with a machine under shared control. While several studies have been conducted on identifying key objects in a scene depending on directed gaze onto these objects, the literature on eye gaze data regarding human task planning is rather sparse. When interacting with a robot arm in a shared autonomy setting, eye gaze could yield additional information about path planning and action sequencing. While some studies have focused on teleoperation of a robot arm for compensating 4 CHAPTER 1. INTRODUCTION motoric disabilities of humans, this thesis focuses on the direct co-manipulation of a robot’s end-effector. The experiment setting will include a two degrees-of-freedom (DOF) SCARA robot and the experiment participants will be presented with a path planning task to guide the robot’s end-effector. The test subjects will be equipped with an eye tracker to study emerging gaze patterns during the robot manipulation and in a later stage the gaze data will be incorporated into a shared control algorithm, providing a cue on human intention for the planned motion. A survey on the user experience and an investigation of potential differences in the gaze data between the free and supported task execution will be held. In summary, this thesis is dedicated to answer the following research questions: • How can eye gaze data during a shared manipulation task between a robot and a human subject (path guidance of end-effector) be leveraged to infer the users plan and goal of the task? • Can eye gaze be integrated into a shared control algorithm to enhance task performance? • Is there a noticeable difference in gaze data when a user encounters support by the robot system, compared to when the user freely moves the end-effector? • What are the limitations of the predictions? How notable is the accuracy of the eye tracker when factoring into shared control? Can it be used for identifying fine movements or only to recognize the underlying plan of the interacting subjects? 1.3 Limitations • Accuracy of eye tracker: The accuracy of the subject’s gaze location can have a direct impact on the limitations of the predictive algorithm. • Diversity and amount of test subjects for data collection: To guarantee the validity of the experiment and safety for the experiment participants, a research permit has to be applied for. This can take up to several months and thus is a bottleneck for this thesis. Because the research permit can not be obtained in time, the main focus of the thesis is a proof of concept (involving 3-5 human subjects) and the ground work for larger test subject involvement will be laid. • Sensory inputs limited to eye gaze and subject’s direct influence on robot dynamics: The experiment is focused on eye tracking and motion input from human subjects and further human- machine interfaces, such as Electroencephalography (EEG) will not be considered. Since the thesis is conducted by one student, the reasons for this limitation are time restriction and the fact that the experiment would require at least two people to simultaneously install the EEG electrodes and to perform the synchronization of the computers. 1.3. LIMITATIONS 5 • Servo motors of robot: The servo motors of the robot do not have a feedback signal. The motors are controlled by setting their position with a PWM signal. This made it difficult to assess the impact of the user’s forces on the robot dynamics. In addition this also limited the motion planning of the robot, since more complex motion profiles are difficult to realize. • Self isolation inhibits availability of test subjects: During the later stage of the thesis the outbreak of the COVID-19 pandemic heavily limited the number of test subjects that could be involved in the experiment. It is important to reduce the interactions between people and guarantee a minimum interaction distance. The safety-distance during the experiment execution was difficult to obey at all time and thus the amount of test subjects had to be limited as much as possible. This mainly affected the recordings and testing of the shared-control algorithm, which were performed towards the end of the thesis. 6 CHAPTER 1. INTRODUCTION Chapter 2 Theory 2.1 Intent recognition The concept of intent recognition is to infer a human agent’s plan and goal based on observed actions. Depending on which user actions are being monitored, several different approaches for intent recognition can be taken. For example in [9] gestures were used to communicate the users intention to an autonomous "servant" robot. Notably they stressed the importance of context and interaction history to distinguish between similar gestures with different goals. They included object recognition in the scene to derive additional context. In [4] video footage of dual-agent interactions (such as handshake, hug, push) was used to study behavioural patterns between humans. By recording the actions of one person, they were able to predict the reaction of the second, unobserved person. They achieved that by developing a novel algorithm based on the principles of maximum causal entropy and inverse optimal control, which will be further elaborated on in 2.2. The advances in autonomous driving also had a major impact on the demand for intent recognition. Autonomous driving is a multi-agent problem, where an intelligent system constantly has to predict the actions of its surrounding environment. Autonomous vehicles have to consider different types of human agents when planning a route. In [10] they divided the human agents into three main categories: Humans in the vehicle cabin of the autonomous vehicle, humans around the vehicle and humans in surrounding vehicles. When planning trajectories of surrounding vehicles, a great deal of driving intent can be extracted from the road layout and by identifying the lanes being chosen [5]. In practice, a partially observable Markov decision process (POMDP) is often used as an underlying framework to model intent recognition. Partially observable refers to the fact that the data an agent receives about its environment and interaction partner(s) is incomplete and often stochastic interference is present in the process. Heuristic methods such as probability distributions over the set of possible states 8 CHAPTER 2. THEORY can be used to deal with information incompleteness. Another common method to predict user intent are neural networks, but their black-box nature make reasoning about the produced outputs hard and thus the focus of this thesis lies on more interpretable models. 2.2 Shared control The main difference of shared control, compared to traditional control systems, is the integration of the user into the control loop. Shared control aims to merge an automated system with a user to reach a common goal in a safe and collaborative manner. Shared control is heavily reliant on intent recognition. It is important to model the uncertainty arising from interactions with humans. As shown in [8], humans are rarely rational decision makers. By integrating the more risk oriented nature of humans into the model, improvements to predicting human actions can be made. A relevant framework to account for the stochastic nature of the human influence on the control loop is a Markov decision process (MDP). More precisely, the user’s effect on the control task is included in the state transition function. The state- of-the-art algorithms developed in [14],[4] use inverse optimal control (inverse reinforcement learning) to recover an unknown reward function of an MDP. This recovery is based on samples from the behavior of the human-robot interaction, usually in the form of a probability distribution. In [3], they developed a concept called policy blending for shared control. Their algorithm considers two policies, the user’s input and the robot’s prediction of the user’s intent. Depending on the confidence level of the intent policy, the robot applies weaker or stronger corrections to the user’s input to reach the predicted goal. 2.3 Eye tracking / Hand-eye coordination Eye tracking refers to the concept of recording eye metrics and movements and mapping them onto a scene view, which usually represents the field of view of the user. The recording of the eye is typically done with an infra-red (IR) camera. The image obtained by the IR camera is then processed by an algorithm to obtain pupil position and eye angle with regard to a reference frame, as well as pupil diameter. A calibration procedure is then applied to retrieve the gaze location in the scene view. Since the eye parameter detection algorithm and the scene view mapping varies between the different eye tracking manufacturers, we limit the explanation to the device used in this thesis, which is a head mounted device from Pupil Labs [12]. The device explanation is given in chapter 3. The gaze data can then be further processed to analyze formats such as scan-paths (for temporal information) and heat-maps (to identify areas of interest), to study correlations and implications on a task. For example eye, head and hand movements coordination has been studied in [11], where subjects performed a mechanical building task with LEGO blocks. The 2.3. EYE TRACKING / HAND-EYE COORDINATION 9 recorded eye gaze when building simple block patterns is ahead of the motoric execution and this latency is depending on the building strategies of the different subjects, as well as the complexity of the task. Another important measure in eye tracking is pupillometry, which describes the size and shape of the pupil. In [6] it was shown that the rate of change of pupil diameter directly correlates to cognitive load and task difficulty. When subjects were presented with a digit memorization task, the pupil diameter tended to dilate during mental processing of the string and constrict while they were reporting the string. The more digits the participants had to memorize, the more significant was the rate of change of pupil diameter. 10 CHAPTER 2. THEORY Chapter 3 Experiment design and setup 3.1 Hardware The experiment setup consists of a two-degrees-of-freedom SCARA robot, which operates on a 2D plane. A top mounted camera records the position of a color marker placed on the end-effector. A total of 12 LED’s are located on the 2D operating plane of the robot. 3.1.1 Eye tracking device The human subjects of the experiment were equipped with an eye tracker, consisting of an infrared (IR) eye camera, recording the eye movements of the subjects at 120Hz, and a scene view camera, recording the field of view of the subjects at 30Hz. The eye tracker used throughout the experiments is the Pupil Core device manufactured by Pupil Labs [12]. The pupil detection algorithm attempts to find a 2D ellipse in the IR eye camera image that represents the pupil geometry. To do so, a series of image processing methods are applied to filter for the dark pupil. To map the detected pupil positions of the eye camera to the scene view, a calibration process needs to be performed. The result of the calibration routine is a transfer function consisting of two bivariate polynomial. During calibration, the degree of the polynomials are determined by making the user focus on markers in the scene view [7]. Using the surface tracking plugin provided by Pupil Labs open source software, the LED surface of the robot’s operating plane was 12 CHAPTER 3. EXPERIMENT DESIGN AND SETUP Figure 3.1: Experiment setup isolated with the help of four AprilTag markers [1]. This allowed to limit the subjects gaze points to the one’s located on the robot’s operating plane. 3.1.2 SCARA robot The two joint axes of the robot (shoulder and elbow) are driven by two hobby servos operating under 5V. Their angular range is up to 180 degrees. The control signal of the motors is in the pulse-width modulation (PWM) form and the effective voltage of the signal corresponds to an angle between 0 and 180 degrees. Attached to the robot’s end-effector is a color marker, which is detected by a top mounted camera. After obtaining the end-effector position, the angles of the joint motors can be calculated with the robot’s inverse kinematics. 3.2. EXPERIMENT DESIGN 13 3.1.3 LED surface To obtain a common reference frame for both the robot arm and gaze positions, the LED surface is defined with four AprilTag markers representing the corners. The same four markers are detected by both cameras - the scene view camera of the eye tracking device and the top view camera of the robot arm. 3.2 Experiment design The LED surface consists of the start-location (blue LED on the right-hand-side of the participant) and the end-location (blue LED on the left-hand-side of the participant). Between the start- and end-location are a total of 10 LED’s from which five (randomly chosen) will light up during each experiment trial. The participant’s goal is to guide the robot’s end-effector from the start LED to the end LED while connecting all the light up LED’s. The order of the inter-connecting LED’s can be chosen freely by the participants but with some regard to the shortest path. This condition should reflect the fact that time is a somewhat critical factor to task performance. The experiment can be divided into two parts. First the participants execute the task without robot assistance and in a later stage with robot assistance (shared control). 1. Without robot assistance: The focus lies on collecting participants gaze data while executing the manipulation task. In addition, the position of the robot’s end-effector will be recorded at all times. 2. Shared control: The second part of the experiment aims to evaluate the performance of the shared control algorithm while performing the same manipulation task. This evaluation is done by the participants and focuses on the following criteria: • How intuitive or counter-intuitive does the robot’s assistance feel during task execution? • Could an increase in task performance be achieved? • Does the confidence threshold apply correctly to the robot support? • Does the subjects gaze data differ between shared control manipulation and without robot assistance? 14 CHAPTER 3. EXPERIMENT DESIGN AND SETUP Chapter 4 Data collection 4.1 Experiment participants A total of 5 people took part in the experiment. The participants parameters are summarized in table 4.1: Table 4.1: Participants parameters Nbr. Age Gender Vision impairment experiment runs with valid data 1 29 Male No 0/5 2 36 Female No 0/5 3 25 Male No 4/5 4 27 Female No 5/5 5 31 Male No 4/5 Since the accuracy of the eye tracking device heavily depends on parameters such as eye shape, eye lashes and calibration accuracy, the quality of gaze data varied among subjects. Also for subject 2 the surface detection was incomplete, due to a large portion of frames not having all four markers in the scene (i.e. not the whole surface was visible on some frames). To guarantee a meaningful data analysis, data with poor quality (all data of subjects 1 and 2, as well as one experiment run each for subjects 3 and 4) was not included in the evaluation. 16 CHAPTER 4. DATA COLLECTION 4.2 Eye tracking data The eye tracking data recorded from the participants consists of pupil diameter, blink frequency and the gaze position on the LED surface. The gaze position, which is mapped to the LED surface by pupil lab’s open source software, can be further divided into the following subcategories [2]: • Fixations: If a participant’s gaze is resting on a single location for a prolonged time, it is considered a fixation. Based on the task, the duration of a fixation is typically between 100-500ms, which are the limits applied to pupil labs software throughout this thesis. Fixations above 500ms are split into multiple fixations by the software, but are labeled with the same fixation index. • Saccades: Saccades are rapid, jerking movements of the eye, usually occurring between fixations. They last between 20-200ms and the amplitude can have a narrow or wide range, depending on the distance between two fixations. Saccades can be an involuntary and show up even during fixations. • Smooth pursuit: Allows the eyes to slowly track a moving target by adjusting the eye’s angular velocity to the target’s angular velocity. This type of eye movement is voluntary by the observer. Only practiced people are able to make smooth pursuit movements without a moving stimulus. • Vestibulo-ocular movements: When focusing on a scene, vestibulo-ocular movements counteract head movements, so that the visual image remains stable and is not slipping. Head-mounted eye tracking devices perceive these eye movements as identical to smooth pursuit movements and can only be differentiated by recording head movements. Since the pupil labs eye tracking device is not recording head movements, a head rest was used to minimize vestibulo-ocular movements. The pupil lab software assigns a detection confidence value (between 0 and 1) to each gaze measurement taken. This confidence value can be used to filter out gaze data with potentially low accuracy. The manufacturer recommends to only use gaze data with above 0.6 confidence, which is the filter criteria used throughout this thesis. The fixation detection algorithm implemented by Pupil Lab’s software uses a dispersion-based method to identify the fixations. This means fixations are identified as groups of consecutive points in the scene view within a particular dispersion, or maximum separation. The number of consecutive points considered is dependent on the minimum fixation duration, in this case 100ms. The dispersion of these consecutive points is defined as [13]: D = [max(x) − min(x)] + [max(y) − min(y)] (4.1) 4.3. END-EFFECTOR POSITION 17 While the definition of gaze dispersion in [13] is calculated with consecutive x- and y-positions in the scene view, Pupil Lab’s software is calculating the dispersion by identifying the maximum angle between all eye vectors recorded during the fixation time window. If this maximum eye angle is below a chosen threshold, a fixation is detected. 4.3 End-effector position The end-effector’s real-world position was recorded during each trial at 100Hz. The timestamp format used for the end-effector recording is the UNIX epoch time of the computer on which the data collection was performed. Pupil Lab’s software has its own time scale for the collected gaze data, which allows synchronization to the UNIX epoch time. This guaranteed a valid comparison between the data. 4.4 Experiment time constraints The experiment participants were not subject to any time constraints apart from a start countdown. After the start countdown, which also started gaze and position recordings, the participants were given no completion time constraint. The task completion time averaged around 10 seconds and recordings were stopped after reaching the goal LED. 18 CHAPTER 4. DATA COLLECTION Chapter 5 Data analysis 5.1 Preprocessing The fixation coordinates on the LED surface are given as normalized coordinates with respect to a different point of origin than the end-effector coordinates. Thus, the fixations are mapped to the real-world coordinate system of the robot action plane. The mapping is shown in fig. 5.1 XRW = (0.5 − XN )XS (5.1) YRW = YO + (1 − YN )YS (5.2) XRW X RW coordinates YRW Y RW coordinates XN X gaze normalized YN Y gaze normalized XS RW length of LED surface 404 mm YS RW height of LED surface 280 mm YO Y offset of Robot origin from LED surface 43 mm 20 CHAPTER 5. DATA ANALYSIS 20015010050050100150200 X-position robot end-effector [mm] 0 50 100 150 200 250 300 Y- po sit io n ro bo t e nd -e ffe ct or [m m ] 0.0 0.2 0.4 0.6 0.8 1.0 Normalized gaze x-position 0.0 0.2 0.4 0.6 0.8 1.0 No rm al ize d ga ze y -p os iti on Figure 5.1: Surface mapping of gaze coordinates to robot RW coordinates 5.2 End effector position For each trial the path of the end-effector, as well as the LED’s specific to each trial, was visualized as in fig. 5.2: The end-effector position is determined with an accuracy of approximately 5mm and as can be seen in fig. 5.2, the test subject guided the end-effector through the LED sequence within roughly 10mm proximity. Some test subjects were more precise than others, but all test subjects were usually within the 10mm, apart from some out-liners of single positions. For example, in fig. 5.2 the goal LED (176,148) was not ended on very precisely. 5.3. FIXATIONS ON SURFACE 21 20015010050050100150200 X position on surf [mm] 50 100 150 200 250 300 Y po sit io n on su rf [m m ] Robot end-effector position LED positions Figure 5.2: End-effector and LED positions in RW coordinates, subject 4 5.3 Fixations on surface In the top half of fig. 5.3, the average position during each fixation span is plotted in relation to the LED positions. The size of the fixation markers corresponds to the fixation duration. The bottom half shows exact fixation duration of each successive fixation. The positional accuracy of the fixations on the surface is dependent on the surface detection, as well as the gaze mapping precision of pupil lab’s software. Since in some frames of the scene view camera of the eye tracker, one or more of the four surface markers might be obscured by either the test subjects arm or the robot arm, the surface can not be fully detected. In those cases, the surface detection algorithm calculates the missing corners with the given side ratios of the LED surface. While in most of these cases only one marker is obscured at a time, which still yields accurate surface positions, the accuracy drops significantly with only two or fewer detected markers. These trials, containing portions of the video only showing two or less detected markers, were discarded. The gaze mapping accuracy is mainly dependent on the calibration and can vary dependent on the position on the surface, e.g. gaze positions closer to the boundaries of the surface tend to be less accurate due to squinting of the eyes. Only eye gaze data with positional accuracy below 20mm was evaluated. 22 CHAPTER 5. DATA ANALYSIS As can be seen in the upper part of fig. 5.3, most fixations fall approximately onto the LED positions. If compared with the end-effector position in fig. 5.2, the fixations are very close to the position where the end-effector passes the LED position. Notable exceptions are the first four fixations at the start. As we will see in section 5.6, some of these fixations are actually smooth pursuit eye movements. 5.4 Fixations vs end effector In fig. 5.4, the fixations and end-effector positions were plotted for the total trial duration to illustrate timing differences between arm and gaze coordination. The subject’s gaze tends to fixate the next LED between one and two seconds before reaching the position with the robot end-effector, depending on how much distance the subject has to cover when moving from the current LED to the next. After reaching the fixated position, the gaze is shifted almost immediately (with 0.1-0.2s delay) onto the next LED, while movement onset is delayed about 0.5 seconds. 5.5 Pupil diameter during task execution The pupil diameter during task execution is measured in pixels, which is a valid measure, since the relative pupil diameter is the information we are interested in. As can be seen in fig. 5.6, pupil diameter is largest at the start of the experiment and gradually decreases with time. This indicates increased mental load of the subject at the start, i.e. during path planning. 7.3s into the experiment run the pupil dilates again for approximately 0.7s. The relative change in pupil diameter is roughly 22% .Zero pupil diameter occurred when the subjects blinked. 5.6 Fixation dispersion When comparing all gaze points on the LED surface with the fixations, it shows that some of the fixations could rather be considered as smooth pursuit eye movements. This makes sense, since slow smooth pursuit movements have a small enough dispersion to be considered a fixation. This effect is also reflected in the position of the gaze points. In fig. 5.7 this effect is apparent in the fixations 1-3. Usually smooth 5.7. FIXATIONS COMPARISON BETWEEN TEST SUBJECTS 23 pursuit eye movements are hinted at by a short fixation duration. During smooth pursuit eye movements the dispersion increases until it surpasses the dispersion threshold used to detect a fixation. 5.7 Fixations comparison between test subjects The fixation durations are compared between subjects 3-5 and split into fixations lying within 20mm distance of an LED and fixations further away than 20mm from any LED positions. The median of the fixation durations close to an LED position is on average 40ms longer than the median of those not close to an LED position. The maximum fixation duration is limited to 0.5s and the minimum to 0.1s, by the Pupil Lab software. No clear distinction can be made for the maxima and minima of the fixation durations, because all three subjects cover the whole spectrum of fixation durations between maximum and minimum. Notable for subject 3 is that the fixations not falling onto an LED position are more spread out, resulting in a longer box and a shorter lower whisker. The number of fixations used from each subject are summarized in table 5.1. Table 5.1: Nbr. of fixations used for box plot Subject nbr. On LED Nbr. of fixations total nbr. of trial 3 Yes 79 4 3 No 17 4 4 Yes 103 5 4 No 24 5 5 Yes 85 4 5 No 16 4 24 CHAPTER 5. DATA ANALYSIS 20015010050050100150200 X position on surf [mm] 50 100 150 200 250 300 Y po sit io n on su rf [m m ] Fixations on surface LED positions 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425 Fixation index 0.0 0.1 0.2 0.3 0.4 0.5 Fi xa tio n du ra tio n [s ] Figure 5.3: Average positions of fixations on surface, subject 4 5.7. FIXATIONS COMPARISON BETWEEN TEST SUBJECTS 25 X po sit ion on su rf [m m] 20 0 15 0 10 0 50 0 50 10 0 15 0 20 0 t [ s] 0 2 4 6 8 10 Y position on surf [mm] 50 10 0 15 0 20 0 25 0 30 0 en d- ef fe ct or 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 Fixation duration [s] Fi xa tio n nu m be r 0 2 4 6 8 10 t [ s] 20 0 15 0 10 0 50 0 50 10 0 15 0 20 0 X position on surf [mm] S 1 2 3 4 5 E Ro bo t e nd -e ffe ct or x p os iti on LE D x po sit io n 0 2 4 6 8 10 t [ s] 10 0 12 0 14 0 16 0 18 0 20 0 22 0 Y position on surf [mm] S 12 345E Ro bo t e nd -e ffe ct or y p os iti on LE D y po sit io n Fi gu re 5. 4: Fi xa tio ns an d en d- ef fe ct or th ro ug h tim e, su bj ec t4 26 CHAPTER 5. DATA ANALYSIS X position on surf [mm] 200 150 100 50 0 50 100 150 200 t [s] 0 2 4 6 8 10 Y po sit io n on su rf [m m ] 50 100 150 200 250 300 end-effector Figure 5.5: 3D plot, fixations on surface through time, subject 4 5.7. FIXATIONS COMPARISON BETWEEN TEST SUBJECTS 27 0 2 4 6 8 10 t [s] 0 10 20 30 40 50 60 70 80 Pu pi l d ia m et er in p ix el Figure 5.6: Pupil diameter during task execution, subject 4 28 CHAPTER 5. DATA ANALYSIS 20015010050050100150200 X position on surf [mm] 50 100 150 200 250 300 Y po sit io n on su rf [m m ] Start time End time 20015010050050100150200 X position on surf [mm] 50 100 150 200 250 300 Y po sit io n on su rf [m m ] Fixations on surface LED positions 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425 Fixation index 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Di sp er sio n [d eg ] Figure 5.7: All gaze points on surface, fixations on surface and dispersion of the fixations for subject 4. The color spectrum for the top graph changes with experiment time and for the bottom two graphs each color represents a fixation. 5.7. FIXATIONS COMPARISON BETWEEN TEST SUBJECTS 29 on LED, S3 not on LED, S3 on LED, S4 not on LED, S4 on LED, S5 not on LED, S5 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 Fi xa tio n du ra tio ns [s ] Figure 5.8: Fixation durations of all trials of all subjects 30 CHAPTER 5. DATA ANALYSIS Chapter 6 Predictive algorithm design This chapter explains the design choices made to develop the algorithm, as well as its functionalities. To start off, the goal of the algorithm is stated, followed by a discussion on how the findings of the passive robot experiment from chapter 5 can potentially be integrated in the algorithm to make predictions about how the users intend to complete the experiment task. After that, the input parameters of the algorithm, i.e. what states and observations during experiment execution does the algorithm have access to, are explained. In the next step the limitations of the algorithm are listed and finally the implementation and structure is explained. 6.1 Goal of algorithm The overlaying goal of the algorithm is to exploit the subjects gaze behavior during task execution, to derive which LED’s they are intending to reach and how they plan to sequence through them. If the algorithm can deduce the position the subject is trying to steer the end-effector to, the next step will be to assist the subject to reach that position. 32 CHAPTER 6. PREDICTIVE ALGORITHM DESIGN 6.2 Information used from passive robot data analysis relevant to algo- rithm With the results obtained in chapter 5, where the subjects goal was to guide the passive (unactuated) robot end-effector through seven trial specific LEDs, the following observations were investigated for "informational use" to the algorithm: • Positional precision of subjects end-effector guidance: While guiding the robot’s end-effector through the LED positions, the subjects considered an LED as visited by passing them within a 10mm radius. • Fixations proximity to LED positions: To decide if the subject’s gaze is fixated on an LED position, a proximity region has to be chosen for each LED on the surface. A simple, but nonetheless efficient heuristic is to define a circular area with fixed radius around each LED position. The length of the radius is dependent on how accurate the subject’s gaze is mapped onto the LED surface. To achieve optimal classification the radius should be as small as possible, but still large enough to include less accurate fixations on LED’s. A radius of 20mm proved to yield the best results. Since fixations tend to get less accurate the closer they are to the boundaries of the surface, the length of the radius was chosen in accordance with this observation. • Most fixations fall onto LED positions during task execution: Referring to fig. 5.8 of the previous chapter, approximately 4.5 as much fixations fall onto one of the trial specific LEDs compared to those that were not in any LED proximity areas. • Duration differences between the fixations falling onto LED positions and those not in prox- imity of LED positions: While it was shown in fig. 5.8 that the median fixation duration was longer for those falling onto an LED position compared to those that were not in any of the LED’s proximity areas, both categories had fixation durations ranging from 100-500ms. Because the fixations have to be processed by the algorithm during run-time, i.e. each fixation is processed individually, no apparent distinction can be made with only the duration. • Fixation duration and dispersion: When considering the fixation dispersion in addition to the duration, it was shown that Pupil Lab’s software classifies slow smooth pursuit eye movements as fixations. The key to identifying these false classifications is to look at the ratio between the duration and dispersion. A fixation with a short duration and high dispersion is most likely a smooth pursuit eye movement. • Timing differences between fixations and movement onset: As discussed in the previous chapter 5, when subjects reach a goal LED, they tend to shift their gaze onto the next LED after a short delay of 0.1-0.2s, while movement onset of the end-effector is usually delayed between 0.3-1s. 6.3. INPUT PARAMETERS 33 • Pupil diameter during task execution: The subjects increased mental load at the start of the experiment, likely due to path planning, correlates with a wider pupil diameter. Although this information is in accordance with the theory, little value can be gained by incorporating it into the algorithm and the computational expense should be invested into more promising information extraction. 6.3 Input parameters • All LED positions on surface: The locations of all LEDs on the surface are known to the algorithm. It is important to note that the five randomly chosen LEDs specific to each new experiment run are unknown to the algorithm and neither is the order in which subjects choose to pass them. • Current end-effector position: The current end-effector position detected by the top mounted camera is updated with 100Hz. • Fixation position, duration and dispersion: Fixation position on the LED surface and the corre- sponding duration and dispersion are broadcast to the algorithm. 6.4 Limitations of algorithm • Since the servo motors of the robot arm, when activated, are locked in place at the current position, the robot has either full control of the motion (when activated) or no control (in passive configura- tion). This limitation did not allow for a complex shared control algorithm, where the motors could exert varying torque depending on the goal detection confidence. • It is not possible for the robot to recognize when the user is not agreeing with the robot’s chosen path to a detected goal. This is a result of the lack of force feedback from the servo motors. 6.5 Algorithm structure The structure of the algorithm and its main functionalities are displayed in fig. 6.1. The algorithm consists of three processes running in parallel: 34 CHAPTER 6. PREDICTIVE ALGORITHM DESIGN • Receive gaze data and detect subject’s goal: This process receives the user’s fixations on the surface, which are sent by the Pupil Lab software. The fixation data consists of the normalized position on the surface, as well as the duration and dispersion of the fixation. To exclude potential smooth pursuit eye movements labeled as fixations, the ratio between dispersion and duration is calculated. If that ratio is above a threshold, the fixation could be a smooth pursuit eye movement and thus is not considered as a potential goal. If the dispersion-duration-ratio is below the threshold, the normalized position on the surface is then mapped to the RW coordinates of the robot action plane. If the RW position of the fixation is in the proximity area of an LED, it is added as a goal to the goal queue. • Detect end-effector position and calculate RW coordinates: To obtain the current end-effector position, we first need to grab the current camera frame of the top mounted camera and detect the color marker placed on the end-effector. This is done converting the image into HSV (hue, saturation, value) space, followed by image thresholding to isolate the marker from its surrounding environment. The resulting image is then searched for connected components (i.e. shapes). To segment the marker from other detected shapes, each detected shape is checked for its area and if the size of the area is corresponding to the one of the marker. After obtaining the pixel coordinates of the marker in the image, the pixel coordinates are then mapped to the RW coordinates of the robot action plane. This is done by an affine-transformation approach with the four AprilTag markers (also detected in the image) on the corners of the surface. This works because the AprilTag markers are fixed in place and their RW positions are known, thus the affine-transformation makes use of where the color marker is in relation to the AprilTag markers. Since the color marker on top of the end-effector is offset from the LED surface, a parallax error occurs when moving away from the projected center (onto the LED surface) of the top mounted camera. To correct the parallax-error a automatic calibration routine is executed at the start of each experiment run. • Move robot to current goal position: As soon as a goal is added to the goal queue by the goal detection process, the current goal is set to the first element in the queue. Since the robot’s servo motors are in passive mode (no Voltage applied to the servos) until there is a current goal, the motors need to get attached at the current angles. Because the current position detected by the previous process are the x-y-coordinates of the end-effector, the current joint angles need to be calculated. This is done with the robot’s inverse kinematics. After the current joint angles are obtained and the servo motors are attached, the robot begins to move towards the current goal. After reaching the current goal, the current goal is removed from the goal queue. If in the meantime a new goal has been detected and added to the queue, the current goal is set to the newly detected goal. If the goal queue is empty instead, the servo motors get detached and the process is waiting for a new goal to be added to the queue. 6.5. ALGORITHM STRUCTURE 35 Recive fixation on surface Yes No If disp/dur < threshold  If in proxim.  of LED Add to goal queue Map fix_norm_pos to fix_rw_pos Yes No Yes Grab current camera frame Detect end-effector marker (pixel coords) Affine transform pixel coords to RW coords Parallax correction of RW coords Update current end- effector RW coords Goal queue empty? Set current goal Inverse kinematics Get current end- effector RW coords Attache servos at current joint angles Move to current goal Remove current goal from goal queue Goal queue empty? Detache servos No No Yes Text Recive gaze data and detect subject's goal Detect end-effector position and calculate RW coordinates Move robot to current goal position Set current goal Figure 6.1: Basic structure of algorithm explaining the three main processes 36 CHAPTER 6. PREDICTIVE ALGORITHM DESIGN Chapter 7 Results 7.1 Evaluation of Algorithm 7.1.1 Recorded data during shared control experiment To test the algorithms performance and the effect of the active robot on the subject during co-manipulation, the RW position of the end-effector and the subjects gaze data was recorded. All recorded trials had to be made with the same subject (subject 4), since the self-isolation period of the COVID-19 virus had started. The recorded data is visualized in fig. 7.1. Even though the first fixation lies inside the proximity area of LED 1, the algorithm does not consider it as a goal. This is because of its high dispersion, which indicates it might be the end of a smooth pursuit eye movement. Approximately 0.3s before the subject starts moving from LED 1 towards LED 2, the subjects fixates at LED 2 with low dispersion. This results in a quick response from the algorithm, identifying it as a goal LED. The robot activates almost immediately after the subject begins moving towards LED 2. Since the subject’s gaze keeps fixating on LED 2 for about 0.7s after the active robot reaches it, the next goal can not yet be identified (goal queue empty) and the algorithm detaches the servo motors of the robot to hand the control back to the subject. Although the subject’s gaze switches to LED 3, before moving towards it, LED 3 is detected as a goal only halfway through the motion. This is because the subject starts moving the end-effector almost at the same time as the gaze is redirected at LED 3, which requires the corresponding fixation (number 7) to be long enough to be considered a potential goal by the 38 CHAPTER 7. RESULTS algorithm. Similarly, when the subject moves the end-effector from LED 3 to 4 and from LED 4 to 5, the robot aids the subject only about 0.2s into the movement towards the goal LED. Surprisingly, the subject starts fixating on the end LED about 0.2s into the motion of moving towards it and not as with previous goals before starting the motion. Since start and end LEDs are the same in each trial, this could indicate that the subject has acquired muscle memory of the end LED position. Because the fixation occurred late, the algorithm did detect the goal late too. In conclusion, if the dispersion of the fixation is low enough, the algorithm manages to identify the goal after the subject fixates on it for at least 0.3s, which is a minimum delay for the algorithm’s goal identification. How far ahead of time the robot can predict the subject’s next goal is mainly dependent on how much ahead the fixation occurs before the initiation of the end-effector movement. In theory, if the subject fixates on a goal LED 0.3s before initiating the movement towards it, the robot can perform the whole movement between the LEDs. This was almost the case between LEDs 1 and 2. A problem that came up during one of the trials was that low calibration accuracy of the eye tracker resulted in a fixation lying close to an inactive LED (that was not in the sequence of the trial). As a consequence, the algorithm considered it as a goal LED. However, this error can be avoided completely when making sure the eye tracker is well calibrated before each trial. 7.1.2 Comparison between passive experiment and shared control When comparing between the passive robot and shared control experiments no apparent difference in gaze data could be detected. However this could be due to the limited data that could be collected. In addition the subjects were not informed about the goal queue of the algorithm, which could potentially be used to increase task performance, if the subjects trust the algorithm’s goal detection enough. While in the passive experiment the users in some cases were not very accurate in landing on the exact LED positions, the active robot had a higher accuracy and could make up for this. 7.2 Subjects evaluation of algorithm The subject was asked to evaluate the algorithm on the following criteria: • How intuitive is the robots support during task execution, do the subjects feel the robot is supporting them during their task execution, or are they irritated by the support? During first test runs, the subject was taken by surprise when servo motors were attached, because it was 7.2. SUBJECTS EVALUATION OF ALGORITHM 39 notable and interrupted the users flow of motion. However the subject got used to this quickly, after the first two experiment runs. • Do the subjects think they could increase their task performance with robot assistance? Once the motors are active and move towards the correct goal, the subject could start planning the next goal(s), which could result in less time spend on the actual goal positions. They argued however, that it usually took them a short duration to recognize if the robot is actually moving to the correct position. This delayed their planning for the next goal slightly. • What do they think could improve the algorithm? The subjects opinion was, that if the robot would gradually increase its assistance (when recognizing a goal), it would feel a lot more natural and new subjects, that are not used to the abrupt activation of the motors, would be less surprised by it. This however would require to install new servo motors that allow torque control. 40 CHAPTER 7. RESULTS X position on surf [m m ] 200 150 100 50 0 50 100 150 200 t [s] 0 2 4 6 8 Y position on surf [mm] 50 100 150 200 250 300 Robot end-effector position 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 0.0 0.1 0.2 0.3 0.4 0.5 Fixation duration [s] Fixation num ber 0 2 4 6 8 10 t [s] 200 150 10050050 100 150 200 X position on surf [mm] S12345E Robot end-effector x position LED x position 0 2 4 6 8 10 t [s] 100 120 140 160 180 200 220 240 Y position on surf [mm] S1 2 34 5 E Robot end-effector y position LED y position Figure 7.1: R ecorded data ofsubject4 during shared control,blue end-effectorcoordinates = passive robotand green = active robot Chapter 8 Conclusion 8.1 Summary of results The aim of this thesis was to investigate the potential of eye tracking technology, to help recognizing the intent of humans when working with a machine under shared control. Eye gaze has been proven to be a rich source of information when a human is planning a task. Especially when performing a physical task, the human gaze tends to fixate on key objects, before they take physical action. The chosen shared-control setting was the co-manipulation of a two degrees-of-freedom (DOF) SCARA robot. Test subjects were presented with a path planning task, where they were requested to maneuver the robot’s end-effector through a sequence of LEDs on the robot action plane. The LED sequence of each run had the same start- and end-positions, while the remaining 5 were chosen randomly out of 10 in total. In a first stage, the robot arm was unactuated and the focus was to collect eye gaze data from the subjects while they were guiding the passive robot through the sequence. It was found that most gaze fixations of the subjects tend to fall onto the LEDs they had intended to connect. In fact, around 4.5 as much as the fixations not in proximity of these goal LEDs. In addition, when the subjects planned to move from one LED to the next, the subject’s gaze tended to fixate on the next LED between one and two seconds before reaching the position with the robot end-effector, depending on how much distance the subject had to cover when moving from the current LED to the next. After reaching the fixated position, the gaze is shifted almost immediately (with 0.1-0.2s delay) onto the next LED, while movement onset is delayed about 0.5 seconds. This information was then used to develop an algorithm to predict which LED a subject is intending to reach. In a second data collection, where the algorithm was tested, it was shown that the algorithm was successfully recognizing the LEDs the subjects are trying to navigate the robot to and could support them in the movement. How far ahead of time the goals were recognized was dependent on how soon the 42 CHAPTER 8. CONCLUSION subjects gaze shifted from a reached LED to their next planned goal LED. If the subject fixates on a goal LED 0.3s before initiating the movement towards it, the robot was able to perform the whole movement between the LEDs. In most cases the algorithm initiated the support half-way through the planned motion of the subjects. 8.2 Further work • By encouraging the subjects to perform the experiment faster, or by defining a time constraint to the experiment completion time the goal queue implemented in the algorithm could achieve increased execution speed. • Since the algorithm could only be tested with one subject, due to recommended self isolation in connection with the COVID-19 virus, the collected data was limited. If more subjects could be recorded, better understanding between the differences of the passive and active robot experiment could be obtained. • If the servo motors of the robotic arm could be replaced by ones that have more control possibilities, a more subtle switch between active and passive robot could be achieved. This would result in more comfort for the user, which is less irritating. • To receive feedback from the user during manipulation of the end-effector it could prove valuable to monitor the user’s reaction forces on the robotic arm. This can be achieved by installing pressure pads, piezo-sensors or torque sensors on the end-effector. Bibliography [1] AprilTag markers. https://april.eecs.umich.edu/software/apriltag/. Ac- cessed: 2020-02-21. [2] Aronson, Reuben M, Santini, Thiago, Kübler, Thomas C, Kasneci, Enkelejda, Srinivasa, Siddhartha, and Admoni, Henny. “Eye-hand behavior in human-robot shared manipulation”. In: Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction. 2018, pp. 4–13. [3] Dragan, Anca D and Srinivasa, Siddhartha S. “A policy-blending formalism for shared control”. In: The International Journal of Robotics Research 32.7 (2013), pp. 790–805. [4] Huang, De-An, Farahmand, Amir-massoud, Kitani, Kris M, and Bagnell, James Andrew. “Ap- proximate maxent inverse optimal control and its application for mental simulation of human interactions”. In: Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015. [5] Huang, Rulin, Liang, Huawei, Zhao, Pan, Yu, Biao, and Geng, Xinli. “Intent-estimation-and motion- model-based collision avoidance method for autonomous vehicles in urban environments”. In: Applied Sciences 7.5 (2017), p. 457. [6] Kahneman, Daniel and Beatty, Jackson. “Pupil diameter and load on memory”. In: Science 154.3756 (1966), pp. 1583–1585. [7] Kassner, Moritz, Patera, William, and Bulling, Andreas. “Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction”. In: Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing: Adjunct publication. 2014, pp. 1151–1160. [8] Kwon, Minae, Biyik, Erdem, Talati, Aditi, Bhasin, Karan, Losey, Dylan P, and Sadigh, Dorsa. “When Humans Aren’t Optimal: Robots that Collaborate with Risk-Aware Humans”. In: arXiv preprint arXiv:2001.04377 (2020). https://april.eecs.umich.edu/software/apriltag/ 44 BIBLIOGRAPHY [9] Nehaniv, Chrystopher L, Dautenhahn, Kerstin, Kubacki, Jens, Haegele, Martin, Parlitz, Christo- pher, and Alami, Rachid. “A methodological approach relating the classification of gesture to identification of human intent in the context of human-robot interaction”. In: ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005. IEEE. 2005, pp. 371–377. [10] Ohn-Bar, Eshed and Trivedi, Mohan Manubhai. “Looking at humans in the age of self-driving and highly automated vehicles”. In: IEEE Transactions on Intelligent Vehicles 1.1 (2016), pp. 90–104. [11] Pelz, Jeff, Hayhoe, Mary, and Loeber, Russ. “The coordination of eye, head, and hand movements in a natural task”. In: Experimental brain research 139.3 (2001), pp. 266–277. [12] Pupil labs eye tracking device. https://pupil-labs.com/products/core/. Accessed: 2020-02-02. [13] Salvucci, Dario D and Goldberg, Joseph H. “Identifying fixations and saccades in eye-tracking protocols”. In: Proceedings of the 2000 symposium on Eye tracking research & applications. 2000, pp. 71–78. [14] Ziebart, Brian D, Bagnell, J Andrew, and Dey, Anind K. “The principle of maximum causal entropy for estimating interacting processes”. In: IEEE Transactions on Information Theory 59.4 (2013), pp. 1966–1980. https://pupil-labs.com/products/core/ List of Figures 3.1 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.1 Surface mapping of gaze coordinates to robot RW coordinates . . . . . . . . . . . . . . 20 5.2 End-effector and LED positions in RW coordinates, subject 4 . . . . . . . . . . . . . . . 21 5.3 Average positions of fixations on surface, subject 4 . . . . . . . . . . . . . . . . . . . . 24 5.4 Fixations and end-effector through time, subject 4 . . . . . . . . . . . . . . . . . . . . . 25 5.5 3D plot, fixations on surface through time, subject 4 . . . . . . . . . . . . . . . . . . . . 26 5.6 Pupil diameter during task execution, subject 4 . . . . . . . . . . . . . . . . . . . . . . 27 5.7 All gaze points on surface, fixations on surface and dispersion of the fixations for subject 4. The color spectrum for the top graph changes with experiment time and for the bottom two graphs each color represents a fixation. . . . . . . . . . . . . . . . . . . . . . . . . 28 5.8 Fixation durations of all trials of all subjects . . . . . . . . . . . . . . . . . . . . . . . . 29 46 LIST OF FIGURES 6.1 Basic structure of algorithm explaining the three main processes . . . . . . . . . . . . . 35 7.1 Recorded data of subject 4 during shared control, blue end-effector coordinates = passive robot and green = active robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Appendix A.1 Pseudo code of algorithm 48 LIST OF FIGURES Algorithm 1: Recive gaze data and detect subject’s goal while Program running do if robot is calibrated then receive fixations; if fixation is on surface then if confidence of gaze data > confidence threshold then if fixation’s dispersion-duration-ratio < threshold then RW position of fixation = map normalized position to RW position on surface; for all LED positions on surface do if RW position of fixation in proximity of LED then potential goal = LED position on surface; end end end if potential goal not current goal and not in goal queue and not in reached goals then add potential goal to goal queue; end end end end end A.1. PSEUDO CODE OF ALGORITHM 49 Algorithm 2: Detect and calculate current RW position of end-effector calculate affine transformation matrix with known RW positions of AprilTag markers; while Program running do grab current camera frame; pixel coordinates of end-effector = detect marker of end-effector in current frame; if no marker detected then set pixel coordinates of end-effector to previous one; end RW position of end-effector = affine transform pixel coordinates of end-effector; if robot is calibrated then RW position of end-effector = parallax error correction of RW position of end-effector; end end 50 LIST OF FIGURES Algorithm 3: Move robot to current goal position while Program running do if robot is calibrated then if no current goal then if goal queue is not empty then current goal = get first goal in goal queue; current end-effector position = get current end-effector position; current joint motor angles = calculate inverse kinematics of current end-effector position; attache servo motors at current joint motor angles; end else while current joint motor angles not goal joint angles do move motors towards goal joint angles; end add current goal to reached goals; if goal queue is not empty then current goal = get first goal in goal queue; else current goal = no current goal; detach servo motors; end end end end Abbreviations Introduction Background Aim Limitations Theory Intent recognition Shared control Eye tracking / Hand-eye coordination Experiment design and setup Hardware Eye tracking device SCARA robot LED surface Experiment design Data collection Experiment participants Eye tracking data End-effector position Experiment time constraints Data analysis Preprocessing End effector position Fixations on surface Fixations vs end effector Pupil diameter during task execution Fixation dispersion Fixations comparison between test subjects Predictive algorithm design Goal of algorithm Information used from passive robot data analysis relevant to algorithm Input parameters Limitations of algorithm Algorithm structure Results Evaluation of Algorithm Recorded data during shared control experiment Comparison between passive experiment and shared control Subjects evaluation of algorithm Conclusion Summary of results Further work Bibliography List of Figures List of Tables Appendix Pseudo code of algorithm