Evaluation of 3D camera technology for patient positioning within the field of radiation therapy Master of Science Thesis ARMIN BÖLLER Department of Computer Science and Engineering Division of Computing Science CHALMERS UNIVERSITY OF TECHNOLOGY Göteborg, Sweden, 2010 Evaluation of 3D camera technology for patient positioning within the field of radiation therapy by Armin Böller A thesis submitted to Chalmers University of Technology in partial fulfillment of the requirements for the degree of Master of Science Department of Computer Science and Engineering Chalmers University of Technology Gothenburg, August 2010 Evaluation of 3D camera technology for patient positioning within the field of radiation therapy by Armin Böller Chalmers University of Technology Department of Computer Science and Engineering ABSTRACT The thesis describes shortly the Time-of-Flight technology and also the experiments, algorithms and results for the registration of point clouds acquired by a Time-of- Flight camera. The aim is to explore possible applications of the ToF camera for patient positioning in the field of radiotherapy. A distance function between two point clouds has been defined for the registration of point clouds in different frames. Afterwards the distance of the reference frame with a linear transformed point cloud is minimized. For this purpose the UOBYQA (unconstrained optimization by quadratic approximation) algorithm has been used. The following results could be obtained: a single pixel of the camera itself had a standard deviation of 0.5mm; the translation of a spherical object could be determined with an average error of 0.5mm. The algorithm for the registration of arbitrary surfaces with six degrees of freedom are not completely finished and will be improved in further work. The cameras are sensitive to temperature fluctuations, which has been tried to be avoided by using two cameras at the same time. For this purpose the cameras have been calibrated and their coordinate system transformed from one to the other. Finally a motion surveillance application is presented, which creates a real time video stream with a visualization of the distance from the actual frame to the reference frame. ACKNOWLEDGMENTS Ich möchte mich bei Professor Peter Damaschke für das betreuen der Masterarbeit bedanken. Jag ville gärna studera i Sverige och Chalmers tekniska högskola blev d̊a ett naturligt val för mig. Ich möchte mich bei Stephan Erbel und Doktor Kajetan Berlinger für die Betreuung bei Brainlab bedanken. I would like to thank Brainlab for giving the possibility of writing my master thesis. Ich möchte mich bei meiner Familie für die Unterstützung bedanken. And finally I would like to thank all my friends for having such a nice time, while I was writing my thesis. Contents Table of Contents iv 1 Introduction 1 1.1 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Time-of-Flight Camera . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Application of the ToF Camera . . . . . . . . . . . . . . . . . . . . . 4 1.4 Recent Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Time-of-Flight Technology 5 2.1 SwissRanger4000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Error Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Sphere Matching 7 3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.1 Phantoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.2 Tested Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Calculation of the Sphere Center . . . . . . . . . . . . . . . . . . . . 9 3.2.1 Hough Circle Transform . . . . . . . . . . . . . . . . . . . . . 9 3.2.2 Local Minima Search . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.3 Sphere Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.4 Candidate Filtering . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.5 Iterations to the Final Coordinates . . . . . . . . . . . . . . . 12 3.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.1 Static Observation . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.2 Shifted Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.3 Temperature Influence . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Stereo Camera System 20 4.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Transformation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 iv CONTENTS v 4.3.2 Combining the Positions . . . . . . . . . . . . . . . . . . . . . 22 4.4 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5 Surface Matching 26 5.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.2 Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2.1 Point to Point Cloud Distance . . . . . . . . . . . . . . . . . . 27 5.2.2 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2.3 Approximation Algorithm . . . . . . . . . . . . . . . . . . . . 29 5.2.4 Starting Value . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6 Patient Surveillance 35 7 Conclusion 37 Bibliography 38 List of Abbreviations 3D Three Dimensional CT Computer Tomography DRR Digitally Reconstructed Radiograph IGRT Image Guided Radiation Therapy LED Light Emitting Diode LINAC Linear Accelerator PMD Photonic Mixer Device UOBYQA Unconstrained Optimization By Quadratic Approximation ToF Time-of-Flight RGB Red Green Blue color level vi Chapter 1 Introduction Radiation therapy is one of the major treatment methods of cancer sufferers, besides surgery and chemotherapy. In the therapy are several sessions of radiation treatments used to destroy the tumor. At first a computer tomography (CT) scan is done to determine the affected tissue. This is followed by the treatment with the linear accelerator (LINAC). The radiation is emitted from different directions with the focus in the tumor to avoid the destruction of healthy tissue. To ensure that all tumor cells are destroyed and cannot regrow after the therapy, a larger volume than the tumor itself needs to be irradiated. The tumor could have grown since the last CT-scan or might not have been fairly visible at the time of the CT-scan. Further errors like the positioning of the patient relative to the treatment machine extend the required safety margin even more. In radiation therapy the time in-between the treatment sessions gives the healthy tissue some time to regenerate and the repeated positioning of the patient partially averages out the errors. There are several techniques to minimize the positioning error. One possibility is to fix the patient with a frame or another one is to position and monitor the patient with help of image processing, a so called Image Guided Radiation Therapy (IGRT). Together with a precise LINAC and treatment plan, it is sometimes possible to reduce the number of treatment sessions to a single one. In this case a radiologist speaks about radiation surgery instead of radiation therapy. This speeds up the healing process, reduces the emitted radiation and may reduce the side effects of the therapy [1]. 1.1 State of the Art One commonly used IGRT system is ExacTrac R© of Brainlab, which is part of the widely used Novalis Tx system (figure 1.1) [2]. Reflective Markers are placed onto the patient and tracked by a stereo infrared camera system. After the tumor is placed roughly in the focus of the LINAC, two x-ray acquisitions are done from different sides. These images are matched with digitally reconstructed radiograph (DRR) virtual X- 1 1.1 State of the Art 2 Figure 1.1 Brainlab ExacTrac R© system with both X-ray tubes activated. ray images generated from the planning CT-data, which includes the position of the tumor. The patient is automatically moved according to the distance vector between the LINAC focus and the tumor. Another X-ray acquisition is done to ensure a high precision with a maximum alignment error of 0.5 mm before the radiation treatment starts. For a tumor in a moving body part, as for instance the lung, a so-called gating can be used to reduce the amount of irradiated healthy tissue. During the gating the LINAC emits radiation while the tumor is in the focus. When the tumor is moving out of focus the treatment pauses and resumes as soon as the tumor is back in focus. To observe the movement, a camera system is needed. In the Brainlab ExacTrac R© system 5 to 8 reflective markers are fixed to the patient’s skin. The clinical personal needs to take care, that the markers are attached to body parts - moving with the tumor and are visible for the camera system. A benefit would be a camera system, which works without markers. This would speed up the patient setup and would avoid mistakes by placing the markers on wrong positions. An IGRT system without markers is AlignRT R© from VisionRT [3] [4]. It uses two stereo camera systems, which reconstruct the surface of the patient to be able to place the patient in the same position as in the last session. There is no matching of the CT-image with the inner body parts through an X-ray scan. So this system is especially suited for a cancer, which is close to the surface, for example breast cancer. 1.2 Time-of-Flight Camera 3 Figure 1.2 Mesa SwissRanger4000 ToF camera mounted on a ladder. 1.2 Time-of-Flight Camera An alternative sensor for an IGRT system is the Time-of Flight (ToF) camera tech- nology (figure 1.2), which has been developed over the last years. The ToF camera system uses a light impulse to determine a three dimensional point cloud with dis- tance information for each pixel. This can be done in real time in a video frame rate. The ToF camera system uses an array of light emitting diodes (LED) to send out near infrared light impulse. The light gets reflected at the objects in the field of view and returns partly to the ToF sensor. There are two possible ways to determine the dis- tance to the object for each pixel. The possibilities are either by measuring the travel time, the light needed from the LED back to the ToF sensor, or the PMD (Photonic Mixer Device) principle with help of the phase-shift of the light. Depending on the distance from the LED to the reflecting object and back to the sensor the light phase has changed. By measuring the shift, the distance can be determined (Figure 1.3). Most of the ToF systems use the phase-shift measurement to avoid the use of a high precision clock. But depending on the modulation frequency there is a maximum un- ambiguous distance range. E.g. for the MESA SwissRanger4000 with a modulation frequency of 30MHz the maximum distance is 5 meter. With a modulation frequency of 15Mhz the unambiguous distance is 10 meter [5]. 1.3 Application of the ToF Camera 4 Figure 1.3 PMD measurement principle. 1.3 Application of the ToF Camera The ToF point cloud of a radiation patient can be matched with the surface part of a CT-image. Together with the information of the inner body parts from the CT-image, the position of the tumor can be estimated. The body movements can be determined in real time and the distance information makes the image segmentation easy and robust. Target of this thesis is to determine how precise a point cloud obtained by a ToF camera can be registered with a surface model. 1.4 Recent Work Similar work has recently been done by Schaller [6], who observed the breathing curve of a patient. Studies about the error rate of the raw signal were done in [7]. There are also studies to improve the result of the ToF sensor, either by further calibration [8], or by a superresolution method [9] or by combining the ToF sensor with a stereo camera [10]. The ToF sensor in general has a wide field of image analysis applications, i.e. space docking [11], surveillance systems [12], robotics [13] or in the automotive industry [14]. Microsoft is developing a game console with help of a CMOS distance sensor from PrimeSense [15], which does not work with pulsed light as the PMD sensor does and is much cheaper. Chapter 2 Time-of-Flight Technology More details about the Time-of-Flight technology are provided in this chapter. The principle of the ToF sensor functionality is described in section 1.2. The benefits of a ToF camera are the distance information for each pixel and consequently the easy and robust segmentation of different objects together with a high frame rate. As part of this thesis, the CamCube 2.0 from PMD Technologies and the SwissRanger4000 from Mesa Imaging have been tested. Both ToF cameras are measuring the phase shift of the infrared light with a PMD sensor. Due to a non disclosure agreement with PMD Technologies only results from Mesa’s SwissRanger4000 are published in this thesis. Figure 2.1 A ToF acquired point cloud of a person with a keyboard in the hand, staying in front of a wall. 5 2.1 SwissRanger4000 6 2.1 SwissRanger4000 The SwissRanger4000 has a resolution of 176 x 144 pixels over 43.6◦ x 35.6◦ field of view, with a maximum of 54 fps and a maximum distance of 10 meter. To illustrate the distance of each pixel in the obtained point cloud, colors can be used. In the frame of figure 2.1 red to orange is in close distance to the camera, yellow in a medium distance and green in a higher distance. 2.2 Error Sources To receive a valuable 3D point cloud from the camera there are several conditions to fulfill. For a high precision measurement it is important that a large amount of light is reflected back to the sensor. Therefore, black surfaces are leading to unreliable results, since these absorb a large amount of light. A higher integration time allows the sensors to receive a larger amount of light but on the contrary an oversaturated pixel sensor does not provide any valuable information at all. Dark objects might return enough light to the sensor with a high integration time, but close white objects may cause oversaturated pixels. Infrared light from the LED which gets reflected over a multipath (figure 2.2) falsifies the result as well [5]. Due to the complex technology every pixel is affected by a high noise ratio, but an averaging over several pixels and/or frames can increase the accuracy significantly. Furthermore sunlight contains a large amount of light of the same wavelength as used by the ToF technology and yield in further inaccuracy, but artificial light does not provoke this problem [16]. Figure 2.2 Multipath leads to false measurements. Chapter 3 Sphere Matching Before real patients getting observed by a ToF camera, tests are required to determine how exact the camera works in an ideal environment. 3.1 Experiment Setup A single acquired pixel itself is noisy, so a better result is achievable when using several pixels. If a geometrical model of the acquired object is fitted into a point cloud, the error can be averaged over all used pixels and reduce the noise ratio. A sphere is geometrical easy to describe and looks identical from every angle and is therefore a good object to test with. Afterwards, the obtained point cloud is matched with the geometrical sphere model to receive the center of the sphere. This is done several times to determine the accuracy of the camera. (a) Phantom (b) Phantom with measurement device on top Figure 3.1 Self constructed phantom out of four table tennis balls. 7 3.1 Experiment Setup 8 3.1.1 Phantoms Four extra large table tennis balls with a diameter of 55mm are glued to an acrylic glass plane (figure 3.1a). To measure the distance between the four spheres, the camera system of Exactrac R© was used with help of a self constructed measuring tool (figure 3.1b) with known geometry. The device was used to measure the distances between the white spheres with only a small uncertainty. For the testing, three of the spheres were used to create an orthogonal coordinate system and the position of the fourth sphere in this coordinate system can be measured. The table tennis balls are quite small and in one meter distance to the camera, they return only around 80 pixels. To get a better measurement and to use a phantom which has a similar size as a human head another phantom has been used, a large styrofoam sphere (figure 3.2). Figure 3.2 Styrofoam sphere to measure with a diameter of 250mm. 3.1.2 Tested Data To be able to test the software before the ToF camera is available, an artificial point cloud has been used (figure 3.3a). The point cloud simulates the view of the camera from directly above the table tennis ball phantom (section 3.1.1) on a plate. A point cloud acquired by the ToF camera looks like in figure 3.3b. Pixels, which values were determined by a signal of low amplitude, are filtered out, as well as pixels in a larger distance than 1.2 meters to save computation time. 3.2 Calculation of the Sphere Center 9 (a) Artificial sphere image with added gaussian noise (b) ToF point cloud of the phantom Figure 3.3 Visualized input data for the sphere matching algorithm. 3.2 Calculation of the Sphere Center To determine the center points of the spheres, a sequence of calculations is necessary. The first task is to detect, which pixels of the point cloud are part of the sphere surface. To do so, a sphere center is estimated with the local minima search (section 3.2.2). Afterwards all pixels, which are located approximately in a distance of the given radius r to the estimated sphere center are used for the sphere fitting. To get a first estimation for the sphere center, at first the point cloud of the tof coordinate system is transferred to a two dimensional plane, where the (x, y) remain and the distance information is represented by a gray value. This is just an ordinary 2D image, which can be used in a standard image processing algorithm. 3.2.1 Hough Circle Transform Sonka [17] suggests the Hough transform to detect circular objects in an image. The line detecting Hough transformation (like in Gonzalez [18]) would suggest to use a three dimensional voting matrix to detect a circular object. Two dimensions for the x and y coordinate of the circle center and one for the circle radius r. An implementation of the circular Hough transformation could work like this. The original image is transformed to a binary edge image. Every edge pixels votes for the circle centers, for all possible x, y and r values, for which it could be a circle point (figure 3.4). The entry in the matrix with the most votes determines the variables x, y and r. The implementation of OpenCV works a little different. To avoid the memory and time consuming three dimensional voting matrix, it votes only for the circle center coordinates (x, y) and determines the radius r for the most probable center (x, y) after the voting has been completed. To vote for a circle center the algorithm determines 3.2 Calculation of the Sphere Center 10 Figure 3.4 Hough voting (dashed) with a fixed radius r for a circular segment (solid line). at first the angle of the potential circle and votes along a straight line orthogonal to the circular arc [19]. The OpenCV implementation of the circle detection does not yield a stable solution for the tested data, since the images of the table tennis balls have a too low resolution to contain enough border pixels to vote along their orthogonal line. A Hough transformation with an ordinary three dimensional voting matrix should work fine. But instead another simple algorithm, which is described in the next section 3.2.2 , has been implemented. 3.2.2 Local Minima Search Since an algorithm to fit a sphere with known radius r is needed anyway for the sphere fitting ( section 3.2.3), a simple algorithm has been designed to detect the center of the sphere, which will be used as a start value for the optimization algorithm. If you look at a sphere, then the center of the sphere is just r mm behind the closest spherical surface point (figure 3.5), when r is the radius. The point with the smallest distance z to the camera is the closest point. To allow several spheres in an image, a local neighborhood must be considered. To determine this point, it is just necessary to erode [17] the 2D distance image with a large kernel. Every z value which is equal to the eroded pixel with the same coordinate is a local minima for a neighborhood, with the same size as the kernel. In general the local minima is due to the noisy measurement not situated at the circular center (figure 3.6), which is treated by several iterations of the sphere fitting, which is described in the following sections. 3.2.3 Sphere Fitting Not all pixels obtained by the camera can be used for the geometrical fitting, so only the pixels which are part of the spherical surface must be segmented. To do this, the sphere candidate from the 2D image is transformed back to its original 3D coordinate. All pixels which are closer than the radius plus a tolerance of 40 mm to the candidate 3.2 Calculation of the Sphere Center 11 Figure 3.5 The closest point of the sphere to the camera is just r mm in front of the spherical center. Figure 3.6 Gray pixel represent the distance, while red pixels are a local minima. centers (a, b, c) are used for the fitting algorithm. If the candidate is the center of a sphere, then the surrounding pixels is the observed spherical surface. The segmented surface pixels (xi, yi, zi) are now used to determine a new estima- tion of the sphere center (a, b, c). The distance di(x, y, z) is the distance of each pixel (xi, yi, zi) to the spherical center (a, b, c). The objective function f(a, b, c) is defined by the squared sum of the differences of the radius and the pixel distance to the cen- ter di(x, y, z). To minimize the objective function f(a, b, c) the Levenberg-Marquardt algorithm [20] is used. The stop criterion is a small error with at most 300 iterations. For chosen examples there was a clear global minimum for f(a, b, c) without a risk for ending at a local minimum. More about optimization and the corresponding ob- jective is described in the more general case of arbitrary surface matching in section 3.2 Calculation of the Sphere Center 12 5.2.3. di(x, y, z) = √ (xi − a)2 + (yi − b)2 + (zi − c)2 min f(a, b, c) = 1 n · n∑ i=1 (√ (xi − a)2 + (yi − b)2 + (zi − c)2 − r )2 3.2.4 Candidate Filtering If some local minima have the same z coordinate they might lie close together. So, if a candidate has a distance of less than r to already found candidates, it is skipped. So if we are looking for several spheres in one frame than there is at most one candidate pixel for each sphere. For all remaining candidates, a set of surface pixels is created and fitted with a sphere (see section 3.2.3). The new coordinates (a, b, c) of the candidates are used for a second iteration. The surrounding pixels are extracted again and the objective function minimized. This is necessary since the first candidate coordinates are usually not in the spherical center due to the noise in the acquired data. The candidates with the lowest value for the objective function are selected to be the observed spheres. Nevertheless, objects with only a few pixels tend to have a lower error sum than the sphere. Therefore, all candidates which have less than a thresholded amount of surface pixels are removed from the candidate set before the selection. 3.2.5 Iterations to the Final Coordinates The selected surface pixels, which are used for the spherical fitting, have a strong influence on the calculated sphere position. On one hand, as many pixels as possible should be used to minimize the error. On the other hand, pixels which are part of the objective function but do not belong to the sphere in reality, lead to a bias to their position. To filter out outliers, only 95 percent of the pixels which are closest to the estimated spherical surface are used (compare figure 3.8). Several iterations of the Levenberg-Marquardt algorithm are used to determine the sphere position and every time the surrounding pixels are newly determinated since they might change for a new estimated spherical center. This continues until the position of the sphere moves less than 0.1 mm. Finally the summary of the complete algorithm is displayed in figure 3.7. Fur- thermore the algorithm is presented in the following abstract code with a simplified program structure. Note that all distance and coordinate values are in mm. 3.2 Calculation of the Sphere Center 13 Figure 3.7 Diagram of the sphere fitting algorithm. 3.2 Calculation of the Sphere Center 14 Figure 3.8 Sphere (black center point) is fitted into the selected pixels (green) with the removed percentile (red). Algorithm 3.2.1: ExtractSphereCenter(p, r, n) comment: Extracts n centers of spheres with radius r from the point cloud p (image2d,mapTo3D)← convertTo2D(p) minImage← erodeWithLargeKernel(image2D) possibleCenter ← Emtpy for each (x, y) ∈ image2d do  if minImage[x, y] = image2d[x, y] if distance3D(mapTo3d(x, y), possibleCenter) < r possibleCenter.add(mapTo3d(x, y)) for each pC ∈ possibleCenter do  pC.z← pC.z− r surfacePoints← getPointsCloserThan(pC, r + 50) pC ← fitSphere(SurfacePoints) for each pC ∈ possibleCenter do  surfacePoints← getPointsCloserThan(pC, r + 50) if (surfacePoints.size < minPixel) then remove(pC) else pC ← fitSphere(SurfacePoints) Keep only n possibleCenter points, with the lowest error in the fitSphere optimization for each pC ∈ possibleCenter do  while Sphere Center is Moving do  surfacePoints← getPointsCloserThan(pC, r + 50) SurfacePoints← KeepBestQuartile(pC, SurfacePoints, 0.95) pC ← fitSphere(SurfacePoints) return (pC) 3.3 Result 15 For the phantom with four visible spheres (figure 3.1) it is easy to identify which sphere is A, B, C and D. The longest distance between the spheres is AD and the shortest is CD. So it is just necessary to extract the sphere which is closest and farthest away from the neighboring spheres at the same time to get D. A is the remaining with the farthest distance, C with the shortest distance and B must be the sphere is left over. 3.3 Result All tests, which were done with spherical objects and a single camera, are published in this section. Two phantoms were used, the four table tennis balls (figure 3.1) and the styrofoam sphere (figure 3.2). All measurements were done with a camera with at least one hour time to heat before the measurement to run the camera with a constant temperature. 3.3.1 Static Observation Tests, when neither the camera nor the phantom has moved are presented in this section. A sequence of 100 frames was acquired with the table tennis phantom at a distance of 1.1 meter. The standard deviation of a single sphere was around 0.7mm in the (x, y) coordinate and 0.8mm in the z distance. The distances between the spheres had an absolute error of circa 2mm and a relative error of circa 0.008mm. All with the premise, that the reference measurement underlies an absolute error of around 0.5mm. Furthermore the standard deviation of sphere D was observed for a orthonormal coordinate system defined by the three other spheres with B as the origin and the axes e1 := AB ‖AB‖ , e2 := w2 ‖w2‖ with w2 := CB − 〈CB, e1〉, and e3 := e1 × e2. The standard deviation for the coordinate of D is 2.05mm in e1, 2.22mm in e2, and 1.72mm in e3 direction. All further measurements were done with the styrofoam Phantom, whose size is closer to the size of a human head, which would be one possible object for the final application. The plotted positions of the calculated center for a static observation are displayed in figure 3.9. The distribution seems reasonable, without clusters. The (x, y, z) coordinate over several frames had a standard deviation of 0.3mm in (x, y) and of 0.8mm in z direction. The histogram in Figure 3.10 shows the distance of the surface pixels of one frame to the fitted sphere surface. Pixels, which had a distance of less than sphere radius plus 20mm to the estimated center of the sphere were used for the fitting and plotted in the histogram. The pixels are symmetrical distributed, which is no surprise since the sphere was fitted into this point cloud. Nevertheless the histogram looks convenient 3.3 Result 16 Figure 3.9 Position of the fitted centers of the spheres. and shows the noise for a single pixels. The standard deviation of distance to the spherical surface is 4.1mm. Figure 3.10 Distance of the measured surface points to the fitted sphere surface. 3.3.2 Shifted Sphere After the studies of the random error for the measurement, the focus is now how exact a change in position can be determined. The styrofoam phantom was positioned 1m away from the camera and 80 frames were acquired. The phantom was moved 100mm closer to the camera and another 80 of frames were acquired. After a median filtering both positions were compared. For a median filter of size 80 the absolute error for 3.3 Result 17 the 100mm shift was 0.5mm. For a median filter with a size of 4 the error is up to 1mm (figure 3.11) and for a smaller median filter even larger. Figure 3.11 Measurement Error for a 100mm shifted sphere. 3.4 Discussion 18 3.3.3 Temperature Influence The temperature of the camera also has an influence on the measured values. The camera and the styrofoam phantom were set in a fixed position while the position were observed by the camera. For the first experiment (figure 3.12a) the camera was started with room temperature and heated itself up to circa 40◦C. It took about one hour until equilibrium was reached. During this time, the calculated z coordinate changed 10mm before it stabilized. The coordinates x and y were only affected by a minor change. In the second experiment about the influence of heat, the camera started with operating temperature (40◦C) and got cooled by a small ventilator for certain time periods. The start and the end of the cooling phase are plotted with a red circle. With start of the cooling phase the measured distance increases at first by two millimeter and drops afterwards significantly by 7mm less than at the start (figure 3.12b). The jump of two millimeters might be explainable by the temperature sensor in the cam- era, which is used by the camera to correct the measured values. After the cooling stops, the distance value increases slowly to the original value. The process until the original distance value is achieved takes about 50 minutes. The influence to the x and y coordinates is with less than 2mm rather small compared to the z distance (figure 3.12c and 3.12d). 3.4 Discussion The absolute accuracy does not matter so much. For the application is the relative vector of the object to the LINAC focus relevant. The accuracy of a static observation with a standard deviation of 0.3-0.8mm and an error of 0.5mm for a 100mm shift are quite promising and justify further studies. The error through multipath should not be a big problem in the therapy room, since the camera will be fixed at one position and the room inventory will change only slightly. Multipath on the patient skin might be a problem, which could be solved by using two cameras, such that at every relevant body part, one camera always has a clear view. Contrary, the temperature has a too strong influence on the camera to be able to ignore it. At first the camera can be supplied with power one hour before the therapy starts, to heat up to the working temperature. But the camera should also return feasible positions when the temperature in the room is changing. One solution would be with control engineering to keep the camera body temperature constant. Another one would be the use of two cameras, which is discussed in the next chapter, particularly in section 4.3.2. 3.4 Discussion 19 (a) Time to reach operating temperature. (b) Temperature effect on the z cooridnate. (c) Temperature effect on the x cooridnate. (d) Temperature effect on the y cooridnate. Figure 3.12 Temperature influence for static observations. Chapter 4 Stereo Camera System What is better than a ToF camera? Two cameras? This chapter is about how two cameras can increase the accuracy for the patient positioning. 4.1 Experiment The same Styrofoam sphere was used as in chapter 3 to measure the accuracy of the camera system as well as for the calibration. The cameras were fixed on an aluminum frame at top of a table. The angle between the cameras was about 50◦ degrees. Both cameras acquired independently positions of the sphere center with the algorithm of chapter 3. The next sections are about how those positions can be combined. Furthermore at the result section 4.4 are interferences between the cameras described. Figure 4.1 Stereo camera system on top of a paper stack recording the styrofoam sphere. 20 4.2 Transformation Matrix 21 4.2 Transformation Matrix To transform the a point x1 in the coordinate system of one camera to a point x2 in the coordinate system of another camera a transformation matrix T can be used x2 = T · x1 (Bronstein [21]). For the three dimensional space there are six degrees of freedom for the transformation matrix T (x, y, z, α, β, γ), three translations x, y and z and the three rotations α, β and γ. While x, y and z are the translations along the axes, α, β and γ are the rotations around the axes. A translation cannot be expressed by a 3 by 3 transformation matrix. With extended coordinates x = (x, y, z, 1) the translation can be expressed by a linear transformation T̃ . The transformation matrix T can be split in several parts, since the matrix multiplication is associative. So we have the following matrix T (x, y, z, α, β, γ) = T̃ (x, y, z) ·Rx(α) ·Ry(β) ·Rz(γ) with T̃ (x, y, z) =  1 0 0 x 0 1 0 y 0 0 1 z 0 0 0 1  Rx(α) =  1 0 0 0 0 cosα − sinα 0 0 sinα cosα 0 0 0 0 1  Ry(β) =  cos β 0 sin β 0 0 1 0 0 − sin β 0 cos β 0 0 0 0 1  Rz(γ) =  cos γ − sin γ 0 0 sin γ cos γ 0 0 0 0 1 0 0 0 0 1  4.3 Algorithms With help of the transformation matrix it is possible to transfer the coordinate system of one camera to the other camera system. In the next section 4.3.1 is described how to obtain such a transformation matrix. In the following section 4.3.2 is described how to combine the two coordinates of the sphere in the best way. 4.4 Result 22 4.3.1 Calibration For the calibration of the cameras, we need a transformation matrix T which allows us to transform the position of an object in the right camera system pB to the position the same object hast in the left camera system pA. It follows, that pA = T · pB plus a measurement error. With three known points (p1, p2, p3) it is possible to calculate a unique transformation matrix T . The error g of the transformation should be as small as possible. g = ∑ i={1,2,3} (T · piB − piA) To get the transformation matrix are three positions of the sphere acquired and the error of the calibration g is minimized to get T . To minimize the error of the calibration, the median position of 100 frames was used and the spheres were not placed inside a plane. 4.3.2 Combining the Positions With help of T it is now possible to transform all coordinates from the left camera to the right camera. This should be used to increase the accuracy of the determination of the position of an object. Camera A has acquired the position pA. Camera B has acquired the position p̃B = T · pB for the same object. Since each measurement contains errors, both positions are not equal. One possibility to half the error, would be to average over both positions p = 1 2 pA + p̃B. We know that the (x, y) coordinate is more accurate than the z coordinate of pA and p̃B from section 3.3. With this premise pS can be calculated as the position where the lines (A, pA) and (T ·B, T ·pB) intersect (figure 4.2) with A the position of the left camera and B the position of the right camera. Since two lines are not intersecting in general in the three dimensional space we take pS in the middle of the line which is the shortest distance between both lines. 4.4 Result A ToF camera sends a pulsed light impulse out and when it comes back, it measures the phase of the light. Therefore other light sources might disturb the signal. Other light sources like the second camera. A minimal disturbance is achieved if both cameras are set to a different modulation frequency and an equal integration time. Nevertheless, there is a small disturbance measureable if both cameras are operating at the same time. For the experiment setup in figure 4.1 the standard deviation of the x coordinate doubled, while the y and z coordinate remained unchanged (compare section 3.3.1). 4.5 Discussion 23 Figure 4.2 Possible combination of a stereo camera system. Standard deviation [mm] x y z 1.3 0.7 1.0 The transformed position of an object in camera B should be equal to the position of the object in camera A. There are always errors present in measurements, such that the positions are not exactly the same. 100 frames of the Styrofoam phantom were taken and the positions pA and (p̃B = T · pB) compared. The medium distance was 1.5mm and the largest distance 3mm. The two points pA and p̃B combined yield the position pS. Figure 4.3 shows a test about the influence of temperature on pS. Camera A and B were observing the Styrofoam sphere. After 5 minutes camera A got cooled for 10 minutes. The x coordinate of pS varies in a range of 2mm and is only dependend on pA and not on pB. The y coordinate varies in range of 0.6mm and is an average out of pA and pB. The z coordinate varies in range of 3mm and depends only on pB. 4.5 Discussion The range of the z coordinate for the stereo camera system decreased to 3mm com- pared to 7mm with a single camera through cooling with the stereo camera. Contrary the x coordinate is not as accurate as with a single camera. This is explainable with the higher standard deviation for the cameras in the x coordinate due to interference between the cameras. The interference only in the x coordinate, is explainable since the cameras are positioned along their x-axis. To ensure this, tests with other camera positions could be done. The strong dependency of the coordinates of pS in figure 4.3 might be surprising at first glance. Actually it is exactly what was intended. At first, the z coordinate 4.5 Discussion 24 of pA is not used, instead the information of pB is used. Secondly, the z coordinate of pB is not used. The plot displays the coordinates in the camera system of camera A, or in other words pB has been transformed with T . x̃B depends on zB due to the camera positions. That means, that the distance information z is not used in this system, which is actually the big benefit compared to a regular RGB camera. Instead of a noisy ToF stereo system a stereo system with regular RGB cameras could be used. Only the detection of object without the distance information would be more difficult but the whole system could achieve a higher accuracy. One step back, the transformed position (T · pB) and pA of the same object have a median distance of 1.5mm. This is quite high and might be due to the calibration. Several tests could not improve the results, neither considering more frames for determining the calibration positions (p1, p2, p3) nor the use of more calibration points. If the calibration positions are all inside a plane, the results are a little worse. A smarter calibration might still yield in an increased accuracy to calculate the position of an object. Regardless, two cameras still increase the field of view, which is useful to recognize rotations and allows combining the information of two cameras for objects which have a shape which causes multipathing. How to calculate the rotation of a object between two images a registration of arbitrary point clouds is necessary, which is the topic of the next chapter. 4.5 Discussion 25 (a) Temperature effect on the x cooridnate. (b) Temperature effect on the y cooridnate. (c) Temperature effect on the z cooridnate. Figure 4.3 Temperature influence on the stereo ToF camera system in the coordinate system of camera A. Chapter 5 Surface Matching After the matching of spherical objects, it is time to handle the surface matching for arbitrary surfaces. 5.1 Experiment At first a reference image of a head dummy (figure 5.1a) was acquired at the distance of one meter. Afterwards, the dummy was moved a determined distance or rotated by 5◦, 10◦ or 15◦ degrees. (a) Head dummy on top of a ro- tation device. (b) Torso dummy out of plastic. (c) Registration setup for a human body. The ToF cam- era is marked with a red ar- row. Figure 5.1 Tests for the Surface Registration. Another shift was simulated with a plastic phantom (figure 5.1b) at a camera distance of 1m and a angle of about 50 degrees from above. For the third setup a test person was lying on a couch with an easy breathing (figure 5.1c). The pixels which belong to the human surface could be easily extracted by using only pixels with high 26 5.2 Optimization Algorithm 27 amplitude. The black couch and objects in a large distance and bad reflectivity could be removed for the calculation. The electronically couch was moved a specific shift and compared with the calculated result to determine the accuracy of the registration. The calculation of the shift was done with the following algorithm. 5.2 Optimization Algorithm To determine the distance and rotation between two surface point clouds f1 and f2 a transformation matrix T (x̄) must be calculated, such that the distance between surface f1 and the transformed surface (T (x̄) · f2) is minimized. To do this, it is necessary to define a objective function g(x̄), which describes the distance between the point clouds and to use an optimizer, which finds the best translation and rotation values x̄ for the transformation matrix T (x̄) to minimize g(x̄). In section 5.2.1 the distance function d(p, f) for a point p to a point cloud f is defined. In section 5.2.2 these distances are summarized to define the distance d(f1, f2) between two point clouds f1 and f2 yield finally to an objective function g(x̄). At last, in section 5.2.3 an algorithm is presented to calculate a transformation matrix T (x̄) such that the objective function and therefore the distance between f1 and (T (x̄) ·f2) is minimized. 5.2.1 Point to Point Cloud Distance There are several ways to define the distance d(p, f) between a point p and point cloud f . The easiest distance function would be d(p, f) = ‖p− f(i)‖, such that f(i) is the closest point of the point cloud f to the point p (figure 5.2a). The only problem is, that the space between the pixels of the point cloud is empty, although the surface of the real object is there. Since the acquired pixels are approximately arranged on a grid, all the pixels p would snap to the reference grid f (figure 5.2b) and no subpixel accuracy could be achieved. To fill the space between the pixels for the modeled distance a linear approximation can be used. For each pixel p, the three closest pixels f(i), f(j) and f(k) are computed and the distance d(p, f) is defined by the distance of p to the triangular subspace t(i, j, k) determined by f(i), f(j) and f(k) (figure 5.2c). In some cases it is possible that some grid pixels are missing. This might be the case, because of a surface with bad reflectivity or because the point is hidden by another object in front, for instance the nose can cover a part of the cheek. In this case the three closest neighbors would not interpolate the correct surface t(i, j, k) and yield in an error (figure 5.2d). Nevertheless this simple distance function is used for the algorithm due to limited time and the possibility to manually avoid these bad cases in test situations. For the final application a triangulation should be computed and the distance between p and the closest triangular subspace should be used. Lee and Schachter [22] describe how to compute a feasible Delaunay triangulation. 5.2 Optimization Algorithm 28 (a) Point to Point distance (b) The Points would snap into the reference grid (c) Point to Triangle constructed through the 3 closest points (d) A bad case for the Point to Triangle distance. Figure 5.2 Distance of a point (red) to a point cloud (blue). 5.2.2 Objective Function The distance d(f1, f2) of the first point cloud f1 to the second point cloud f2 can be defined by the quadratic sum of the distances d(pi, f) of the points of one cloud pi ∈ f1 to the other cloud f2. d(f1, f2) = 1 |f1| ∑ pi∈f1 d2(pi, f2) Since the objective is to calculate the translation and rotation between f1 and f2 we compute the transformation matrix T (x̄) with x̄ = (x, y, z, α, β, γ) (more about trans- formation matrices in section 4.2) which minimizes the distance between (T (x̄) · f1) and f2. min x̄∈R6 g(x̄) = 1 |f1| ∑ pi∈f1 d2 ( T (x̄) · pi, f2 ) 5.2 Optimization Algorithm 29 Only one side of the object is reflecting information to the camera, so there are no pixels on the backside of the object and a limited amount of pixels on the side. To allow an object rotation, the objective function needs to skip pixels which are only in the field of view of one frame f1 or f2. The smiley in figure 5.3(a-b) visualizes that it is not useful to match all pixels on the margin. For a correct matching it is more important to match eye to eye and mouth to mouth. To allow this constraint, all pixels of f1, which are farther away than 7mm of f2 are not cumulated to the distance sum, instead they increase the sum by a quadratic penalty factor ( |f1| |f ′ 1| )2 (compare figure 5.3c). The objective function becomes min x̄∈R6 g(x̄) = 1 |f1| ( |f1| |f ′1| )2 ∑ pi∈f ′ 1 d2 ( T (x̄) · pi, f2 ) with pi ∈ f ′1 if d(pi, f2) < 7mm and the distance function d(pi, f2) from section 5.2.1, which is the distance of pi to the triangular area constructed by the three closest points of f2 to pi. (a) Smiley (b) Rotated smiley (c) f1 to f ′ 1 proportion Figure 5.3 5.2.3 Approximation Algorithm To keep the running time feasible, the optimizer should minimize g(x̄) with as few calculations of g(x̄) as possible. One good algorithm for this task is the UOBYQA (unconstrained optimization by quadratic approximation) algorithm by Powell [23]. It is also popular, since it is robust for noisy data. That is another benefit for the noisy data from a ToF camera. UOBYQA optimizes the values x̄ ∈ Rn for a given objective function g(x̄). The algorithm starts with a given starting value x̄0 and calculates the corresponding ob- jective value g(x̄0). For each dimension of x̄ the algorithm changes one entry of x̄ by 5.3 Result 30 +ρstart and −ρstart. For the changed values of x̄, it calculates the new g(x̄). That are 2n calculations of g(x̄) for a dimension of n = Dim(x). With these values, UOBYQA uses a quadratic approach to estimate a new vector x̄i and continues to alter the entries by ±ρ. This continues until g(x̄) cannot be decreased any more by a step size of ρ and reduces therefore the step size ρ. The search is done until a before specified ρend is reached, where ρend must be less than ρstart. Back to the matching of the two surface point clouds f1 and f2. We look for the translation and rotation values x̄ such that the transformation matrix T (x̄) minimizes the distance of f1 and T (x̄) · f2 for the objective function g(x̄). This calculation is implemented by the UOBYQA algorithm even though an approximation algorithm might result only in a local minimum. It is not possible to try every single possible value of x̄ to minimize g(x̄). UOBYQA does not differ between the units of [mm] and [◦] for the different variables, but it allows a scaling. The scaling has been chosen for rotation and translation: sα = sβ = sγ = π · 1 180 and sx = sy = sz = 4 [mm] such that 1◦ has the same weight as 4mm. The parameter ρstart is according Powell supposed to be one tenth of the expected maximal change. A ρstart of 1.0 is suitable for changes of the starting value x0 for rotations up to 20◦ and movements up to 40mm. 5.2.4 Starting Value A bad starting value x̄0 could lead to end up in a local minima at the end of the optimization. The difference (∆x,∆y,∆z) between the centroids of the point clouds f1 and f2 is a good approximation for the translation (x0, y0, z0) ∈ x̄0. The imple- mentation of the starting value accelerated the calculation by 10% but changed the result for T (x̄) only insignificantly. (∆x,∆y,∆z) = 1 |f1| ∑ (pi)∈f1 pi − 1 |f2| ∑ (pi)∈f2 pi with pi ∈ R3 The second moment can be used to get a second vector (∆x̂,∆ŷ,∆ẑ). Together with the centroid shift there are two vectors to estimate the rotations (α0, β0, γ0). (∆x̂,∆ŷ,∆ẑ) = 1 |f1| √ ∑ (pi)∈f1 p2 i − 1 |f2| √ ∑ (pi)∈f2 p2 i 5.3 Result The results for rotations of the Styrofoam head are presented in table 5.1. The head was rotated around the y-axis. The nose moved from the middle to the right. With the help of a coordinate transformation, the origin was set to the center of the head to be able to compare the rotation values. The translation (x, y, z) is summarized with the Euclidian distance of the coordinates (d = √ ∆x2 + ∆y2 + ∆z2). Tests, acquired 5.3 Result 31 case translation [mm] expected (β) α β γ difference (β) 1 1.2 5.0 -0.2 3.8 0.2 -1.2 2 1.3 -5.0 -0.4 -3.7 0.0 -1.3 3 2.5 10.0 0.7 7.5 0.0 -2.5 Table 5.1 Rotations for the Stryofoam phantom in degrees [◦] with measured translation [mm] case expected measured difference 1 100.0 97.9 -2.1 2 50.0 37.2 -12.8 3 5.0 4.9 -0.1 4 5.0 4.7 -0.3 5 10.0 8.1 -1.9 Table 5.2 Translation for a plastic (case 1 & 2) and Stryofoam phantom (case 3-5) in [mm] at 0.5m instead of 1.0m distance between camera and phantom improved the result only slightly. Rotations were less than 0.5◦ for tests with pure translations. The difference for translation expermients with and without enabled rotations were less than 2mm. Since the rotation did not work that well, it was disabled. That way, rotation could not be the reason to be the error source in the later measurement and the degrees of freedom were reduced to three. The results for translation tests of the Styrofoam head and the plastic body are presented in table 5.2. The phantom was moved a certain distance with help of a scaled paper. The difference between the real and the measured distance are given for the Euclidian distance. For the next test, the test setup was like in figure 5.1 with a patient lying on the couch while the ToF camera is acquiring images. The couch was moved electronically a specific distance and compared with the measured result. There are some tests which worked fine, and two tests with an error of more than 10 mm. In the table below the results of seven tests with the Euclidian distance are displayed. Figure 5.4 shows (x, y, z) of the translation vector for a breathing patient as in the test setup figure 5.1. The (x, y) translation for registration is mostly lower than 0.5mm. Due to breathing the z translation moves up and down in a range of 4mm, which might got boosted by the noise ratio, which is strongest in the z coordinate (section 3.3.1). 5.4 Discussion 32 case expected measured difference 1 11.5 14.2 2.7 2 27.6 16.8 -10.8 3 27.0 21.9 -5.1 4 4.0 2.4 -1.6 5 11.0 10.7 -0.3 6 12.0 10.7 -1.3 7 64.3 51.7 -12.6 Table 5.3 Translation for a human body in [mm] Figure 5.4 Registration of a silent lying and breathing human. 5.4 Discussion The experiments for the registration of human body parts are partly very good and partly too bad for later applications. The calculated rotation angle with an error of 20% is not acceptable. The translation achieved some good results, especially for shifts in a short distance. But there are also some outliers, which must be investigated and excluded in later tests. Due to time limitation this could not be achieved in this thesis. Still, here are some ideas presented which can improve the result and some reasoning to justify parts of the algorithm. The plots of f(x̄) let assume, that UOBYQA finds the global minimum or at least a local minimum close to the global minimum. Since it is difficult to visualize a 6 dimensional space, in figure 5.5 are only the variables x and β, where x is the translation along the x-axis and β is the rotation around the y-axis displayed for a registration of an β=10◦ rotated head. The found optimum is at the coordinate (x, β) = (0, 0). (x, β) are the most correlated variables and plots for other pairs of variables show a more clear valley for g(x̄) than the (x, β) plot. 5.4 Discussion 33 (a) Calculated points. (b) Contour plot Figure 5.5 g(x̄) with modified values for x and β. The point clouds of a body registration are visible in figure 5.6. f1 is cyan, (T (x̄) · f1) is blue and f2 is red, such that g(x̄) is minimized. The registration looks reasonable, since f1 was shifted towards the right arm. Neverthenless, the calucated shift was only 21.9mm instead of 27mm (case 3 in table 5.3). Figure 5.6 Point clouds for a shift towards the right arm. 5.4 Discussion 34 There are several promising improvement undone which can significantly improve the final result. Like the start value for rotation (α0, β0, γ0) (section 5.2.4), which is not implemented so far. Furthermore the penalty factor of the objective function, which leaves pixels unmatched if they have distance of more than 7mm (section 5.2.2). It filters successfully outliers out, but it did not succeed in leaving border pixels unmatched, as it would be necessary for a rotation (compare the smiley figure 5.3(a- b)). Also a better result could be obtained by removing some of the matched pixels before they are used for the matching. The pixels at the object border are noisy and maybe not visible at all at the other frame. So a better match is possible without them. In figure 5.7, only the pixels in the red frame could be used to match with another point cloud, instead of all pixels. Figure 5.7 Removal of margin pixels for a better matching. Chapter 6 Patient Surveillance One application which is easy to realize with the help of a ToF camera is the ob- servation of a patient. Just before the treatment starts, a reference image is taken. Afterwards, the live stream visualizes movements of the patient to the clinical per- sonal. The clinical personal is now able to stop the treatment in case the patient moved out of the LINAC focus. To do so, the camera is mounted above the patient. The distance values for each pixel are subtracted from the reference image and displayed in green and red (figure 6.1). Green in this case means, that the pixel is closer to the camera than in the reference image. Pixels, which are farther away, are red. To segment the patient, only pixels are taken, which have either in the reference or in the live image an amplitude value higher than a threshold of 3000. The background is black and has therefore a low amplitude. Furthermore this filters out a big part of the noisy pixels. Additionally a improved version of the surface registration (chapter 5) can display the movements in digits. 35 36 (a) Amplitude Image (b) Ground noise due surfaces with bad reflection (c) Breathing (d) 10mm body shift towards the head (e) 20mm body shift towards the head (f) Additionally lifted arm Figure 6.1 Pictures from an observation video. Chapter 7 Conclusion The ToF camera is able to register translations of large spherical objects with a medium error less than 1mm. Together with a frame rate which enable real time applications with a resolution of several thousand pixels the camera is very promising for new patient monitoring systems in the field of radio therapy. The visualization of the patient movements worked fine. Such an application, which makes an emergency stop as soon as a patient makes an unexpected movement, is realizable. A gating system is also possible, a system which pauses the treatment as soon as the breathing moves the tumor out of focus. Only the influence of temperature fluctuations must be controlled by a camera case with constant temperature. For certain positioning of the patient, the registration of the camera for some steady inner body parts is yet not precise enough. An X-ray scan matched with a CT-image is more precise. It has an error less than 1mm. On the contrary, a sub millimeter precision is useless for moving body parts. So a ToF camera with a high frame rate is more suitable for treatments with a tumor in a moving body part. It is not possible to state the accuracy of the registration of arbitrary surfaces in figures yet. There are some good results, but also too many outliers. At Brainlab, further research will be done in order to improve the registration algorithm. Furthermore, the developers of the cameras are creating new models, which are getting cheaper and more precise in order to improve the results. In general, the ToF camera is a very promising technology which could be used in a large variety of application and open new application fields for image analysis. 37 Bibliography [1] W. Dörr, Klinische Strahlenbiologie - kurz und bündig (Urban & Fischer bei Elsev, 2006). [2] Brainlab, “Radiosurgery,” http://oncology.brainlab.com/ (Accessed Januar 25, 2010). [3] VisionRT, http://www.visionrt.com/ (Accessed Januar 25, 2010). [4] M. Krengli, S. Gaiano, E. Mones, A. Ballarè, D. Beld̀ı, C. Bolchini, and G. Loi, “Reproducibility of patient setup by surface image registration system in conformal radiotherapy of prostate cancer,” Radiation Oncology 4:9 (2009). [5] Mesa, “Knowledgebase,” http://www.mesa-imaging.ch/ (Accessed Januar 25, 2010). [6] C. Schaller, A. Adelt, J. Penne, and J. Hornegger, “Time-of-flight sensor for patient positioning,” In Proceedings of SPIE, 7258 (2009). [7] M. Frank, M. Plaue, H. Rapp, U. Köthe, B. Jähnea, and F. A. Hamprecht, “Theoretical and Experimental Error Analysis of Continuous-Wave Time-Of- Flight Range Cameras,” Optical Engineering 48 (2009). [8] H. I. T. Kahlmann F. Remondino, “Calibration for increased accuracy of the range imaging camera swissranger,” In International Archives of Photogramme- try, Remote Sensing and Spatial Information Sciences, ISPRS Commission V Symposium, volume XXXVI, pp. 136–141 (2006). [9] S. Schuon, C. Theobalt, J. Davis, and S. Thrun, “LidarBoost: Depth superreso- lution for ToF 3D shape scanning,” In , pp. 343–350 (IEEE, 2009). [10] C. Beder, B. Bartczak, and R. Koch, “A Combined Approach for Estimating Patchlets from PMD Depth Images and Stereo Intensity Images,” In DAGM- Symposium, pp. 11–20 (2007). [11] P. Khongsab, “Signal Processing and Performance Evaluation of a PMD Camera for Space Docking,” Master thesis (Lule̊a tekniska universitet, 2009). 38 http://oncology.brainlab.com/ http://oncology.brainlab.com/ http://www.visionrt.com/ http://www.visionrt.com/ http://www.mesa-imaging.ch/ http://www.mesa-imaging.ch/ BIBLIOGRAPHY 39 [12] F. Tombari, L. Stefano, S. Mattoccia, and A. Zanetti, “Graffiti Detection Using a Time-Of-Flight Camera,” In ACIVS ’08: Proceedings of the 10th International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 645–654 (Springer-Verlag, Berlin, Heidelberg, 2008). [13] S. May, B. Werner, H. Surmann, and K. Pervölz, “3D time-of-flight cameras for mobile robotics,” In IROS, pp. 790–795 (IEEE, 2006). [14] S. Hsu, S. Acharya, A. Rafii, and R. New, “Performance of a Time-of-Flight Range Camera for Intelligent Vehicle Safety Applications,”, 2006. [15] PrimeSense, “PrimeSense Supplies 3-D-Sensing Technology to Project Natal for Xbox 360,” http://www.primesense.com/ (March 31, 2010; Accessed April 27, 2010). [16] H. Rapp, “Experimental and Theoretical Investigation of Correlating TOF- Camera Systems,” Diploma Thesis (University of Heidelberg, 2007). [17] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis, and Machine Vision, 2 ed. (Chapman & Hall, 1998). [18] R. C. Gonzalez and R. E. Woods, Digital image processing (Prentice Hall, Upper Saddle River, N.J., 2008). [19] G. Bradski and A. Kaehler, Learning OpenCV: Computer Vision with the OpenCV Library (O’Reilly, Cambridge, MA, 2008). [20] K. Levenberg, “A method for the solution of certain nonlinear problems in least squares,” In The Quarterly of Applied Mathematics 2, pp. 164–168 (1944). [21] I. Bronstein, K. Semendjajew, G. Musiol, and H. Mühlig, Taschenbuch der Math- ematik, 5. auflage ed. (Verlag Harry Deutsch, Thun und Frankfurt am Main, 2001). [22] D. T. Lee and B. J. Schachter, “Two algorithms for constructing a Delaunay triangulation,” International Journal of Parallel Programming 9, 219–242 (1980). [23] M. Powell, “UOBYQA: unconstrained optimization by quadratic approxima- tion,” Mathematical Programming 92, 555–582 (2000). http://www.primesense.com/ http://www.primesense.com/ Title Abstract Table of Contents 1 Introduction 1.1 State of the Art 1.2 Time-of-Flight Camera 1.3 Application of the ToF Camera 1.4 Recent Work 2 Time-of-Flight Technology 2.1 SwissRanger4000 2.2 Error Sources 3 Sphere Matching 3.1 Experiment Setup 3.1.1 Phantoms 3.1.2 Tested Data 3.2 Calculation of the Sphere Center 3.2.1 Hough Circle Transform 3.2.2 Local Minima Search 3.2.3 Sphere Fitting 3.2.4 Candidate Filtering 3.2.5 Iterations to the Final Coordinates 3.3 Result 3.3.1 Static Observation 3.3.2 Shifted Sphere 3.3.3 Temperature Influence 3.4 Discussion 4 Stereo Camera System 4.1 Experiment 4.2 Transformation Matrix 4.3 Algorithms 4.3.1 Calibration 4.3.2 Combining the Positions 4.4 Result 4.5 Discussion 5 Surface Matching 5.1 Experiment 5.2 Optimization Algorithm 5.2.1 Point to Point Cloud Distance 5.2.2 Objective Function 5.2.3 Approximation Algorithm 5.2.4 Starting Value 5.3 Result 5.4 Discussion 6 Patient Surveillance 7 Conclusion Bibliography