Figure 1.1: Application of 3D head tracking in a vehicle environment. A successful head tracking application should be robust to significant head motion, Figure 3.2: Influence of surface normals on matching (taken from [1}) Figure 3.2 shows how using surface normals help to find better correspondences. Performance optimization using kd-trees Figure 3.3: Example of a 2D kd-tree construction on the left-hand side of A possess smaller x coordinates and all the points on the right-hand side of A have greater x coordinates. The median point E according to the next key, in this case y, is added to the tree as a left child of A. The construction of the tree continues iteratively until all point are assigned to a node in the tree. of the tree continues iteratively until all point are assigned to a node in the tree. and when rewritten in matrix representation as described in (3.21), Figure 4.1: Difference of Gaussians are computed from a pyramid of Gaussians. Adjacent Gaussian images are subtracted to produce a difference of Gaussian(DoG) images. Figure 4.2: Maxima and minima of the DoG images are detected by comparing the pixel of interest by its 26 neighbors of the current and adjacent scales. Figure 4.3: SIFT Descriptor. For each pixel around the keypoint gradient magni- tudes and orientations are computed. These samples are weighted by a Gaussian and accumulated into 16 orientation histograms for the 16 subregions. The SIFT features for the current and the previous frame are extracted, creating two sets of features F, and F. Feature matching is then performed between the features in each of the two sets. For each feature in F, the nearest neighbor is found in F,, where the nearest neighbor is defined using some distance measure based on feature descriptors or other properties such as scale and orientation. In order to speed up the matching, nearest-neighbor algorithm on a 128 dimensional kd-tree is applied. Figure 4.4: Matching of keypoints Figure 5.1: Block diagram of the tracker system algorithm to estimate the pose change between the previous and the current frame. Figure 5.1 presents the block diagram of the whole tracker system. Prior to running All the tracking algorithms, either reviewed or proposed in this thesis are differential methods that can only estimate the pose change between consecutive frames. In order to obtain the pose of the head at a given time ¢ relative to the first frame. pose differences between adjacent frames need to be accumulated. However since each pose change measurement is noisy, the accumulation of these measurements becomes noisier with time, potentially resulting in an unbounded drift. [2] All the tracking algorithms, either reviewed or proposed in this thesis are differential Table 5.1: Computational complexity of the 3 tracking algorithms. Without special optimizations, the SIFT tracker can update the pose changes at around 2 frames per second on a Celeron 1500Mhz Laptop with 512MB of RAM. Table 5.1 presents the time complexity comparison of the SIFT algorithm against NFC and ICP algorithms. Table 5.2: Performance results of the trackers over synthetic video sequences For testing the performance of our tracker we have used a hardware based tracker, Polhemus Fastrak [37] as the ground truth. The tracker consists of 3 parts: System Electronics Unit (SEU), a transmitter, and a receiver. SEU contains the hard- ware necessary to generate and sense the magnetic fields, compute position and orientation, and interface with the host computer via a RS-232 or a USB interface. The transmitter contains electromagnetic coils and is responsible for emitting the magnetic fields. The transmitter is the system’s reference frame for receiver mea- surements. The receiver unit contains the magnetic coils that detect the magnetic field emitted by the transmitter. Polhemus is a 3D motion tracker with 6 degrees of freedom. The tracker has an accuracy of 0.076cm RMS with 0.0005 cm resolution for the position and an accuracy 0.15° RMS with an 0.025° resolution for the ori- entation. In a laboratory environment with computers and metal furniture around, we observed lower accuracy than the specifications. However, the ground truth captured was still visually good enough to evaluate the head tracking algorithms considered in the thesis. Polhemus tracker must be converted to the coordinate system of the vision based ‘rackers. As shown in Figure 5.6, we have positioned the camera and the magnetic ransmitter as close as possible to minimize the offset between the two coordinate systems, and aligned them so that the two coordinate systems are parallel to each ther. The position vector acquired through the Polhemus tracker is translated vlong the Polhemus’ orientation vector by 10cm so that the ground truth position s roughly aligned with the neck’s pivot of rotation. At the initialization and mode juilding step, the center of the head is calculated by both the Polhemus tracker and che vision based trackers, the offset between the two coordinate systems is calculated 1nd the measured offset is added to the magnetic tracker’s position vector in order to »btain the Polhemus tracker output in the vision based tracker’s coordinate system. Table 5.3: Performance results of the trackers over real video sequences Table 5.4: Performance results of the trackers under illumination changes. Performance Under Spatially Varying Illumination Table 5.5: Performance results of the trackers over sequences with occlusion.