Papers by robby firnaldo tan
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)
... Robby T. Tan Katsushi Ikeuchi Institute of Industrial Science The University of Tokyo {robby,... more ... Robby T. Tan Katsushi Ikeuchi Institute of Industrial Science The University of Tokyo {robby,ki}@cvl.iis.u-tokyo.ac.jp Abstract ... The method principally utilizes the co-efficients of the reflectance basis functions of input image and its specular-free image. ...
18th International Conference on Pattern Recognition (ICPR'06), 2006
Page 1. Specular Free Spectral Imaging Using Orthogonal Subspace Projection Zhouyu Fu Robby T. Ta... more Page 1. Specular Free Spectral Imaging Using Orthogonal Subspace Projection Zhouyu Fu Robby T. Tan Terry Caelli NICTA ∗ , RSISE Bldg. ... The illumination spectra were measured by taking an image of the white calibration target and averaging the spectral re-sponses. ...
The Journal of The Institute of Image Information and Television Engineers, 2010

Lecture Notes in Computer Science, 2015
In rainy scenes, visibility can be degraded by raindrops which have adhered to the windscreen or ... more In rainy scenes, visibility can be degraded by raindrops which have adhered to the windscreen or camera lens. In order to resolve this degradation, we propose a method that automatically detects and removes adherent raindrops. The idea is to use long range trajectories to discover the motion and appearance features of raindrops locally along the trajectories. These motion and appearance features are obtained through our analysis of the trajectory behavior when encountering raindrops. These features are then transformed into a labeling problem, which the cost function can be optimized efficiently. Having detected raindrops, the removal is achieved by utilizing patches indicated, enabling the motion consistency to be preserved. Our trajectory based video completion method not only removes the raindrops but also complete the motion field, which benefits motion estimation algorithms to possibly work in rainy scenes. Experimental results on real videos show the effectiveness of the proposed method.

Computer Vision – ECCV 2014, 2014
Contrast enhancement is used for many algorithms in computer vision. It is applied either explici... more Contrast enhancement is used for many algorithms in computer vision. It is applied either explicitly, such as histogram equalization and tone-curve manipulation, or implicitly via methods that deal with degradation from physical phenomena such as haze, fog or underwater imaging. While contrast enhancement boosts the image appearance, it can unintentionally boost unsightly image artifacts, especially artifacts from JPEG compression. Most JPEG implementations optimize the compression in a scene-dependent manner such that low-contrast images exhibit few perceivable artifacts even for relatively high-compression factors. After contrast enhancement, however, these artifacts become significantly visible. Although there are numerous approaches targeting JPEG artifact reduction, these are generic in nature and are applied either as pre-or post-processing steps. When applied as pre-processing, existing methods tend to over smooth the image. When applied as post-processing, these are often ineffective at removing the boosted artifacts. To resolve this problem, we propose a framework that suppresses compression artifacts as an integral part of the contrast enhancement procedure. We show that this approach can produce compelling results superior to those obtained by existing JPEG artifacts removal methods for several types of contrast enhancement problems.

2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011
The surface of most natural objects is composed of two or more layers whose optical properties jo... more The surface of most natural objects is composed of two or more layers whose optical properties jointly determine the surface's overall reflectance. Light transmission through these layers can be approximated by using the Lambert-Beer (LB) model, which provides a good trade-off between the accuracy and simplicity to handle layer decomposition. Recently, a layer decomposition based on the LBbased model is proposed. Assuming surfaces with two layers, it estimates the reflectance of top and bottom layers, as well as the opacity of the top layer. The method introduces the "spider model", which is named after the color distribution in the RGB space that resembles the shape of spiders. In this paper, we intend to verify the accuracy of the spider model and the optical model where it is based on (i.e., the LB-based model). We verify the LB-based model by comparing to the Kubelka-Munk (KM) model, which has previously been shown to be reliably accurate. The benefits of layer decomposition are easy to notice. First, many computer vision algorithms assume a single layer, and tend to fail when encountering multi-layered surfaces. Second, knowing the optical properties of each layer can provide further knowledge of the target objects.

2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010
Many object surfaces are composed of layers of different physical substances, known as layered su... more Many object surfaces are composed of layers of different physical substances, known as layered surfaces. These surfaces, such as patinas, water colors, and wall paintings, have more complex optical properties than diffuse surfaces. Although the characteristics of layered surfaces, like layer opacity, mixture of colors, and color gradations, are significant, they are usually ignored in the analysis of many methods in computer vision, causing inaccurate or even erroneous results. Therefore, the main goals of this paper are twofold: to solve problems of layered surfaces by focusing mainly on surfaces with two layers (i.e., top and bottom layers), and to introduce a decomposition method based on a novel representation of a nonlinear correlation in the color space that we call the "spider" model. When we plot a mixture of colors of one bottom layer and n different top layers into the RGB color space, then we will have n different curves intersecting at one point, resembling the shape of a spider. Hence, given a single input image containing one bottom layer and at least one top layer, we can fit their color distributions by using the spider model and then decompose those layered surfaces. The last step is equivalent to extracting the approximated optical properties of the two layers: the top layer's opacity, and the top and bottom layers' reflections. Experiments with real images, which include the photographs of ancient wall paintings, show the effectiveness of our method.

Lecture Notes in Computer Science, 2015
When the foreground objects have variegated appearance and/or manifest articulated motion, not to... more When the foreground objects have variegated appearance and/or manifest articulated motion, not to mention the momentary occlusions by other unintended objects, a segmentation method based on single video and a bottom-up approach is often insufficient for their extraction. In this paper, we present a video co-segmentation method to address the aforementioned challenges. Departing from the objectness attributes and motion coherence used by traditional figure-ground separation methods, we place central importance in the role of "common fate", that is, the different parts of the foreground should persist together in all the videos. To accomplish this idea, we first extract seed superpixels by a motion-based figure/ground segmentation method. We then formulate a set of linkage constraints between these superpixels based on whether they exhibit the characteristics of common fate or not. An iterative constrained clustering algorithm is then proposed to trim away the incorrect and accidental linkage relationships. The clustering algorithm also performs automatic model selection to estimate the number of individual objects in the foreground (e.g. male and female birds in courtship), while at the same time binding the parts of a variegated object together in a unified whole. Finally, a multiclass labeling Markov randome field is used to obtain a refined segmentation result. Our experimental results on two datasets show that our method successfully addresses the challenges in the extraction of complex foreground and outperforms the state-of-the-art video segmentation and co-segmentation methods.
2011 18th IEEE International Conference on Image Processing, 2011
Multiple people tracking from multiple cameras can suffer from various problems, particularly fro... more Multiple people tracking from multiple cameras can suffer from various problems, particularly from inter-person occlusions. This paper attempts to solve the problems by analyzing the view visibility and ranking the reliability of the cues from 2D views. It combines the visibility with the smoothness constraints into a probability framework, which offers a more flexible and robust estimation. Moreover, it introduces 3D reference lines to estimate the 2D position of every individual in the input images. These lines can estimate more accurate and robust 2D positions. The experimental results and quantitative evaluations on the standard data set show the effectiveness of the method.

Lecture Notes in Computer Science, 2014
We propose a method for detecting dyadic interactions: finegrained, coordinated interactions betw... more We propose a method for detecting dyadic interactions: finegrained, coordinated interactions between two people. Our model is capable of recognizing interactions such as a hand shake or a high five, and locating them in time and space. At the core of our method is a pictorial structures model that additionally takes into account the finegrained movements around the joints of interest during the interaction. Compared to a bag-of-words approach, our method not only allows us to detect the specific type of actions more accurately, but it also provides the specific location of the interaction. The model is trained with both video data and body joint estimates obtained from Kinect. During testing, only video data is required. To demonstrate the efficacy of our approach, we introduce the ShakeFive dataset that consists of videos and Kinect data of hand shake and high five interactions. On this dataset, we obtain a mean average precision of 49.56%, outperforming a bag-ofwords approach by 23.32%. We further demonstrate that the model can be learned from just a few interactions.

2010 20th International Conference on Pattern Recognition, 2010
Most of the development of pose recognition focused on a single person. However, many application... more Most of the development of pose recognition focused on a single person. However, many applications of computer vision essentially require the estimation of multiple people. Hence, in this paper, we address the problems of estimating poses of multiple persons using volumes estimated from multiple cameras. One of the main issues that causes the multiple person from multiple cameras to be problematic is the present of 'ghost' volumes. This problem arises when the projections of two different silhouettes of two different persons onto the 3D world overlap in a place where in fact there is no person in it. To solve this problem, we first introduce a novel principal axis-based framework to estimate the 3D ground plane positions of multiple people, and then use the position cues to label the multi-person volumes (voxels), while considering the voxel connectivity. Having labeled the voxels, we fit the volume of each person with a body model, and determine the pose of the person based on the model. The results on real videos demonstrate the accuracy and efficiency of our approach.

2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008
Bad weather, such as fog and haze, can significantly degrade the visibility of a scene. Optically... more Bad weather, such as fog and haze, can significantly degrade the visibility of a scene. Optically, this is due to the substantial presence of particles in the atmosphere that absorb and scatter light. In computer vision, the absorption and scattering processes are commonly modeled by a linear combination of the direct attenuation and the airlight. Based on this model, a few methods have been proposed, and most of them require multiple input images of a scene, which have either different degrees of polarization or different atmospheric conditions. This requirement is the main drawback of these methods, since in many situations, it is difficult to be fulfilled. To resolve the problem, we introduce an automated method that only requires a single input image. This method is based on two basic observations: first, images with enhanced visibility (or clear-day images) have more contrast than images plagued by bad weather; second, airlight whose variation mainly depends on the distance of objects to the viewer, tends to be smooth. Relying on these two observations, we develop a cost function in the framework of Markov random fields, which can be efficiently optimized by various techniques, such as graph-cuts or belief propagation. The method does not require the geometrical information of the input image, and is applicable for both color and gray images.

IPSJ Digital Courier, 2005
In the real world, the color appearances of objects are generally not consistent. It depends prin... more In the real world, the color appearances of objects are generally not consistent. It depends principally on two factors: illumination spectral power distribution (illumination color) and intrinsic surface properties. Consequently, to obtain objects' consistent color descriptors, we have to deal with those two factors. The former is commonly referred to as color constancy: a capability to estimate and discount the illumination color, while the latter is identical to the problem of recovering body color from highlights. This recovery is crucial because highlights emitted from opaque inhomogeneous objects can cause the surface colors to be inconsistent with regard to the change of viewing and illuminant directions. We base our color constancy methods on analyzing highlights or specularities emitted from opaque inhomogeneous objects. We have successfully derived a linear correlation between image chromaticity and illumination chromaticity. This linear correlation is clearly described in inverse-intensity chromaticity space, a novel two-dimensional space we introduce. Through this space, we become able to effectively estimate illumination chromaticity (illumination color) from both uniformly colored surfaces and highly textured surfaces in a single integrated framework, thereby making our method significantly advanced over the existing methods. Meanwhile, for separating reflection components, we propose an approach that is based on an iterative framework and a specularfree image. The specular-free image is an image that is free from specularities yet has different body color from the input image. In general, the approach relies principally on image intensity and color. All methods of color constancy and reflection-components separation proposed in this paper are analyzed based on physical phenomena of the real world, making the estimation more accurate, and have strong basics of analysis. In addition, all methods require only a single input image. This is not only practical, but also challenging in term of complexity.

2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06)
In this paper, we propose a novel approach to object material identification in spectral imaging ... more In this paper, we propose a novel approach to object material identification in spectral imaging by combining the use of absorption features and statistical machine learning techniques. We depart from the significance of spectral absorption features for material identification and link the use of spectral absorption features with statistical learning. We do this by casting the identification problem into a classification setting which can be tackled using support vector machines. Hence, we commence by proposing a novel method for the robust detection of absorption bands in the spectra. With these bands at hand, we show how those absorptions which are most relevant to the classification task in hand may be selected via discriminant learning. We then train a support vector machine for purposes of classification making use of an absorption feature representation scheme which is robust to varying photometric conditions. We perform experiments on real world data and compare the results yield by our approach with those recovered using an alternative. We also illustrate the invariance of the absorption features recovered by our method to different photometric effects. * National ICT Australia is funded by the Australian Governments Backing Australia's Ability initiative, in part through the Australian Research Council.

Digitally Archiving Cultural Objects, 2008
To separate a color signal into its components: illumination spectral power distribution and surf... more To separate a color signal into its components: illumination spectral power distribution and surface spectral reflectance, is an important issue on computer vision. And to separate, we use spectral power distribution, we proposed a minimization technique that, unlike the existing methods, uses multiple color signals. In our implementation, we introduce three different approaches: first, color signals obtained from two different surface reflectance lit by an identical illumination spectral power distribution; second, color signal from an identical surface reflectance lit by different illumination spectral power distributions; and third, color signals from identical surface reflectance but with different types of reflection components (diffuse and specular reflectance) lit by identical illumination spectral power distribution. Using multiple color signals can improve the robustness of the estimation, since we can obtain more constraints in the input data.

Digitally Archiving Cultural Objects, 2008
Color appearance of an object is significantly influenced by the color of the illumination. When ... more Color appearance of an object is significantly influenced by the color of the illumination. When the illumination color changes, the color appearance of the object will change accordingly, causing its appearance to be inconsistent. To arrive at color constancy, we have developed a physics-based method of estimating and removing the illumination color. In this paper, we focus on the use of this method to deal with outdoor scenes, since very few physics-based methods have successfully handled outdoor color constancy. Our method is principally based on shadowed and non-shadowed regions. Previously researchers have discovered that shadowed regions are illuminated by sky light, while non-shadowed regions are illuminated by a combination of sky light and sunlight. Based on this difference of illumination, we estimate the illumination colors (both the sunlight and the sky light) and then remove them. To reliably estimate the illumination colors in outdoor scenes, we include the analysis of noise, since the presence of noise is inevitable in natural images. As a result, compared to existing methods, the proposed method is more effective and robust in handling outdoor scenes. In addition, the proposed method requires only a single input image, making it useful for many applications of computer vision.

2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013
Raindrops adhered to a windscreen or window glass can significantly degrade the visibility of a s... more Raindrops adhered to a windscreen or window glass can significantly degrade the visibility of a scene. Detecting and removing raindrops will, therefore, benefit many computer vision applications, particularly outdoor surveillance systems and intelligent vehicle systems. In this paper, a method that automatically detects and removes adherent raindrops is introduced. The core idea is to exploit the local spatiotemporal derivatives of raindrops. First, it detects raindrops based on the motion and the intensity temporal derivatives of the input video. Second, relying on an analysis that some areas of a raindrop completely occludes the scene, yet the remaining areas occludes only partially, the method removes the two types of areas separately. For partially occluding areas, it restores them by retrieving as much as possible information of the scene, namely, by solving a blending function on the detected partially occluding areas using the temporal intensity change. For completely occluding areas, it recovers them by using a video completion technique. Experimental results using various real videos show the effectiveness of the proposed method.

2007 IEEE Intelligent Vehicles Symposium, 2007
Bad weather, particularly fog and haze, commonly obstruct drivers from observing road conditions.... more Bad weather, particularly fog and haze, commonly obstruct drivers from observing road conditions. This could frequently lead to a considerable number of road accidents. To avoid the problem, automatic methods have been proposed to enhance visibility in bad weather. Methods that work on visible wavelengths, based on the type of their input, can be categorized into two approaches: those using polarizing filters, and those using images taken from different fog densities. Both of the approaches require that the images are multiple and taken from exactly the same point of view. While they can produce reasonably good results, their requirement makes them impractical, particularly in real time applications, such as vehicle systems. Considering their drawbacks, our goal is to develop a method that requires solely a single image taken from ordinary digital cameras, without any additional hardware. The method principally uses color and intensity information. It enhances the visibility after estimating the color of skylight and the values of airlight. The experimental results on real images show the effectiveness of the approach.
Uploads
Papers by robby firnaldo tan