in office

 Ko Nishino, Ph.D.

  Associate Professor
  Department of Computer Science
  College of Engineering
  Drexel University

My research interests primarily lie in Computer Vision. In particular, I strive to understand the intrinsic structures of visual data, i.e., images and videos, induced by the inherent variability of our physical world. The drastic but coherent variations of visual phenomena, induced for instance, by the geometry, radiometry, and motion, are neatly tucked into the pixel values. The primary goal of my recent research activities is to derive novel models, representations, and algorithms that fully leverage such latent structures to better analyze and understand image and video contents.
In particular, I have recently been focusing on applying this basic idea to research in three domains: analyzing and leveraging 1) the scale variability of 3D geometric structures, 2) the statistical characteristics of real-world appearance, and 3) the spatio-temporal relationships of local motion patterns in cluttered real-world scenes to tackle challenging problems in 3D geometry processing, radiometric scene understanding, and video analysis, respectively. Each of these research programs focuses on a seemingly different entity, namely 3D geometry, local light reflection, and spatio-temporal visual data. Yet, the theoretical and computational foundations derived within each program are interrelated and, in fact, reinforce each other.

Appearance Modeling: A Statistical Approach to Material Estimation/Recognition

I have been building a comprehensive research program for deriving a novel theoretical and computational framework that will allow us to freely explore the appearance of real-world objects for analysis and synthesis. Our research program aims to achieve this by establishing a sound statistical framework by introducing novel directional statistics-based local appearance models that can represent a wide variety of real-world materials; deriving canonical methods for analyzing the space of real-world appearance to extract concise representations that encapsulate the latent structure of the space; and deriving a series of probabilistic formulations and inference algorithms that leverage such (statistical) characterizations to solve long-standing, often inherently ill-posed, problems in scene understanding such as object recognition based on materials, illumination and material estimation in the wild, and geometry reconstruction of arbitrary objects under arbitrary illumination. In short, we want to derive the right models and representations that allows us to extract powerful priors on real-world appearance, and to identify the right formulations and algorithms to fully leverage them to tackle the hard problems that haunt us in radiometric scene understanding.

To date, we have made considerable progress in all fronts of these research thrusts. We have derived a spherical specular reflection model that extends the conventional Torrance-Sparrow model to the spherical domain. This enables the encoding of reflected radiance as a spherical distribution, based on which we formulate and solve joint estimation of illumination and reflectance as mixture modeling on the surface of a unit sphere. We further extended this idea to model and estimate spatially-varying reflectance properties using a radial basis network and a novel variational inference formulation. We have also studied the general problem of robustly solving bilinear problems which often arise in radiometric scene understanding. In particular, we derived a novel formulation based on a factorial Markov random field and its inference algorithm so that realistic constraints can be imposed as statistical priors on the two latent variables that suffer from bilinear ambiguity. We have demonstrated the effectiveness of this approach on a specific problem of defogging a single image (estimating true scene color and scene depth from a single foggy image).

On the modeling side, we have introduced a novel parametric BRDF model based on a mixture model of a newly derived hemispherical directional statistics distribution that can accurately encode a wide variety of real-world isotropic BRDFs with a very small number of parameters. The novel directional statistics BRDF model enables the derivation of a canonical probabilistic method for estimating its parameters including its number of components. We showed that the model captures the full spectrum of real-world isotropic BRDFs with accuracy comparable to the state-of-the-art non-parametric model but with a much more compact representation. The model directly provides us with sound bases for analyzing the space of real-world reflectance, which we have demonstrated by computing physically meaningful low-dimensional embeddings of the otherwise massively high-dimensional space of isotropic BRDFs based on functional principal component analysis.

Related Publications

  • Reflectance and Natural Illumination from a Single Image
    S. Lombardi and K. Nishino,
    to appear in Proc. of European Conference on Computer Vision ECCV'12, Oct., 2012. [ pre-print ]
  • Shape and Reflectance from Natural Illumination
    G. Oxholm and K. Nishino,
    to appear in Proc. of European Conference on Computer Vision ECCV'12, Oct., 2012. [ pre-print ]
  • Single Image Multimaterial Estimation
    S. Lombardi and K. Nishino,
    in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition CVPR'12, pp238-245, Jun., 2012. [ PDF ]
  • Bayesian Defogging
    K. Nishino, L. Kratz, and S. Lombardi,
    in Int'l Journal of Computer Vision, vol. 98, no. 3, pp263--278, Jul., 2012. [ pre-print ] [ Springer ] [ errata ]
  • Directional Statistics-based Reflectance Model for Isotropic Bidirectional Reflectance Distribution Functions
    K. Nishino and S. Lombardi,
    in OSA Journal of Optical Society of America A, vol.28, no.1, pp8-18, Jan., 2011. [ PDF ] [ errata ]
  • Variational Estimation of Inhomogeneous Specular Reflectance and Illumination from A Single View
    K. Hara and K. Nishino,
    OSA Journal of Optical Society of America A, vol. 28, no. 2, pp136--146, Feb., 2011. [ PDF ]
  • Directional Statistics BRDF Model
    K. Nishino,
    in Proc. of IEEE Twelfth International Conference on Computer Vision ICCV'09, pp476-483, Oct., 2009. [ PDF ]
  • Factorizing Scene Albedo and Depth from a Single Foggy Image
    L. Kratz and K. Nishino,
    in Proc. of IEEE Twelfth International Conference on Computer Vision ICCV'09, pp1701--1708, Oct., 2009. [ PDF ]
  • Illumination and Spatially Varying Specular Reflectance from a Single View
    K. Hara and K. Nishino,
    in Proc. of IEEE Conference on Computer Vision and Pattern Recognition CVPR '09, pp619-626, Jun., 2009. [ PDF ]
  • Mixture of Spherical Distributions for Single-View Relighting
    K. Hara, K. Nishino, and K. Ikeuchi,
    in IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.30, no.1, pp25-35, Jan., 2008. [ PDF ]

Video Analysis: Modeling and Exploiting Crowd Flow

Since joining Drexel University, I have been building a research program focused on analyzing real-world videos of highly cluttered scenes - scenes containing hundreds of thousands of freely moving objects. This a particularly challenging problem. Conventional video analysis methods generally assume that foreground objects can be isolated from a static background and tracked, which is not always the case for complex real-world scenes. The real-world scenes we target contain excessive amount of constituents that cause severe and frequent occlusions and exhibit significant spatio-temporal scale differences in their actions and motion paths. In such scenes hundreds of people will be moving in various directions within a few frames and one cannot discern foreground from background - an organically moving crowd will constitute the background which is at the same time the foreground we want to analyze.

The key idea underlying our research is that, although each individual constituent of a cluttered scene would cause highly complex motion and appearance variations, the collection of such individuals seen as a whole will manifest itself as a structured pattern within the video. In other words, the video viewed as a spatio-temporal volume of pixels would contain intrinsic structures that characterize the scene and the behavior of the objects in the scene. One can expect to see a set of natural flows formed by objects in the scene and these flows, i.e., spatio-temporal motion patterns, will exhibit localized but correlated structures that dynamically vary. The goal of this research program is to derive suitable mathematical models and computational tools for exploiting such intrinsic structures formed by the constituents of the scene to analyze and extract critical information for higher-level analysis, such as detection, recognition, and comparison of actions and events.

To date, we have derived the foundation for statistically characterizing the latent structures of motion in videos of cluttered scenes based on the idea of viewing them as collections of local motion patterns, whose spatial and temporal relationships are encoded with distribution-based hidden Markov models; demonstrated their use for detecting anomalous events, such as pedestrians walking against the flow of people, by identifying statistical deviations from the learned spatio-temporal models; and introduced a novel method for tracking individuals in such overly crowded scenes by predicting the local motion pattern in the neighborhood of the target using the spatio-temporal statistical model and then by converting it into a transition probability distribution that is used in a particle filtering based tracking algorithm.

Related Publications

  • Going With the Flow: Pedestrian Efficiency in Crowded Scenes
    L. Kratz and K. Nishino,
    to appear in Proc. of European Conference on Computer Vision ECCV'12, Oct., 2012. [ pre-print ]
  • Tracking Pedestrians using Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes
    L. Kratz and K. Nishino,
    to appear in IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 34, no. 5, pp987-1002, May, 2012. [ pre-print ] [ IEEE ] [ Video mov/avi ]
  • Tracking with Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes
    L. Kratz and K. Nishino,
    in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition CVPR'10, Jun., 2010. [ PDF ]
  • Anomaly Detection in Extremely Crowded Scenes Using Spatio-Temporal Motion Pattern Models
    L. Kratz and K. Nishino,
    in Proc. of IEEE Conference on Computer Vision and Pattern Recognition CVPR '09, pp1446-1453, Jun., 2009. [ PDF ]

Geometry Processing: 3D Geometric Scale Variability

Real-world objects and scenes consist of geometric structures of varying scales. The scene may contain various objects of different dimensions and each individual object may consist of local structures of varying spatial extents. For instance, in a 3D human face model, both the tip of the nose and dimples are discriminative facial features suitable for representing the underlying surface. The spatial extents of such geometric features, however, significantly differ from one another - the local surface that gives rise to them have very different sizes. While this geometric scale variability can be deemed to be another cause of error in subsequent processing, it can in turn be exploited as an additional source of information to enrich the representation of the actual object/scene geometry. One of my current research programs focuses on establishing a general theoretical and computational foundation for analyzing and exploiting this hidden dimension of 3D geometry which I like to refer to as the 3D geometric scale variability.

To date, we have derived theoretically sound methods for analyzing the 3D geometric scale variability by formulating scale-space analysis for 3D mesh models and range images based on 2D embeddings of surface normals; novel computational methods for detecting geometric features, such as corners and edges, and their natural scales; derived novel local geometric descriptors that encode the scale variation or are invariant to the scale; and established methods for matching such descriptors while fully leveraging the hierarchy they induce. We have demonstrated the use of the descriptors and their matching algorithms for range image registration and 3D object localization in cluttered scenes and have shown that we can achieve better or comparable performance to state-of-the-art methods.

For 2D image analysis, but based on the very idea of leveraging latent 3D geometric structures, we have derived a novel nonrigid image registration method that models the image as an elastic membrane in spatio-intensity space. We formulate deformable alignment as probabilistic inference with statistical priors reflecting the rigidity and elasticity of the membranes local geometric structures. This idea is based on the observation that smaller scale structures should be assigned stronger rigidity to preserve the characteristic shape of the membrane geometry, i.e., the intensity structure in the original image, as it deforms. Weve shown that this approach overcomes the shortcomings of previous intensity-based and feature-based approaches with conventional uniform smoothing or diffeomorphic constraints that suffer from large errors in textureless regions and in areas in-between specified features.

Related Publications

  • The Scale of Geometric Texture
    G. Oxholm, P. Bariya, and K. Nishino,
    to appear in Proc. of European Conference on Computer Vision ECCV'12, Oct., 2012. [ pre-print ]
  • 3D Geometric Scale Variability in Range Images: Features and Descriptors
    P. Bariya, J. Novatnack, G. Schwartz, and K. Nishino,
    in Int'l Journal of Computer Vision, vol. 99, no. 2, pp232--255, Sept., 2012. [ pre-print ][ Springer ]
  • A Flexible Approach to Reassembling Thin Objects of Unknown Geometry
    G. Oxholm and K. Nishino,
    to appear in Journal of Cultural Heritage, 2012. [ DOI ]
  • Locally Rigid Globally Non-rigid Surface Registration
    K. Fujiwara, K. Nishino, J. Takamatsu, B. Zheng and K. Ikeuchi,
    in Proc. of IEEE Thirteenth International Conference on Computer Vision ICCV'11, pp1527-1534, Nov., 2011. [ PDF ]
  • Reassembling Thin Artifacts of Unknown Geometry
    G. Oxholm and K. Nishino,
    in Proc. of International Symposium on Virtual Reality, Archaeology and Cultural Heritage, 2011. [ PDF ] [ video ]
  • Membrane Nonrigid Image Registration
    G. Oxholm and K. Nishino,
    in Proc. of 11th European Conference on Computer Vision ECCV'10, pp763-776, Sep., 2010. [ PDF ]
  • Scale-Hierarchical 3D Object Recognition in Cluttered Scenes
    P. Bariya and K. Nishino,
    in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition CVPR'10, Jun., 2010. [ PDF ]
  • Scale-Dependent/Invariant Local 3D Shape Descriptors for Fully Automatic Registration of Multiple Sets of Range Images
    J. Novatnack and K. Nishino,
    in Proc. of Tenth European Conference on Computer Vision ECCV'08, Oct., 2008. [ PDF ]
  • Scale-Dependent 3D Geometric Features
    J. Novatnack and K. Nishino,
    in Proc. of IEEE Eleventh International Conference on Computer Vision ICCV'07, Oct., 2007. [ PDF ]

Other Work

Most of my past research also centers around the general theme of extracting and exploiting latent structures of visual data. For instance, analyzing reflections in people's eyes to visualize what the person is seeing and to estimate the illumination conditions for relighting objects/scenes, achieving joint estimation of reflectance and illumination from a sparse set of images using the characteristics of specular reflection, deriving a compact representation of object appearance under varying illumination and viewing conditions based on subspace analysis on the object surface, blockwise subspace analysis of spatio-temporal visual data that strikes the optimal trade off between representation size and accuracy via subspace clustering, subspace analysis of the intrinsic images of a fixed-view scene to remove shadows for more effective video surveillance, identifying a novel color space for robust estimation of illumination color, etc. I have also done extensive work in digital archaeology including robust range image registration and integration as well as 2D-3D alignment for seamless texture mapping. I am currently working with the National Park Service Independence Living History Center in Philadelphia on some interesting digital archaeology projects, too.

Please go to the publications page for a complete list of publications resulting from these research activities.