Further, over the past twenty-five years, computer scientists have developed computational systems that can outperform humans at a number of cognitive tasks. Computers can now beat all but the best human chess players and computational mathematics systems are routinely used to solve calculus problems that are too involved for people to do. However, computers have not achieved human levels of performance at literally any visual perception task. One of the most successful areas of visual information processi ng has been the development of systems for recognizing printed text. Yet such Optical Character Recognition (OCR) software still makes mistakes that a grade school student would not make, even if the child did not know the particular words being recognized.
A central problem with artificial vision systems is that they are brittle, in the sense that small variations in the input may cause enormous changes in the output. For example, commercial OCR systems work very well on original documents such as this newsletter. A photocopy or a fax of the newsletter, however, while still easily readable to a human, will often cause an OCR system to output near gibberish. This brittleness is due to fundamental limitations in our scientific understanding, rather than to (so lely) a need for better engineering. The scientific community has not developed adequate methods for formalizing perceptual problems, in which there is inherent ambiguity or uncertainty. Research areas such as fuzzy logic and neural networks are attempting to address the representation of ambiguity and uncertainty in general. My own research has focused more narrowly on the problem of representing uncertainty in visual shape recognition.
The ability to recognize shapes is a fundamental part of human visual perception. For example, young children have little problem recognizing a line drawing of an object. A line drawing encodes primarily shape information, rather than properties such as color, texture, or reflectiveness. From a computational point of view this is interesting, because the shape information encoded by a line drawing is quite an abstract representation of an object. A common representation of an image in computer vision is th e so-called intensity-edge map, which looks a lot like a line drawing. An intensity edge occurs in an image when there is a sudden change in brightness. Such changes tend to occur at object boundaries, and thus the locations of these changes reflect (to some degree) the shapes of objects. The accompanying figure shows several images and their intensity-edge maps (each image and edge map also have a small box superimposed on them, which will be explained below).
My research group is investigating the problem of comparing shapes, in order to determine the extent to which one shape differs from another. We represent shapes using binary images, such as the edge maps in the figure. A binary image is one that has just two "colors," such as black ink on a white page. Most techniques for comparing binary images operate by determining how well two images can be aligned with one another. The quality of an alignment is measured by the extent to which two shapes overlap when aligned. In our work, we measure the degree of overlap based on a mathematical formula known as the Hausdorff distance. The key idea underlying the approach is to measure how many of the black points in one image are close to black points in the other image, and vice versa. In contrast, most other methods measure how many black points in the two images are directly overlapping, which does not account well for uncertainty.
There are many applications of shape matching, ranging from recognition of objects using pre-stored models, to visually guided navigation of mobile robots, to automated inspection of manufactured parts. Perhaps the easiest application to illustrate is that of tracking moving objects in video data. A video can be thought of as a sequence of images taken close together in time (about 1/30th of a second apart). The idea underlying our approach to tracking is to use our shape-matching method to find where an o bject moved to in the image from one time to the next. Given the shape of the object in an image, we find the best match of that shape in the next image in order to determine where the object moved to. This match determines the portion of the next image that corresponds to the object.
The figure shows every tenth image from a sequence of forty images (1.3 seconds of video) of a football game. Player #82 is marked with a box. This player is tracked during the sequence by finding the best match of the edge map inside the box in a given image to the edge map of the next image in the sequence. This best match defines a new box in the next image, which is used as the model for that image frame. Thus the boxes in each image frame show the location of the object being tracked at that time. In order for the method to work, the shape between successive images can not change greatly. (Note that the figure only shows every tenth image, thus it is not these images that were matched to one another; there are nine intervening frames between each of them that were used to generate the matches.)
Notice how the object being tracked changes shape significantly over the course of the image sequence. Initially the player is leaning on his right elbow, then on his left elbow, then leaning backwards, then sitting upright. Thus the method can handle significant changes in shape over time, as long as the changes between successive images are not too large (which is generally the case in video sequences). Tracking methods such as these can be used for applications such as visually guided navigation, remote surveillance, and interactive television.
(More about Hausdorff distances and shape comparison can be found on the World Wide Web at http://www.cs. cornell.edu/info/people/dph/dph.html.)
Daniel Huttenlocher (computer science) has been at Cornell since 1988. In addition to teaching and conducting research at Cornell, Huttenlocher directs Xerox PARCÕs Image Understanding research group. In 1993, he received the New York State Professor of the Year Award in recognition of his commitment to undergraduate education.
This article is Copyright © 1995 Daniel Huttenlocher. All Rights Reserved.