Outline of Object Recognition - Feature-based Methods

Feature-based Methods

- a search is used to find feasible matches between object features and image features.

- the primary constraint is that a single position of the object must account for all of the feasible matches.

- methods that extract features from the objects to be recognized and the images to be searched.

surface patches
corners
linear edges

1. Interpretation trees

A method for searching for feasible matches, is to search through a tree.
Each node in the tree represents a set of matches.

Root node represents empty set
Each other node is the union of the matches in the parent node and one additional match.
Wildcard is used for features with no match

Nodes are “pruned” when the set of matches is infeasible.

A pruned node has no children

Historically significant and still used, but less commonly

2. Hypothesize and test

General Idea:

Hypothesize a correspondence between a collection of image features and a collection of object features
Then use this to generate a hypothesis about the projection from the object coordinate frame to the image frame
Use this projection hypothesis to generate a rendering of the object. This step is usually known as backprojection
Compare the rendering to the image, and, if the two are sufficiently similar, accept the hypothesis

Obtaining Hypothesis:

There are a variety of different ways of generating hypotheses.
When camera intrinsic parameters are known, the hypothesis is equivalent to a hypothetical position and orientation – pose – for the object.
Utilize geometric constraints
Construct a correspondence for small sets of object features to every correctly sized subset of image points. (These are the hypotheses)

Three basic approaches:

Obtaining Hypotheses by Pose Consistency
Obtaining Hypotheses by Pose Clustering
Obtaining Hypotheses by Using Invariants

Expense search that is also redundant, but can be improved using Randomization and/or Grouping

Randomization

§ Examining small sets of image features until likelihood of missing object becomes small

§ For each set of image features, all possible matching sets of model features must be considered.

§ Formula:

( 1 – Wc)k = Z

W = the fraction of image points that are “good” (w ~ m/n)

c = the number of correspondences necessary

k = the number of trials

Z = the probability of every trial using one (or more) incorrect correspondences

Grouping

§ If we can determine groups of points that are likely to come from the same object, we can reduce the number of hypotheses that need to be examined

3. Pose consistency

Also called Alignment, since the object is being aligned to the image
Correspondences between image features and model features are not independent – Geometric constraints
A small number of correspondences yields the object position – the others must be consistent with this
General Idea:

If we hypothesize a match between a sufficiently large group of image features and a sufficiently large group of object features, then we can recover the missing camera parameters from this hypothesis (and so render the rest of the object)

Strategy:

Generate hypotheses using small number of correspondences (e.g. triples of points for 3D recognition)
Project other model features into image (backproject) and verify additional correspondences

Use the smallest number of correspondences necessary to achieve discrete object poses

4. Pose clustering

General Idea:

Each object leads to many correct sets of correspondences, each of which has (roughly) the same pose
Vote on pose. Use an accumulator array that represents pose space for each object
This is essentially a Hough transform

Strategy:

For each object, set up an accumulator array that represents pose space – each element in the accumulator array corresponds to a “bucket” in pose space.
Then take each image frame group, and hypothesize a correspondence between it and every frame group on every object
For each of these correspondences, determine pose parameters and make an entry in the accumulator array for the current object at the pose value.
If there are large numbers of votes in any object’s accumulator array, this can be interpreted as evidence for the presence of that object at that pose.
The evidence can be checked using a verification method

Note that this method uses sets of correspondences, rather than individual correspondences

Implementation is easier, since each set yields a small number of possible object poses.

Improvement

The noise resistance of this method can be improved by not counting votes for objects at poses where the vote is obviously unreliable

§ For example, in cases where, if the object was at that pose, the object frame group would be invisible.

These improvements are sufficient to yield working systems

5. Invariance

There are geometric properties that are invariant to camera transformations
Most easily developed for images of planar objects, but can be applied to other cases as well

6. Geometric hashing

An algorithm that uses geometric invariants to vote for object hypotheses
Similar to pose clustering, however instead of voting on pose, we are now voting on geometry
A technique originally developed for matching geometric features (uncalibrated affine views of plane models) against a database of such features
Widely used for pattern-matching, CAD/CAM, and medical imaging.
It is difficult to choose the size of the buckets
It is hard to be sure what “enough” means. Therefore there my be some danger that the table will get clogged.

7. Scale-invariant feature transform (SIFT)

Keypoints of objects are first extracted from a set of reference images and stored in a database
An object is recognized in a new image by individually comparing each feature from the new image to this database and finding candidate matching features based on Euclidean distance of their feature vectors.
Lowe (2004)

8. Speeded Up Robust Features (SURF)

A robust image detector & descriptor
The standard version is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT
Based on sums of approximated 2D Haar wavelet responses and made efficient use of integral images.
Bay et al (2008)

Read more about this topic: Outline Of Object Recognition

Famous quotes containing the word methods:

“Generalization, especially risky generalization, is one of the chief methods by which knowledge proceeds... Safe generalizations are usually rather boring. Delete that “usually rather.” Safe generalizations are quite boring.”
—Joseph Epstein (b. 1937)