Outline of Object Recognition - Feature-based Methods

Feature-based Methods

- a search is used to find feasible matches between object features and image features.

- the primary constraint is that a single position of the object must account for all of the feasible matches.

- methods that extract features from the objects to be recognized and the images to be searched.

  • surface patches
  • corners
  • linear edges

1. Interpretation trees

  • A method for searching for feasible matches, is to search through a tree.
  • Each node in the tree represents a set of matches.
  • Root node represents empty set
  • Each other node is the union of the matches in the parent node and one additional match.
  • Wildcard is used for features with no match
  • Nodes are “pruned” when the set of matches is infeasible.
  • A pruned node has no children
  • Historically significant and still used, but less commonly

2. Hypothesize and test

  • General Idea:
  • Hypothesize a correspondence between a collection of image features and a collection of object features
  • Then use this to generate a hypothesis about the projection from the object coordinate frame to the image frame
  • Use this projection hypothesis to generate a rendering of the object. This step is usually known as backprojection
  • Compare the rendering to the image, and, if the two are sufficiently similar, accept the hypothesis
  • Obtaining Hypothesis:
  • There are a variety of different ways of generating hypotheses.
  • When camera intrinsic parameters are known, the hypothesis is equivalent to a hypothetical position and orientation – pose – for the object.
  • Utilize geometric constraints
  • Construct a correspondence for small sets of object features to every correctly sized subset of image points. (These are the hypotheses)
  • Three basic approaches:
  • Obtaining Hypotheses by Pose Consistency
  • Obtaining Hypotheses by Pose Clustering
  • Obtaining Hypotheses by Using Invariants
  • Expense search that is also redundant, but can be improved using Randomization and/or Grouping
  • Randomization
§ Examining small sets of image features until likelihood of missing object becomes small
§ For each set of image features, all possible matching sets of model features must be considered.
§ Formula:
( 1 – Wc)k = Z
W = the fraction of image points that are “good” (w ~ m/n)
c = the number of correspondences necessary
k = the number of trials
Z = the probability of every trial using one (or more) incorrect correspondences
  • Grouping
§ If we can determine groups of points that are likely to come from the same object, we can reduce the number of hypotheses that need to be examined

3. Pose consistency

  • Also called Alignment, since the object is being aligned to the image
  • Correspondences between image features and model features are not independent – Geometric constraints
  • A small number of correspondences yields the object position – the others must be consistent with this
  • General Idea:
  • If we hypothesize a match between a sufficiently large group of image features and a sufficiently large group of object features, then we can recover the missing camera parameters from this hypothesis (and so render the rest of the object)
  • Strategy:
  • Generate hypotheses using small number of correspondences (e.g. triples of points for 3D recognition)
  • Project other model features into image (backproject) and verify additional correspondences
  • Use the smallest number of correspondences necessary to achieve discrete object poses

4. Pose clustering

  • General Idea:
  • Each object leads to many correct sets of correspondences, each of which has (roughly) the same pose
  • Vote on pose. Use an accumulator array that represents pose space for each object
  • This is essentially a Hough transform
  • Strategy:
  • For each object, set up an accumulator array that represents pose space – each element in the accumulator array corresponds to a “bucket” in pose space.
  • Then take each image frame group, and hypothesize a correspondence between it and every frame group on every object
  • For each of these correspondences, determine pose parameters and make an entry in the accumulator array for the current object at the pose value.
  • If there are large numbers of votes in any object’s accumulator array, this can be interpreted as evidence for the presence of that object at that pose.
  • The evidence can be checked using a verification method
  • Note that this method uses sets of correspondences, rather than individual correspondences
  • Implementation is easier, since each set yields a small number of possible object poses.
  • Improvement
  • The noise resistance of this method can be improved by not counting votes for objects at poses where the vote is obviously unreliable
§ For example, in cases where, if the object was at that pose, the object frame group would be invisible.
  • These improvements are sufficient to yield working systems

5. Invariance

  • There are geometric properties that are invariant to camera transformations
  • Most easily developed for images of planar objects, but can be applied to other cases as well

6. Geometric hashing

  • An algorithm that uses geometric invariants to vote for object hypotheses
  • Similar to pose clustering, however instead of voting on pose, we are now voting on geometry
  • A technique originally developed for matching geometric features (uncalibrated affine views of plane models) against a database of such features
  • Widely used for pattern-matching, CAD/CAM, and medical imaging.
  • It is difficult to choose the size of the buckets
  • It is hard to be sure what “enough” means. Therefore there my be some danger that the table will get clogged.

7. Scale-invariant feature transform (SIFT)

  • Keypoints of objects are first extracted from a set of reference images and stored in a database
  • An object is recognized in a new image by individually comparing each feature from the new image to this database and finding candidate matching features based on Euclidean distance of their feature vectors.
  • Lowe (2004)

8. Speeded Up Robust Features (SURF)

  • A robust image detector & descriptor
  • The standard version is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT
  • Based on sums of approximated 2D Haar wavelet responses and made efficient use of integral images.
  • Bay et al (2008)

Read more about this topic:  Outline Of Object Recognition

Famous quotes containing the word methods:

    The ancient bitter opposition to improved methods [of production] on the ancient theory that it more than temporarily deprives men of employment ... has no place in the gospel of American progress.
    Herbert Hoover (1874–1964)