Feature-based Methods
- a search is used to find feasible matches between object features and image features.
- the primary constraint is that a single position of the object must account for all of the feasible matches.
- methods that extract features from the objects to be recognized and the images to be searched.
-
- surface patches
- corners
- linear edges
1. Interpretation trees
-
- A method for searching for feasible matches, is to search through a tree.
- Each node in the tree represents a set of matches.
-
- Root node represents empty set
- Each other node is the union of the matches in the parent node and one additional match.
- Wildcard is used for features with no match
- Nodes are “pruned” when the set of matches is infeasible.
-
- A pruned node has no children
- Historically significant and still used, but less commonly
2. Hypothesize and test
-
- General Idea:
-
- Hypothesize a correspondence between a collection of image features and a collection of object features
- Then use this to generate a hypothesis about the projection from the object coordinate frame to the image frame
- Use this projection hypothesis to generate a rendering of the object. This step is usually known as backprojection
- Compare the rendering to the image, and, if the two are sufficiently similar, accept the hypothesis
-
- Obtaining Hypothesis:
-
- There are a variety of different ways of generating hypotheses.
- When camera intrinsic parameters are known, the hypothesis is equivalent to a hypothetical position and orientation – pose – for the object.
- Utilize geometric constraints
- Construct a correspondence for small sets of object features to every correctly sized subset of image points. (These are the hypotheses)
-
- Three basic approaches:
-
- Obtaining Hypotheses by Pose Consistency
- Obtaining Hypotheses by Pose Clustering
- Obtaining Hypotheses by Using Invariants
-
- Expense search that is also redundant, but can be improved using Randomization and/or Grouping
-
- Randomization
- § Examining small sets of image features until likelihood of missing object becomes small
- § For each set of image features, all possible matching sets of model features must be considered.
- § Formula:
-
-
-
-
- ( 1 – Wc)k = Z
-
-
-
-
-
-
-
- W = the fraction of image points that are “good” (w ~ m/n)
- c = the number of correspondences necessary
- k = the number of trials
- Z = the probability of every trial using one (or more) incorrect correspondences
-
-
-
-
-
- Grouping
- § If we can determine groups of points that are likely to come from the same object, we can reduce the number of hypotheses that need to be examined
-
3. Pose consistency
-
- Also called Alignment, since the object is being aligned to the image
- Correspondences between image features and model features are not independent – Geometric constraints
- A small number of correspondences yields the object position – the others must be consistent with this
- General Idea:
-
- If we hypothesize a match between a sufficiently large group of image features and a sufficiently large group of object features, then we can recover the missing camera parameters from this hypothesis (and so render the rest of the object)
-
- Strategy:
-
- Generate hypotheses using small number of correspondences (e.g. triples of points for 3D recognition)
- Project other model features into image (backproject) and verify additional correspondences
- Use the smallest number of correspondences necessary to achieve discrete object poses
4. Pose clustering
-
- General Idea:
-
- Each object leads to many correct sets of correspondences, each of which has (roughly) the same pose
- Vote on pose. Use an accumulator array that represents pose space for each object
- This is essentially a Hough transform
-
- Strategy:
-
- For each object, set up an accumulator array that represents pose space – each element in the accumulator array corresponds to a “bucket” in pose space.
- Then take each image frame group, and hypothesize a correspondence between it and every frame group on every object
- For each of these correspondences, determine pose parameters and make an entry in the accumulator array for the current object at the pose value.
- If there are large numbers of votes in any object’s accumulator array, this can be interpreted as evidence for the presence of that object at that pose.
- The evidence can be checked using a verification method
-
- Note that this method uses sets of correspondences, rather than individual correspondences
-
- Implementation is easier, since each set yields a small number of possible object poses.
-
- Improvement
-
-
- The noise resistance of this method can be improved by not counting votes for objects at poses where the vote is obviously unreliable
- § For example, in cases where, if the object was at that pose, the object frame group would be invisible.
- These improvements are sufficient to yield working systems
-
5. Invariance
-
- There are geometric properties that are invariant to camera transformations
- Most easily developed for images of planar objects, but can be applied to other cases as well
6. Geometric hashing
-
- An algorithm that uses geometric invariants to vote for object hypotheses
- Similar to pose clustering, however instead of voting on pose, we are now voting on geometry
- A technique originally developed for matching geometric features (uncalibrated affine views of plane models) against a database of such features
- Widely used for pattern-matching, CAD/CAM, and medical imaging.
- It is difficult to choose the size of the buckets
- It is hard to be sure what “enough” means. Therefore there my be some danger that the table will get clogged.
7. Scale-invariant feature transform (SIFT)
-
- Keypoints of objects are first extracted from a set of reference images and stored in a database
- An object is recognized in a new image by individually comparing each feature from the new image to this database and finding candidate matching features based on Euclidean distance of their feature vectors.
- Lowe (2004)
8. Speeded Up Robust Features (SURF)
-
- A robust image detector & descriptor
- The standard version is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT
- Based on sums of approximated 2D Haar wavelet responses and made efficient use of integral images.
- Bay et al (2008)
Read more about this topic: Outline Of Object Recognition
Famous quotes containing the word methods:
“The ancient bitter opposition to improved methods [of production] on the ancient theory that it more than temporarily deprives men of employment ... has no place in the gospel of American progress.”
—Herbert Hoover (18741964)