Label Me - Problems With The Data

Problems With The Data

The LabelMe dataset has some problems that should be noted. Some are inherent in the data, such as the objects in the images not being uniformly distributed with respect to size and image location. This is due to the images being primarily taken by humans who tend to focus the camera on interesting objects in a scene. However, cropping and rescaling the images randomly can simulate a uniform distribution. Other problems are caused by the amount of freedom given to the users of the annotation tool. Some problems that arise are:

  • The user can choose which objects in the scene to outline. Should an occluded person be labeled? Should the sky be labeled?
  • The user has to describe the shape of the object themselves by outlining a polygon. Should the fingers of a hand on a person be outlined with detail? How much precision must be used when outlining objects?
  • The user chooses what text to enter as the label for the object. Should the label be person, man, or pedestrian?

The creators of LabelMe decided to leave these decisions up to the annotator. The reason for this is that they believe people will tend to annotate the images according to what they think is the natural labeling of the images. This also provides some variability in the data, which can help researchers tune their algorithms to account for this variability.

Read more about this topic:  Label Me

Famous quotes containing the words problems and/or data:

    I respect guilt. It is a dangerous but sometimes useful beast. The guilt that made me want to solve all my children’s problems meant trouble. The guilt that made me question my role in our mother-daughter squabbles proved helpful. Yes, I care about my kids’ problems, and I long to make suggestions. But these days I wait for children to ask for help, and I give it sparingly. Some things can’t be fixed, and I tell them so.
    Susan Ferraro (20th century)

    This city is neither a jungle nor the moon.... In long shot: a cosmic smudge, a conglomerate of bleeding energies. Close up, it is a fairly legible printed circuit, a transistorized labyrinth of beastly tracks, a data bank for asthmatic voice-prints.
    Susan Sontag (b. 1933)