Internet Media Type - Limitations

Limitations

Internet media types are often used as part of a communication protocol between two applications (the source and destination). In this context, internet media type specifiers experience several problems.

The first problem is the ability of the source application (i.e. web server, email client) to correctly determine an internet media type for a piece of content. Many applications attempt to heuristically classify a file using its filename extension or with magic numbers. Neither approach is perfect, and may incorrectly classify a content's media type:

  • Incorrect filename extension: a filename extension classifier will report an incorrect media type. For instance, some applications incorrectly give Rich text format files the .doc file extensions, instead of the correct .rtf extension.
  • No filename extension: a filename extension classifier will report no media type, or will (incorrectly) report a catch-all type such as application/octet-stream. Files without extension are common on unix systems.
  • Filename extension collisions: when multiple formats use the same filename extension, a filename extension classifier will choose one media type arbitrarily. For instance, both Microsoft Word templates and graphviz graph files use the extension .dot.
  • Ambiguous container formats: a magic number classifier can may give an correct, though non-specific, media type, thus preventing a meaningful interpretation of the content. For instance, Office Open XML (.docx) format and Java executable (.jar) are both implemented internally as a zipped archive. A magic number system may classify such files as application/zip instead of the more specific type. Similar problems occur between XML and application formats implemented on top of XML.
  • Ambiguous magic numbers: an attacker can create a file which is identified simultaneously as two separate internet media types. For instance, the internal structure of a Gifar makes it both a valid GIF image and Java executable.

The second problem is the destination application's ability to trust the internet media type reported by the sender. As above, the internet media type is incorrect in some circumstances, and must be treated with skepticism. As early as 2002, the W3C unambiguously warned that it is a "serious error" if internet media type is incorrect, and that software should not attempt to guess a correct media type. Nonetheless, software engineering principles encourage software that forgives a certain degree of malformed input, and user experience suffers when software fails to correctly interpret the content. Consequently, the many destination applications are designed to attempt recovery from such errors and identify a correct media type.

The destination application has no more knowledge of the content than the source application, and attempts to infer the media type at the destination are equally difficult. This can lead to incompatibilities between source and destination applications, and in the worst-case, security vulnerabilities such as the Gifar attack or Cross-site scripting attacks. Advanced content sniffing approaches have been proposed to balance interoperability and security in such situations.

Read more about this topic:  Internet Media Type

Famous quotes containing the word limitations:

    Much of what contrives to create critical moments in parenting stems from a fundamental misunderstanding as to what the child is capable of at any given age. If a parent misjudges a child’s limitations as well as his own abilities, the potential exists for unreasonable expectations, frustration, disappointment and an unrealistic belief that what the child really needs is to be punished.
    Lawrence Balter (20th century)

    The limitations of pleasure cannot be overcome by more pleasure.
    Mason Cooley (b. 1927)

    To note an artist’s limitations is but to define his talent. A reporter can write equally well about everything that is presented to his view, but a creative writer can do his best only with what lies within the range and character of his deepest sympathies.
    Willa Cather (1876–1947)