Open Document Technical Specification - Format Internals

Format Internals

An OpenDocument file commonly consists of a standard ZIP archive (JAR archive) containing a number of files and directories; but OpenDocument file can also consist only of a single XML document. OpenDocument file is commonly a collection of several subdocuments within a (ZIP) package. OpenDocument file as a single XML is not widely used. According to the OpenDocument 1.0 specification, the ZIP file specification is defined in Info-ZIP Application Note 970311, 1997. The simple compression mechanism used for a package normally makes OpenDocument files significantly smaller than equivalent Microsoft ".doc" or ".ppt" files. This smaller size is important for organizations who store a vast number of documents for long periods of time, and to those organizations who must exchange documents over low bandwidth connections. Once uncompressed, most data is contained in simple text-based XML files, so the uncompressed data contents have the typical ease of modification and processing of XML files. The standard also allows for the creation of a single XML document, which uses as the root element, for use in document processing.

The standard allows the inclusion of directories to store images, non-SMIL animations, and other files that are used by the document but cannot be expressed directly in the XML.

Due to the openly specified compression format used, it is possible for a user to extract the container file to manually edit the contained files. This allows repair of a corrupted file or low-level manipulation of the contents. It is known that many programs which implement the OpenDocument format do not utilise high compression levels. Users can therefore optimise the file sizes by using more aggressive compression-programs. This may be coupled with a number of image optimisation programs being used on the contained pictures and has been seen to give over 40% reduction in file size over a file directly saved from an OpenDocument compatible program.

The zipped set of files and directories includes the following:

  • XML files
    • content.xml
    • meta.xml
    • settings.xml
    • styles.xml
  • Other files
    • mimetype
  • Directories
    • META-INF/
      • manifest.xml
    • Thumbnails/
      • thumbnail.png

The OpenDocument format provides a strong separation between content, layout and metadata. The most notable components of the format are described in the subsections below. The files in XML format are further defined using the RELAX NG language for defining XML schemas. RELAX NG is itself defined by an OASIS specification, as well as by part two of the international standard ISO/IEC 19757: Document Schema Definition Languages (DSDL).

Read more about this topic:  Open Document Technical Specification