Comma-separated Values - Basic Rules and Examples

Basic Rules and Examples

Many informal documents exist that describe "CSV" formats. IETF RFC 4180 (summarized above) defines the format for the "text/csv" MIME type registered with the IANA. (Shafranovich 2005) Another relevant specification is provided by Fielded Text. Creativyst (2010) provides an overview of the variations used in the most widely used applications and explains how CSV can best be used and supported.

Rules typical of these and other "CSV" specifications and implementations are as follow:

  • CSV is a delimited data format that has fields/columns separated by the comma character and records/rows terminated by newlines.
  • A CSV file does not require a specific character encoding, byte order, or line terminator format (some software does not support all line-end variations).
  • A record ends at a line terminator. However, line-terminators can be embedded as data within fields, so software must recognize quoted line-separators (see below) in order to correctly assemble an entire record from perhaps multiple lines.
  • All records should have the same number of fields, in the same order.
  • Data within fields is interpreted as a sequence of characters, not as a sequence of bits or bytes (see RFC 2046, section 4.1). For example, the numeric quantity 65535 may be represented as the 5 ASCII characters "65535" (or perhaps other forms such as "0xFFFF", "000065535.000E+00", etc.); but not as a sequence of 2 bytes intended to be treated as a single binary integer rather than as two characters. If this "plain text" convention is not followed, then the CSV file no longer contains sufficient information to interpret it correctly, the CSV file will not likely survive transmission across differing computer architectures, and will not conform to the text/csv MIME type.
  • Adjacent fields must be separated by a single comma. However, "CSV" formats vary greatly in this choice of separator character. In particular, in locales where the comma is used as a decimal separator, semicolon, TAB, or other characters are used instead.
1997,Ford,E350
  • Any field may be quoted (that is, enclosed within double-quote characters). Some fields must be quoted, as specified in following rules.
"1997","Ford","E350"
  • Fields with embedded commas must be quoted.
1997,Ford,E350,"Super, luxurious truck"
  • Each of the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super, ""luxurious"" truck"
  • Fields with embedded line breaks must be quoted (however, many CSV implementations simply do not support this).
1997,Ford,E350,"Go get one now they are going fast"
  • In some CSV implementations, leading and trailing spaces and tabs are trimmed. This practice is controversial, and does not accord with RFC 4180, which states "Spaces are considered part of a field and should not be ignored."
1997, Ford, E350 not same as 1997,Ford,E350
  • In CSV implementations that do trim leading or trailing spaces, fields with such spaces as meaningful data must be quoted.
1997,Ford,E350," Super luxurious truck "
  • The first record may be a "header", which contains column names in each of the fields (there is no reliable way to tell whether a file does this or not; however, it is uncommon to use characters other than letters, digits, and underscores in such column names).
Year,Make,Model 1997,Ford,E350 2000,Mercury,Cougar

Read more about this topic:  Comma-separated Values

Famous quotes containing the words basic, rules and/or examples:

    Man has lost the basic skill of the ape, the ability to scratch its back. Which gave it extraordinary independence, and the liberty to associate for reasons other than the need for mutual back-scratching.
    Jean Baudrillard (b. 1929)

    The early Christian rules of life were not made to last, because the early Christians did not believe that the world itself was going to last.
    George Bernard Shaw (1856–1950)

    There are many examples of women that have excelled in learning, and even in war, but this is no reason we should bring ‘em all up to Latin and Greek or else military discipline, instead of needle-work and housewifry.
    Bernard Mandeville (1670–1733)