Basic Rules and Examples
Many informal documents exist that describe "CSV" formats. IETF RFC 4180 (summarized above) defines the format for the "text/csv" MIME type registered with the IANA. (Shafranovich 2005) Another relevant specification is provided by Fielded Text. Creativyst (2010) provides an overview of the variations used in the most widely used applications and explains how CSV can best be used and supported.
Rules typical of these and other "CSV" specifications and implementations are as follow:
- CSV is a delimited data format that has fields/columns separated by the comma character and records/rows terminated by newlines.
- A CSV file does not require a specific character encoding, byte order, or line terminator format (some software does not support all line-end variations).
- A record ends at a line terminator. However, line-terminators can be embedded as data within fields, so software must recognize quoted line-separators (see below) in order to correctly assemble an entire record from perhaps multiple lines.
- All records should have the same number of fields, in the same order.
- Data within fields is interpreted as a sequence of characters, not as a sequence of bits or bytes (see RFC 2046, section 4.1). For example, the numeric quantity 65535 may be represented as the 5 ASCII characters "65535" (or perhaps other forms such as "0xFFFF", "000065535.000E+00", etc.); but not as a sequence of 2 bytes intended to be treated as a single binary integer rather than as two characters. If this "plain text" convention is not followed, then the CSV file no longer contains sufficient information to interpret it correctly, the CSV file will not likely survive transmission across differing computer architectures, and will not conform to the text/csv MIME type.
- Adjacent fields must be separated by a single comma. However, "CSV" formats vary greatly in this choice of separator character. In particular, in locales where the comma is used as a decimal separator, semicolon, TAB, or other characters are used instead.
- Any field may be quoted (that is, enclosed within double-quote characters). Some fields must be quoted, as specified in following rules.
- Fields with embedded commas must be quoted.
- Each of the embedded double-quote characters must be represented by a pair of double-quote characters.
- Fields with embedded line breaks must be quoted (however, many CSV implementations simply do not support this).
- In some CSV implementations, leading and trailing spaces and tabs are trimmed. This practice is controversial, and does not accord with RFC 4180, which states "Spaces are considered part of a field and should not be ignored."
- In CSV implementations that do trim leading or trailing spaces, fields with such spaces as meaningful data must be quoted.
- The first record may be a "header", which contains column names in each of the fields (there is no reliable way to tell whether a file does this or not; however, it is uncommon to use characters other than letters, digits, and underscores in such column names).
Read more about this topic: Comma-separated Values
Famous quotes containing the words basic, rules and/or examples:
“It is easier to move rivers and mountains than to change a persons basic nature.”
—Chinese proverb.
“The average educated man in America has about as much knowledge of what a political idea is as he has of the principles of counterpoint. Each is a thing used in politics or music which those fellows who practise politics or music manipulate somehow. Show him one and he will deny that it is politics at all. It must be corrupt or he will not recognize it. He has only seen dried figs. He has only thought dried thoughts. A live thought or a real idea is against the rules of his mind.”
—John Jay Chapman (18621933)
“Histories are more full of examples of the fidelity of dogs than of friends.”
—Alexander Pope (16881744)