Comma-separated Values - Toward Standardization

Toward Standardization

The huge variety among "CSV" formats has led to the assertion that there is no "CSV standard". In common usage, almost any delimiter-separated text data may be referred to as a "CSV" file. Different CSV formats may not be compatible.

Nevertheless, RFC 4180 is an effort to formalize CSV. It defines the MIME type "text/csv", and CSV files that follow its rules should be very widely portable. Among its requirements:

  • DOS-style lines that end with (CRLF) characters
  • An optional header record (there is no sure way to detect whether it is present, so care is required when importing).
  • Each record "should" contain the same number of comma-separated fields.
  • Any field may be quoted (with double quotes).
  • Fields containing a line-break, double-quote, and/or commas should be quoted. (If they are not, the file will likely be impossible to process correctly).
  • A (double) quote character in a field must be represented by two (double) quote characters.

The format is simple and can be processed by most programs that claim to read CSV files. The exceptions are (a) programs may not support line-breaks within quoted fields, and (b) programs may confuse the optional header with data or interpret the first data line as an optional header.

Read more about this topic:  Comma-separated Values