Crlf - Representations

Representations

Software applications and operating systems usually represent a newline with one or two control characters:

  • Systems based on ASCII or a compatible character set use either LF (Line feed, '\n', 0x0A, 10 in decimal) or CR (Carriage return, '\r', 0x0D, 13 in decimal) individually, or CR followed by LF (CR+LF, '\r\n', 0x0D0A). These characters are based on printer commands: The line feed indicated that one line of paper should feed out of the printer thus instructed the printer to advance the paper one line, and a carriage return indicated that the printer carriage should return to the beginning of the current line. Some rare systems, such as QNX before version 4, used the ASCII RS (record separator, 0x1E, 30 in decimal) character as the newline character.
    • LF: Multics, Unix and Unix-like systems (GNU/Linux, Mac OS X, FreeBSD, AIX, Xenix, etc.), BeOS, Amiga, RISC OS and others.
    • CR+LF: Microsoft Windows, DEC TOPS-10, RT-11 and most other early non-Unix and non-IBM OSes, CP/M, MP/M, DOS (MS-DOS, PC-DOS, etc.), Atari TOS, OS/2, Symbian OS, Palm OS
    • LF+CR: Acorn BBC and RISC OS spooled text output.
    • CR: Commodore 8-bit machines, Acorn BBC, TRS-80, Apple II family, Mac OS up to version 9 and OS-9
    • RS: QNX pre-POSIX implementation.
  • EBCDIC systems—mainly IBM mainframe systems, including z/OS (OS/390) and i5/OS (OS/400)—use NL (New Line, 0x15) as the character combining the functions of line-feed and carriage-return. The equivalent UNICODE character is called NEL (Next Line). Note that EBCDIC also has control characters called CR and LF, but the numerical value of LF (0x25) differs from the one used by ASCII (0x0A). Additionally, there are some EBCDIC variants that also use NL but assign a different numeric code to the character.
  • Operating systems for the CDC 6000 series defined a newline as two or more zero-valued six-bit characters at the end of a 60-bit word. Some configurations also defined a zero-valued character as a colon character, with the result that multiple colons could be interpreted as a newline depending on position.
  • ZX80 and ZX81, home computers from Sinclair Research Ltd used a specific non-ASCII character set with code NEWLINE (0x76, 118 decimal) as the newline character.
  • OpenVMS uses a record-based file system, which stores text files as one record per line. In most file formats, no line terminators are actually stored, but the Record Management Services facility can transparently add a terminator to each line when it is retrieved by an application. The records themselves could contain the same line terminator characters, which could either be considered a feature or a nuisance depending on the application.
  • Fixed line length was used by some early mainframe operating systems. In such a system, an implicit end-of-line was assumed every 72 or 80 characters, for example. No newline character was stored. If a file was imported from the outside world, lines shorter than the line length had to be padded with spaces, while lines longer than the line length had to be truncated. This mimicked the use of punched cards, on which each line was stored on a separate card, usually with 80 columns on each card, often with sequence numbers in columns 73-80. Many of these systems added a carriage control character to the start of the next record; this could indicate if the next record was a continuation of the line started by the previous record, or a new line, or should overprint the previous line (similar to a CR). Often this was a normal printing character such as '#' that thus could not be used as the first character in a line. Some early line printers interpreted these characters directly in the records sent to them.

Most textual Internet protocols (including HTTP, SMTP, FTP, IRC and many others) mandate the use of ASCII CR+LF (0x0D 0x0A) on the protocol level, but recommend that tolerant applications recognize lone LF as well. In practice, there are many applications that erroneously use the C newline character '\n' instead (see section Newline in programming languages below). This leads to problems when trying to communicate with systems adhering to a stricter interpretation of the standards; one such system is the qmail MTA that actively refuses to accept messages from systems that send bare LF instead of the required CR+LF.

FTP has a feature to transform newlines between CR+LF and LF only when transferring text files. This must not be used on binary files. Usually binary files and text files are recognised by checking their filename extension.

Read more about this topic:  Crlf