- published: 01 Oct 2012
- views: 610
In computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text and to "binary files" in which some portions must be interpreted as binary objects (encoded integers, real numbers, images, etc.).
The encoding has traditionally been either ASCII, one of its many derivatives such as ISO/IEC 646 etc., or sometimes EBCDIC. Unicode-based encodings such as UTF-8 and UTF-16 are gradually replacing the older ASCII derivatives limited to 7 or 8 bit codes.
Files that contain markup or other meta-data are generally considered plain-text, as long as the entirety remains in directly human-readable form (as in HTML, XML, and so on (as Coombs, Renear, and DeRose argue, punctuation is itself markup). The use of plain-text rather than bit-streams to express markup, enables files to survive much better "in the wild", in part by making them largely immune to computer architecture incompatibilities.
More formally, the fundamental distinction of "plain text" is that no information would be lost if you went through and translated the file to a completely different character encoding, or translated it to no encoding by just printing it out generically (provided the printer has a good enough font that you can correctly distinguish all the characters!). No information is conveyed by the fact that an "A" in the printout was originally stored as a byte with value 65 (as it would be in ASCII), or with value 193 (as in EBCDIC); and it certainly wasn't meant to express half of the bits of an integer.[clarification needed (complicated jargon)]