A
code is a rule for converting a piece of
information (for example, a
letter,
word,
phrase, or
gesture) into another form or representation (one
sign into another sign), not necessarily of the same type.
In communications and information processing, encoding is the process by which information from a source is converted into symbols to be communicated. Decoding is the reverse process, converting these code symbols back into information understandable by a receiver.
One reason for coding is to enable communication in places where ordinary spoken or written language is difficult or impossible. For example, semaphore, where the configuration of flags held signaller or the arms of a semaphore tower encodes parts of the message, typically individual letters and numbers. Another person standing a great distance away can interpret the flags and reproduce the words sent.
Theory
In
information theory and
computer science, a code is usually considered as an
algorithm which uniquely represents symbols from some source
alphabet, by ''encoded'' strings, which may be in some other target alphabet. An extension of the code for representing sequences of symbols over the source alphabet is obtained by concatenating the encoded strings.
Before giving a mathematically precise definition, we give a brief example. The mapping
:
is a code, whose source alphabet is the set and whose target alphabet is the set . Using the extension of the code, the encoded string ''0011001011'' can be grouped into codewords as ''0 – 011 – 0 – 01 – 011'', and these in turn can be decoded to the sequence of source symbols ''acabc''.
Using terms from formal language theory, the precise mathematical definition of this concept is as follows: Let S and T be two finite sets, called the source and target alphabets, respectively. A code is a total function mapping each symbol from S to a sequence of symbols over T, and the extension of M to a homomorphism of into , which naturally maps each sequence of source symbols to a sequence of target symbols, is referred to as its extension.
Variable-length codes
In this section we consider codes, which encode each source (clear text) character by a
code word from some dictionary, and
concatenation of such code words give us an encoded string.
Variable-length codes are especially useful when clear text characters have different probabilities; see also
entropy encoding.
A ''prefix code'' is a code with the "prefix property": there is no valid code word in the system that is a prefix (start) of any other valid code word in the set. Huffman coding is the most known algorithm for deriving prefix codes, so prefix codes are also widely referred to as "Huffman codes", even when the code was not produced by a Huffman algorithm.
Other examples of prefix codes are country calling codes, the country and publisher parts of ISBNs, and the Secondary Synchronization Codes used in the UMTS W-CDMA 3G Wireless Standard.
Kraft's inequality characterizes the sets of code word lengths that are possible in a prefix code. Virtually, any uniquely decodable one-to-many code, not necessary a prefix one, must satisfy Kraft's inequality.
Block codes
Error correcting codes
Codes may also be used to represent data in a way more resistant
to errors in transmission or storage. Such a "code" is
called an
error-correcting code, and works by including carefully crafted redundancy with the stored (or transmitted) data. Examples include
Hamming codes,
Reed–Solomon,
Reed–Muller,
Walsh–Hadamard,
Bose–Chaudhuri–Hochquenghem,
Turbo,
Golay,
Goppa,
low-density parity-check codes, and
space–time codes.
Error detecting codes can be optimised to detect ''burst errors'', or ''random errors''.
Examples
Codes in communication used for brevity
A cable code replaces words (e.g., ''ship'' or ''invoice'') with shorter words, allowing the same information to be sent with fewer
characters, more quickly, and most important, less expensively.
Codes can be used for brevity. When telegraph messages were the state of the art in rapid long distance communication, elaborate systems of commercial codes that encoded complete phrases into single words (commonly five-letter groups) were developed, so that telegraphers became conversant with such "words" as ''BYOXO'' ("Are you trying to weasel out of our deal?"), ''LIOUY'' ("Why do you not answer my question?"), ''BMULD'' ("You're a skunk!"), or ''AYYLU'' ("Not clearly coded, repeat more clearly."). Code words were chosen for various reasons: length, pronounceability, etc. Meanings were chosen to fit perceived needs: commercial negotiations, military terms for military codes, diplomatic terms for diplomatic codes, any and all of the preceding for espionage codes. Codebooks and codebook publishers proliferated, including one run as a front for the American Black Chamber run by Herbert Yardley between the First and Second World Wars. The purpose of most of these codes was to save on cable costs. The use of data coding for data compression predates the computer era; an early example is the telegraph Morse code where more-frequently used characters have shorter representations. Techniques such as Huffman coding are now used by computer-based algorithms to compress large data files into a more compact form for storage or transmission.
Character encodings
Probably the most widely known data communications code so far (aka character representation) in use today is
ASCII. In one or another (somewhat compatible) version, it is used by nearly all personal
computers,
terminals,
printers, and other communication equipment. It represents 128
characters with seven-bit
binary numbers—that is, as a string of seven 1s and 0s. In ASCIIvcx a lowercase "a" is always 1100001, an uppercase "A" always 1000001, and so on. There are many other encodings, which represent each character by a
byte (usually referred as
code pages), integer
code point (
Unicode) or a byte sequence (
UTF-8).
Genetic code
Biological organisms contain genetic material that is used to control their function and development. This is
DNA which contains units named
genes that can produce
proteins through a code (
genetic code) in which a series of triplets {
codons} of four possible
nucleotides are translated into one of twenty possible
amino acids. A sequence of codons results in a corresponding sequence of amino acids that form a protein.
Gödel code
In
mathematics, a
Gödel code was the basis for the proof of
Gödel's
incompleteness theorem. Here, the idea was to map
mathematical notation to a
natural number (using a
Gödel numbering).
Other
There are codes using colors, like
traffic lights, the
color code employed to mark the nominal value of the
electrical resistors or that of the trashcans devoted to specific types of garbage (paper, glass, biological, etc.)
In marketing, coupon codes can be used for a financial discount or rebate when purchasing a product from an internet retailer.
In military environments, specific sounds with the cornet are used for different uses: to mark some moments of the day, to command the infantry in the battlefield, etc.
Communication systems for sensory impairments, as the sign language for deaf people and braille for blind people, are based in movement or tactile codes.
Musical scores are the most common way to encode music.
Specific games, as chess, have their own code systems to record the matches (chess notation).
Cryptography
In the
history of cryptography, codes were once common for ensuring the confidentiality of communications, although
ciphers are now used instead. See
code (cryptography).
Secret codes intended to obscure the real messages, ranging from serious (mainly espionage in military, diplomatic, business, etc.) to trivial (romance, games) can be any kind of imaginative encoding: flowers, game cards, clothes, fans, hats, melodies, birds, etc., in which the sole requisite is the previous agreement of the meaning by both the sender and the receiver.
Other examples
Other examples of encoding include:
Encoding (in cognition) is a basic perceptual process of interpreting incoming stimuli; technically speaking, it is a complex, multi-stage process of converting relatively objective sensory input (e.g., light, sound) into subjectively meaningful experience.
A content format is a specific encoding format for converting a specific type of data to information.
Text encoding uses a markup language to tag the structure and other features of a text to facilitate processing by computers. (See also Text Encoding Initiative.)
Semantics encoding of formal language A in formal language B is a method of representing all terms (e.g. programs or descriptions) of language A using language B.
Electronic encoding transforms a signal into a code optimized for transmission or storage, generally done with a codec.
Neural encoding is the way in which information is represented in neurons.
Memory encoding is the process of converting sensations into memories.
Television encoding: NTSC, PAL and SECAM
Other examples of decoding include:
Digital-to-analog converter, the use of analog circuit for decoding operations
Decoding (Computer Science)
Decoding methods, methods in communication theory for decoding codewords sent over a noisy channel
Digital signal processing, the study of signals in a digital representation and the processing methods of these signals
Word decoding, the use of phonics to decipher print patterns and translate them into the sounds of language
Codes and acronyms
Acronyms and abbreviations can be considered codes, and in a sense all
languages and writing systems are codes for human thought.
International Air Transport Association airport codes are three-letter codes used to designate airports and used for bag tags.
Occasionally a code word achieves an independent existence (and meaning) while the original equivalent phrase is forgotten or at least no longer has the precise meaning attributed to the code word. For example, '30' was widely used in journalism to mean "end of story", and it is sometimes used in other contexts to signify "the end".
References
See also
Asemic writing
Equipment codes
Semiotics
Quantum error correction
Category:Encodings
Category:Signal processing
bs:Kod
ca:Codi
cs:Kód
da:Kode
de:Code
et:Kood
el:Κώδικας (υπολογιστές)
eo:Kodo
fa:کد
ko:부호 (정보)
hr:Kod
id:Kode
it:Codice (teoria dell'informazione)
he:קוד
kk:Кода (электроника)
hu:Kód
ja:符号
no:Kode
nn:Kode
pl:Kod
ru:Код
simple:Code
sk:Kód (informatika)
sr:Код
sh:Код
fi:Koodi
sv:Kodning
th:รหัส
uk:Код
vi:Mã hiệu