- published: 02 Jan 2014
- views: 52369
In character encoding terminology, a code point or code position is any of the numerical values that make up the code space (or code page). For example, ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112 code points in the range 0hex to 10FFFFhex. The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 216) code points. Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112.
The notion of a code point is used for abstraction, to distinguish both:
This is because one may wish to make these distinctions:
For Unicode, the particular sequence of bits is called a code unit – for the UCS-4 encoding, characters/code points are encoded as 4-byte (octet) binary numbers (which is fixed-width and simple, but inefficient), while in the UTF-8 encoding, characters are encoded as 1- to 4-byte numbers (which is variable-width, hence more efficient but more complex, and backward-compatible with ASCII). Code points are normally assigned to abstract characters. An abstract character is not a graphical glyph but a unit of textual data. The precise appearance of the character depends on the font. However code points may also be left reserved for future assignment (most of the Unicode code space is unassigned), or given other designated functions.