[Unicode]  Unihan Database Home | Site Map | Search
 

Unihan Database Lookup

 

Use images, not text

About the Unihan Database Lookup Tool

As a handy reference, the Unicode Consortium here provides a search interface to the Unicode Hàn (漢) Database (Unihan).

The Unihan Database organizes information relating to the properties of CJK Unified Ideographs. Unihan Database Documentation is available in UAX #38.

For production reasons, the version of the Unihan database available via this lookup tool may not be in sync with the latest version of the Unicode Standard. The current version available via the lookup tool is Version 6.0. For access to the most recent version of the raw data files (Unihan.zip), see http://www.unicode.org/Public/UNIDATA/.

The lookup interface on this page provides access to Unihan information on individual characters through the “Lookup” button and text field above. Enter the four- or five-digit hexadecimal identifier for the character (if you know it), or copy and paste a character (if you have one), and then click the “Lookup” button.

The resulting data set will contain various types of information available in the Unihan database, for example, mappings to legacy encoding standards, references to major dictionaries, and meaning and pronunciation information according to various authorities.

The Use images, not text check-box allows you to control whether CJK ideographs in the search results are displayed as text (reliant upon your CJK system fonts, if any) or as embedded images. Using images is relatively system-independent, but the images are fixed-size and loading time may be longer. If you choose to use images, results of some queries will display CJK ideographs as, for example, HAN rather than as 漢 Note that images may not be available for every character.

If you do not happen to know the hexadecimal value of the code point, and have no example of the character in text to copy and paste, another Unihan Search Page supports queries on a few select fields (for example key-word and pronunciation fields).

There are also two indices for the database: a grid index grouping the characters in blocks of 256; and a radical-stroke index.

Unihan Code Charts and Indices

See The Unicode Standard, Chapter 12 (PDF) for discussion of Han (CJKV) unification principles. The Unihan Radical-Stroke indices are documented in a short PDF file. The indices are available online in two PDF files, the Full RS Index and the II Core RS Index (version 5.0). Code Charts covering all of Unihan are available in PDF format, linked from the main chart index page along with other code charts.

Disclaimers

The Unihan database is provided as a public service by Unicode, Inc. These data are provided as-is by Unicode, Inc. (The Unicode Consortium). No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided.

The data in the Unihan database comes from various sources. The majority of this data is released as part of the Unihan.zip data file for each version of the standard. Some of the data in the Unihan database comes from other sources. For example the Japanese and Chinese compound data derive from CEDICT and Jim Breen's EDICT projects.