UTF-8, UTF-16 en UTF-32 zijn drie op Unicode gebaseerder coderingsschema’s. De onderlinge verschillen zitten in het aantal bytes dat nodig is om een karakter op te slaan. De kleinste eenheid binnen een opgeslagen bestaand is respectievelijk 1 byte, een blok van 2 bytes en een blok van 4 bytes. De karaktergroottes variëren als volgt per karakterinterval:

begin	eind	UTF-8	UTF-16	UTF-32
0	7F	1 byte	2 bytes	4 bytes
80	7FF	2 bytes
800	FFFF	3 bytes
10000	10FFFF	4 bytes	4 bytes

Daarnaast heeft UTF-8 als pluspunt dat het compatibel is met US-ASCII; doordat het slechts één byte per ASCII-karakter nodig heeft, is UTF-8 erg efficiënt met westerse teksten.

Omdat de kleinste eenheid bij UTF-16 en UTF-32 groter is dan één byte, kunnen hier twee verschillende byte-volgordes onderscheiden worden: big-endian en little-endian. Dit leidt tot UTF-16BE, UTF-16LE, UTF-32BE en UTF-32LE.

De karakters

De meeste Unicode-karakters zijn ingedeeld in zogenaamde blokken. De volgende blokken zijn beschikbaar:

begin	eind	blok
0	7F	Basic Latin
80	FF	Latin-1 Supplement
100	17F	Latin Extended-A
180	24F	Latin Extended-B
250	2AF	IPA Extensions
2B0	2FF	Spacing Modifier Letters
300	36F	Combining Diacritical Marks
370	3FF	Greek and Coptic
400	4FF	Cyrillic
500	52F	Cyrillic Supplement
530	58F	Armenian
590	5FF	Hebrew
600	6FF	Arabic
700	74F	Syriac
750	77F	Arabic Supplement
780	7BF	Thaana
7C0	7FF	NKo
800	83F	Samaritan
840	85F	Mandaic
900	97F	Devanagari
980	9FF	Bengali
A00	A7F	Gurmukhi
A80	AFF	Gujarati
B00	B7F	Oriya
B80	BFF	Tamil
C00	C7F	Telugu
C80	CFF	Kannada
D00	D7F	Malayalam
D80	DFF	Sinhala
E00	E7F	Thai
E80	EFF	Lao
F00	FFF	Tibetan
1000	109F	Myanmar
10A0	10FF	Georgian
1100	11FF	Hangul Jamo
1200	137F	Ethiopic
1380	139F	Ethiopic Supplement
13A0	13FF	Cherokee
1400	167F	Unified Canadian Aboriginal Syllabics
1680	169F	Ogham
16A0	16FF	Runic
1700	171F	Tagalog
1720	173F	Hanunoo
1740	175F	Buhid
1760	177F	Tagbanwa
1780	17FF	Khmer
1800	18AF	Mongolian
18B0	18FF	Unified Canadian Aboriginal Syllabics Extended
1900	194F	Limbu
1950	197F	Tai Le
1980	19DF	New Tai Lue
19E0	19FF	Khmer Symbols
1A00	1A1F	Buginese
1A20	1AAF	Tai Tham
1B00	1B7F	Balinese
1B80	1BBF	Sundanese
1BC0	1BFF	Batak
1C00	1C4F	Lepcha
1C50	1C7F	Ol Chiki
1CD0	1CFF	Vedic Extensions
1D00	1D7F	Phonetic Extensions
1D80	1DBF	Phonetic Extensions Supplement
1DC0	1DFF	Combining Diacritical Marks Supplement
1E00	1EFF	Latin Extended Additional
1F00	1FFF	Greek Extended
2000	206F	General Punctuation
2070	209F	Superscripts and Subscripts
20A0	20CF	Currency Symbols
20D0	20FF	Combining Diacritical Marks for Symbols
2100	214F	Letterlike Symbols
2150	218F	Number Forms
2190	21FF	Arrows
2200	22FF	Mathematical Operators
2300	23FF	Miscellaneous Technical
2400	243F	Control Pictures
2440	245F	Optical Character Recognition
2460	24FF	Enclosed Alphanumerics
2500	257F	Box Drawing
2580	259F	Block Elements
25A0	25FF	Geometric Shapes
2600	26FF	Miscellaneous Symbols
2700	27BF	Dingbats
27C0	27EF	Miscellaneous Mathematical Symbols-A
27F0	27FF	Supplemental Arrows-A
2800	28FF	Braille Patterns
2900	297F	Supplemental Arrows-B
2980	29FF	Miscellaneous Mathematical Symbols-B
2A00	2AFF	Supplemental Mathematical Operators
2B00	2BFF	Miscellaneous Symbols and Arrows
2C00	2C5F	Glagolitic
2C60	2C7F	Latin Extended-C
2C80	2CFF	Coptic
2D00	2D2F	Georgian Supplement
2D30	2D7F	Tifinagh
2D80	2DDF	Ethiopic Extended
2DE0	2DFF	Cyrillic Extended-A
2E00	2E7F	Supplemental Punctuation
2E80	2EFF	CJK Radicals Supplement
2F00	2FDF	Kangxi Radicals
2FF0	2FFF	Ideographic Description Characters
3000	303F	CJK Symbols and Punctuation
3040	309F	Hiragana
30A0	30FF	Katakana
3100	312F	Bopomofo
3130	318F	Hangul Compatibility Jamo
3190	319F	Kanbun
31A0	31BF	Bopomofo Extended
31C0	31EF	CJK Strokes
31F0	31FF	Katakana Phonetic Extensions
3200	32FF	Enclosed CJK Letters and Months
3300	33FF	CJK Compatibility
3400	4DBF	CJK Unified Ideographs Extension A
4DC0	4DFF	Yijing Hexagram Symbols
4E00	9FFF	CJK Unified Ideographs
A000	A48F	Yi Syllables
A490	A4CF	Yi Radicals
A4D0	A4FF	Lisu
A500	A63F	Vai
A640	A69F	Cyrillic Extended-B
A6A0	A6FF	Bamum
A700	A71F	Modifier Tone Letters
A720	A7FF	Latin Extended-D
A800	A82F	Syloti Nagri
A830	A83F	Common Indic Number Forms
A840	A87F	Phags-pa
A880	A8DF	Saurashtra
A8E0	A8FF	Devanagari Extended
A900	A92F	Kayah Li
A930	A95F	Rejang
A960	A97F	Hangul Jamo Extended-A
A980	A9DF	Javanese
AA00	AA5F	Cham
AA60	AA7F	Myanmar Extended-A
AA80	AADF	Tai Viet
AB00	AB2F	Ethiopic Extended-A
ABC0	ABFF	Meetei Mayek
AC00	D7AF	Hangul Syllables
D7B0	D7FF	Hangul Jamo Extended-B
D800	DB7F	High Surrogates
DB80	DBFF	High Private Use Surrogates
DC00	DFFF	Low Surrogates
E000	F8FF	Private Use Area
F900	FAFF	CJK Compatibility Ideographs
FB00	FB4F	Alphabetic Presentation Forms
FB50	FDFF	Arabic Presentation Forms-A
FE00	FE0F	Variation Selectors
FE10	FE1F	Vertical Forms
FE20	FE2F	Combining Half Marks
FE30	FE4F	CJK Compatibility Forms
FE50	FE6F	Small Form Variants
FE70	FEFF	Arabic Presentation Forms-B
FF00	FFEF	Halfwidth and Fullwidth Forms
FFF0	FFFF	Specials
10000	1007F	Linear B Syllabary
10080	100FF	Linear B Ideograms
10100	1013F	Aegean Numbers
10140	1018F	Ancient Greek Numbers
10190	101CF	Ancient Symbols
101D0	101FF	Phaistos Disc
10280	1029F	Lycian
102A0	102DF	Carian
10300	1032F	Old Italic
10330	1034F	Gothic
10380	1039F	Ugaritic
103A0	103DF	Old Persian
10400	1044F	Deseret
10450	1047F	Shavian
10480	104AF	Osmanya
10800	1083F	Cypriot Syllabary
10840	1085F	Imperial Aramaic
10900	1091F	Phoenician
10920	1093F	Lydian
10A00	10A5F	Kharoshthi
10A60	10A7F	Old South Arabian
10B00	10B3F	Avestan
10B40	10B5F	Inscriptional Parthian
10B60	10B7F	Inscriptional Pahlavi
10C00	10C4F	Old Turkic
10E60	10E7F	Rumi Numeral Symbols
11000	1107F	Brahmi
11080	110CF	Kaithi
12000	123FF	Cuneiform
12400	1247F	Cuneiform Numbers and Punctuation
13000	1342F	Egyptian Hieroglyphs
16800	16A3F	Bamum Supplement
1B000	1B0FF	Kana Supplement
1D000	1D0FF	Byzantine Musical Symbols
1D100	1D1FF	Musical Symbols
1D200	1D24F	Ancient Greek Musical Notation
1D300	1D35F	Tai Xuan Jing Symbols
1D360	1D37F	Counting Rod Numerals
1D400	1D7FF	Mathematical Alphanumeric Symbols
1F000	1F02F	Mahjong Tiles
1F030	1F09F	Domino Tiles
1F0A0	1F0FF	Playing Cards
1F100	1F1FF	Enclosed Alphanumeric Supplement
1F200	1F2FF	Enclosed Ideographic Supplement
1F300	1F5FF	Miscellaneous Symbols And Pictographs
1F600	1F64F	Emoticons
1F680	1F6FF	Transport And Map Symbols
1F700	1F77F	Alchemical Symbols
20000	2A6DF	CJK Unified Ideographs Extension B
2A700	2B73F	CJK Unified Ideographs Extension C
2B740	2B81F	CJK Unified Ideographs Extension D
2F800	2FA1F	CJK Compatibility Ideographs Supplement
E0000	E007F	Tags
E0100	E01EF	Variation Selectors Supplement
F0000	FFFFF	Supplementary Private Use Area-A
100000	10FFFF	Supplementary Private Use Area-B

Klik op een bloknaam om het codetabelfragment te bekijken.

XHTML5.NL

Unicode-karaktersets

De karakters