XHTML5.NL

Unicode-karaktersets

UTF-8, UTF-16 en UTF-32 zijn drie op Unicode gebaseerder coderingsschema’s. De onderlinge verschillen zitten in het aantal bytes dat nodig is om een karakter op te slaan. De kleinste eenheid binnen een opgeslagen bestaand is respectievelijk 1 byte, een blok van 2 bytes en een blok van 4 bytes. De karaktergroottes variëren als volgt per karakterinterval:

begineindUTF-8UTF-16UTF-32
07F1 byte2 bytes4 bytes
807FF2 bytes
800FFFF3 bytes
1000010FFFF4 bytes4 bytes

Daarnaast heeft UTF-8 als pluspunt dat het compatibel is met US-ASCII; doordat het slechts één byte per ASCII-karakter nodig heeft, is UTF-8 erg efficiënt met westerse teksten.

Omdat de kleinste eenheid bij UTF-16 en UTF-32 groter is dan één byte, kunnen hier twee verschillende byte-volgordes onderscheiden worden: big-endian en little-endian. Dit leidt tot UTF-16BE, UTF-16LE, UTF-32BE en UTF-32LE.

De karakters

De meeste Unicode-karakters zijn ingedeeld in zogenaamde blokken. De volgende blokken zijn beschikbaar:

begineindblok
07FBasic Latin
80FFLatin-1 Supplement
10017FLatin Extended-A
18024FLatin Extended-B
2502AFIPA Extensions
2B02FFSpacing Modifier Letters
30036FCombining Diacritical Marks
3703FFGreek and Coptic
4004FFCyrillic
50052FCyrillic Supplement
53058FArmenian
5905FFHebrew
6006FFArabic
70074FSyriac
75077FArabic Supplement
7807BFThaana
7C07FFNKo
80083FSamaritan
84085FMandaic
90097FDevanagari
9809FFBengali
A00A7FGurmukhi
A80AFFGujarati
B00B7FOriya
B80BFFTamil
C00C7FTelugu
C80CFFKannada
D00D7FMalayalam
D80DFFSinhala
E00E7FThai
E80EFFLao
F00FFFTibetan
1000109FMyanmar
10A010FFGeorgian
110011FFHangul Jamo
1200137FEthiopic
1380139FEthiopic Supplement
13A013FFCherokee
1400167FUnified Canadian Aboriginal Syllabics
1680169FOgham
16A016FFRunic
1700171FTagalog
1720173FHanunoo
1740175FBuhid
1760177FTagbanwa
178017FFKhmer
180018AFMongolian
18B018FFUnified Canadian Aboriginal Syllabics Extended
1900194FLimbu
1950197FTai Le
198019DFNew Tai Lue
19E019FFKhmer Symbols
1A001A1FBuginese
1A201AAFTai Tham
1B001B7FBalinese
1B801BBFSundanese
1BC01BFFBatak
1C001C4FLepcha
1C501C7FOl Chiki
1CD01CFFVedic Extensions
1D001D7FPhonetic Extensions
1D801DBFPhonetic Extensions Supplement
1DC01DFFCombining Diacritical Marks Supplement
1E001EFFLatin Extended Additional
1F001FFFGreek Extended
2000206FGeneral Punctuation
2070209FSuperscripts and Subscripts
20A020CFCurrency Symbols
20D020FFCombining Diacritical Marks for Symbols
2100214FLetterlike Symbols
2150218FNumber Forms
219021FFArrows
220022FFMathematical Operators
230023FFMiscellaneous Technical
2400243FControl Pictures
2440245FOptical Character Recognition
246024FFEnclosed Alphanumerics
2500257FBox Drawing
2580259FBlock Elements
25A025FFGeometric Shapes
260026FFMiscellaneous Symbols
270027BFDingbats
27C027EFMiscellaneous Mathematical Symbols-A
27F027FFSupplemental Arrows-A
280028FFBraille Patterns
2900297FSupplemental Arrows-B
298029FFMiscellaneous Mathematical Symbols-B
2A002AFFSupplemental Mathematical Operators
2B002BFFMiscellaneous Symbols and Arrows
2C002C5FGlagolitic
2C602C7FLatin Extended-C
2C802CFFCoptic
2D002D2FGeorgian Supplement
2D302D7FTifinagh
2D802DDFEthiopic Extended
2DE02DFFCyrillic Extended-A
2E002E7FSupplemental Punctuation
2E802EFFCJK Radicals Supplement
2F002FDFKangxi Radicals
2FF02FFFIdeographic Description Characters
3000303FCJK Symbols and Punctuation
3040309FHiragana
30A030FFKatakana
3100312FBopomofo
3130318FHangul Compatibility Jamo
3190319FKanbun
31A031BFBopomofo Extended
31C031EFCJK Strokes
31F031FFKatakana Phonetic Extensions
320032FFEnclosed CJK Letters and Months
330033FFCJK Compatibility
34004DBFCJK Unified Ideographs Extension A
4DC04DFFYijing Hexagram Symbols
4E009FFFCJK Unified Ideographs
A000A48FYi Syllables
A490A4CFYi Radicals
A4D0A4FFLisu
A500A63FVai
A640A69FCyrillic Extended-B
A6A0A6FFBamum
A700A71FModifier Tone Letters
A720A7FFLatin Extended-D
A800A82FSyloti Nagri
A830A83FCommon Indic Number Forms
A840A87FPhags-pa
A880A8DFSaurashtra
A8E0A8FFDevanagari Extended
A900A92FKayah Li
A930A95FRejang
A960A97FHangul Jamo Extended-A
A980A9DFJavanese
AA00AA5FCham
AA60AA7FMyanmar Extended-A
AA80AADFTai Viet
AB00AB2FEthiopic Extended-A
ABC0ABFFMeetei Mayek
AC00D7AFHangul Syllables
D7B0D7FFHangul Jamo Extended-B
D800DB7FHigh Surrogates
DB80DBFFHigh Private Use Surrogates
DC00DFFFLow Surrogates
E000F8FFPrivate Use Area
F900FAFFCJK Compatibility Ideographs
FB00FB4FAlphabetic Presentation Forms
FB50FDFFArabic Presentation Forms-A
FE00FE0FVariation Selectors
FE10FE1FVertical Forms
FE20FE2FCombining Half Marks
FE30FE4FCJK Compatibility Forms
FE50FE6FSmall Form Variants
FE70FEFFArabic Presentation Forms-B
FF00FFEFHalfwidth and Fullwidth Forms
FFF0FFFFSpecials
100001007FLinear B Syllabary
10080100FFLinear B Ideograms
101001013FAegean Numbers
101401018FAncient Greek Numbers
10190101CFAncient Symbols
101D0101FFPhaistos Disc
102801029FLycian
102A0102DFCarian
103001032FOld Italic
103301034FGothic
103801039FUgaritic
103A0103DFOld Persian
104001044FDeseret
104501047FShavian
10480104AFOsmanya
108001083FCypriot Syllabary
108401085FImperial Aramaic
109001091FPhoenician
109201093FLydian
10A0010A5FKharoshthi
10A6010A7FOld South Arabian
10B0010B3FAvestan
10B4010B5FInscriptional Parthian
10B6010B7FInscriptional Pahlavi
10C0010C4FOld Turkic
10E6010E7FRumi Numeral Symbols
110001107FBrahmi
11080110CFKaithi
12000123FFCuneiform
124001247FCuneiform Numbers and Punctuation
130001342FEgyptian Hieroglyphs
1680016A3FBamum Supplement
1B0001B0FFKana Supplement
1D0001D0FFByzantine Musical Symbols
1D1001D1FFMusical Symbols
1D2001D24FAncient Greek Musical Notation
1D3001D35FTai Xuan Jing Symbols
1D3601D37FCounting Rod Numerals
1D4001D7FFMathematical Alphanumeric Symbols
1F0001F02FMahjong Tiles
1F0301F09FDomino Tiles
1F0A01F0FFPlaying Cards
1F1001F1FFEnclosed Alphanumeric Supplement
1F2001F2FFEnclosed Ideographic Supplement
1F3001F5FFMiscellaneous Symbols And Pictographs
1F6001F64FEmoticons
1F6801F6FFTransport And Map Symbols
1F7001F77FAlchemical Symbols
200002A6DFCJK Unified Ideographs Extension B
2A7002B73FCJK Unified Ideographs Extension C
2B7402B81FCJK Unified Ideographs Extension D
2F8002FA1FCJK Compatibility Ideographs Supplement
E0000E007FTags
E0100E01EFVariation Selectors Supplement
F0000FFFFFSupplementary Private Use Area-A
10000010FFFFSupplementary Private Use Area-B

Klik op een bloknaam om het codetabelfragment te bekijken.