Commit | Line | Data |
---|---|---|
77514391 MCC |
1 | Unicode support |
2 | =============== | |
3 | ||
1da177e4 LT |
4 | Last update: 2005-01-17, version 1.4 |
5 | ||
6 | This file is maintained by H. Peter Anvin <unicode@lanana.org> as part | |
7 | of the Linux Assigned Names And Numbers Authority (LANANA) project. | |
8 | The current version can be found at: | |
9 | ||
8c27ceff | 10 | http://www.lanana.org/docs/unicode/admin-guide/unicode.rst |
1da177e4 | 11 | |
7d56f0fa S |
12 | Introduction |
13 | ------------ | |
1da177e4 LT |
14 | |
15 | The Linux kernel code has been rewritten to use Unicode to map | |
16 | characters to fonts. By downloading a single Unicode-to-font table, | |
17 | both the eight-bit character sets and UTF-8 mode are changed to use | |
18 | the font as indicated. | |
19 | ||
20 | This changes the semantics of the eight-bit character tables subtly. | |
21 | The four character tables are now: | |
22 | ||
77514391 | 23 | =============== =============================== ================ |
1da177e4 | 24 | Map symbol Map name Escape code (G0) |
77514391 | 25 | =============== =============================== ================ |
1da177e4 LT |
26 | LAT1_MAP Latin-1 (ISO 8859-1) ESC ( B |
27 | GRAF_MAP DEC VT100 pseudographics ESC ( 0 | |
28 | IBMPC_MAP IBM code page 437 ESC ( U | |
29 | USER_MAP User defined ESC ( K | |
77514391 | 30 | =============== =============================== ================ |
1da177e4 LT |
31 | |
32 | In particular, ESC ( U is no longer "straight to font", since the font | |
33 | might be completely different than the IBM character set. This | |
34 | permits for example the use of block graphics even with a Latin-1 font | |
35 | loaded. | |
36 | ||
37 | Note that although these codes are similar to ISO 2022, neither the | |
38 | codes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and | |
39 | G1), whereas ISO 2022 has four 7-bit codes (G0-G3). | |
40 | ||
41 | In accordance with the Unicode standard/ISO 10646 the range U+F000 to | |
42 | U+F8FF has been reserved for OS-wide allocation (the Unicode Standard | |
43 | refers to this as a "Corporate Zone", since this is inaccurate for | |
44 | Linux we call it the "Linux Zone"). U+F000 was picked as the starting | |
45 | point since it lets the direct-mapping area start on a large power of | |
46 | two (in case 1024- or 2048-character fonts ever become necessary). | |
47 | This leaves U+E000 to U+EFFF as End User Zone. | |
48 | ||
49 | [v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been | |
50 | hard-coded to map directly to the loaded font, bypassing the | |
51 | translation table. The user-defined map now defaults to U+F000 to | |
52 | U+F0FF, emulating the previous behaviour. In practice, this range | |
53 | might be shorter; for example, vgacon can only handle 256-character | |
54 | (U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts. | |
55 | ||
56 | ||
57 | Actual characters assigned in the Linux Zone | |
58 | -------------------------------------------- | |
59 | ||
60 | In addition, the following characters not present in Unicode 1.1.4 | |
61 | have been defined; these are used by the DEC VT graphics map. [v1.2] | |
62 | THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW. | |
63 | ||
77514391 | 64 | ====== ====================================== |
1da177e4 LT |
65 | U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1 |
66 | U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3 | |
67 | U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7 | |
68 | U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9 | |
77514391 | 69 | ====== ====================================== |
1da177e4 LT |
70 | |
71 | The DEC VT220 uses a 6x10 character matrix, and these characters form | |
72 | a smooth progression in the DEC VT graphics character set. I have | |
73 | omitted the scan 5 line, since it is also used as a block-graphics | |
74 | character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL. | |
75 | ||
76 | [v1.3]: These characters have been officially added to Unicode 3.2.0; | |
77 | they are added at U+23BA, U+23BB, U+23BC, U+23BD. Linux now uses the | |
78 | new values. | |
79 | ||
80 | [v1.2]: The following characters have been added to represent common | |
81 | keyboard symbols that are unlikely to ever be added to Unicode proper | |
82 | since they are horribly vendor-specific. This, of course, is an | |
83 | excellent example of horrible design. | |
84 | ||
77514391 | 85 | ====== ====================================== |
1da177e4 LT |
86 | U+F810 KEYBOARD SYMBOL FLYING FLAG |
87 | U+F811 KEYBOARD SYMBOL PULLDOWN MENU | |
88 | U+F812 KEYBOARD SYMBOL OPEN APPLE | |
89 | U+F813 KEYBOARD SYMBOL SOLID APPLE | |
77514391 | 90 | ====== ====================================== |
1da177e4 LT |
91 | |
92 | Klingon language support | |
93 | ------------------------ | |
94 | ||
95 | In 1996, Linux was the first operating system in the world to add | |
96 | support for the artificial language Klingon, created by Marc Okrand | |
97 | for the "Star Trek" television series. This encoding was later | |
98 | adopted by the ConScript Unicode Registry and proposed (but ultimately | |
99 | rejected) for inclusion in Unicode Plane 1. Thus, it remains as a | |
100 | Linux/CSUR private assignment in the Linux Zone. | |
101 | ||
102 | This encoding has been endorsed by the Klingon Language Institute. | |
103 | For more information, contact them at: | |
104 | ||
105 | http://www.kli.org/ | |
106 | ||
107 | Since the characters in the beginning of the Linux CZ have been more | |
108 | of the dingbats/symbols/forms type and this is a language, I have | |
109 | located it at the end, on a 16-cell boundary in keeping with standard | |
110 | Unicode practice. | |
111 | ||
77514391 MCC |
112 | .. note:: |
113 | ||
114 | This range is now officially managed by the ConScript Unicode | |
115 | Registry. The normative reference is at: | |
1da177e4 LT |
116 | |
117 | http://www.evertype.com/standards/csur/klingon.html | |
118 | ||
119 | Klingon has an alphabet of 26 characters, a positional numeric writing | |
120 | system with 10 digits, and is written left-to-right, top-to-bottom. | |
121 | ||
122 | Several glyph forms for the Klingon alphabet have been proposed. | |
123 | However, since the set of symbols appear to be consistent throughout, | |
124 | with only the actual shapes being different, in keeping with standard | |
125 | Unicode practice these differences are considered font variants. | |
126 | ||
77514391 | 127 | ====== ======================================================= |
1da177e4 LT |
128 | U+F8D0 KLINGON LETTER A |
129 | U+F8D1 KLINGON LETTER B | |
130 | U+F8D2 KLINGON LETTER CH | |
131 | U+F8D3 KLINGON LETTER D | |
132 | U+F8D4 KLINGON LETTER E | |
133 | U+F8D5 KLINGON LETTER GH | |
134 | U+F8D6 KLINGON LETTER H | |
135 | U+F8D7 KLINGON LETTER I | |
136 | U+F8D8 KLINGON LETTER J | |
137 | U+F8D9 KLINGON LETTER L | |
138 | U+F8DA KLINGON LETTER M | |
139 | U+F8DB KLINGON LETTER N | |
140 | U+F8DC KLINGON LETTER NG | |
141 | U+F8DD KLINGON LETTER O | |
142 | U+F8DE KLINGON LETTER P | |
143 | U+F8DF KLINGON LETTER Q | |
144 | - Written <q> in standard Okrand Latin transliteration | |
145 | U+F8E0 KLINGON LETTER QH | |
146 | - Written <Q> in standard Okrand Latin transliteration | |
147 | U+F8E1 KLINGON LETTER R | |
148 | U+F8E2 KLINGON LETTER S | |
149 | U+F8E3 KLINGON LETTER T | |
150 | U+F8E4 KLINGON LETTER TLH | |
151 | U+F8E5 KLINGON LETTER U | |
152 | U+F8E6 KLINGON LETTER V | |
153 | U+F8E7 KLINGON LETTER W | |
154 | U+F8E8 KLINGON LETTER Y | |
155 | U+F8E9 KLINGON LETTER GLOTTAL STOP | |
156 | ||
157 | U+F8F0 KLINGON DIGIT ZERO | |
158 | U+F8F1 KLINGON DIGIT ONE | |
159 | U+F8F2 KLINGON DIGIT TWO | |
160 | U+F8F3 KLINGON DIGIT THREE | |
161 | U+F8F4 KLINGON DIGIT FOUR | |
162 | U+F8F5 KLINGON DIGIT FIVE | |
163 | U+F8F6 KLINGON DIGIT SIX | |
164 | U+F8F7 KLINGON DIGIT SEVEN | |
165 | U+F8F8 KLINGON DIGIT EIGHT | |
166 | U+F8F9 KLINGON DIGIT NINE | |
167 | ||
168 | U+F8FD KLINGON COMMA | |
169 | U+F8FE KLINGON FULL STOP | |
170 | U+F8FF KLINGON SYMBOL FOR EMPIRE | |
77514391 | 171 | ====== ======================================================= |
1da177e4 LT |
172 | |
173 | Other Fictional and Artificial Scripts | |
174 | -------------------------------------- | |
175 | ||
176 | Since the assignment of the Klingon Linux Unicode block, a registry of | |
177 | fictional and artificial scripts has been established by John Cowan | |
178 | <jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>. | |
179 | The ConScript Unicode Registry is accessible at: | |
180 | ||
181 | http://www.evertype.com/standards/csur/ | |
182 | ||
183 | The ranges used fall at the low end of the End User Zone and can hence | |
184 | not be normatively assigned, but it is recommended that people who | |
185 | wish to encode fictional scripts use these codes, in the interest of | |
186 | interoperability. For Klingon, CSUR has adopted the Linux encoding. | |
187 | The CSUR people are driving adding Tengwar and Cirth into Unicode | |
188 | Plane 1; the addition of Klingon to Unicode Plane 1 has been rejected | |
189 | and so the above encoding remains official. |