KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.

KOI8-RU is closely related, but adds Ў for Belarusian. In both, the letter allocations match those in KOI8-E, except for Ґ which is added to KOI8-F.

In Microsoft Windows, KOI8-U is assigned the code page number 21866. In IBM, KOI8-U is assigned code page/CCSID 1168.

KOI8 remains much more commonly used than ISO 8859-5, which never really caught on.[citation needed] Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.

KOI8 stands for Kod Obmena Informatsiey, 8 bit (Russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".

The KOI8 character sets have the property that the Cyrillic letters are in pseudo-Latin alphabetic order rather than Cyrillic alphabetical order as in ISO 8859-5. This has the useful effect that if the eighth bit is stripped and the text is presented in any character set based on ASCII including the KOI8 sets themselves, the text is still reasonably human readable as a case-reversed transliteration. For instance, the "KOI" acronym "Код Обмена Информацией" becomes kOD oBMENA iNFORMACIEJ.

Character set

The following table shows the KOI8-U encoding. Each character is shown with its equivalent Unicode code point.

KOI8-U
0123456789ABCDEF
0x
1x
2xSP!"#$%&'()*+,-./
3x0123456789:;<=>?
4x@ABCDEFGHIJKLMNO
5xPQRSTUVWXYZ[\]^_
6x`abcdefghijklmno
7xpqrstuvwxyz{|}~
8x─2500│2502┌250C┐2510└2514┘2518├251C┤2524┬252C┴2534┼253C▀2580▄2584█2588▌258C▐2590
9x░2591▒2592▓2593⌠2320■25A0∙2219√221A≈2248≤2264≥2265NBSP⌡2321°00B0²00B2·00B7÷00F7
Ax═2550║2551╒2552ё0451є0454╔2554і0456ї0457╗2557╘2558╙2559╚255A╛255Bґ0491╝255D╞255E
Bx╟255F╠2560╡2561Ё0401Є0404╣2563І0406Ї0407╦2566╧2567╨2568╩2569╪256AҐ0490╬256C©00A9
Cxю044Eа0430б0431ц0446д0434е0435ф0444г0433х0445и0438й0439к043Aл043Bм043Cн043Dо043E
Dxп043Fя044Fр0440с0441т0442у0443ж0436в0432ь044Cы044Bз0437ш0448э044Dщ0449ч0447ъ044A
ExЮ042EА0410Б0411Ц0426Д0414Е0415Ф0424Г0413Х0425И0418Й0419К041AЛ041BМ041CН041DО041E
FxП041FЯ042FР0420С0421Т0422У0423Ж0416В0412Ь042CЫ042BЗ0417Ш0428Э042DЩ0429Ч0427Ъ042A

Although RFC 2319 says that character 0x95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.

Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).

See also

Further reading

  • Flohr, Guido (2016) [2006]. . CPAN libintl-perl. 1.1. from the original on 2017-01-15.
  • RFC
  • . Kermit. Columbia University.
  • Leishner, Mark (2008) [1999-12-20]. . Department of Mathematical Sciences, New Mexico State University. from the original on 2017-02-19.
  • Kornai, Andras; Birnbaum, David J.; da Cruz, Frank; Davis, Bur; Fowler, George; Paine, Richard B.; Paperno, Slava; Simonsen, Keld J.; Thobe, Glenn E.; Vulis, Dimitri; van Wingen, Johan W. (1993-03-13). . 1.3. from the original on 2017-02-18.

External links

  • Czyborra, Roman (1998-11-30) [1998-05-25]. . from the original on 2016-12-03.
  • Hohlov, Yu. E. . from the original on 2016-12-05.
  • Nechayev, Valentin (2013) [2001]. . from the original on 2016-12-05.