THIS IS AN OUTDATED VERSION OF THIS PAGE. SEE THE CURRENT ONE AT:
http://kermitproject.org/utf8.html
¥ · £ · € · $ · ¢ · ₡ · ₢ · ₣ · ₤ · ₥ · ₦ · ₧ · ₨ · ₩ · ₪ · ₫ · ₭ · ₮ · ₯ · ₹
Frank da Cruz
Data for this table is available as a comma-separated, UTF-8 text file. This, and other examples, including examples for Supplementary Plane characters can be found in the navigation table in the ' Introduction To The Unicode Examples For Business Usage '.
UTF-8 is an ASCII-preserving encoding method forUnicode (ISO 10646), the Universal Character Set(UCS). The UCS encodes most of the world's writing systems in a singlecharacter set, allowing you to mix languages and scripts within a documentwithout needing any tricks for switching character sets. This web page isencoded directly in UTF-8.
As shown HERE,Columbia University's Kermit 95 terminal emulationsoftware can display UTF-8 plain text in Windows 95, 98, ME, NT, XP, Vista,or Windows 7 when using a monospace Unicode font like Andale Mono WT J or Everson Mono Terminal, or the lesserpopulated Courier New, Lucida Console, or Andale Mono. C-Kermit can handle it too,if you have a Unicodedisplay. As many languages as are representable in your font can be seenon the screen at the same time.
This, however, is a Web page, which started out as a kind of stress test forUTF-8 support in Web browsers, which was spotty when this page was firstcreated but which has become standard in all modern browsers. The problemnow is mainly the fonts and the browser's (or font's) support for thenonzero Unicode planes (as in, e.g., the Braille and Gothic examples below).And to some extent the rendition of combining sequences, right-to-leftrendition (Arabic, Hebrew), and so on. CLICK HERE for asurvey of Unicode fonts for Windows.
The subtitle above shows currency symbols of many lands. If they don'tappear as blobs, we're off to a good start! (The one on the end is thenew Indian Rupeesign which won't show up in fonts for a while.)
PoetryFrom the Anglo-Saxon Rune Poem (Rune version):ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ
From Laȝamon'sBrut(The Chronicles of England, Middle English, West Midlands):
An preost wes on leoden, Laȝamon was ihoten
(The third letter in the author's name is Yogh, missing from many fonts;CLICK HERE for another Middle English samplewith some explanation of letters and encoding).
From the Tagelied of Wolfram von Eschenbach (Middle High German):
Sîne klâwen durh die wolken sint geslagen,
Some lines of Odysseus Elytis (Greek):
The first stanza of Pushkin's Bronze Horseman (Russian):
На берегу пустынных волн
Šota Rustaveli's Veṗxis Ṭq̇aosani,̣︡Th, The Knight in the Tiger's Skin (Georgian):
ვეპხის ტყაოსანიშოთა რუსთაველი
ღმერთსი შემვედრე, ნუთუ კვლა დამხსნას სოფლისა შრომასა,ცეცხლს, წყალსა და მიწასა, ჰაერთა თანა მრომასა;მომცნეს ფრთენი და აღვფრინდე, მივჰხვდე მას ჩემსა ნდომასა,დღისით და ღამით ვჰხედვიდე მზისა ელვათა კრთომაასა.
Tamil poetry of Subramaniya Bharathiyar:சுப்ரமணிய பாரதியார் (1882-1921):
யாமறிந்த மொழிகளிலே தமிழ்மொழி போல் இனிதாவது எங்கும் காணோம்,
Kannada poetry by Kuvempu — ಬಾ ಇಲ್ಲಿ ಸಂಭವಿಸು
ಬಾ ಇಲ್ಲಿ ಸಂಭವಿಸು ಇಂದೆನ್ನ ಹೃದಯದಲಿನಿತ್ಯವೂ ಅವತರಿಪ ಸತ್ಯಾವತಾರ
ಮಣ್ಣಾಗಿ ಮರವಾಗಿ ಮಿಗವಾಗಿ ಕಗವಾಗೀ...
ಮಣ್ಣಾಗಿ ಮರವಾಗಿ ಮಿಗವಾಗಿ ಕಗವಾಗಿ ಭವ ಭವದಿ ಭತಿಸಿಹೇ ಭವತಿ ದೂರ ನಿತ್ಯವೂ ಅವತರಿಪ ಸತ್ಯಾವತಾರ || ಬಾ ಇಲ್ಲಿ || I Can Eat GlassAnd from the sublime to the ridiculous, here is acertain phrase¹ in an assortment of languages:
(Additions, corrections, completions,gratefuly accepted.)
For testing purposes, some of these are repeated in a monospace font . . .
The Quick Brown Fox... PangramsThe 'I can eat glass' sentences do not necessarily show off the orthography ofeach language to best advantage. In many alphabetic written languages it ispossible to include all (or most) letters (or 'special' characters) ina single (often nonsense) pangram. These were traditionally used intypewriter instruction; now they are useful for stress-testing computer fontsand keyboard input methods. Here are a few examples (SEND MORE):
Accented Cyrillic:
(This section contributed by Vladimir Marinov.)
In Bulgarian it is desirable, customary, or in some cases required towrite accents over vowels. Unfortunately, no computer character setscontain the full repertoire of accented Cyrillic letters. With Unicode,however, it is possible to combine any Cyrillic letter with any combiningaccent. The appearance of the result depends on the font and the renderingengine. Here are two examples.
HTML FeaturesHere is the Russian alphabet (uppercase only) coded in threedifferent ways, which should look identical:
In another test, we use HTML language tags to distinguish Bulgarian, Russian,and Serbian, which have different italic forms for lowercaseб, г, д, п, and/or т:
Credits, Tools, and Commentary
Comments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |