Handling Multilingual Data: Background for Database Designers --- return to Main Handouts Page |
- | Item Description | Access |
---|---|---|
Display and/or Download |
Exploring Alphabets
( i18n Database Design Note #1 -- A,B,C's, Alphabets, Abjads and Abugidas )
Before dealing with mixed Scripts and Languages, you need to know that the common word Alphabet doesn't quite mean the same thing in other Languages. This Note may help expose some implications for your upgrade. |
Display * |
Display and/or Download |
Exploring Complex Text Layout
( i18n Database Design Note #2 -- Handling all those "Alphabets" )
If you've only used simple alphabets, where each letter is an independent entity, and text runs from left to right, you'll need to become more aware of the variety of issues you may encounter in a multi-lingual database. This introduction will help by presenting examples of Thai, Devanagari, Arabic, Hebrew and Korean Scripts in order to see what that variety entails. NO KNOWLEDGE OF ANY LANGUAGES USING THESE SCRIPTS IS REQUIRED TO HANDLE THEM EFFECTIVELY IN YOUR DATABASE DESIGNS!! (We show you how!) |
Display * |
Display and/or Download |
Exploring UTF-8
( i18n Database Design Note #3 -- Efficient Storage of Multi-Lingual/Multi-Script Data )
The UTF-8 formatting of Unicode values has proven to be the best compromise between efficient storage and the ability to easily handle multiple scripts, but there are implications for your schema. This will explain why that is. |
Display * |
Display and/or Download |
Evaluating Fonts for use in Multi-Lingual Documents
( i18n Database Design Note #4 -- Displaying the Data )
As you add data in new Languages and Scripts to your database, you'll soon learn that many apps can't display that data well (or at all). Here's why. A Bash Script is included to assist in evaluating your font collection. |
Display * |
Display and/or Download |
Exploring Bi-Directional Text Entry
( i18n Database Design Note #5 -- Entering the Data )
Entering or editing text in multiple Languages and Scripts can be confusing, particularly if bi-directional text (bidi) is to be entered in the same paragraph or sentence. This tutorial is intended to clarify that process. |
Display * |
Display and/or Download |
Exploring Arabic Script Behavior
( i18n Database Design Note #6 -- Unique Behavior of Arabic Script )
Arabic Script, used by millions of people to write many unrelated Languages, was introduced earlier, but the complexity that makes Arabic Script so elegant to look at can also make its behavior in your systems difficult to evaluate, particularly if you are unfamiliar with any of those Languages. This provides a helpful overview. |
Display * |
Display and/or Download |
Exploring Han-Chinese Script Behavior
( i18n Database Design Note #7 -- Unique Qualities of Syllabic Scripts )
Well over a Billion people write their languages using phonetic syllabaries - the oldest and most common being what are called Hànzì by Chinese speakers and Kanji by the Japanese. The ability to read at least several hundred of these symbols is required for someone to be considered literate, but there are more than twenty thousand in use. So what sort of keyboard can produce all of these? And how are words in these languages sorted? With a significant part of the world's population and economy centered in countries using syllabaries, a basic understanding of how such Scripts are handled in a database is appropriate. This Note provides that. |
Display * |
Display and/or Download |
IME Keyboard Layouts - Hello, World
( i18n Database Design Note #8 -- IME Keyboard Layouts - Hello, World )
The ability to enter and analyze text in a variety of Languages and Writing Systems is a requirement for those who maintain multi-lingual data stores. This Note will show how to use some representative Input Method Editors (IMEs) to do that - providing examples in a dozen languages common to business and/or academia that can be used by data managers who have no fluency in those Languages themselves. |
Display * |
Display and/or Download |
Exploring_Tones
( i18n Database Design Note #10 -- Exploring Tones ) ---- 𝄞 𝅘𝅥 𝅘𝅥𝅮 ----
Tones are critical to many spoken languages, but because data managers concentrate on storage rather than speech, tones have not been considered in earlier notes of this series. As it happens, however, an exposure to tones is not only helpful in gaining familiarity with these Scripts and Languages, but some traditional phrases used as examples for teaching tone usage to non-native speakers can also provide an entertaining diversion. |
Display * |
* Click to View the document; OR Right-Click, then "Save link as ..." to download (or Use browser control while viewing). |
Questions or Comments? [ NO LONGER AVAILABLE ] |