The i18n Cookbook - recipies for a global society

  • java cookbook
  • about the author
Home › Java Internationalization Cookbook › Unicode, Transliteration, and Charactersets

Java Cookbook

  • Java Internationalization Cookbook
    • Locales
    • Dates and Times
    • Numerical Systems
    • Misc
    • Resource Bundles
    • Unicode, Transliteration, and Charactersets
      • Convert text from one script to another
      • Detect the Charset of a URL
      • Get Transliterators available source ids
      • Get all available transliterator ids
      • Get available target ids for a Transliterator source id
      • Read a Unicode file
      • Write a Shift_JIS Japanese file

Convert text from one script to another

Problem:

You want to convert the script text is written in.

Solution:

The Transliterator from icu4j class allows for easy conversion of text from one writing system to another.  To transliterate text you simply get an instance of the Transliterator class using the id of the writing system you are converting from, and the id of the system you want to convert into, and then call the transliterate method on the object.

 

It is important to note that not all writing systems contain all sounds, nor do they handle sounds in the same way.  This means that round trip transliteration is often faulty.

 

To transliterate text from English to Hangul:

//Get a Transliterator instance for converting Latin script to Hangul (Korean) script
Transliterator trans = Transliterator.getInstance("Latin-Hangul");
//The text to transliterate
String txt = "Transliteration is very cool.";
//Output the example text
System.out.println(txt);
//Transliterate the text
String korean = trans.transliterate(txt);
//Output the Hangul text
System.out.println("To Hangul: " + korean);
//Get an instance of a transliterator going in the reverse.
//This is the same as calling Transliterator.getInstance("Hangul-Latin");
trans = Transliterator.getInstance("Latin-Hangul",Transliterator.REVERSE);
//Output the transliterated English value
//Note the English doesn't match the original value exactly. 
//This is due to the different sounds available in each script
System.out.println("Back to English: " + trans.transliterate(korean));

 


The output:

Transliteration is very cool.
To Hangul: 트란스리테라티온 잇 베류 초올.
Back to English: teulanseulitelation is belyu chool.

‹ Unicode, Transliteration, and Charactersets up Detect the Charset of a URL ›
  • Java
  • transliterate
  • transliterator
  • Printer-friendly version
  • Add new comment

If you are testing any of these recipes in Eclipse and the characters are not displaying correctly in your console visit http://i18ncookbook.com/eclipse_settings.

This site is ad supported.  I hope you find something among our sponsors worth clicking. ;)

i18n search

Google
Custom Search

Search

Tags in Tags

calendar date icu4j Java Locale number format numberformat parse spellout timezone transliteration transliterator
more tags

User login

  • Create new account
  • Request new password
  • java cookbook
  • about the author