Java Internationalization Cookbook

Internationalization and localization are ever more important areas of software development.  As more and more people in Asia and around the world have access to computers and the internet a company that does not meet the needs of those users stands to lose.  Java provides a number of tools to help the developer create localized applications.  ICU4J from IBM provides further capabilities.

 

The internet changes everything.  To sell widgets in China takes an enormous amount of money.  You need packaging, distribution channels, and a salesforce.  To open your website to Chinese users is a far simpler matter.  Even so, there is a great amount of confusion about how to accomplish many of the tasks needed to have a global application.  Java provides so many ways to handle localization that it is hard to figure out which is best.

 

I created this site to help other programmers develop global applications, and I chose the cookbook style because it is the easiest way for an experienced developer to get to the information they need.

 

Please feel free to use any of the code you find here.   If you would like to contribute content or have any comments, please contact me.

Locales

The concept of the locale is central to internationalization. It is the foundation for all other aspects of software internationalization.  At its most basic a locale represents the language of a user.  This is handled by combining a language and country code.
 
Java uses ISO 639 codes for languages and ISO 3166-2 codes for country.  ICU4J combines the two codes with a potential script name and appends additional data to the end after an '@'.  That additional data could refer to a calendar type or collation order, etc...
 
For example:
zh = Chinese
zh_HK = Chinese in Hong Kong
zh_Hans_HK = Chinese in Hong Kong using Simplified Han writing system
zh_Hans_HK@calendar=buddhist = Chinese in Hong Kong using a Simplifies Han writing system and using a Buddhist calendar
 

How to make a Locale object using language and country arguments

Problem:

You want to create a representation of your user's language and country.

Solution:

A Java locale represents a unique language and country combination. You create a Locale by specifying an ISO 639 language code and ISO 3166-2 country code in the constructor.

To create a Locale representing the English language:

Locale locale = new Locale("en");

 

To create a Locale representing English as spoken in Great Britain:

Locale locale = new Locale("en","GB");

 

To create a Locale representing English as spoken in Japan using a POSIX variant:

Locale locale = new Locale("en",GB","POSIX");

How to use a static constant to retrieve a common locale

Problem:

You want to use a common locale without having to create a new locale object.

Solution:

Many common locales are stored in static constants within the Locale class for convenience. To retrieve one simply use the locale name attached to the class name. To get a Japanese locale: Locale myLocale = Locale.JAPANESE; To get a Locale for the country of Japan: Locale myLocale = Locale.JAPAN; To get a simplified Chinese Locale: Locale myLocale = Locale.SIMPLIFIED_CHINESE; The available constants as of 1.5 are:

  • CANADA
  • CANADA_FRENCH
  • CHINA
  • CHINESE
  • ENGLISH
  • FRANCE
  • FRENCH
  • GERMAN
  • GERMANY
  • ITALIAN
  • ITALY
  • JAPAN
  • JAPANESE
  • KOREA
  • KOREAN
  • PRC
  • SIMPLIFIED_CHINESE
  • TAIWAN
  • TRADITIONAL_CHINESE
  • UK
  • US

How to get an array of available lLocales

Problem:

You want to list all locales your system supports.

Solution:

To get an array of all installed locales you can use the static getAvailableLocales method. This retrieves the full locales, and not only their ids.

To loop through available locales and output their ids:

Locale[] locales = Locale.getAvailableLocales();

for(int x = 0; x < locales.length; x++){

System.out.println(locales[x].toString());

}

Get all ISO language and country codes

Problem:

You want to retrieve all ISO639 language codes or ISO3166 region codes.

Solution:

You can retrieve both ISO 639 language codes and ISO 3166 country codes by calling static methods on the Locale class.

To retrieve the ISO 639 language codes:

String[] locales = Locale.getISOLanguages();

for(int x = 0; x < locales.length; x++){

  System.out.println(locales[x]);

}

 

To retrieve the ISO 3166 country codes:

String[] locales = Locale.getISOCountries();

for(int x = 0; x < locales.length; x++){

  System.out.println(locales[x]);

}

Get localized display names

Problem:

You want to retrieve a translated (localized) display name for a language, region, or locale.

Solution:

The Locale class contains methods to retrieve the localized display names for languages and countries. To retrieve a display name use the getDisplayLanguage, getDisplayCountry, and getDisplayName methods.

To get the display name for the language, country or locale using the system locale:

Locale[] locales = Locale.getAvailableLocales();

for(int x = 0; x < locales.length; x++){

  System.out.println(locales[x].getDisplayLanguage() + " - " + locales[x].getDisplayCountry() + " - " + locales[x].getDisplayName());

}

 

To retrieve a localized version of the display names you can specify a Locale as an argument to the method. This Locale is used to retrieve the appropriate display name.

Locale[] locales = Locale.getAvailableLocales();

for(int x = 0; x < locales.length; x++){

  System.out.println(locales[x].getDisplayLanguage(locales[x]) + " - " + locales[x].getDisplayCountry(locales[x]) + " - " + locales[x].getDisplayName(locales[x]));

}

 

To loop through all the locales again, but get a Japanese display name:

Locale[] locales = Locale.getAvailableLocales();

Locale japanese = Locale.JAPANESE;

for(int x = 0; x < locales.length; x++){

  System.out.println(locales[x].getDisplayLanguage(japanese) + " - " + locales[x].getDisplayCountry(japanese) + " - " + locales[x].getDisplayName(japanese));

}

Create an ICU4J ULocale

Problem:

You want more options and power than provided by a core Java locale.

Solution:

ICU4J provides a locale class that provides significantly more capabilites than the Locale class in core Java.  ULocale is the foundation of all internationalization classes in ICU4J.

 

ULocale is defined with at minimum a language code.  It can also contain script name, region code, and other locale specific meta data such as calendar type.

 

To get an English ULocale:

ULocale english = new ULocale("en");

 

Dates and Times

Calendars and date formatting are central to a properly internationalized application.  Java provides several classes that are very useful to the localization developer for handling dates and times. 
 
Java's Calendar class and its sub-classes handle the complex date calculations that go with Chinese, Islamic, Hebrew, and Gregorian calendars, as well as a number of others.
 
Java's DateFormat class handles formatting dates and times in a locale appropriate manner.
 
The DateFormatSymbols class contains localized information for date parts like weekdays, eras, months, etc...

Calendars

Most Java developers are familiar with the Gregorian calendar and perhaps even Julian Day, but to have a well localized application it is often neccessary to go beyond that.
 
To do business across Asia you need to be able to handle the Chinese, Japanese, and Buddhist calendars.  Muslim nations require an Islamic Calendar, and Israel needs a Hebrew calendar.  Each of these poses problems for the internationalization developer, but thankfully core Java and, to a greater extent, ICU provide some excellent capabilities for handling diverse calendars.

Gregorian Calendar

Problem:

You want to create a standard calendar as used in the west.

Solution:

The Gregorian Calendar is the predominant calendar of western countries today. To create a new Gregorian Calendar call the Singleton getInstance method on the Calendar class. if you specify a Locale as an argument the Calendar returned will reflect the first day of the week etc. of the Locale.

Locale locale = new Locale("en","US");
Calendar calendar = Calendar.getInstance(locale);

 

Hebrew Calendar

 Problem:

You want to use a Hebrew Calendar in your application.

Solution:

The Hebrew calendar is a very interesting lunar-solar calendar that poses some interesting challenges for the pogrammer.  It uses a leap month to synch the lunar and solar components.  The first day of the year can vary depending upon how it will affect jewish holidays, and the day ends at sundown.

 

Thankfully icu4j provides an excellent Hebrew calendar.  It is a simple manner to create a calendar, and the usage is similar to a standard Gregorian calendar.

 

To get an instance of a Hebrew calendar simply call the getInstance method passing in a ULocale that has a calendar type specified.

//Specify the calendar type on the ULocale

ULocale he = new ULocale("he_IL@calendar=hebrew");

//Call getInstance passing the ULocale

Calendar cal = Calendar.getInstance(he);

 

 

To get the months of the Jewish Calendar get an instance of DateFormatSymbols with a ULocale specifying the Hebrew Calendar.

//Get a Hebrew calendar for a Hebrew locale

ULocale he = new ULocale("he_IL@calendar=hebrew");

//Get a Hebrew calendar on an English locale

ULocale en = new ULocale("en@calendar=hebrew");

//Get DateFormatSymols for the locales

DateFormatSymbols dfsHe = DateFormatSymbols.getInstance(he);

DateFormatSymbols dfsEn = DateFormatSymbols.getInstance(en);

//Get arrays of months

String[] hebrewMonths = dfsHe.getMonths();

String[] englishMonths = dfsEn.getMonths();

//loop through the arrays and print out the English and Hebrew months

for(int x = 0; x < hebrewMonths.length; x++){

   System.out.println(englishMonths[x] + " = " + hebrewMonths[x]);

}

 

The output is:

 

Tishri = תשרי

Heshvan = חשון

Kislev = כסלו

Tevet = טבת

Shevat = שבט

Adar I = אדר ראשון

Adar = אדר שני

Nisan = ניסן

Iyar = אייר

Sivan = סיון

Tamuz = תמוז

Av = אב

Elul = 

 

 

 

 

 

Japanese Calendar

 Problem:

You want a calendar that is most familiar to your Japanese users.

Solution:

The Japanese calendar is identical to the Gregorian calendar, except the year is represented by the year of the emperor's reign.  It is currently the 20th year of the Heisei emperor's reign, so the year is Heisei 20.

You can create a Japanese calendar by using icu4j .

To create a calendar object:

ULocale ul = new ULocale("ja_JP@calendar=japanese");

 

To format a simple date example:

ULocale ul = new ULocale("en@calendar=japanese");

DateFormat df = DateFormat.getDateInstance(DateFormat.FULL, ul);

System.out.println(df.format(new Date()));

 

 The result is "Friday, November 28, 20 Heisei".  If we change the ULocale to japanese the output is "平成20年11月28日金曜日"

To get the era of the current date:

//Get Japanese locale with attribute specifying japanese calendar

ULocale ul = new ULocale("ja_JP@calendar=japanese"); 

//calling getInstance using the locale bearing a calendar type will give us the appropriate locale

Calendar cal = Calendar.getInstance(ul);

//Output the era number.  This is a largely arbitrary number, but it can be used to compare

//against the static constants on Japanese Calendar

System.out.println(cal.get(JapaneseCalendar.ERA) + " Heisei era = " + JapaneseCalendar.HEISEI);

 

Chinese Calendar

Problem:

You want to create a Chinese calendar.

Solution:

Use icu4j to create a calendar instance.

//Get a ULocale for Chinese written in simplified Chinese for China using a Chinese calendar
ULocale chinese = new ULocale("zh_Hans_CN@calendar=chinese");
//Get a Calendar instance specifying our ULocale.  This will give us a Chinese calendar
Calendar c = Calendar.getInstance(chinese);
//Get a DateTimeFormat appropriate for the calendar.
DateFormat df = c.getDateTimeFormat(DateFormat.FULL, DateFormat.FULL, chinese);
//Output a formatted date.
System.out.println(df.format(new Date()));
//Output the pattern.
System.out.println(((SimpleDateFormat)df).toPattern());

 

Discussion:

The Chinese calendar is one of the most interesting and accurate calendars in use.  It is a lunisolar calendar that introduces a leap month to handle the justification of the lunar and solar systems.  The months are lunar months, but the leap month is added to avoid the calendar diverging from the solar cycle.  The calendar is still in wide use across Asia.

 

The Chinese calendar uses a system of measuring years in 60 year cycles.  The cycles are broken into earthly and heavenly stems of the elements and branches with animal names. 

 

Icu4j offers and excellent implementation of the Chinese calendar. It is very easy to implement.  By default it will output the year as "year x cycle".  I will demonstrate in another recipe how to create a more interesting format.

Find the Chinese zodiac for a Gregorian year

Problem:

You want to know the Chinese zodiac sign for a Gregorian year.

Solution:

Use the icu4j Chinese calendar to get the appropriate branch.  The following code illustrates and example:

//An array of heavenly and earthly elements.  The elements start with wood and end with
//water, but since we are using a modulus to determine the element we have moved earthly water
//to position 0
String[] enElements = {"Water","Wood","Wood","Fire","Fire","Earth",
        "Earth","Metal","Metal","Water"};
String[] zhElements = {"癸","甲","乙","丙","丁","戊","己","庚","辛","壬"};

//The 12 animals of the Chinese calendar.  Rat is first and Boar is last but we have
//moved boar to position zero so our modulus returns a correct result.
String[] enAnimals = {"Boar","Rat","Ox","Tiger","Rabbit","Dragon","Snake",
        "Horse","Sheep","Monkey","Rooster","Dog"};
String[] zhAnimals = {"亥","子","丑","寅","卯","辰","巳",
        "午","未","申","酉","戌"};
//Get an instance of a Chinese calendar by specifying a calendar type of Chinese on our ULocale
ULocale chinese = new ULocale("zh_Hans_CN@calendar=chinese");
Calendar c = Calendar.getInstance(chinese);
//Get a Gregorian calendar instance for setting our dates
Calendar g = Calendar.getInstance();
//We will set the year, month, and day since the stem and branch change on
//Chinese New Year, not January 1st.
//Set the year
g.set(Calendar.YEAR, 2008);
//Set the month.  Remember the month is 0 based so January is 0 and December is 11.
g.set(Calendar.MONTH, 0);
//Set the day of the month
g.set(Calendar.DATE, 7);
//Output the date to confirm
System.out.println(DateFormat.getDateInstance().format(g.getTime()));
//Set the Chinese calendar for your specified date
c.setTimeInMillis(g.getTimeInMillis());
//Get the element stem.  The current year mod 10 will give us a value between 0 and 9.
//We use that value to retrieve the element from the elements array.
String enElement = enElements[c.get(Calendar.YEAR)%10];
String zhElement = zhElements[c.get(Calendar.YEAR)%10];
//Get the animal branch. The current year mod 12 will give us a value between 0 and 11.
//We use that value to retrieve the animal from the animals array.
String enAnimal = enAnimals[c.get(Calendar.YEAR)%12];
String zhAnimal = zhAnimals[c.get(Calendar.YEAR)%12];
System.out.println(enElement + " " + enAnimal);
System.out.println(zhElement + zhAnimal);

Discussion:

The Chinese Calendar uses a 60 year cycle.  The year is represented by a branch representing heavenly and earthly examples for the 5 elements and the 12 animals of the zodiac.  The zodiac changes based upon the Chinese New Year, so the calculations are little more complex and require the month and day to be trully accurate.

 

 

Get the name of the current month

Problem:

You want to retrieve a localized display name of the current month.

Solution:

To get the name of a month you first retrieve an instance of Calendar for the desired locale. You use calendar.get(int) passing in the constant Calendar.MONTH to get the current month of the calendar. You then use that to pull the correct value out of the array of month names returned by getMonths() on an instance of DateFormatSymbols.

To get the display name for the current month:

//Get an instance of Calendar for a Japanese locale

Calendar calendar = Calendar.getInstance(Locale.JAPANESE);

//Get a new DateFormatSymbols object for the Japanese locale

DateFormatSymbols dfs = new DateFormatSymbols(Locale.JAPANESE);

//Get an Array of months

String[] months = dfs.getMonths();

System.out.println(months[calendar.get(Calendar.MONTH)]);


 

To loop through all of the months for a locale:

//Get a new DateFormatSymbols object for the Japanese locale

DateFormatSymbols dfs = new DateFormatSymbols(Locale.FRENCH);

//Get the months String[] months = dfs.getMonths();

//Months are zero based, so loop from 0 to 11

for(int x = 0; x < 12; x++){

  System.out.println(months[x]);

}

Get the first day of the week

Problem:

You want to find the first day of the week for a locale and display it.

Solution:

To get the first day of the week for a calendar in a particular locale you need to first get the Calendar instance by passing in the Locale, then retrieve a DateFormatSymbols object using the same Locale. You can then use the DateFormatSymbols to retrieve an array of weekdays and use the numeric first day of week value returned from the Calendar to get the a human readable first day of week.

 

To get a human readable first day of week for Great Britain.

Locale locale = new Locale("en","GB");

Calendar calendar = Calendar.getInstance(locale);

DateFormatSymbols dfs = new DateFormatSymbols(locale); System.out.println(dfs.getWeekdays()[calendar.getFirstDayOfWeek()]);

 

To do the same for Japan using the static Locale shortcut:

Calendar calendar = Calendar.getInstance(Locale.JAPANESE);

DateFormatSymbols dfs = new DateFormatSymbols(Locale.JAPANESE);

System.out.println(dfs.getWeekdays()[calendar.getFirstDayOfWeek()]);

Add time to Calendar

Problem:

You need to add or subtract time from a calendar.

Solution:

There are two ways to perform date math with a Calendar, use add(int,int) or roll(int,int). Roll will increment or decrement the specified field until the maximum or minimum value is reached, and then it will begin again. Adding 13 months to the calendar with roll will not increment the year.

Add will bubble up such that adding 13 months to a Calendar will also increment the year. To get the current date and add 3 months:

Calendar c = Calendar.getInstance(Locale.GERMAN); System.out.println(c.getTime()); c.add(Calendar.MONTH, 3); System.out.println(c.getTime());

To get the current date and subtract 7 weeks:

Calendar c = Calendar.getInstance(Locale.GERMAN); System.out.println(c.getTime()); c.add(Calendar.WEEK_OF_MONTH, -7); System.out.println(c.getTime());

Get an array of Holidays

 Problem:

You want to know what holidays are common for a locale.

Solution:

Icu4j contains a system to help the Java developer handle international holidays.  The Holiday class is simple to use, and has failry complete data.

 

To get an Array of all Holidays for Mexico:

//Get a ULocale

ULocale locale = new ULocale("es_MX");

//Get an array of Holidays

Holiday[] holidays = Holiday.getHolidays(locale);

//Loop through all of the Holidays and output the localized display name

for(int x = 0; x < holidays.length; x++){

    System.out.println(holidays[x].getDisplayName(locale));

}

 

The output:

 

New Year's Day

Constitution Day

Benito Juárez Day

May Day

Cinco de Mayo

Navy Day

Independence Day

Día de la Raza

All Saints' Day

Day of the Dead

Revolution Day

Flag Day

Christmas

 

 

Find the date for the Chinese New Year

Problem:

You want to find the date of the Chinese new year for a given year.

Solution:

Use icu4j's Chinese calendar to perform the calculations.  You will specify a year in a Gregorian calendar and use that to set the time of the Chinese calendar.  Then set the calendar tot he first month and first day.

//Get a ULocale for Chinese written in simplified Chinese for China using a Chinese calendar
ULocale chinese = new ULocale("zh_Hans_CN@calendar=chinese");
//Get a Calendar instance specifying our ULocale.  This will give us a Chinese calendar
Calendar c = Calendar.getInstance(chinese);
//Get a Gregorian calendar instance
Calendar c2 = Calendar.getInstance();
//Set the year you want to find new Years for.  We will also set the month
//to half way through the year since the Chinese New Year won't occur until perhaps late February
c2.set(Calendar.YEAR, 2008);
c2.set(Calendar.MONTH, 5);
//Set the year of the Chinese Calendar
c.setTimeInMillis(c2.getTimeInMillis());
//Set the month to the first month.  Remember months are zero based
c.set(Calendar.MONTH, 0);
//Set the day to the first.  Remember dates are one based.
c.set(Calendar.DATE, 1);
//Output the date.  getTime() returns a Gregorian date.
System.out.println(c.getTime());

The output:

Thu Feb 07 22:35:16 EST 2008 Shows us the New Years occured on Feb 7th in 2008.  The time output is irrelevant.

 Discussion:

The length of the Chinese calendar's year varies from year to year.  Since the Calendar uses a leap month, the New Year can occur anywhere from late January to late February. 

 

Get all the Era names for the Japanese Calendar

Problem:

You want to retrieve all of the era names for the Japanese calendar.

Solution:

Use the icu4j Japanese calendar and DateFormatSymbols to retrieve the era names.

//Get a ULocale that specifies a Japanese Calendar in Japanese
ULocale jp = new ULocale("ja_JP@calendar=japanese");
//Get a Japanese calendar instance
Calendar jc = Calendar.getInstance(jp);
//The era field contains the numeric identifier for the reigning emperor at the current time
System.out.println(jc.get(Calendar.ERA));
//Get an instance of the DateFormatSymbols for Japanese using the Japanese calendar
DateFormatSymbols dfs = DateFormatSymbols.getInstance(jp);
//getEras() returns a String array of era names
String[] eras = dfs.getEras();
//Loop through the eras and output
for(int x = 0; x < eras.length; x++){
    System.out.println(eras[x]);
}
//Now let's get the English Era names
//Get a ULocale that specifies a Japanese Calendar in English
ULocale en = new ULocale("en@calendar=japanese");
//Get the DateFormatSymbols for English using a Japanese Calendar
dfs = DateFormatSymbols.getInstance(en);
eras = dfs.getEras();
for(int x = 0; x < eras.length; x++){
    System.out.println(eras[x]);
}

Discussion:

The Japanese calendar counts years based upon the reign of the emperor.  Since the reign of an emperor varies in length it is impossible to calculate the year programmatically.  icu4j's DateFormatSymbols can provide us with a list of all Emperors.  To find the Japanese year for a Gregorian date use this recipe.

Get the Japanese era for a Gregorian date

Problem:

You want to know the Japanese era for a Gregorian date.

Solution:

Use a Japanese Calendar and DateFormatSymbols from icu4j to retrieve the Japanese year and era.

//Get a ULocale specifying a locale of Japanese using the Japanese calendar
ULocale jp = new ULocale("en@calendar=japanese");
//Get a calendar instance
Calendar jc = Calendar.getInstance(jp);
//Get a default Gregorian calendar instance
Calendar gc = Calendar.getInstance();
//Set calendar for desired date
gc.set(Calendar.YEAR, 1971);
//The month is zero based
gc.set(Calendar.MONTH, 9);
gc.set(Calendar.DATE, 22);
//Set the Japanese calendar using your Gregorian calendar
jc.setTimeInMillis(gc.getTimeInMillis());
//Get a DateFormat appropriate for the ULocale
DateFormat df = jc.getDateTimeFormat(DateFormat.FULL, DateFormat.FULL, jp);
//Output a formatted date
System.out.println(df.format(jc.getTime()));
//Output the era.
System.out.println(jc.get(Calendar.ERA));
//The era is a numeric value.  To get a meaningful value we need to
//get an instance of DateFormatSymbols and retrieve an array of era names.
//We can then use the era to retrieve the appropriate display name
DateFormatSymbols dfs = DateFormatSymbols.getInstance(jp);
//Concatenate the era and year.
String jYear = dfs.getEras()[jc.get(Calendar.ERA)]+jc.get(Calendar.YEAR);
System.out.println(jYear);

Output:

昭和46年10月22日金曜日 12時53分44秒アメリカ合衆国 (ニューヨーク)
234
昭和46

Discussion:

The Japanese year is represented by the year of the Emperors reign.  This makes it impossible to programmatically calculate the year since the length of an Emperor's reign fluctuates.  We will rely on the data stored in the CLDR and access it via icu4j.

 

Formating dates and times

Problem:

You want to format a date correctly for a locale.

Solution:

Formatting dates and times correctly for a locale can prove challenging, especially considering the variety of possibilities.  Thankfully, Java provides a number of date formatting methods and classes.  icu4j adds even more tools to the programmers toolbox.

 

To format a date you need to get an instance of the DateFormat class passing the locale in:

//Get a locale object.  In this case we could also use the static constant Locale.FRENCH

Locale french = new Locale("fr");

//Get a DateFormat instance using the locale.  We also specify a length.

//Possible length values are DateFormat.SHORT, DateFormat.MEDIUM, DateFormat.LONG, DateFormat.FULL

//To demonstrate we will get an example of each

int[] lengths = {DateFormat.SHORT, DateFormat.MEDIUM, DateFormat.LONG, DateFormat.FULL};

//Loop through the lengths.  Get a DateFormat instance for each length and format a date

for(int x = 0; x < lengths.length; x++){

    DateFormat df = DateFormat.getDateInstance(lengths[x], french);

    System.out.println(df.format(new Date()));

}

 

 

The output:

 

29/11/08

29 nov. 2008

29 novembre 2008

samedi 29 novembre 2008

 

Format a time amount

Problem:

You want to format a time amount like "5 days" for a locale.

Solution:

TimeUnitFormat from icu4j provides some excellent formatting options for time units.  To use you specify a TimeUnitAmount which contains an amount and a unit.  Then specify a locale on the TimeUnitFormat.

To format 1 day in French:

//Get a TimeUnit
TimeUnit unit = TimeUnit.DAY;
//Specify the unit amount
TimeUnitAmount amount = new TimeUnitAmount(1,unit);
TimeUnitAmount amount2= new TimeUnitAmount(3,unit);
//Get a unit format
TimeUnitFormat format = new TimeUnitFormat();
//Get and set a ULocale
ULocale uloc = new ULocale("fr");
format.setLocale(uloc);
//Format and output
System.out.println(format.format(amount));
System.out.println(format.format(amount2));


The output:

1 jour
3 jours
 

Format a time interval

Problem:

You want to format a span of time.

Solution:

As of version 4, icu4j contains a DateIntervalFormat class.  This class makes it relatively easy to format an interval value.  The format will automatically format as compactly as possible. For example: if the difference between the two dates is only a few hours and both dates occur on the same day, the year, month, and day parts of the date will be omitted.

 

In order to try this example out you will need to download a copy of icu4j from http://icu-project.org/download/4.0.html#ICU4J .

 

Import the correct classes

import java.text.FieldPosition;
import com.ibm.icu.text.DateFormat;
import com.ibm.icu.text.DateIntervalFormat;
import com.ibm.icu.util.Calendar;
import com.ibm.icu.util.ULocale;

Then simply get an instance of the DateIntervalFormat class and format:

//Get two calendar instances
Calendar cal1 = Calendar.getInstance();
Calendar cal2 = (Calendar)cal1.clone();
//increment one calendar by one week.
cal2.add(Calendar.HOUR, 4);
cal2.add(Calendar.MINUTE, 34);
//Get an instance of DateIntervalFormat.  Specify the date format skeleton to use.
//The skeleton is an example format that will be used as a base for the interval format.
//You can specify your own skeleton format, or use one of the defaults in the DateFormat class
DateIntervalFormat dtIntervalFmt = DateIntervalFormat.getInstance(DateFormat.HOUR_MINUTE, new ULocale("ja"));
//Create a StringBuffer to hold the formatted value.
StringBuffer str = new StringBuffer("");
//Create a FieldPosition to specify the position for the StringBuffer
FieldPosition pos = new FieldPosition(0);
//format the interval.  The formatted value is placed in the StringBuffer.
dtIntervalFmt.format(cal1, cal2, str, pos);
System.out.println(str);

 

 Running this we get an output of "19時20分~23時54分".

Format and cast a date to a timezone

Problem:

You want to format a date and cast it to a timezone.

Solution:

The Java DateFormat class can cast a date to a new time zone at the same time as formatting it.  This can be accomplished simply by specifying a TimeZone on the DateFormat object.

 

To format and cast a date to PST:

//Get a Locale.  In this case we are going to use Afrikans in South Africa
Locale afrikans = new Locale("af","ZA");
//Get a DateFormat instance.
//We specify length for date and time and a locale to format for.
DateFormat df = DateFormat.getDateTimeInstance(DateFormat.FULL, DateFormat.FULL, afrikans);
//Get a TimeZone instance for a specified id.
//The ids are Olsen TimeZone ids
TimeZone tz = TimeZone.getTimeZone("America/Los_Angeles");
//Specify the time zone
df.setTimeZone(tz);
//output the format
System.out.println(df.format(new Date()));

 

Output Example:

viernes 5 de diciembre de 2007 06:49:46 PM PST

Get all time zone ids

Problem:

You want to know what time zones are available on your system.

Solution:

Java uses the TimeZone class to handle date/time casting to and from UTC.  A TimeZone is retrieved by specifying a string id.  Most of these ids are Olsen time zone ids.  You can retrieve an array of time zone ids by calling the static getAvailableIDs() method.

 

To get all available IDs:

//Get all Timezone ids
String[] ids = TimeZone.getAvailableIDs();
//Loop through ids and output
for(int x = 0; x < ids.length; x++){
    System.out.println(ids[x]);
}

 

The output is very long, so I will omit it here.

Get an array of day names

 Problem:

You want to retrieve all of the day of the week names for a locale.

Solution:

The DateFormatSymbols class contains a wealth of information useful to the localization programmer.  You can easily retrieve day names, month names, era names, etc...

 

To get the days of the week in Korean and English:

//Get a DateFormatSymbols object using a locale as an argument

DateFormatSymbols dfsKorean = new DateFormatSymbols(Locale.KOREAN);

DateFormatSymbols dfsEnglish = new DateFormatSymbols(Locale.ENGLISH);

//Get an array of weekday names

String[] kWeekdays = dfsKorean.getWeekdays();

String[] eWeekdays = dfsEnglish.getWeekdays();

//Loop through the array and output the names

for(int x = 1; x < 8; x++){

    System.out.println(eWeekdays[x] + " = " + kWeekdays[x]);

}

 

 


The output:

Sunday = 일요일

Monday = 월요일

Tuesday = 화요일

Wednesday = 수요일

Thursday = 목요일

Friday = 금요일

Saturday = 토요일

 

Get an array of timezone ids for offset

Problem:

You know what your offset from UTC is and you want to know what timezones are available for that offset

Solution:

The Java TimeZone class uses string ids to retrieve an instance.  You can get a list of all TimeZone ids, but the list is quite large.  Luckily you can narrow the results by specifying an offset from UTC and retrieve only the results with that raw offset.

 

The offset is specified as milliseconds from UTC.  Go figure.

 

To get all TimeZone ids for an offset of -7:00 from UTC:

//Get all Timezone ids for an offset.
//For some reason the offset is in milliseconds. 
//To my knowledge there is no offset increment less than half an hour
//Here we get all ids for an offset of -7 hours from UTC
String[] ids = TimeZone.getAvailableIDs(-1000*60*60*7);
//Loop through ids and output
for(int x = 0; x < ids.length; x++){
    System.out.println(ids[x]);
}


 

Output:

America/Boise
America/Cambridge_Bay
America/Chihuahua
America/Dawson_Creek
America/Denver
America/Edmonton
America/Hermosillo
America/Inuvik
America/Mazatlan
America/Phoenix
America/Shiprock
America/Yellowknife
Canada/Mountain
Etc/GMT+7
MST
MST7MDT
Mexico/BajaSur
Navajo
PNT
SystemV/MST7
SystemV/MST7MDT
US/Arizona
US/Mountain

Get the best date format pattern

 Problem:

You know what parts of a date you want to display but you don't know the exact pattern to use.

Solution:

Often a Java programmer will know what type of pattern he wants, but not exactly what the pattern would be.  With icu4j's DateTimePatternGenerator you can specify a pattern skeleton, and it will return the most appropriate localized match.

 

To get the best pattern for the skeleton, "MMMddYYYYH" in Japanese:

//Get a Locale object

ULocale locale = new ULocale("ja_JP");

//Get an instance of the DatePatternGenerator for the locale

DateTimePatternGenerator dp = DateTimePatternGenerator.getInstance(locale);

//Get the best pattern match

String best = dp.getBestPattern("MMMddYYYYHH");

//output the best pattern match

System.out.println(best);

//Get a DateFormat for the pattern

DateFormat df = new SimpleDateFormat(best);

//output a formatted date

System.out.println(df.format(new Date()));

 


The output:

 

YYYY年MM月dd日 HH

2008年12月06日 15

 

As you can see the closest match uses a 4 digit year followed by a year symbol, a 2 digit month followed by a month symbol, a 2 digit day followed by a day symbol, and a 0~24 hour.

 

 

Get the display name for a Timezone

Problem:

You want to get a localized display name for a time zone.

Solution:

You can get a display name for a TimeZone in Java very easily by specifying a locale in the getDisplayName method.

 

To get a display name for Canada/Mountain in Japanese:

//Get a locale for japanese in Japan
Locale jp = new Locale("ja","JP");
//Get a TimeZone for the id Canada/Mountain
TimeZone tz = TimeZone.getTimeZone("Canada/Mountain");
//output the display name
System.out.println(tz.getDisplayName(jp));

 


The output:

山地標準時

Parse a formatted date string

Problem:

You want to parse a localized date string to a Java Date.

Solution:

As important as date formatting is, you have to be able to parse the formatted dates.  Parsing a date is very similar to formatting the date.  You just call the parse method on a DateFormat instance.

 

One important point is you need to specify the length of the DateFormat.  This can be difficult to know in advance sometimes.  You can wrap the parse attempt in a try catch to trap the ParseException and change the length of the DateFormat until you return a success.

 

To parse a date:

//Get a Locale
Locale chinese = new Locale("zh","CN");
//Get an instance of DateFormat.
DateFormat df = DateFormat.getDateInstance(DateFormat.FULL, chinese);
System.out.println(df.format(new Date()));
//Parse the date
//We need to handle the ParseException the parse method can throw
//The length specified in the DateFormat.getDateInstance method has to be the
//same as the string you are trying to parse, or you will get a ParseException in many cases.
try {
    //parse(String str) returns a Date
    Date date = df.parse("2008年11月30日 星期日");
    System.out.println(date.toString());
} catch (ParseException e) {
    e.printStackTrace();
}


 

The output:

2008年11月30日 星期日 Sun Nov 30 00:00:00 EST 2008

Numerical Systems

Formatting numbers is straight forward in Java.  Many developers overlook this aspect of localization.  Numeric systems vary around the world, and even more importantly the delimiters and separators vary.  ICU4J provides even more capabilities

Format and parse an integer

Problem:

You want to format and parse integer values.

Solution:

The Java NumberFormat class makes formatting and parsing a number is localized manner easy and efficient.  Simple specify the locale when retrieving the NumberFormat instance.

 

To format and parse a number in Hindi:

//Get a locale.  Hindi in India here.
ULocale hindi = new ULocale("hi_IN");
//Get an instance of an integer format.
NumberFormat nf = NumberFormat.getIntegerInstance(hindi);
//format and output.  The decimal is cut off and rounded.
String f = nf.format(1234234234.556);
System.out.println(f);

//Parse the value.  We must handle the potential ParseException
try {
    System.out.println(nf.parse("१,२३,४२,३४,२३५"));
} catch (ParseException e) {
    e.printStackTrace();
}

 


The output:

१,२३,४२,३४,२३५
1234234235

Format and parse a decimal

Problem:

You want to format and parse decimal values.

Solution:

Localized formatting of decimal values is easy in Java.  It can be accomplished by passing the number into the getInstance method on the NumberFormat class.  Then simply call format and parse.

 

To format and parse a decimal value for Arabic:

//Get a locale.  We use Arabic without a country here.
ULocale arabic = new ULocale("ar");
//Get a decimal formatter instance
NumberFormat nf = NumberFormat.getInstance(arabic);
//format and output
String f = nf.format(123456.789);
System.out.println(f);
//Parse the value.  We must handle the potential ParseException
try {
   System.out.println(nf.parse("١٢٣٬٤٥٦٫٧٨٩"));
} catch (ParseException e) {
    e.printStackTrace();
}

 


The output :

١٢٣٬٤٥٦٫٧٨٩
123456.789

Format and parse a percent

Problem:

You want to format and parse percent values.

Solution:

The Java NumberFormat class makes it easy to create a localized format for a percent value.  Simply pass the locale when retrieving your instance.  The format will see 1 as 100% and .1 as 10%.

 

To format and parse a percent value for Brazillian Portuguese:

//Get a locale.  In this case Portuguese in Brazil
ULocale pt = new ULocale("pt_BR");
//Get an instance of a percent formatter
NumberFormat nf = NumberFormat.getPercentInstance(pt);
//format and output.  1 == 100% and .5 = 50%
String f = nf.format(.78);
System.out.println(f);
//Parse the value.  We must handle the potential ParseException
try {
    System.out.println(nf.parse("78%"));
} catch (ParseException e) {
    e.printStackTrace();
}

 


The output:

78%
0.78

Format and parse a currency

Problem:

You want to format and parse localized currency values.

Solution:

Currency formatting at its most basic is straight forward in Java programming.  Simply retrieve a currency instance of the NumberFormat class and format your number.

 

To format a currency for Japanese for Japan:

//Get a locale.
Locale ja = new Locale("ja","JP");
//Get a currency instance
NumberFormat nf = NumberFormat.getCurrencyInstance(ja);
//format and output.  Notice the rounding is limited to two place on the output.  This is
//Governed by the currency
String f = nf.format(123456.789);
System.out.println(f);

//Parse the value.  We must handle the potential ParseException
try {
    System.out.println(nf.parse("¥123,457"));
} catch (ParseException e) {
    e.printStackTrace();
}

 


The output:

¥123,457
123457

 

Currencies  have a rounding increment that is also important.  A US dollar has cents, and so a currency should be rounded two decimal places.  A Japanese Yen should be rounded to the nearest integer.  If you specify a currency other than the default for the locale the rounding will reflect the locale and not the currency.

 

Format Ordinal Numbers

 Problem:

You want to format an ordinal number like "1st" or "2nd".

Solution:

Java programmers can handle ordinal number formatting by leveraging ICU4J's RuleBasedNumberFormat class.  

 

An ordinal number is a number like "1st" "2nd" etc.

 

To format an ordinal number:

//Get a RuleBasedNumberFormat appropriate for English ordinal format

RuleBasedNumberFormat rbnf = new RuleBasedNumberFormat(ULocale.ENGLISH,RuleBasedNumberFormat.ORDINAL);

//Format the number

System.out.println(rbnf.format(21));

 


The output:

21st

 

Parse a spelled-out number

Problem:

You want to convert a spelled out number to a Number.

Solution:

Parsing a formatted number in Java takes a new twist when the number is fully spelled out.   This can be easily accomplished thanks to the ICU4J library from IBM.  

 

To parse a spelled out number:

//Get a RuleBasedNumberFormat appropriate for French spellout

RuleBasedNumberFormat rbnf = new RuleBasedNumberFormat(ULocale.ENGLISH,RuleBasedNumberFormat.SPELLOUT);

//The String to parse

String number = "three hundred and forty-five";

//We need to handle the potential ParseException

try

{

    //Parse the number

    System.out.println(rbnf.parse(number));

}

catch (ParseException e)

{

    e.printStackTrace();

}

 

The output:
345

 

The same thing for a French Locale:

RuleBasedNumberFormat rbnf = new RuleBasedNumberFormat(ULocale.FRENCH,RuleBasedNumberFormat.SPELLOUT);

String number = "trois cents quarante-cinq";

try

{

    System.out.println(rbnf.parse(number));

}

catch (ParseException e)

{

    // TODO Auto-generated catch block

    e.printStackTrace();

}

 

Spell out a numeric value

Problem:

You want to spell out a localized number like "thirty-six."

Solution:

ICU4J provides some number formatting capabilities that core Java does not.  One of those features is the ability to spell out a numeric value. 

 

This uses the RuleBasedNumberFormat class to convert a numeric value to a spelled out value.  In other words "75" can be converted to "seventy-five."  This can also be combined with MessageFormat.

 

To spell out a numeric value:

//Get a RuleBasedNumberFormat appropriate for French spellout

RuleBasedNumberFormat rbnf = new RuleBasedNumberFormat(ULocale.FRENCH,RuleBasedNumberFormat.SPELLOUT);

//Call format passing in the numeric value to be formatted

System.out.println(rbnf.format(345));

 

 


 The output:

trois cents quarante-cinq

Use a non-default currency

 

Problem:

You want to use a currency other than the default for the region.

Solution:

We can specify a currency separate from a region.  This will give us a numeric format that is appropriate to the locale, but use a currency code for the specified locale.

//Get a locale.
Locale italian = new Locale("it","IT");
//Get a currency instance
Currency c = Currency.getInstance("JPY");
//Get a currency instance
NumberFormat nf = NumberFormat.getCurrencyInstance(italian);
nf.setCurrency(c);
//format and output.  Notice the rounding is not limited to two place on the output.  Rounding
//is now governed by the locale.  Also notice we now use the ISO code istead of a symbol
String f = nf.format(123456.789);
System.out.println(f);

//Parse the value.  We must handle the potential ParseException
//Notice that we can not handle the '¥' mark in the parse
try {
    System.out.println(nf.parse("JPY 123.456,79"));
} catch (ParseException e) {
    e.printStackTrace();
}


 

The output:

JPY 123.456,79
123456.79

Misc

This section contains hints and ideas that don't fit anywhere else.

Viewing Unicode characters in the Eclipse console.

Problem:

You want to display Japanese, Chinese, or other Unicode characters in the Eclipse console.

Solution:

By default the eclipse console does not handle Unicode.  All Unicode characters in the Eclipse console are replaced by question marks. Luckily it is a simple matter to get Unicode characters to display. You will need to make two simple modifications to the run dialog.

 

  1. Open the Run Dialog.  
  2. Click on the second tab, "Arguments"
  3. Enter "-Dfile.encoding=UTF-8" in the second text area title VM arguments.  
  4. Click on the "Common" tab.
  5. Click on "Other" under "Console Encoding"
  6. Select "UTF-8" from the drop down.
  7. Apply and run.

Resource Bundles

One of the first steps in internationalizing an application is th externalizing of text.  This simplifies the translation process and makes localization far easier. 
The standard way of handling resources in Java has been by the use of properties files.  A properties file is simply a text file containing key value pairs.  It is named with the syntax, name_language_country_variant.properties.  So a bundle for a Japanese bundle called errors would be titled, errors_ja.properties.
An example file might look like:


firstName=First Name
lastName=Last Name

 
 
There are a few limitations of properties files.  The biggest problem is that they are not Unicode encoded.  This means that many characters, like Japanese or Chinese double-byte, will need to be escaped in the properties files.  This makes them un-readable to the common user.  The native2ascii tool packaged with Java can convert your unicode to ascii for you.
 
To load a resource bundle and retrieve a key from it:


ResourceBundle rb = PropertyResourceBundle.getBundle("com.cookbook.bundles.foo");
System.out.println(rb.getString("firstName"));

 
 
To load a resource bundle for a particular locale if available:


Locale en = Locale.ENGLISH;
ResourceBundle rb = PropertyResourceBundle.getBundle("com.cookbook.bundles.foo", en);
System.out.println(rb.getString("firstName"));

 
 

Handle plural text in a localizable manner.

Problem:

You want to use the correct forms for plurals in a sentence including dynamic values.

Solution:

Plurals can be very difficult to deal with.  It is not always as easy as adding an s to the end of a word.  Consider the sentence, "Enter your child(rens)'s names(s)."  That looks pretty ugly, and not human at all.

ChoiceFormat allows us to spcify blocks of text that appear depending upon a numeric value passed in.  For example:

//Create a collection of options separated by pipes, and preceded by numeric markers.
String msg = "0#no dogs| 1#a dog| 2#a couple dogs| 3#several dogs| 3<many dogs";
ChoiceFormat fmt = new ChoiceFormat(msg);
//Pass a choice into the format and that option will be selected.
System.out.println("I have " + fmt.format(2) + ".");

Now obviously our hard coded sentence in the previous example is not a good thing for localization.  We can combine a MessageFormat and ChoiceFormat to solve that issue and give us an easily localizable solution.

//Create an array of arguments.  In this case a ChoiceFormat array
Format[] choices = {new ChoiceFormat("0#no dogs| 1#a dog| 2#a couple dogs| 3#several dogs| 3<many dogs")};
//Create a message format with the type of choice
String msg = "I have {0,choice}.";
MessageFormat mft = new MessageFormat(msg);
//call the setFormats method on the MessageFormat and pass the ChoiceFormat array in
mft.setFormats(choices);
for(int x =0; x < 5; x++){
       Object[] args = {x};
        System.out.println(mft.format(args));
}

How to combine dynamic and static data

Problem:

You want to combine dynamic values with your static resource files.

Solution:

There are many situations where part of a sentence comes from a dynamic source, or is provided by the user.  Obviously translatng a sentence for every possible scenario is not an option.  Thankfully Java provides a solution for this situation.  It is called MessageFormat.

A message format is a way of replacing variables in a sentence in a locale sensitive manner.

A simple message format:

String msg = "Hello, {0}. My name is {1}.";
Object[] args = {"John", "Tom"};
System.out.println(MessageFormat.format(msg, args));

You can also specify formatting information with the variables, and they will be formatted in a locale appropriate manner.

//In the message we specify a format style and type.
//Type can be specified as number,time, date, or choice.
//Style can be short, medium, long, or full for date and time types.  It can be
//integer, currency, or percent for number types.
String msg = "Today is {0,date,full}. I have {1,number,currency} in my wallet";
//The arguments to replace the values are specifed as elements in an object array
Object[] args = {new Date(), 1876};
//Create a MessageFormat instance
MessageFormat mf = new MessageFormat(msg,new Locale("fr","FR"));
//call format passing in the argument array
System.out.println(mf.format(args));

Load properties from outside the classpath

 

 Problem:

You want to locate your resource bundles outside your application classpath.

Solution:

Create an InputStream and use it to create the PropertyResourceBundle directly using the constructor.

To loop through a collection of locales and look for a matching properties file outside the classpath then extract a resource from it:

try {
   //Create an arra of locale ids.  This should be created programmaticaly
    String[] localeChain = {"fr_CA","fr","en_CA","en"};
    //Get the root directory you will locate the resources in
    File directory = new File("C:\\temp\\");
    //Get a List of files contained in the director.  We could use a FileFilter to
    //reduce this list
    List<String> files = Arrays.asList(directory.list());
    //The InputStream will be used to load the properties file.  Here we initialize as null;
    InputStream is = null;
    //Loop through the locales in the chain
    for(int x = 0; x < localeChain.length; x++) {
        //Check if a properties file exists for that locale.
        //Naturally in a real usage the file name would be specified programmaticaly.
        //If a match is found we create the InputStream and break out of the loop
        String fileName = "foo_"+localeChain[x]+".prop";
        if(files.contains(fileName)){
            is = new FileInputStream(new File(directory,fileName));
            break;
        }
    }
    //If the InputStream is still null look for a root file
    if(is == null && files.contains("foo.prop") ) {
        is = new FileInputStream(new File(directory,"foo.prop"));
    }
    //If a file has been found create the ResourceBundle and output some text
    if(is != null) {
        PropertyResourceBundle rb2 = new PropertyResourceBundle(is);
        System.out.println(rb2.getString("foo"));
    }
}
catch (MalformedURLException e) {
    e.printStackTrace();
}
catch (IOException e) {
    e.printStackTrace();
}

 

Spell out the numeric value in a concatenated string

 Problem:

You want to spell out a dynamic numeric value being combined with a static resource.

Solution:

Java MessageFormat provides a number of options for formatting the values passed in.  ICU4J provides even more options.

One of the more interesting formatting options is the spellout format.  Spell out will actually write out a numeric value, such as thirty-five vs. 35.

ICU4J's MessageFormat is used similar to the code Java class.

To format a pattern using a spellout pattern:

//Create an Array of Objects
Object[] tmp = {345};
//Create our pattern.  We specify that we will replace using the first element in the array
//and use a spellout pattern
String msg = "I have {0,spellout} coffee cups in my car.";
//Retrieve the MessageFormat
MessageFormat mf = new MessageFormat(msg,ULocale.ENGLISH);
//Call format passing in the array
System.out.println(mf.format(tmp));
//Now lets do it in Japanese
String jMsg = "私の車の中に{0,spellout}個のコーヒーカップがあります。";
MessageFormat mf2 = new MessageFormat(jMsg,ULocale.JAPANESE);
System.out.println(mf2.format(tmp));


The output is:

I have three hundred and forty-five coffee cups in my car.

私の車の中に三百四十五個のコーヒーカップがあります。

 

 

Unicode, Transliteration, and Charactersets

One of the most important tasks a Java internationalization programmer has is the handling of different writing systems.  Characterset conversion used to be a huge task for all developers working with multiple locales.  While the wide-spread acceptance of Unicode has made that job much easier than before, there are still many situations where characterset conversion becomes important.

 

Transliteration is another area that the localization programmer will find useful.  Transliteration should not be confused with translation.  Translation is the conversion of text from one language to another.  Transliteration is the conversion of text from one writing system (Alphabet) to another.  This is useful for handling names and other non-translatable text.

Convert text from one script to another

Problem:

You want to convert the script text is written in.

Solution:

The Transliterator from icu4j class allows for easy conversion of text from one writing system to another.  To transliterate text you simply get an instance of the Transliterator class using the id of the writing system you are converting from, and the id of the system you want to convert into, and then call the transliterate method on the object.

 

It is important to note that not all writing systems contain all sounds, nor do they handle sounds in the same way.  This means that round trip transliteration is often faulty.

 

To transliterate text from English to Hangul:

//Get a Transliterator instance for converting Latin script to Hangul (Korean) script
Transliterator trans = Transliterator.getInstance("Latin-Hangul");
//The text to transliterate
String txt = "Transliteration is very cool.";
//Output the example text
System.out.println(txt);
//Transliterate the text
String korean = trans.transliterate(txt);
//Output the Hangul text
System.out.println("To Hangul: " + korean);
//Get an instance of a transliterator going in the reverse.
//This is the same as calling Transliterator.getInstance("Hangul-Latin");
trans = Transliterator.getInstance("Latin-Hangul",Transliterator.REVERSE);
//Output the transliterated English value
//Note the English doesn't match the original value exactly. 
//This is due to the different sounds available in each script
System.out.println("Back to English: " + trans.transliterate(korean));

 


The output:

Transliteration is very cool.
To Hangul: 트란스리테라티온 잇 베류 초올.
Back to English: teulanseulitelation is belyu chool.

Detect the Charset of a URL

 Problem:

You want to know what character set a web page is encoded in.

Solution:

The Java programmer can use icu4j's CharsetDetector class to create a best guess for the charset of a specified input stream.  Unfortunately this is very much a guess, and should not be overly relied upon.

 

To get the Charset of web page:

try{
    //The url to check
    URL url = new URL("http://ru.yahoo.com");
    //get an inputstream from the url
    InputStream is = url.openStream();
    //get a byte array from the input stream and close the inputstream
    byte[] bytes = new byte[2000];
    is.read(bytes);
    is.close();
    //get a CharacterDetector
    CharsetDetector cd = new CharsetDetector();
    //set the text to detect
    cd.setText(bytes);
    //detect
    CharsetMatch match = cd.detect();
    //Get the name of the most likely match
    System.out.println(match.getName());
}catch (MalformedURLException e){
    e.printStackTrace();
}catch (IOException e){
    e.printStackTrace();
}

 


The output:

UTF-8

 

Get Transliterators available source ids

Problem:

You want to retireve all available source ids for a Transliterator.

Solution:

A Transliterator is created using a combination of a source and a target id joined by a hyphen. 

To get a list of all sources:

//Get an Enumeration of available source ids
Enumeration<String> ids = Transliterator.getAvailableSources();
//Loop through available ids and output them to the console
while(ids.hasMoreElements()){
    String id = ids.nextElement();
    System.out.println(id);
}

 


The output:

Arabic
Hangul
Tamil
Thaana
Gujarati
Simplified
Han
Telugu
Syriac
Devanagari
Name
Publishing
Digit
Latin
Kannada
NumericPinyin
Jamo
Any
Fullwidth
Cyrillic
Armenian
Georgian
Katakana
Hex
Malayalam
Oriya
Pinyin
Tone
Thai
Greek
Hiragana
Halfwidth
Hebrew
Accents
Traditional
Bengali
Gurmukhi
 

 

Get all available transliterator ids

Problem:

You want to get all available transliterator ids.

Solution:

You obtain a Transliterator by specifying an id.  The id is a String that contains two script names separated by a hyphen, for example: "Latin-Hangul."

 

To obtain a list of all ids:

//Get an Enumeration of available ids
Enumeration<String> ids = Transliterator.getAvailableIDs();
//Loop through available ids and output them to the console
while(ids.hasMoreElements()){
    String id = ids.nextElement();
    System.out.println(id);
}

 


This outputs:

Accents-Any
Any-Accents
Any-Publishing
Arabic-Latin
Armenian-Latin
Bengali-Devanagari
Bengali-Gujarati
Bengali-Gurmukhi
Bengali-Kannada
Bengali-Latin
Bengali-Malayalam
Bengali-Oriya
Bengali-Tamil
Bengali-Telugu
Cyrillic-Latin
Devanagari-Bengali
Devanagari-Gujarati
Devanagari-Gurmukhi
Devanagari-Kannada
Devanagari-Latin
Devanagari-Malayalam
Devanagari-Oriya
Devanagari-Tamil
Devanagari-Telugu
Digit-Tone
Fullwidth-Halfwidth
Georgian-Latin
Greek-Latin
Greek-Latin/UNGEGN
Gujarati-Bengali
Gujarati-Devanagari
Gujarati-Gurmukhi
Gujarati-Kannada
Gujarati-Latin
Gujarati-Malayalam
Gujarati-Oriya
Gujarati-Tamil
Gujarati-Telugu
Gurmukhi-Bengali
Gurmukhi-Devanagari
Gurmukhi-Gujarati
Gurmukhi-Kannada
Gurmukhi-Latin
Gurmukhi-Malayalam
Gurmukhi-Oriya
Gurmukhi-Tamil
Gurmukhi-Telugu
Halfwidth-Fullwidth
Han-Latin
Hangul-Latin
Hebrew-Latin
Hiragana-Katakana
Hiragana-Latin
Jamo-Latin
Kannada-Bengali
Kannada-Devanagari
Kannada-Gujarati
Kannada-Gurmukhi
Kannada-Latin
Kannada-Malayalam
Kannada-Oriya
Kannada-Tamil
Kannada-Telugu
Katakana-Hiragana
Katakana-Latin
Latin-Arabic
Latin-Armenian
Latin-Bengali
Latin-Cyrillic
Latin-Devanagari
Latin-Georgian
Latin-Greek
Latin-Greek/UNGEGN
Latin-Gujarati
Latin-Gurmukhi
Latin-Han
Latin-Hangul
Latin-Hebrew
Latin-Hiragana
Latin-Jamo
Latin-Kannada
Latin-Katakana
Latin-Malayalam
Latin-NumericPinyin
Latin-Oriya
Latin-Syriac
Latin-Tamil
Latin-Telugu
Latin-Thaana
Latin-Thai
Malayalam-Bengali
Malayalam-Devanagari
Malayalam-Gujarati
Malayalam-Gurmukhi
Malayalam-Kannada
Malayalam-Latin
Malayalam-Oriya
Malayalam-Tamil
Malayalam-Telugu
NumericPinyin-Latin
NumericPinyin-Pinyin
Oriya-Bengali
Oriya-Devanagari
Oriya-Gujarati
Oriya-Gurmukhi
Oriya-Kannada
Oriya-Latin
Oriya-Malayalam
Oriya-Tamil
Oriya-Telugu
Pinyin-NumericPinyin
Publishing-Any
Simplified-Traditional
Syriac-Latin
Tamil-Bengali
Tamil-Devanagari
Tamil-Gujarati
Tamil-Gurmukhi
Tamil-Kannada
Tamil-Latin
Tamil-Malayalam
Tamil-Oriya
Tamil-Telugu
Telugu-Bengali
Telugu-Devanagari
Telugu-Gujarati
Telugu-Gurmukhi
Telugu-Kannada
Telugu-Latin
Telugu-Malayalam
Telugu-Oriya
Telugu-Tamil
Thaana-Latin
Thai-Latin
Tone-Digit
Traditional-Simplified
Any-Null
Any-Remove
Any-Hex/Unicode
Any-Hex/Java
Any-Hex/C
Any-Hex/XML
Any-Hex/XML10
Any-Hex/Perl
Any-Hex
Hex-Any/Unicode
Hex-Any/Java
Hex-Any/C
Hex-Any/XML
Hex-Any/XML10
Hex-Any/Perl
Hex-Any
Any-Lower
Any-Upper
Any-Title
Any-Name
Name-Any
Any-NFC
Any-NFD
Any-NFKC
Any-NFKD
Any-Latin
Any-Telugu
Any-Malayalam
Any-Oriya
Any-Gurmukhi
Any-Gujarati
Any-Bengali
Any-Devanagari
Any-Kannada
Any-Tamil
Any-Han
Any-Katakana
Any-Hiragana
Any-Armenian
Any-Cyrillic
Any-Hangul
Any-Arabic
Any-Greek
Any-Greek/UNGEGN
Any-Hebrew
Any-Thai
Any-Syriac
Any-Thaana
Any-Georgian

 

Get available target ids for a Transliterator source id

Problem:

You have a source id but you want to retrieve all available target ids for the source id.

Solution:

A Transliterator is retrieved using an id that is a combination of a source and target id.

 

To retrieve all possible target IDs for the source "Latin":
//Get an Enumeration of available target ids for the source "Latin"
Enumeration<String> ids = Transliterator.getAvailableTargets("Latin");
//Loop through available ids and output them to the console
while(ids.hasMoreElements()){
    String id = ids.nextElement();
    System.out.println(id);
}

 


The output:

Gujarati
Jamo
Han
Katakana
Hiragana
Armenian
Cyrillic
NumericPinyin
Gurmukhi
Bengali
Hangul
Arabic
Greek
Devanagari
Hebrew
Thai
Oriya
Tamil
Syriac
Malayalam
Kannada
Thaana
Telugu
Georgian
 

Read a Unicode file

Problem:

You want to read a Unicode encoded file into memory.

Solution:

To read a file with a charset other than the system default you should specify the charset in the reader.  You can specify either a Charset or the String id of the Charset.

To read a Unicode file containing Japanese characters:

//Surround with try catch to handle potential exception
try{
    //Get an InputStream
    FileInputStream fis = new FileInputStream("C:\\files\\test.txt");
    //Get a reader specifying the charset
    InputStreamReader isr = new InputStreamReader(fis,"UTF-8");
    //wrap with a buffered reader for performance
    BufferedReader br = new BufferedReader(isr);
    //Read it into a variable an output
    String txt;
    while((txt = br.readLine()) != null){
        System.out.println(txt);
    }
}catch (FileNotFoundException e){
    e.printStackTrace();
}catch (IOException e){
    e.printStackTrace();
}

 


The output:

これはテストです。
高松
日本
米国
英国
世界

Write a Shift_JIS Japanese file

Problem:

You want to write a file to disk in an encoding other than the default.

Solution:

The convenience FileWriter class writes files in the default character encoding of the JVM.  If you want to specify an encoding you should create a FileOutputStream and pass it to an OutputStreamWriter.  The OutputStreamWriter class allows you to specify an encoding Charset.

To write a file to disk as Shift_JIS:

//Handle potential exceptions
try{
   //Our text to write out to the file. In this case garbage Japanese
   String example = "これはテストです。高松日本米国英国世界";
   //Create an output stream
   FileOutputStream fos = new FileOutputStream("C:\\files \\testOut.html");
   //Create a writer specifying our output stream and character set.
   OutputStreamWriter osw = new OutputStreamWriter(fos,"Shift_JIS");
   //Let's buffer it for performance
   BufferedWriter bw = new BufferedWriter(osw);
   //write the file bw.write(example);
   //close the writer bw.close();
} catch (FileNotFoundException e){
   e.printStackTrace();
} catch (IOException e){
   e.printStackTrace();
}


to test the output open the file in your browser and change the encoding to Shift_JIS.