Language tags define a standard way to express a users preferences but how to we extract those preferences in a cohesive fashion in ex_cldr?

What a Language Tag can tell us?

Posit the language tag en. We know from the previous post that the user is telling us that they prefer information presented in the english language.

It may suprise you to learn that from this simple language tag we also know a lot more! We know, by implication, at least the following:

  • The territory1 preferred by the user
  • The preferred currency
  • The preferred calendar
  • The preferred number system (system of digits)

The users preferred territory

Certain localisation functions are dependent on understanding with which territory a user affiliates themselves. One example is units of measurement where the US tends to use units derived from the Imperial System of Measurement whereas most of the world uses the Metric system. Therefore it is helpful to know the users territory in order to present units of measure in an appropriate manner.

In ex_cldr we can identify the user’s territory with Cldr.Locale.territory_from_locale/1. For example:

iex> Cldr.Locale.territory_from_locale "en"
:US

iex> Cldr.Locale.territory_from_locale "en-AU"
:AU

iex> Cldr.Locale.territory_from_locale "en-GB"
:GB

iex> Cldr.Locale.territory_from_locale "ja"
:JP

You may be wondering how we could have derived that the territory for the language tag en happens to be :US. Since at current count 105 territories have english defined as one active language a decision has to be made.

CLDR’s policy is that when a language tag is comprised solely of a language identifier, the territory with the largest population is defined to be the default. This means that en is considered to the same as en-US since the US has the largest english-speaking population worldwide.

In a similar manner, pt is derived to be the same as pt-BR since Brazil is the largest population of Portugese speakers worldwide. Not always the expectation of developers or users!

As a result of this policy, there is no pt-BR or en-US locale data defined in CLDR although a language tag of this form is most definitely accepted and resolved correctly.

In some cases it is desirable to override the territory to be used for some kinds of localisation and in this case the regional override can be specified as part of the BCP 47 U of a language tag. We will explore the U extension in a later post but for now we can show some examples of its use:

iex> Cldr.Locale.territory_from_locale "en-u-rg-auzzzz"
:AU

iex> Cldr.Locale.territory_from_locale "ja-u-rg-uszzzz"
:US

The syntax is unusual in order to comply with the overall specification of language tags however it is straight forward to understand when recognising that the zzzz is padding to create a 6-character region override code.

Why is overriding the territory (region) preference useful? In CLDR, the region override (rg) is used to overide the territory to be used when:

  • Identifying the desired calendar to be used
  • Identifying the which unit preferences to be used
  • Identifying the currency formatting to be used

Its not a common case the overriding the territory becomes a requirement so it is not something to be used without specific use case.

The users preferred currency

In the same way that we can infer the users preferred territory from a language tag, we can similarly infer the preferred currency or override it is desired. Since most territories have a single authorised currency, its a simple matter to identify the preferred territory using then process defined above and the look up the currency associated with that territory. The function Cldr.Currency.currency_from_locale/1 takes care of all of that for us. For example:

iex> Cldr.Currency.currency_from_locale "en"
:USD

iex> Cldr.Currency.currency_from_locale "en-AU"
:AUD

iex> Cldr.Currency.currency_from_locale "en-JP"
:JPY

iex> Cldr.Currency.currency_from_locale "en-IR"
:IRR

However in some circumstances a user may prefer a specific currency be used by default. Perhaps you have your secret stash in a Swiss bank account and prefer to have currency reporting in Swiss Francs (:CHF). In that case:

iex> Cldr.Currency.currency_from_locale "en-u-cu-chf"
:CHF

It’s much preferred that currencies be cleary specified in all operations. However a default currency is helpful when parsing user input. Money.parse/2 can use this default currency. For example:

iex> Money.parse("100")
#Money<:USD, 100>

iex> Money.parse("100", locale: "en-AU")
#Money<:AUD, 100>

iex> Money.parse("100", locale: "en-AU-u-cu-chf")
#Money<:CHF, 100>

In order to be especially cautious (and with money that’s never a bad thing), the option default_currency: false can be passed. For example:

iex> Money.parse("100", default_currency: false)
{:error, {Money.Invalid,
  "A currency code, symbol or description must be specified but was not found in \"100\""}}

We’ll cover localised money in a separate post, but money parsing is really quite flexible and can recognise both currency symbols and textual currencies like:

iex> Money.parse "USD 100,00", locale: "de"
#Money<:USD, 100.00>

iex> Money.parse("100", default_currency: :EUR)
#Money<:EUR, 100>

iex> Money.parse("100 eurosports", fuzzy: 0.8)
#Money<:EUR, 100>

iex> Money.parse("100 eurosports", fuzzy: 0.9)
{:error, {Money.UnknownCurrencyError, "The currency \"eurosports\" is unknown or not supported"}}

iex> Money.parse("100 afghan afghanis")
#Money<:AFN, 100>

The users preferred calendar

Although for civil use the proleptic gregorian calendar may be the most commonly used calendar, many other calendars are in use around the world for civil or religious purposes.

CLDR maintains content to allow the localisation of calendar information for a number of calendars which can be returned by Cldr.known_calendars/0. For example:

iex> Cldr.known_calendars
[:buddhist, :chinese, :coptic, :dangi, :ethiopic, :ethiopic_amete_alem,
 :gregorian, :hebrew, :indian, :islamic, :islamic_civil, :islamic_rgsa,
 :islamic_tbla, :islamic_umalqura, :japanese, :persian, :roc]

Since ex_cldr attempts to make localisation as simple as possible, localising dates and times requires a calendar implementation for any CLDR calendar that is to be localised. As of this November 2020, the following calendars are supported:

Many locales support allow more than one calendar type with a preferred calendar as the default. The supported calendars for a given locale can be returned with Cldr.Calendar.Preference.preferences_for_territory/1. For example:

iex> Cldr.Calendar.Preference.preferences_for_territory(:AU)
{:ok, [:gregorian]}

iex> Cldr.Calendar.Preference.preferences_for_territory(:IR)
{:ok, [:persian, :gregorian, :islamic, :islamic_civil, :islamic_tbla]}

iex> Cldr.Calendar.Preference.preferences_for_territory(:JP)
{:ok, [:gregorian, :japanese]}

The first entry in the list is the preferred and default calendar and the other calendars in the list are also valid to be selected - as long as that calendar has an implementation in ex_cldr.

If the requested calendar is valid but has no implementation in ex_cldr then the first calendar in the preference list that does have an implementation is selected. This will most commonly be the gregorian calendar.

Now we can put all the pieces together an identify what a user’s calendar preference is for a given langauge tag.

iex> Cldr.Calendar.calendar_from_locale "en"
{:ok, Cldr.Calendar.US}

iex> Cldr.Calendar.calendar_from_locale "en-AU"
{:ok, Cldr.Calendar.AU}

iex> Cldr.Calendar.calendar_from_locale "fa-IR"
{:ok, Cldr.Calendar.Persian}

Variations on the theme of the Gregorian calendar

Although the gregorian calendar may be in common use, the application of that calendar does vary from territory to territory. For example, in the US the first day of the week is considered to by Sunday and in the UK it is considered to be Monday. In conjunction with the ex_cldr_calendars, ex_cldr will return a calendar module that most closely reflects the calendar preferences of the territory appropriate for the given language tag. Some examples:

iex> Cldr.Calendar.calendar_from_locale "en"
{:ok, Cldr.Calendar.US}

iex> Cldr.Calendar.calendar_from_locale "en-GB"
{:ok, Cldr.Calendar.GB}

You might reasonably ask why, since both territories use the gregorian calendar, a different calendar is returned? ex_cldr_calendars builds on the Elixir Calendar protocol to provide a very flexible calendar model that encompasses week-based calendars like the ISO Week calendar, month-based calendars, and fiscal calendars like the 445 calendar. ex_cldr_calendars calendars can have different start days of week, different start months and differnt month formations of a quarter too.

In the examples above the difference is solely that the US calendar starts the week on a Sunday and the GB calendar starts the week on a Monday. Unless you are using the functionality of ex_cldr_calendars the two calendars will work iterchangeably with the standard Calendar.ISO calendar.

The users preferred number system

Although the Hindu Arabic is the most familiar number system to developers, there are many other number systems in use around the world. Some have the same decimal structure as the Hindu-Arabic system and others use a different, often algorithmic, number system. The numbers systems known to CLDR can be returned with Cldr.Number.System.known_number_systems/0. There are more than 80 numbers systems defined. For example:

iex> Cldr.Number.System.known_number_systems
[:adlm, :ahom, :arab, :arabext, :armn, :armnlow, :bali, :beng, :bhks, :brah,
 :cakm, :cham, :cyrl, :deva, :diak, :ethi, :fullwide, :geor, :gong, :gonm,
 :grek, :greklow, :gujr, :guru, :hanidays, :hanidec, :hans, :hansfin, :hant,
 :hantfin, :hebr, :hmng, :hmnp, :java, :jpan, :jpanfin, :jpanyear, :kali,
 :khmr,  :knda, :lana, :lanatham, :laoo, :latn, :lepc, :limb, :mathbold,
 :mathdbl, :mathmono, :mathsanb, ...]

Not all number systems are application to all locales. The valid number systems for a given locale can be returned by Cldr.Number.System.number_systems_for/1. For example:

iex> Cldr.Number.System.number_systems_for "en"
{:ok, %{default: :latn, native: :latn}}

iex> Cldr.Number.System.number_systems_for "fa"
{:ok, %{default: :arabext, native: :arabext}}

iex> Cldr.Number.System.number_systems_for "he"
{:ok, %{default: :latn, native: :latn, traditional: :hebr}}

From these examples we can see that number systems can be referenced directly by name, such as :latn , :arabext and :hebr or by a number system type such as :default, :native and :traditional. In ex_cldr a number system can be referenced by either the name directly or by a number system type except when referenced in a language tag. In this case, only the number system name is recognised.

Putting it all together as part of a language tag, we use the nu subtag of the U extension as the following examples show:

iex> Cldr.Number.System.number_system_from_locale "he"
:latn

iex> Cldr.Number.System.number_system_from_locale "he-u-nu-hebr"
:hebr

iex> Cldr.Number.System.number_system_from_locale "en"
:latn

iex> Cldr.Number.System.number_system_from_locale "en-u-nu-thai"
:thai

Although CLDR states that any parameter nu must be valid for the give locale, ex_cldr does not apply this constraint and any number system known to CLDR may be used in a language tag. Note how ever that Cldr.Number.to_string/2 will enforce that the requested number system is known to the given locale.

How is the number system applied?

A number system is typically used when formatting a number. The following examples illustrate:

iex> Cldr.Number.to_string 123456, locale: "th"
{:ok, "123,456"}

iex> Cldr.Number.to_string 123456, locale: "th-u-nu-thai"
{:ok, "๑๒๓,๔๕๖"}

# In the "ar" locale the number system is always :arab
iex> Cldr.Number.to_string 123456, locale: "ar"
{:ok, "١٢٣٬٤٥٦"}

In Summary

In this post we learned that a language tag can convey a lot more information about the users preferences that may seem obvious on the surface. Building on the this basic knowledge we can construct more complex and functionally rich expressions of the users preference a canonical way to drive content localisation and parsing.


Footnotes

1The term territory is used deliberately. Although the term country is more widely used a country is a political entity and as such there are always shifting boundary and territorial disputes. Since our purpose is only to express user preferences without endorsing any particular political claim, we use the term territory. CLDR uses the term region which is considered here interchangable with territory.