Stop using Locale::Maketext

The following is a translation of a talk given by Steffen Winkler at German Perl Workshop 10, Internationalisierungs-Framework auswählen, in 2010. You can find my translation of the POD on Github. (In fact you are better off reading it there, because the CSS formatting on blogs.perl.org sucks, at least for now. If you’re seeing this in a feed reader, feel free to stay.)

Aside from Steffen’s talk, there is also Nikolai Prokoschenko’s rant On the state of i18n in Perl, which mentions to some extent the fact that gettext has established workflows as well as supporting software (such as graphical tools) known in many software communities and even outside the open source world, whereas Maketext has… none of that, and is known only to Perl folk.

Be smart. Don’t use Maketext.


Selecting an Internationalization Framework

Author

Steffen Winkler perl-ws@steffen-winkler.de

Bio

I’ve existed since 1960.

I've been programming Perl since late 2000, first privately and then professionally.

Currently I work for SIEMENS AG in Erlangen, primarily in the area of web programming.

I have been attending the German Perlworkshop since 2003.

Abstract

Why use Locale::TextDomain when so many frameworks on CPAN use Locale::Maketext?

Following my presentation on DBD::PO in Frankfurt/Main there was a lively discussion, both in Frankfurt as well as at Erlangen-PM.

There are 2 internationalization frameworks on CPAN, Locale::TextDomain (Perl interface to Uniform Message Translation) and Locale::Maketext (framework for localization).

What are the differences?

What are the limitations?

What I want to talk about today

From source to multilingual application in 2 ways.

No matter what internationalization framework from the CPAN you use you have to live with limitations. A good choice greatly reduces them.

It begins with the application's source code

 print  'You can log out here.';
 printf 'He lives in %s, %s.', $town, $address;
 printf '%d people live here.', $people;
 printf 'These are %d books.', $books;
 printf 'He has %s houses in %s, %s.', $houses, $town, $address;
 printf '%s books are in %s shelves.', $books, shelves;

PO files - what are they?

PO is an abbreviation for "portable object".

GNU gettext PO files can be used to make programs multilingual.

Along with the original text and its translation the file contains various comments and flags.

MO files are the binary version of PO files.

Rewriting to Locale::Maketext::Simple

Here we use the basic module Locale::Maketext together with a module which reads gettext PO/MO files. It is called Locale::Maketext::Lexicon::Gettext. Locale::Maketext::Simple exports the function "loc".

 [_n] where n = 1, 2, ...

is the general notation for placeholders. Within [] a function name can be used as a prefix followed by its parameters. They are separated by ",". quant, abbreviated *, is the name of the function for plural processing.

 print loc('You can log out here.');

 print loc(
     'He lives in [_1], [_2].',
     $town,
     $address,
 );

 print loc(
     '[quant,_1,person lives,people live] here.',
     $people,
 );

I have no idea how to write the following phrase with "quant". With "quant" you write something along the lines of value followed by unit. But here the plural form starts before the value. The problem is that "quant" requires omitting "_1" in the plural forms and also omitting the following space.

 print loc(  
     '[myplural,_1,It is _1 book,These are _1 books].',
     # ????????    ^^^^^ ???     ^^^^^^^^^ ???
     $books, 
 );

 print loc(
     'He has [quant,_1,house,houses] in [_2], [_3].',
     $houses,
     $town,
     $address,
 );

 print loc(
     '[quant,_1,book is,books are] in [*,_2,shelf,shelves].',
     $books,
     $shelves,
 );

Rewriting to Locale::TextDomain

Locale::TextDomain is part of the libintl-perl distribution. There are several exported functions. Function names follow a simple scheme.

 x for a placeholder,
 n for plural and
 p for context.

The order of parameters, when present:

 Context,
 singular,
 plural,
 number for plural selection and
 finally a hash with placeholder data.

Not all combinations of n, p and x are implemented. If you use x without a placeholder and adhere to alphabetical order then __x, __nx, __px und __npx are the possibilities left.

 __('msgid')
 __x(
     'msgid',
     name1 => $value1, name2 => $value2, ...
 )
 __n('msgid', 'msgid_plural', $count)
 __nx(
     'msgid', 'msgid_plural', $count,
     name1 => $value1, name2 => $value2, ...
 )
 __xn(
     'msgid', 'msgid_plural',
     $count, name1 => $value1, name2 => $value2, ...
 )
 __p('context', 'msgid')
 __px(
     'context', 'msgid',
     name1 => $value1, name2 => $value2, ...
 )
 __np('context', 'msgid', 'msgid_plural', $count)
 __npx(
     'context', 'msgid', 'msgid_plural', $count,
     name1 => $value1, name2 => $value2, ...
 )

 print __('You can log out here.');

 print __x(
     'He lives in {town}, {address}.',
     town    => $town,
     address => $address,
 );

 print __nx(
     '{num} person lives here.',
     '{num} people live here.',
     $people,
     num => $people,
 );


 print __nx(
     'It is {num} book.',
     'These are {num} books.',
     $books,
     num => $books,
 );

 print __nx(
     'He has {num} house in {town}, {address}.',
     'He has {num} houses in {town}, {address}.',
     $houses,
     num     => $houses,
     town    => $town,
     address => $address,
 );

 print
     __nx(
         '{num} book is',
         '{num} books are',
         $books,
         num => $books,
     ),
     __nx(
         ' in {num} shelf.',
         ' in {num} shelves.',
         $shelves,
         num => $shelves,
     );

What do you see at first glance?

Locale::Maketext has numbered parameters. If there are many, this may be confusing. All the translator can tell is that something is being included, but not what.

 [_1] is a [_2] in [_3].

Locale::Maketext can handle multiple plural forms in a text phrase.

 [quant,_1,book is,books are] in [*,_2,shelf,shelves].

The text in plural forms (quant) is not automatically translatable because it's contained in a kind of "or" block.

Within this "or" block, placeholders such as _1 are no longer present. Thus it is impossible to represent plural forms which start before the number.

 [myplural,_1,It is _1 book,These are _1 books].

Of course this "myplural" function does not exist.

***

Locale::TextDomain has named parameters, which are easier to translate because the translator can understand the meaning of the sentence in spite of the placeholders.

 {name} is a {locality} in {country}.

A text phrase containing several plural forms needs to be divided which makes it not automatically translatable.

Things you won't spot immediately

Number of plural forms

Locale Maketext:

 singular
 singular + plural
 singular + plural + zero

Locale::Textdomain:

 2 in the source language
 arbitrarily many in the target language

The header of each PO/MO file contains something called "Plural-Forms". This is a calculation formula, written in C except for one thing, "OR" is allowed in place of "||". Different versions are contained in different PO/MO files depending on language. Locale::Maketext ignores this entry.

German/English:

 "Plural-Forms: nplurals=2; plural=n != 1\n";

Russian:

 "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
 " ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

An example from the Russian language:

 0          books -> книг  (Plural 2)
 1          book  -> книга (Singular)
 2 .. 4     books -> книги (Plural 1)
 5 .. 20    books -> книг  (Plural 2)
 21         books -> книга (Singular)
 22 .. 30   books -> книг  (Plural 2)
 ...
 100        books -> книг  (Plural 2)
 101        books -> книга (Singular)
 102 .. 104 books -> книги (Plural 1)
 105 .. 120 books -> книг  (Plural 2)
 121        book  -> книга (Singular)
 122 .. 124 books -> книги (Plural 1)
 125 .. 130 books -> книг  (Plural 2)
 ...

There are also 3 plural forms in e.g. Czech, Lithuanian, Polish, Romanian, Slovak. There are 4 plural forms in eg. Slovenian and Celtic. So in the EU we can get by with 4 plural forms. Arabic has 6 has plural forms.

Because Locale::Maketext ignores "Plural-Forms" in PO/MO files, it can only support languages with 2 plural forms, i.e. singular and plural, like we are familiar with in German and English. There is a function "quant" which essentially corresponds to "quant2" (singular + 1st plural) assuming we ignore the zero form. It is quite possible to imagine functions "quant3" to "quant6" for Locale::Maketext. But then the programmer would need to already know which text phrases need 2, 3, 4, 5 or 6 plural forms. Because he does not know, he would have to always use "quant6". That's a whole lot of typing.

Position of words in a sentence in different languages

The positions of individual words can differ in different languages e.g. in one language it is

 I have 2 books.

and in another

 2 books I have.

If that is so, then with Locale::Maketext you have to write complete sentences as the plural forms. The English-native programmer cannot know that. The conflict is thus only discovered during translation.

If you want to avoid the conflict, you always write entire sentences.

But even that doesn't always work, because Locale::Maketext always expects "quant" to be followed by "_1" and then implicitly adds a space and then the text.

Yet what's needed is:

 [myplural,_1,It is _1 book.,These are _1 books.]

But that would make it nothing else than Locale::TextDomain.

Comma in plural forms, or the "join and never can split" trap

Due to the use of commas as separators, no commas may exist in enumerating texts.

Is there any simple quoting mechanism such as in Text::CSV? I know of none.

 I need 1 book, computer or notebook to do this.

Here's a dirty workaround using ";".

 I need [*_1,book; computer or notebook,books; computers or notebooks] to do this.

Value and unit may get wrapped

Due to string concatenation using spaces, line breaks may occur between value and unit.

Depending on line length you get

 I have
 1 book.

or

 I have 1
 book.

With Locale::TextDomain you can write:

 I have {num}\N{NO-BREAK SPACE}book.
 I have {num}\N{NO-BREAK SPACE}books.

Locale::Maketext has the space hardcoded.

Excerpt from a PO file for Locale::Maketext

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=2; plural=n != 1;\n"
 "..."

 msgid  "You can log out here."
 msgstr "Sie können sich hier abmelden."

 msgid  "He lives in %1, %2."
 msgstr "Er wohnt in %1, %2."

 msgid  "%quant(%1,person lives,people live) here."
 msgstr "%quant(%1,Mensch wohnt,Menschen wohnen) hier."

 # a bad workaround (no singular before placeholder)
 msgid  "This are %quant(%1,book,books)."
 msgstr "Das sind %quant(%1,Buch,Bücher)."

 msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
 msgstr "%quant(%1,Buch ist,Bücher sind) in %quant(%2,Regal,Regalen)."

extract from a PO file for Locale::TextDomain

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=2; plural=n != 1;\n"
 "..."

 msgid        "You can log out here."
 msgstr       "Sie können sich hier abmelden."

 msgid        "He lives in {town}, {address}."
 msgstr       "Er wohnt in {town}, {address}."

 msgid        "{num} person lives here."
 msgid_plural "{num} people live here."
 msgstr[0]    "{num} Mensch wohnt hier."
 msgstr[1]    "{num} Menschen wohnen hier."

 msgid        "It is {num} book."
 msgid_plural "These are {num} books."
 msgstr[0]    "Es ist {num} Buch."
 msgstr[1]    "Es sind {num} Bücher."

 msgid        "He has {num} house in {town}, {address}."
 msgid_plural "He has {num} houses in {town}, {address}."
 msgstr[0]    "Er hat {num} Haus in {town}, {address}."
 msgstr[1]    "Er hat {num} Häuser in {town}, {address}."

 msgid        "{num} book is"
 msgid_plural "{num} books are"
 msgstr[0]    "{num} Buch ist"
 msgstr[1]    "{num} Bücher sind"

 msgid        " in {num} shelf."
 msgid_plural " in {num} shelves.
 msgstr[0]    " in {num} Regal."
 msgstr[1]    " in {num} Regalen."

PO file for English/Russian translation

for Locale::Maketext

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
 " ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
 "..."

 msgid  "You can log out here."
 msgstr "Выход из системы."

 # The town name should be inflected here: 
 # Москва -> в Москве
 # Киев   -> в Киеве
 # Мытищи -> в Мытищах (nicht regulär)
 msgid  "He lives in %1, %2."
 msgstr "Он живет в %1, %2"

 # This is not correctly translatable.
 # The plural form for number 2 to 4 (человека живут) is not storable.
 msgid  "%quant(%1,person lives,people live) here."
 msgstr "%quant(%1,человек живет,человек живут) здесь."

 # This is not correctly translatable.
 # The plural form for number 2 to 4 (дома) is not storable.
 msgid  "He has %quant(%1,house,houses) in %2, %3."
 msgstr "У него %quant(%1,дом,домов) в %2, %3."

 # This is not correctly translatable.
 # The plural form for number 2 to 4 (книги) is not storable.
 msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
 msgstr "%quant(%1,книга,книг) на %quant(%1,полке,полках)."

for Locale::TextDomain

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
 " ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
 "..."

 msgid        "You can log out here."
 msgstr       "Выход из системы."

 # The town name should be inflected here: 
 # Москва -> в Москве
 # Киев   -> в Киеве
 # Мытищи -> в Мытищах (nicht regulär)
 msgid        "He lives in {town}, {address}."
 msgstr       "Он живет в {town}, {address}."

 msgid        "{num} person lives here."
 msgid_plural "{num} people live here."
 msgstr[0]    "{num} человек живет здесь."
 msgstr[1]    "{num} человека живут здесь."
 msgstr[2]    "{num} человек живут здесь."

 msgid        "It is {num} book."
 msgid_plural "These are {num} books."
 msgstr[0]    "Это {num} книга."
 msgstr[1]    "Это {num} книги."
 msgstr[2]    "Это {num} книг."

 msgid        "He has {num} house in {town}, {address}."
 msgid_plural "He has {num} houses in {town}, {address}."
 msgstr[0]    "У него {num} дом в {town}, {address}."
 msgstr[1]    "У него {num} дома в {town}, {address}."
 msgstr[2]    "У него {num} домов в {town}, {address}."

 # Translate this phrase together with the next one.
 msgid        "{num} book is"
 msgid_plural "{num} books are"
 msgstr[0]    "{num} книга"
 msgstr[1]    "{num} книги"
 msgstr[2]    "{num} книг"

 # Translate this phrase together with the previous one.
 msgid        " in {num} shelf."
 msgid_plural " in {num} shelves."
 msgstr[0]    " на {num} полке."
 msgstr[1]    " на {num} полках."
 msgstr[2]    " на {num} полках."

Inflecting "in {town}"

 Berlin    -> Берлин
 in Berlin -> в Берлине

If you want this, you need to also translate placeholder values and only then insert them.

That's doable, but it makes it impossible to automatically translate the phrase in which it is to be inserted. Moreover, that one is then also hard to translate manually because again to some extent the context is lost.

You can only tinker here.

neutral/masculine/feminine singular/plural

Inflection of nouns:

 masculine singular -> Arzt
 feminine  singular -> Ärztin
 masculine plural   -> Ärzte
 feminine  plural   -> Ärztinnen

Inflection of verbs:

 Mascha ist zur Schule gegangen. -> Маша пошла в школу.
 Petja ist zur Schule gegangen.  -> Петя пошёл в школу.

Context

 msgid   "design"
 msgstr  "Design"

 msgctxt "automobile"
 msgid   "design"
 msgstr  "Konstruktion"

 msgctxt "verb"
 msgid   "design"
 msgstr  "zeichnen"

Locale::Maketext::TPJ13 - Article by Sean M. Burke about software localization

He writes:

Since I wrote this article in 1998, I now see that the gettext docs are now trying more to come to terms with plurality. Whether useful conclusions have come from it is another question altogether. -- SMB, May 2001

[repeat, translated]

It is many years later now yet a jack of all trades still does not exist.

Software for translation agencies

In the current case known to me the translation agency uses the software "SDL Trados". Like other similar software it is based on a "translation memory". This works very well for static documents.

For the dynamism in software localization caused by plural and context, such a software seems less suited. It assumes a 1:1 relation in translations. Therefor one has to expect that the relatively small portion needing context or plural forms can not well be accomplished with aid from software.

In the current case the POT file had to be converted into XML and the target language had to be filled from the source language. This seemed like it would be part of a translation agency’s services.

Recommendation: Have a translation done with a smaller test file. The test should contain all the typical constructs. Repeat per-language, because subcontractors may be involved.

Bibliography

8 Comments

which mentions to some extent the fact that gettext has established workflows as well as supporting software (such as graphical tools) known in many software communities and even outside the open source world, whereas Maketext has… none of that, and is known only to Perl folk.

I followed the slides and got what it says (i think), and don't really understand why that leads to the conclusion of not using Maketext.

Locale::Maketext::Lexicon::Gettext perfectly provides the way you can use the gettext workflow, po/mo files and tools, including the GUI tools that supports translation memory, and make it work with perl software that uses Maketext.

We use maketext at work, on typepad.com for translations into Japanese and many european languages: it works really well.

Could you put your recommendation for what SHOULD be used in your before you start the translation

Additional functions can be defined with Locale-Maketext. Also, L-M is easily subclassable and extendable with a common interface. Which means I have a common space to do things like internationalization of numbers, dates and such.

Recently I had to choose between Locale::TextDomain and Locale::Maketext. I've choosen Locale::Maketext, and the reason was that it is not a problem to add plural support to Locale::Maketext, but it is a problem to localise web-application with Locale::TextDomain, as it is requires you to set locale, and that consequently means, that you will get messages in error.log in different languages.

About having to set locale: you can always use lower level Locale::Gettext rather than higher level Locale::TextDomain.

About being able to add plural text support: the complaing about Locale::Maketext (be it using gettext as lexicon or not) is that you have to 1) program plural form support 2) do it on wrong level: as programmer not as translator.

Locale::gettext still requires you to set locale, which makes it unsuitable for multiuser server applications. That's IMO most serious design flaw in gettext. As for plural support in Locale::Maketext::Lexicon, I extracting Plural-Forms header from .po file, just like gettext do it.

I don't agree that translation should be whole responsibility of translator. Translator should be supported by programmer in order to achieve suitable result. E.g. Russian plurals forms are not always fit into gettext formula, and with gettext you can do nothing about it. Every language has it's own special features which should be supported by localisation library or translation will not look naturally. Maketext provides you with ability to add language specific functions, gettext not.

Errr... I meant Locale::Messages not non-existent Locale::Gettext. There you have directly from gettext itself

dgettext $textdomain, $msgid;

with explicitely stated domain (translation).

As for plural support in Locale::Maketext::Lexicon, I extracting Plural-Forms header from .po file, just like gettext do it.

Did you made this code available somewhere?

I think there's some misunderstanding. My problem with gettext is follows, consider the code:


say __"<h1>Hello!</h1>";
open my $fh, "<", "/ets/passwd" or die $!;

I want it to say "Hello!" to English speaking user, "你好!" to Chinese, "Привет!" to Russian, but at the same time it should always write "No such file or directory" to error log, I don't want any language there, but English.

I posted code in question on perlmonks.

Leave a comment

About Aristotle

user-pic Waxing philosophical