Stop using Locale::Maketext

The following is a translation of a talk given by Steffen Winkler at German Perl Workshop 10, Internationalisierungs-Framework auswählen, in 2010. You can find my translation of the POD on Github. (In fact you are better off reading it there, because the CSS formatting on blogs.perl.org sucks, at least for now. If you’re seeing this in a feed reader, feel free to stay.)

Aside from Steffen’s talk, there is also Nikolai Prokoschenko’s rant On the state of i18n in Perl, which mentions to some extent the fact that gettext has established workflows as well as supporting software (such as graphical tools) known in many software communities and even outside the open source world, whereas Maketext has… none of that, and is known only to Perl folk.

Be smart. Don’t use Maketext.


Selecting an Internationalization Framework

Author

Steffen Winkler perl-ws@steffen-winkler.de

Bio

Since 1960, I exist.

I've been programming Perl since the end of 2000, first privately and then professionally.

Currently I am working at SIEMENS AG in Erlangen, primarily in the area of web programming.

I have been attending the German Perlworkshop since 2003.

Abstract

Why use Locale::TextDomain when so many frameworks on CPAN use Locale::Maketext?

Following my presentation on DBD::PO in Frankfurt/Main there was a lively discussion, both in Frankfurt and at Erlangen-PM.

There are 2 internationalization frameworks on CPAN, Locale::TextDomain (Perl interface to Uniform Message Translation) and Locale::Maketext (framework for localization).

What are the differences?

Where are the limits?

What I want to talk about today

From source to multilingual application in 2 ways.

No matter what internationalization framework from the CPAN you use you have to live with limitations. A good choice greatly reduces them.

It begins with the application's source code

print  'You can log out here.';
printf 'He lives in %s, %s.', $town, $address;
printf '%d people live here.', $people;
printf 'These are %d books.', $books;
printf 'He has %s houses in %s, %s.', $houses, $town, $address;
printf '%s books are in %s shelves.', $books, shelves;

PO files - what's that?

PO is an abbreviation for "portable object".

GNU gettext PO files can be used to make programs multilingual.

Along with the original text and its translation the file contains various comments and flags.

MO files are the binary version of PO files.

Rewriting to Locale::Maketext::Simple

Here we use the basic module Locale::Maketext and a module which reads gettext PO/MO files, namely Locale::Maketext::Lexicon::Gettext. Locale::Maketext::Simple exports the function "loc".

[_n] where n = 1, 2, ...

is the general notation for placeholders. Within [] a function name can be used as a prefix followed by its parameters, separated by ",". "quant", or "*", is the function name for plural processing.

print loc('You can log out here.');

print loc(
    'He lives in [_1], [_2].',
    $town,
    $address,
);

print loc(
    '[quant,_1,person lives,people live] here.',
    $people,
);

I have no idea how to write the following phrase with "quant". With "quant" you write something along the lines of value followed by unit. But here the plural form starts before the value. The problem is that "quant" requires the omission of "_1" in the plural forms and also the omission of the following space.

print loc(  
    '[myplural,_1,It is _1 book,These are _1 books].',
    # ????????    ^^^^^ ???     ^^^^^^^^^ ???
    $books, 
);

print loc(
    'He has [quant,_1,house,houses] in [_2], [_3].',
    $houses,
    $town,
    $address,
);

print loc(
    '[quant,_1,book is,books are] in [*,_2,shelf,shelves].',
    $books,
    $shelves,
);

Rewriting to Locale::TextDomain

Locale::TextDomain is part of the libintl-perl distribution. There are several exported functions. Function names follow a simple scheme.

x for a placeholder,
n for plural and
p for context.

The order of parameters, when present:

Context,
singular,
plural,
number for plural selection and
finally a hash with placeholder data.

Not all combinations of n, p and x are implemented. If you use x without a placeholder and adhere to the alphabetical order then _x, _nx, _px und _npx are left.

__('msgid')
__x(
    'msgid',
    name1 => $value1, name2 => $value2, ...
)
__n('msgid', 'msgid_plural', $count)
__nx(
    'msgid', 'msgid_plural', $count,
    name1 => $value1, name2 => $value2, ...
)
__xn(
    'msgid', 'msgid_plural',
    $count, name1 => $value1, name2 => $value2, ...
)
__p('context', 'msgid')
__px(
    'context', 'msgid',
    name1 => $value1, name2 => $value2, ...
)
__np('context', 'msgid', 'msgid_plural', $count)
__npx(
    'context', 'msgid', 'msgid_plural', $count,
    name1 => $value1, name2 => $value2, ...
)

print __('You can log out here.');

print __x(
    'He lives in {town}, {address}.',
    town    => $town,
    address => $address,
);

print __nx(
    '{num} person lives here.',
    '{num} people live here.',
    $people,
    num => $people,
);



print __nx(
    'It is {num} book.',
    'These are {num} books.',
    $books,
    num => $books,
);

print __nx(
    'He has {num} house in {town}, {address}.',
    'He has {num} houses in {town}, {address}.',
    $houses,
    num     => $houses,
    town    => $town,
    address => $address,
);

print
    __nx(
        '{num} book is',
        '{num} books are',
        $books,
        num => $books,
    ),
    __nx(
        ' in {num} shelf.',
        ' in {num} shelves.',
        $shelves,
        num => $shelves,
    );

What do you see at first glance?

Locale::Maketext has numbered parameters. If there are many, you may confuse them. All the translator knows is that something is included, but not what.

[_1] is a [_2] in [_3].

Locale::Maketext can handle multiple plural forms in a text phrase.

[quant,_1,book is,books are] in [*,_2,shelf,shelves].

The text in plural forms (quant) is not automatically translatable because it's contained in a kind of "or" block.

Within this "or" block does placeholders such as _1 are absent. There no plural forms can be represented which start before the number.

[myplural,_1,It is _1 book,These are _1 books].

Of course this "myplural" function does not exist.


Locale::TextDomain has named parameters, which are easier to translate because the translator can understand the meaning of the sentence in spite of the placeholders.

{name} is a {locality} in {country}.

A text phrase containing several plural forms needs to be divided which makes it not automatically translatable.

Things you won't spot immediately

Number of plural forms

Locale Maketext:

Singular
Singular + Plural
Singular + Plural + Zero

Locale::Textdomain:

2 in the source language
arbitrarily many in the target language

The header of each PO/MO file contains something called "Plural-Forms". This is a calculation formula, written in C except for one thing, "OR" is allowed in place of "||". Different versions are contained in different PO/MO files depending on language. Locale::Maketext ignores this entry.

German/English:

"Plural-Forms: nplurals=2; plural=n != 1\n";

Russian:

"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

An example from the Russian language:

0          books -> книг  (Plural 2)
1          book  -> книга (Singular)
2 .. 4     books -> книги (Plural 1)
5 .. 20    books -> книг  (Plural 2)
21         books -> книга (Singular)
22 .. 30   books -> книг  (Plural 2)
...
100        books -> книг  (Plural 2)
101        books -> книга (Singular)
102 .. 104 books -> книги (Plural 1)
105 .. 120 books -> книг  (Plural 2)
121        book  -> книга (Singular)
122 .. 124 books -> книги (Plural 1)
125 .. 130 books -> книг  (Plural 2)
...

3 plural forms also exist in e.g. Czech, Lithuanian, Polish, Romanian, Slovak. 4 plural forms exist in eg. Slovenian and Celtic. So in the EU we can get by with 4 plural forms. Arabic has 6 has plural forms.

Because Locale::Maketext ignores "Plural-Forms" in PO/MO files, it can only support languages with 2 plural forms, that is, singular and plural, such as we know from German and English. There is a function "quant" which in principle corresponds to "quant2" (singular + 1st plural) assuming we ignore the zero form. One could define functions "quant3" to "quant6" for Locale::Maketext But then the programmer would need to already know which text phrases need 2, 3, 4, 5 or 6 plural forms. Because he does not know, he would have to always use "quant6". That's a whole lot of typing.

Position of words in a sentence in different languages

The position of the individual words can differ in different languages e.g. in one language it is

I have 2 books.

and in another

2 books I have.

If that is so, then with Locale::Maketext you have to write complete sentences as the plural forms. The English-native programmer cannot know that. The conflict is thus only discovered during translation.

If you want to avoid the conflict, you always write entire sentences.

But even that doesn't always work, because Locale::Maketext always expects "quant" to be followed by "_1" and then implicitly adds a space and then the text.

Yet what's needed is:

[myplural,_1,It is _1 book.,These are _1 books.]

But then that's nothing else than Locale::TextDomain.

Comma in plural forms, or the "join and can never split" trap

Due to the use of commas as separators no commas may exist in enumerating texts.

Is there any simple quoting mechanism as in Text::CSV? I know of none.

I need 1 book, computer or notebook to do this.

Here's a dirty workaround using ";".

I need [*_1,book; computer or notebook,books; computers or notebooks] to do this.

Value and unit may be wrapped

Due to string concatenation using spaces line breaks may occur between value and unit.

Depending on line length you get

I have
1 book.

or

I have 1
book.

With Locale::TextDomain you can write:

I have {num}\N{NO-BREAK SPACE}book.
I have {num}\N{NO-BREAK SPACE}books.

In Locale::Maketext the space is hardwrite in the module code.

Excerpt from a PO file for Locale::Maketext

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."

msgid  "You can log out here."
msgstr "Sie können sich hier abmelden."

msgid  "He lives in %1, %2."
msgstr "Er wohnt in %1, %2."

msgid  "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,Mensch wohnt,Menschen wohnen) hier."


# a bad workaround (no singular before placeholder)
msgid  "This are %quant(%1,book,books)."
msgstr "Das sind %quant(%1,Buch,Bücher)."

msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,Buch ist,Bücher sind) in %quant(%2,Regal,Regalen)."

extract from the PO file for Locale::TextDomain

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."

msgid        "You can log out here."
msgstr       "Sie können sich hier abmelden."

msgid        "He lives in {town}, {address}."
msgstr       "Er wohnt in {town}, {address}."

msgid        "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0]    "{num} Mensch wohnt hier."
msgstr[1]    "{num} Menschen wohnen hier."

msgid        "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0]    "Es ist {num} Buch."
msgstr[1]    "Es sind {num} Bücher."

msgid        "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0]    "Er hat {num} Haus in {town}, {address}."
msgstr[1]    "Er hat {num} Häuser in {town}, {address}."

msgid        "{num} book is"
msgid_plural "{num} books are"
msgstr[0]    "{num} Buch ist"
msgstr[1]    "{num} Bücher sind"

msgid        " in {num} shelf."
msgid_plural " in {num} shelves.
msgstr[0]    " in {num} Regal."
msgstr[1]    " in {num} Regalen."

PO file for English/Russian translation

for Locale::Maketext

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."

msgid  "You can log out here."
msgstr "Выход из системы."

# Hier wäre Beugung des Stadtnamens notwendig: 
# Москва -> в Москве
# Киев   -> в Киеве
# Мытищи -> в Мытищах (nicht regulär)
msgid  "He lives in %1, %2."
msgstr "Он живет в %1, %2"

# This is not correctly translatable.
# The plural form for number 2 to 4 (человека живут) is not storable.
msgid  "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,человек живет,человек живут) здесь."

# This is not correctly translatable.
# The plural form for number 2 to 4 (дома) is not storable.
msgid  "He has %quant(%1,house,houses) in %2, %3."
msgstr "У него %quant(%1,дом,домов) в %2, %3."


# This is not correctly translatable.
# The plural form for number 2 to 4 (книги) is not storable.
msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,книга,книг) на %quant(%1,полке,полках)."

for Locale::TextDomain

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."

msgid        "You can log out here."
msgstr       "Выход из системы."

# Hier wäre Beugung des Stadtnamens notwendig: 
# Москва -> в Москве
# Киев   -> в Киеве
# Мытищи -> в Мытищах (nicht regulär)
msgid        "He lives in {town}, {address}."
msgstr       "Он живет в {town}, {address}."

msgid        "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0]    "{num} человек живет здесь."
msgstr[1]    "{num} человека живут здесь."
msgstr[2]    "{num} человек живут здесь."

msgid        "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0]    "Это {num} книга."
msgstr[1]    "Это {num} книги."
msgstr[2]    "Это {num} книг."

msgid        "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0]    "У него {num} дом в {town}, {address}."
msgstr[1]    "У него {num} дома в {town}, {address}."
msgstr[2]    "У него {num} домов в {town}, {address}."

# Translate this phrase together with the next one.
msgid        "{num} book is"
msgid_plural "{num} books are"
msgstr[0]    "{num} книга"
msgstr[1]    "{num} книги"
msgstr[2]    "{num} книг"

# Translate this phrase together with the previous one.
msgid        " in {num} shelf."
msgid_plural " in {num} shelves."
msgstr[0]    " на {num} полке."
msgstr[1]    " на {num} полках."
msgstr[2]    " на {num} полках."

Inflecting "in {town}"

Berlin    -> Берлин
in Berlin -> в Берлине

If you want this, you need to also translate placeholder values and only them insert them.

That's doable, but it makes it impossible to automatically translate the phrase in which it is to be inserted. Moreover, that one is then also hard to translate manually because again to some extent the context is lost.

You can only tinker.

neutral / masculin / feminine singular / plural

Inflection of nouns:

maskulin singular -> Arzt
feminin singular  -> Ärztin
maskulin plural   -> Ärzte
feminin plural    -> Ärztinnen

Inflection of verbs:

Mascha ist zur Schule gegangen. -> Маша пошла в школу.
Petja ist zur Schule gegangen.  -> Петя пошёл в школу.

Context

msgid   "design"
msgstr  "Design"

msgctxt "automobile"
msgid   "design"
msgstr  "Konstruktion"

msgctxt "verb"
msgid   "design"
msgstr  "zeichnen"

Locale::Maketext::TPJ13 - Article by Sean M. Burke about software localization

He writes:

Since I wrote this article in 1998, I now see that the gettext docs are now trying more to come to terms with plurality. Whether useful conclusions have come from it is another question altogether. --- SMB, May 2001

It is many years later now and a jack of all trades still does not exist.

Software for translation agencies

In the current case known to me the translation agency uses the software "SDL Trados". Like other similar software it is based on a "translation memory". This works very well for static documents.

For dynamism, which exists in software localization due to plural and context, such a software seems less suited. It assumes a 1:1 relation in translations. Therefor one has to expect that the relatively small portion needing context or plural forms can not well be accomplished with aid from software.

In the current case the POT file had to be converted into XML and the target language had to be filled from the source language. This kind of work would normally be expected from the translation agency.

Recommendation: Have a translation done with a smaller test file. This should contain all the typical constructs. Do this per language, because subcontractors may be involved.

Bibliography

  • GNU gettext

wikipedia http://en.wikipedia.org/wiki/Gettext

gettext homepage http://www.gnu.org/software/gettext/gettext.html

  • Singular, Plural, Dual, Trial, Quadral

wikipedia - dual http://en.wikipedia.org/wiki/Dual_%28grammatical_number%29

wikipedia - all forms http://en.wikipedia.org/wiki/Sursurunga_language

sourceforge - which language - which plural form http://translate.sourceforge.net/wiki/l10n/pluralforms

  • CPAN module Locale::Maketext

CPAN http://search.cpan.org/dist/Locale-Maketext/

  • CPAN module Locale::Maketext::Simple

CPAN http://search.cpan.org/dist/Locale-Maketext-Simple/

  • obsolete article by Sean M. Burke about software localization

CPAN http://search.cpan.org/perldoc?Locale::Maketext::TPJ13

  • CPAN module Locale::TextDomain

CPAN http://search.cpan.org/dist/libintl-perl/

  • Thanks for the support, the many ideas, examples and corrections.

Nikolai Prokoschenko http://rassie.org/

Nikolai Prokoschenko - On the state of i18n in Perl http://rassie.org/archives/247

8 Comments

which mentions to some extent the fact that gettext has established workflows as well as supporting software (such as graphical tools) known in many software communities and even outside the open source world, whereas Maketext has… none of that, and is known only to Perl folk.

I followed the slides and got what it says (i think), and don't really understand why that leads to the conclusion of not using Maketext.

Locale::Maketext::Lexicon::Gettext perfectly provides the way you can use the gettext workflow, po/mo files and tools, including the GUI tools that supports translation memory, and make it work with perl software that uses Maketext.

We use maketext at work, on typepad.com for translations into Japanese and many european languages: it works really well.

Could you put your recommendation for what SHOULD be used in your before you start the translation

Additional functions can be defined with Locale-Maketext. Also, L-M is easily subclassable and extendable with a common interface. Which means I have a common space to do things like internationalization of numbers, dates and such.

Recently I had to choose between Locale::TextDomain and Locale::Maketext. I've choosen Locale::Maketext, and the reason was that it is not a problem to add plural support to Locale::Maketext, but it is a problem to localise web-application with Locale::TextDomain, as it is requires you to set locale, and that consequently means, that you will get messages in error.log in different languages.

About having to set locale: you can always use lower level Locale::Gettext rather than higher level Locale::TextDomain.

About being able to add plural text support: the complaing about Locale::Maketext (be it using gettext as lexicon or not) is that you have to 1) program plural form support 2) do it on wrong level: as programmer not as translator.

Locale::gettext still requires you to set locale, which makes it unsuitable for multiuser server applications. That's IMO most serious design flaw in gettext. As for plural support in Locale::Maketext::Lexicon, I extracting Plural-Forms header from .po file, just like gettext do it.

I don't agree that translation should be whole responsibility of translator. Translator should be supported by programmer in order to achieve suitable result. E.g. Russian plurals forms are not always fit into gettext formula, and with gettext you can do nothing about it. Every language has it's own special features which should be supported by localisation library or translation will not look naturally. Maketext provides you with ability to add language specific functions, gettext not.

Errr... I meant Locale::Messages not non-existent Locale::Gettext. There you have directly from gettext itself

dgettext $textdomain, $msgid;

with explicitely stated domain (translation).

As for plural support in Locale::Maketext::Lexicon, I extracting Plural-Forms header from .po file, just like gettext do it.

Did you made this code available somewhere?

I think there's some misunderstanding. My problem with gettext is follows, consider the code:


say __"<h1>Hello!</h1>";
open my $fh, "<", "/ets/passwd" or die $!;

I want it to say "Hello!" to English speaking user, "你好!" to Chinese, "Привет!" to Russian, but at the same time it should always write "No such file or directory" to error log, I don't want any language there, but English.

I posted code in question on perlmonks.

Leave a comment

About Aristotle

user-pic Waxing philosophical