What are your environment settings for Unicode?

How have you set up your environment to work with Unicode? I want to make a cheat sheet for the Perl newbies. There is some information in the googlesphere but it's disperse and unfocused.

I'm working Unicode into the next edition of Learning Perl. The most frustrating part of this for newbies (Perl, Unicode, or otherwise) is getting all the pieces to cooperate. Even if you get it right inside your Perl program, your terminal might not handle Unicode. If your terminal handles it, you might not have the right fonts. And so on and so on.

I know what I've set up for my system and the programs I use, but that's just for the tools I use. What did you have to

  • What environment variables did you set, and with what values? (LESSCHARSET, LANG, LC, LC_ALL)
  • Where did you have to set the environment? For instance, the shells have their special files, but there are also files like ~/MacOSX/environment.plist or the Windows registry.
  • What did you set in your editor preferences?
  • What did you set in your terminal program (besides the shell settings)?
  • What fonts do you use? Did they magically show up or did you have to hunt them down? Does that font do something special with your language?

I'm especially interested in settings for languages besides the european ones (anyone have bidi settings?).

If you have a favorite resource, tell me about that too. :)

9 Comments

I don't remember setting up anything on Ubuntu 10.10 (and a lot of previous releases). It works out of the box:

claudio@adelaide:/etc$ env |grep ^L[CAE]
LANG=en_US.utf8
LESSOPEN=| /usr/bin/lesspipe %s
LESSCLOSE=/usr/bin/lesspipe %s %s

Hi brian


Are you going to publish a program to test these settings, or do you know of one?

Cheers
Ron

I'm managing this kind of configuration with my zsh settings. I've got them in a seperate git repository and each system has its own branch.


My .zshrc includes .zsh/env on every system. There I've put all the LC_* stuff. Why? - I've got loads of different systems and I don't want to deal with their unique way to set these things.


For Mac OS X I've also did two things:
defaults write ~/.MacOSX/environment LC_ALL de_DE.utf-8
defaults write ~/.MacOSX/environment PATH ...
Why? - Because applications started from Finder don't inherit your shell's settings - now they do.

FWIW I wrote http://perlgeek.de/en/article/set-up-a-clean-utf8-environment a while ago, but it probably needs to consider many more aspects and systems.

Hi Brian,
are you going to cover the Win32 console and Unicode issues on Windows as well?

Markus Kuhn has an excellent resource on UTF-8 On POSIX-like systems:

http://www.cl.cam.ac.uk/~mgk25/unicode.html

Included in that demo is

http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt

Which is a handy file to have around to test if your terminal is configured for UTF-8, and/or what fonts you are missing. Simply 'cat UTF-8-demo.txt' and you'll see quickly if your setup and good to go.

hi Brian,
can you please confirm that you have received an email from me with a pdf attached ? thanks

Leave a comment

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).