Unicode in Python and i18n in Fedora
By Johan on Tuesday, August 10 2010, 13:45 - Now geeking ... - Permalink
When I was playing with the ports4dotclear system (see the previous post about it), I had to fix a number of Unicode handling errors in both Python and Perl. For those of you who do not know what is Unicode, it's the technology which allows me to write this kind of stuff: «æ€¶ŧ←»® or «你好世界».
I was forced to use Unicode because, well, XML-RPC communications are in XML and thus require the content to be encoded in UTF-8 (a Unicode encoding/decoding charset).
All this mess leads me to this conclusion : Unicode sucks in Python 2.x ! (and Fedora i18n has still some strange artifacts)
I. Python
There is a well known unicode bug in Python 2.x. It's really a shame it
didn't got fix earlier, although I may have read the new python 3k fixed
it. Anyway, this bug pops when you play with POSIX pipes and wide
characters. It's very annoying because you don't understand why a simple
shell redirection (on stdout) should change the python internal
sys.stdout.encoding, letting a simple | cat
break your script :
bash@localhost:~ $ python unicode.py preferred encoding = UTF-8 but ... sys.stdout.encoding = UTF-8 therefore 'print u'unicode string' results in ... *** Python, çuckß ! *** bash@localhost:~ $ python unicode.py | cat preferred encoding = UTF-8 but ... sys.stdout.encoding = None therefore 'print u'unicode string' results in ... Traceback (most recent call last): File "unicode.py", line 12, in <module> print u" *** Python, çuckß ! *** " UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 13: ordinal not in range(128)
Here comes the source of the previous script, so you can enjoy it at home:
1 #!/usr/bin/python 2 # -*- coding: utf-8 -*- 3 4 import sys 5 import locale 6 7 print "preferred encoding = {0}".format(locale.getpreferredencoding()) 8 print "but ..." 9 print "sys.stdout.encoding = {0}".format(sys.stdout.encoding) 10 print "therefore 'print u'unicode string' results in ..." 11 12 print u" *** Python, çuckß *** "
II. Fedora
What's wrong with Fedora internationalization feature ? Well, nothing important and I am just over-reacting. Just look at this:
bash@localhost:~ $ LANG="en_GB.UTF-8" yum info icu Loaded plugins: auto-update-debuginfo, changelog, presto, refresh-packagekit Available Packages Name : icu Arch : i686 Version : 4.2.1 Release : 8.fc13 Size : 164 k Repo : fedora Summary : International Components for Unicode URL : http://www.icu-project.org/ License : MIT Description: Tools and utilities for developing with icu. bash@localhost:~ $ LANG="de_DE.UTF-8" yum info icu Geladene Plugins: auto-update-debuginfo, changelog, presto, refresh-packagekit Verfügbare Pakete Name : icu Architektur : i686 Version : 4.2.1 Ausgabe : 8.fc13 Grösse : 164 k Repo : fedora Zusammenfassung : International Components for Unicode URL : http://www.icu-project.org/ License : MIT Beschreibung:Tools and utilities for developing with icu. bash@localhost:~ $ LANG="fr_FR.UTF-8" yum info icu Modules complémentaires chargés : auto-update-debuginfo, changelog, presto, : refresh-packagekit Paquets disponibles Nom : icu Architecture : i686 Version : 4.2.1 Révision : 8.fc13 Taille : 164 k Dépôt : fedora Résumé : International Components for Unicode URL : http://www.icu-project.org/ License : MIT Description :Tools and utilities for developing with icu.
Have you notice the text horizontal alignment ? String output is
perfectly aligned for the English locale whereas in the French or
(especially) the German ones, key : value
lines are mis-formatted.
OK, that's not a killer flaw and I should probably not complain since Fedora provides all theses localisations for free (as in speech). But it will always bother me to see such obvious basic dumb mistakes in the user interface.
And be assure that yum is not the only localised-buggy piece of software in Fedora, I've recently come across a similar problem in the German version of the KDE notification program (German sentences tend to be way to large and therefore the text overflows the graphical window edge).
That's a good thing about F/OSS software. You may not be able to play the last video games on those systems, but you will always find enough bugs to fills your day reporting or fixing them.