TWiki:Main/IvanBaktsheev
writes this in his blog at
http://dot-and-thing.blogspot.com/2008/03/twiki-utf8.html
:
two years ago i made (successfully) twiki 4 installation with utf-8 (one source modification required). but last week i couldn't make another utf-8 installation from twiki 4.2.
unsuccessies:
- changing configuration parameters is not enough for unicode twiki installation. in my case, for correct handling utf-8 data, received from user, three .pm files required changes: lib/TWiki/{Save,View,Edit}.pm
- for correct view within TinyMCE editor i had to make another change for correct handling javascript escaped unicode (from module URI::Escape::JavaScript)
[snip] twiki and similar systems should be configured to utf-8 as default, because it allows any people write in any language. [snip]
I appreciate his patch (get it from his blog; I could not upload it because it just hangs on upload); tarcked in
Item5438. He also said some not so nice things about TWiki 4.2.
--
TWiki:Main/PeterThoeny
- 13 Mar 2008
From
TWiki:Codev/GeorgetownReleaseMeeting2008x03x17
: This is of urgent nature but should not block 4.2.1 release. We need an owner of
I18N.
--
TWiki:Main.PeterThoeny
- 17 Mar 2008
I have attached the diff.
I wonder why the reporter chose to put this in a blog posting instead of here???
--
TWiki:Main.KennethLavrsen
- 31 Mar 2008
See also
http://twiki.org/cgi-bin/view/Codev/UseUTF8
I haven't been through Ivan's patches in detail, but on a cursory inspection they look correct. One thing to watch out for is the problem with the accept-charset in forms that Harald noted.
Confirmed.
Please note: contrary to the
release meeting minutes
I am
not working on this; I have too much on my plate ATM. I was just trying to be helpful.
CC
Warning: based on my new understanding of encodings, the patch here is seriously incomplete. It does some of what is necessary, but not all.
--
TWiki:Main.CrawfordCurrie
- 25 May 2008
A few weeks ago I upgraded our multi-lingual UTF-8 Cairo installation to 4.2.0. After tweaking some localisation settings I got it to display everything correctly, but unfortunately editing a page destroys all special characters. I've applied Ivan's patches. I needed to set the character set to "UTF-8", because any other setting breaks display on either Firefox or IE. Editing now now seems to preserve characters, but as Crawford implied it does not work in all cases. Raw edit breaks characters, and special characters in links are also not preserved in
TinyMCE.
--
TWiki:Main.LevienVanZon
- 10 Jun 2008
A correction on my previous comment, the old site was running Dakar, not Cairo. And while display of many UTF-8 characters now works in
TinyMCE with Ivan's patch, saving a page still destroys special characters. So effectively, this leaves me with no way to edit pages with special characters (e.g. also anything in French, Spanish, Portuguese, etc.). I'm going to revert to the situation before the patch.
--
TWiki:Main.LevienVanZon
- 17 Jun 2008
I've just spent a few hours in an attempt to further analyse the issue. This is what I found so far:
- With the correct setting for locale (en_US.utf8) and character set (utf-8), TWiki 4.2.0 has no problems displaying and raw-editing UTF-8 topics. This is basically the same as with Dakar, except...
- When editing a page using the WysiwygPlugin (i.e. either in TinyMCE or with the raw-edit link from TinyMCE), any multi-byte UTF-8 characters get converted to questionmark-icons.
- If I force UTF-8 I/O by changing the top-line of the TWiki edit script to "#!/usr/bin/perl -wTCS", UTF-8 characters are correctly displayed in TinyMCE. However, on reloading, switching to raw edit from TinyMCE or saving, UTF-8 characters still get converted to questionmark-icons. Moreover, direct raw-edit (using ?nowysiwyg=on) displays special characters incorrectly (they seem to be double-encoded).
- Forcing UTF-8 mode by adding the CS switches to view script also seems to give double-encoding problems. Using Ivan's patches instead (and twiddling with the locale and charset settings a bit) gives more or less the same results on viewing and editing.
- So it seems that most code handles UTF-8 correctly (or at least leaves it untouched), except the TML<->HTML conversions in WysiwygPlugin. Furthermore, it seems that TML->HTML can be made to work by forcing UTF-8, but HTML->TML always seems to break UTF-8 in our case.
- I am somewhat mystified as to why forcing UTF-8 leads to double-encoding problems though. Our hosting provider runs a (Debian-based?) Linux system with Perl 5.8.8. I've checked that the topic-datafiles are really valid UTF-8, and the locale used is present in the "locale -a" listing.
- The urlEncode/urlDecode roundtrip (in TWiki.pm, but called from WysiwigPlugin) correctly escapes UTF-8 characters to %uXXXX, but does not restore them. (Ivan's patch for TWiki.pm already fixes this.)
I'm afraid I'm not much of a perl-wizard. So far I have been unable to locate in which step of the
WysiwygPlugin conversion process the UTF-8 characters get clobbered.
--
TWiki:Main.LevienVanZon
- 19 Jun 2008
Per
TWiki:Codev.GeorgetownReleaseMeeting2008x07x21
and by recommendation from Richard Donkin the UTF8 requires more work than what 4.2.1 allows. Deferred to 5.0.
--
TWiki:Main.KennethLavrsen
- 2008-07-22
I re-prioritized this from urgent to normal. Anyone with interest in
I18N can pick this up and fix.
--
TWiki:Main.PeterThoeny
- 2013-11-08