• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item5437: UTF-8 fixes for TWiki 5.0 (was 4.2 but deferred)

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Engine I18N Normal Confirmed   n/a  

Edit Form Data

Summary:
Reported By:
Codebase:
Applies To:
Component:
Priority:
Current State:
Waiting For:
Target Release:
Released In:
 

Detail

TWiki:Main/IvanBaktsheev writes this in his blog at http://dot-and-thing.blogspot.com/2008/03/twiki-utf8.html :
two years ago i made (successfully) twiki 4 installation with utf-8 (one source modification required). but last week i couldn't make another utf-8 installation from twiki 4.2.

unsuccessies:

  1. changing configuration parameters is not enough for unicode twiki installation. in my case, for correct handling utf-8 data, received from user, three .pm files required changes: lib/TWiki/{Save,View,Edit}.pm

  1. for correct view within TinyMCE editor i had to make another change for correct handling javascript escaped unicode (from module URI::Escape::JavaScript)

[snip] twiki and similar systems should be configured to utf-8 as default, because it allows any people write in any language. [snip]

I appreciate his patch (get it from his blog; I could not upload it because it just hangs on upload); tarcked in Item5438. He also said some not so nice things about TWiki 4.2.

-- TWiki:Main/PeterThoeny - 13 Mar 2008

From TWiki:Codev/GeorgetownReleaseMeeting2008x03x17: This is of urgent nature but should not block 4.2.1 release. We need an owner of I18N.

-- TWiki:Main.PeterThoeny - 17 Mar 2008

I have attached the diff.

I wonder why the reporter chose to put this in a blog posting instead of here???

-- TWiki:Main.KennethLavrsen - 31 Mar 2008

See also http://twiki.org/cgi-bin/view/Codev/UseUTF8

I haven't been through Ivan's patches in detail, but on a cursory inspection they look correct. One thing to watch out for is the problem with the accept-charset in forms that Harald noted.

Confirmed.

Please note: contrary to the release meeting minutes I am not working on this; I have too much on my plate ATM. I was just trying to be helpful.

CC

Warning: based on my new understanding of encodings, the patch here is seriously incomplete. It does some of what is necessary, but not all.

-- TWiki:Main.CrawfordCurrie - 25 May 2008

A few weeks ago I upgraded our multi-lingual UTF-8 Cairo installation to 4.2.0. After tweaking some localisation settings I got it to display everything correctly, but unfortunately editing a page destroys all special characters. I've applied Ivan's patches. I needed to set the character set to "UTF-8", because any other setting breaks display on either Firefox or IE. Editing now now seems to preserve characters, but as Crawford implied it does not work in all cases. Raw edit breaks characters, and special characters in links are also not preserved in TinyMCE.

-- TWiki:Main.LevienVanZon - 10 Jun 2008

A correction on my previous comment, the old site was running Dakar, not Cairo. And while display of many UTF-8 characters now works in TinyMCE with Ivan's patch, saving a page still destroys special characters. So effectively, this leaves me with no way to edit pages with special characters (e.g. also anything in French, Spanish, Portuguese, etc.). I'm going to revert to the situation before the patch.

-- TWiki:Main.LevienVanZon - 17 Jun 2008

I've just spent a few hours in an attempt to further analyse the issue. This is what I found so far:

  • With the correct setting for locale (en_US.utf8) and character set (utf-8), TWiki 4.2.0 has no problems displaying and raw-editing UTF-8 topics. This is basically the same as with Dakar, except...
  • When editing a page using the WysiwygPlugin (i.e. either in TinyMCE or with the raw-edit link from TinyMCE), any multi-byte UTF-8 characters get converted to questionmark-icons.
  • If I force UTF-8 I/O by changing the top-line of the TWiki edit script to "#!/usr/bin/perl -wTCS", UTF-8 characters are correctly displayed in TinyMCE. However, on reloading, switching to raw edit from TinyMCE or saving, UTF-8 characters still get converted to questionmark-icons. Moreover, direct raw-edit (using ?nowysiwyg=on) displays special characters incorrectly (they seem to be double-encoded).
  • Forcing UTF-8 mode by adding the CS switches to view script also seems to give double-encoding problems. Using Ivan's patches instead (and twiddling with the locale and charset settings a bit) gives more or less the same results on viewing and editing.
  • So it seems that most code handles UTF-8 correctly (or at least leaves it untouched), except the TML<->HTML conversions in WysiwygPlugin. Furthermore, it seems that TML->HTML can be made to work by forcing UTF-8, but HTML->TML always seems to break UTF-8 in our case.
  • I am somewhat mystified as to why forcing UTF-8 leads to double-encoding problems though. Our hosting provider runs a (Debian-based?) Linux system with Perl 5.8.8. I've checked that the topic-datafiles are really valid UTF-8, and the locale used is present in the "locale -a" listing.
  • The urlEncode/urlDecode roundtrip (in TWiki.pm, but called from WysiwigPlugin) correctly escapes UTF-8 characters to %uXXXX, but does not restore them. (Ivan's patch for TWiki.pm already fixes this.)

I'm afraid I'm not much of a perl-wizard. So far I have been unable to locate in which step of the WysiwygPlugin conversion process the UTF-8 characters get clobbered.

-- TWiki:Main.LevienVanZon - 19 Jun 2008

Per TWiki:Codev.GeorgetownReleaseMeeting2008x07x21 and by recommendation from Richard Donkin the UTF8 requires more work than what 4.2.1 allows. Deferred to 5.0.

-- TWiki:Main.KennethLavrsen - 2008-07-22

I re-prioritized this from urgent to normal. Anyone with interest in I18N can pick this up and fix.

-- TWiki:Main.PeterThoeny - 2013-11-08

ItemTemplate
Summary UTF-8 fixes for TWiki 5.0 (was 4.2 but deferred)
ReportedBy TWiki:Main.PeterThoeny
Codebase 4.2.0
SVN Range TWiki-5.0.0, Sun, 09 Mar 2008, build 16496
AppliesTo Engine
Component I18N
Priority Normal
CurrentState Confirmed
WaitingFor

Checkins

TargetRelease n/a
ReleasedIn

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatdiff twikiutf8.diff r1 manage 3.5 K 2008-03-31 - 16:36 UnknownUser  
Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r12 - 2013-11-08 - PeterThoeny
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback