• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item4070: UTF-8 character broken in fixed font text

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Engine I18N Normal New   n/a  

Edit Form Data

Summary:
Reported By:
Codebase:
Applies To:
Component:
Priority:
Current State:
Waiting For:
Target Release:
Released In:
 

Detail

White space preservation in fixed font fragments (done by sub _fixedFontText in Render.pm) may clobber printable characters depending on the site charset.

The bug was spotted on UTF-8 character set with words ending by cyrillic small letter ha (х, U0445). The sequence of octets D1 85 20 is changed to D1 26 6E 62 73 70 3B 20, i.e. "\xD1 \x20".

-- TWiki:Main/AlexanderSmishlajev - 15 May 2007

I'm surprised this is happening at all given that UTF-8 is ASCII-safe. Note that you shouldn't be using UTF-8 with TWiki unless you are using a non-alphabetic character set (e.g. those for Chinese or Japanese), as Unicode is not supported for general use with TWIki. Use KOI8-R, it works much better than UTF-8 - see TWiki:Codev.CyrillicSupport for Russian usage.

Having looked at the code, it might be because CGI.pm is doing something odd. Q for developers: why are we using CPAN:CGI for things like applying bold markup? I know it's convenient but it seems like a performance overhead compared to simply putting in the relevant HTML with an s/// statement.

For more details on why not to use Unicode mostly, see the documentation at TWiki:Codev.InstallationWithI18N.

-- TWiki:Main.RichardDonkin - 22 Jun 2007

The bug is still present in TWiki release 4.3.0.

And there is a bug of the same origin in heading rendering ( sub _makeAnchorHeading in Render.pm): $text =~ s/^\s*(.*?)\s*$/$1/ cripples the text of section headings ending with cyrillic small letter ha encoded in UTF-8.

I work around both problems by calling decode / encode with $TWiki::cfg{Site}{CharSet} in _fixedFontText and _makeAnchorHeading.

-- TWiki:Main/AlexanderSmishlajev - 05 May 2009

ItemTemplate
Summary UTF-8 character broken in fixed font text
ReportedBy TWiki:Main.AlexanderSmishlajev
Codebase 4.0.5, 4.1.2, 4.3.0
SVN Range TWiki-4.1.2, Sun, 13 May 2007, build 13714
AppliesTo Engine
Component I18N
Priority Normal
CurrentState New
WaitingFor

Checkins

TargetRelease n/a
ReleasedIn

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r5 - 2009-05-05 - AlexanderSmishlajev
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback