White space preservation in fixed font fragments (done by
sub _fixedFontText
in
Render.pm
) may clobber printable characters depending on the site charset.
The bug was spotted on UTF-8 character set with words ending by cyrillic small letter ha (х,
U0445
). The sequence of octets
D1 85 20
is changed to
D1 26 6E 62 73 70 3B 20
, i.e. "\xD1 \x20".
--
TWiki:Main/AlexanderSmishlajev
- 15 May 2007
I'm surprised this is happening at all given that UTF-8 is ASCII-safe. Note that you shouldn't be using UTF-8 with TWiki unless you are using a non-alphabetic character set (e.g. those for Chinese or Japanese), as Unicode is not supported for general use with TWIki. Use KOI8-R, it works much better than UTF-8 - see
TWiki:Codev.CyrillicSupport
for Russian usage.
Having looked at the code, it might be because CGI.pm is doing something odd. Q for developers: why are we using
CPAN:CGI
for things like applying bold markup? I know it's convenient but it seems like a performance overhead compared to simply putting in the relevant HTML with an
s///
statement.
For more details on why not to use Unicode mostly, see the documentation at
TWiki:Codev.InstallationWithI18N
.
--
TWiki:Main.RichardDonkin
- 22 Jun 2007
The bug is still present in TWiki release 4.3.0.
And there is a bug of the same origin in heading rendering (
sub _makeAnchorHeading
in
Render.pm
):
$text =~ s/^\s*(.*?)\s*$/$1/
cripples the text of section headings ending with cyrillic small letter ha encoded in UTF-8.
I work around both problems by calling
decode
/
encode
with
$TWiki::cfg{Site}{CharSet}
in
_fixedFontText
and
_makeAnchorHeading
.
--
TWiki:Main/AlexanderSmishlajev
- 05 May 2009