• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

WikiWords with international chars are not autolinked, even though configuration file is updated and there are no CGI warnings in configure.

A danish WikiWord would look like this: BlåBærGrød.

LocalSite.cfg is updated with these values:

$TWiki::cfg{UseLocale} = 1;
$TWiki::cfg{Site}{Locale} = 'da_DK.ISO-8859-15';
$TWiki::cfg{Site}{CharSet} = 'iso-8859-15';

The configure script is not complaining about missing perl libraries.

I can change into danish language in the userinterface without problems, just not use danish WikiWords.

-- TWiki:Main.SteffenPoulsen - 26-Oct-2005

Try to set the (Lower|Upper)Nationals. -- OK

Setting the regional characters explicitly works as expected. But it shouldn't be nescessary, and it has sideeffects as well (i.e. sorting). Cairo works OK in the same environment; locale settings should be alright. -- SP


I personally never could make Locale regexes to work, and have always used (Lower|Upper)Nationals. I'm trying to investigate what's wrong it Locale regexes.

AT

Locale regexes should always work if you have a reasonable Unix/Linux platform with working locales, so this is a significant regression. The upper/lower national workaround is only intended for Windows and the rare Unix/Linux that has a specific locale that's broken.

More information about the platform would be useful - output of configure in HTML format as an attachment would be good (assuming configure can do that).

Did this work on the same server with Cairo?

RD

Can't attach at the moment, it seems (it gives me the "attempted hack" error) - but yes, Cairo works perfectly with I18N settings at the same server / parallel. The test installation is not mod-perl'ed or in any way non-default. Running apache2, perl v5.8.4 .. puzzles me. Versions from configure:

Required Perl modules
Error0.15
File::Copy2.07
File::Spec0.87
CGI3.04
CGI::Carp1.27
Algorithm::Diff1.1901
FileHandle2.01
Optional Perl Modules
CGI::Cookie1.24
Digest::SHA12.10
Text::Diff0.35
CGI::Session3.95
Net::SMTP2.26
MIME::Base643.04
POSIX1.08
Digest::MD52.33
Locale::Maketext::Lexicon0.49
Encode1.99_01
@INC library path/home/httpd/twikis/twiki-svn-develop-default/lib/CPAN/lib//arch/
/home/httpd/twikis/twiki-svn-develop-default/lib/CPAN/lib//5.8.4/i386-linux-thread-multi/
/home/httpd/twikis/twiki-svn-develop-default/lib/CPAN/lib//5.8.4/
/home/httpd/twikis/twiki-svn-develop-default/lib/CPAN/lib//
/home/httpd/twikis/twiki-svn-develop-default/lib
/etc/perl
/usr/local/lib/perl/5.8.4
/usr/local/share/perl/5.8.4
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.8
/usr/share/perl/5.8
/usr/local/lib/site_perl
.

This is the Perl library path, used to load TWiki modules, third-party modules used by some plugins, and Perl built-in modules.

SP


I could never make this work, even in Cairo. By looking at the code, I really couldn't find out yet why this is not working. pt_BR locale isn't broken, because I've tested it with a simpler perl script copying TWiki regexes and testing random text for WikiWords, and it does work.

I suspect that this is caused by some small gotcha in TWiki code. I'm still trying.

AT

Weird. This change makes the thing work:

=== TWiki.pm
==================================================================
--- TWiki.pm    (revision 11145)
+++ TWiki.pm    (local)
@@ -346,10 +346,10 @@
         $regex{mixedAlpha} = $regex{upperAlpha}.$regex{lowerAlpha};
     } else {
         # Perl 5.006 or higher with working locales
-        $regex{upperAlpha} = '[:upper:]';
-        $regex{lowerAlpha} = '[:lower:]';
-        $regex{numeric}    = '[:digit:]';
-        $regex{mixedAlpha} = '[:alpha:]';
+        $regex{upperAlpha} = '\p{IsUpper}';
+        $regex{lowerAlpha} = '\p{IsLower}';
+        $regex{numeric}    = '\p{IsDigit}';
+        $regex{mixedAlpha} = '\p{IsAlpha}';
     }
     $regex{mixedAlphaNum} = $regex{mixedAlpha}.$regex{numeric};
     $regex{lowerAlphaNum} = $regex{lowerAlpha}.$regex{numeric};

For some reason, it looks like POSIX character classes aren't working. (??)

AT

This update doesn't change anything at my installation, still no I18N-links - did you change anything else?

SP

Steffen, my system does not have an available da_DK.iso-8858-15 locale. Does yours?

AT

Yes, but I have tried several variations over the theme, da_DK, da_DK.iso88591, da_DK.iso885915. Setlocale() throws an error if chosen locale is not available:

"Warning: Unable to set locale to 'NA_NA'. The actual locale is 'C' - please test your locale settings. This warning can be ignored if you are not planning to use locales (e.g. your site uses English only) - or you can set {Site}{Locale} to C, which should always work."

SP

Well, currently the change above is the only difference between my local repository and DEVELOP, and it is working.

If that warning is being shown, the desired locale is not available, hence none of the localisation features (including international WikiWords) will work.

Steffen, could you inform some details:

  • include the output of a `locale -a` on your system
  • include the output of a `LANG=da_DK.iso885915 perl -e '1'` on your system
  • report perl version and stuff

AT

I've just realised that on Perl 5.6 the \p{} constructs require use utf8, so this isn't a good sollution, anyway. Needs more investigation.

AT

Perl is 5.8.4 as reported above - now with details smile

user@vmware-twiki:~$ perl -V
Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
  Platform:
    osname=linux, osvers=2.4.27-ti1211, archname=i386-linux-thread-multi
    uname='linux kosh 2.4.27-ti1211 #1 sun sep 19 18:17:45 est 2004 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.4 -Dsitearch=/usr/local/lib/perl/5.8.4 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.4 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='3.3.5 (Debian 1:3.3.5-9)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so.5.8.4
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'


Characteristics of this binary (from libperl):
  Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
  Built under linux
  Compiled at Mar  8 2005 19:51:48
  @INC:
    /etc/perl
    /usr/local/lib/perl/5.8.4
    /usr/local/share/perl/5.8.4
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    .
user@vmware-twiki:~$

user@vmware-twiki:~$ locale -a
C
da_DK
da_DK.iso88591
da_DK.iso885915
da_DK.utf8
danish
dansk
en_DK
en_DK.iso88591
en_DK.utf8
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8
POSIX
user@vmware-twiki:~$

user@vmware-twiki:~$ LANG=da_DK.iso885915 perl -e '1' (no problems)
user@vmware-twiki:~$ LANG=NA_NA.iso885915 perl -e '1' (e.g. NA_NA.iso885915 has problems, as bin/configure also reports)
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US:en_GB:en",
        LC_ALL = (unset),
        LANG = "NA_NA.iso885915"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

Hope this helps.

SP

Updating priority to Requirement, Dakar shouldn't be released without this issue being fixed, IMHO.

SP

I didn't forget this issue, I'm investigating. smile

AT

Haha, no problem, just realized that items needed to be urgent or higher to block a release, had to do something! smile I'm still rather clueless on the issue, though ..

SP

Please see TWiki:Codev/UsingPerlLocalesTheRightWay. I'll wait some time for feedback and apply that change.

AT

This code did work fine in the Beijing and Cairo releases, and was evolved over some time, so we need to figure out exactly why it doesn't work now in Dakar before making significant changes. It could be that other changes went in to Dakar that have broken things.

(Digression) It's worth noting that the locale code needs re-working anyway to cover two cases when we do Unicode, though that's not in scope for Dakar:

  1. Unicode - do a dynamic use open to set utf8 mode on all data read and written (must also cover ModPerl which doesn't use file descriptors to pass data to TWiki scripts, unlike CGI. This code path must never do a use locale or equivalent because mixing Unicode and locales breaks things quite comprehensively (a Perl bug-fest, I tried this...)
  2. Non-Unicode - should function as now (assuming this is just a bug)

The hard part is that the switch between (1) and (2) must be dynamic, based on a TWiki.cfg setting. It should NOT be based purely on locale matching /\.utf-?8$/, because some people may validly want to run with a UTF-8 locale and browser character set, but without Unicode mode.

Also, please don't do use utf8 to implement Unicode - it has an entirely different meaning between Perl 5.6 (where it means 'assume all data processed is UTF-8') and 5.8 (where it means 'variable names, literals, etc in this file can be UTF-8').

UPDATE: I have proposed a much simpler solution (by fixing regression from Cairo code) over on TWiki:Codev/UsingPerlLocalesTheRightWay.

RD

Fixed in SVN 7571, according to Richard's tip on TWiki:Codev/UsingPerlLocalesTheRightWay.

Steffen, could you please close this after testing the fix yourself ? (well, your danish WikiWord on the top is working wink )

AT

i'd also like to note that develop.twiki.org has consistently resisted accented characters, so congratulations!

WN

You guys did it again, now I can have BlåBærGrød pages all overthe place in Dakar, too (btw, it's a danish name for a fictious gel-like blueberry dish) - works like a charm! smile Any possibility of an (automatic?) testcase for this, to keep the feature in scope? - I presume the testcases are probably not running in I18N-mode as is, so it's probably a long shot .. but I have to ask anyway smile

SP

ItemTemplate
Summary International WikiWords are not auto-linked - ANTONIO/STEFFEN
ReportedBy SteffenPoulsen
AppliesTo Engine
Component

Priority Urgent
CurrentState Closed
WaitingFor

Checkins 7571
Edit | Attach | Watch | Print version | History: r24 < r23 < r22 < r21 < r20 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r24 - 2005-11-21 - SteffenPoulsen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback