• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item5485: Attachments containing special characters (e.g. german umlauts: ä ö ü) can not be opened any more

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Engine I18N Normal Confirmed TWiki:Main.MarkusUeberall n/a  

Edit Form Data

Summary:
Reported By:
Codebase:
Applies To:
Component:
Priority:
Current State:
Waiting For:
Target Release:
Released In:
 

Detail

I migrated a TWiki installation from version 4.0.4 to version 4.2 running in a windows/cygwin environment.

Any attachment with special characters in the name (e.g. ä, ö, Ì) can not be opened any more. The response is "Access denied", "error 403".

All attachments not containing those characters can be opened without problems as before.

Version 4.0.4 had no problems in storing and opening attachments containing these special characters.

-- TWiki:Main/MichaelSchmidt - 31 Mar 2008

It took a bit of hacking to create an attachment with those characters in the name, because TWiki filters those characters from attachment names by default. But once I did, I found it opened just fine, with no 403.

You need to provide more detail; a step-by-step description of how you arrived at the error.

-- CrawfordCurrie - 31 Mar 2008

  • Sample attachment without umlaut:
    FehlerGlblRemote3.jpg

  • Sample attachment with an umlaut:
    FehlerGlblRem�¶te3.jpg

On my Twiki 4.2.0 installation the error can easily be reproduced, simply by attaching a file containing a special character in its name like in the attached Jpeg Files. When I try to open the attachment with the umlaut, the error appears.
The version of this Bugs web handles this file without problems (but it runs version 5.0.0).

Ok, you can say "don't store attachments with umlauts". However my german users have already stored quite a lot of word documents containing these umlauts as attachments when we used version 4.0.4 and they want to open some of them again.

-- TWiki:Main.MichaelSchmidt - 31 Mar 2008

I would never say that. But AFAIK there have been no changes to the handling code since 4.2 was released (you may know better). What's more, opening an attachment (by clicking on it) normally has nothing to do with TWiki; it's opening a URL. The "Access denied" is coming from your Apache server, I suspect.

-- TWiki:Main.CrawfordCurrie - 31 Mar 2008

Michael, do you use the same configuration with both versions (4.2 and 404)? {UseLocale}, {Site}{Locale}, {Site}{CharSet}, {Upper/LowerNational}. Maybe not all of them affect the problem, but its worth a look. smile

-- TWiki:Main.OliverKrueger - 31 Mar 2008

Yes, I have thoroughly checked this and it is the same configuration.
However, I observed the following:
In version 404, when the mouse is over an attachment link, the string in the status line looks as follows: siteurl/bin/viewfile/web/topic?rev=n;filename=file.ext
In version 4.2 the string in the status line looks as follows: siteurl/pub/web/topic/file.ext
In other words: in the new version the attachment is directly accessed via the url. This is quite a difference compared with the previous version where the url apparently has been constructed in the background.

Does this mean, we have to ban the special characters in file names and replace all occurences of these characters in previously stored attachements by standard ASCII characters? Or is there still another solution for this problem?

-- TWiki:Main.MichaelSchmidt - 01 Apr 2008

The use of a direct link instead of viewfile helps on the performance and many requested it because the viewfile URLs were difficult to use with tools like wget.

You need to compare the links that are created in the shown page (look at the page source) and compare this with the actual links. Maybe we have some encoding issue.

It is impossible to guess your configuration. Please attach the LocalSite.cfg (remove passwords and email addresses you want to keep secret and attach it to this bug report.

Also attach an actual topic (the raw file) so we can see what is in the META data of the topic.

-- TWiki:Main.KennethLavrsen - 01 Apr 2008

It also seems to be an encoding issue to me.

I have attached the current config file, a sample test page and three screenshots to demonstrate the problem.
screen_01.jpg

As you can see in the source of the page, the file name in the "meta" statement contains the special character (umlaut ö) while the link statement generated by TWiki contains the encoding "%f6" for this character. The file cannot be displayed by TWiki. If I duplicate the link statement and replace the encoding by the character, the file can be displayed.

When I click on the file name displayed in the attachment area, TWiki generates a URL containing the encoding for the character, but the webserver is not able to access the file using this URL.
screen_02.jpg

If I modify the URL replacing the encoding with the character, the webserver can access the file.
screen_03.jpg

-- TWiki:Main.MichaelSchmidt - 10 Apr 2008

Assigned to I18N

CC

Michael: You said you "migrated a TWiki installation"--by this, did you mean you copied the contents/configuration into a new environment (new computer and/or new web server)?

I 'played' with this issue on a Linux box which also hosts a Windows VM where I installed the provided Windows Installer (after 'deactivating' TWiki::Sandbox::sanitizeAttachmentName and attaching a file called plügin.gif w/ u-umlaut in both cases).

After that, I had a look at the Apache logs and tried to access the attachment/icon directly, using all three possibilities to encode the umlaut: plügin.gif, pl%FCgin.gif, and pl%C3%BCgin.gif. NB: If you use umlauts directly, they will automatically be encoded first--in all cases ({IE7,Firefox2.0.0.14}/Win->TWiki/{Win,Linux}, {Firefox/Konqueror}/Linux->TWiki/Linux), the resulting GET statement used the third encoding. (By "access directly", I mean that I copied the URLs into the browser's address bar; if you use the WYSIWYG editor, links within a topic will be converted (differently) by the plugin itself, which 'strangely' will result in attachments being displayed correctly there but not in "view" mode.)

Interestingly, the installed Apache/2.2.8 (Linux Mandriva) instance was unable to 'map' that third form onto the filename, but was ok with the second, while the Apache/2.2.4 (Win32) instance contained in the forementioned Windows installer would only accept the third encoding (exactly as described above). Therefore, I guess that during the migration, that very behaviour changed. And while I clearly would call this a (mapping) bug, it's not TWiki's fault but the web server's... since the requests technically 'bypass' TWiki, checking/unifying the form of the escape codes used within TWiki topics can only be regarded as a work-around... frown

-- TWiki:Main.MarkusUeberall - 12 May 2008

Addendum: If you cannot change the web server's behaviour, you may get away with a rather small set of rewriting rules for "wrong encodings" of german umlauts, but I didn't test this...

-- TWiki:Main.MarkusUeberall - 12 May 2008

For Apache backends, this might be useful to read: https://issues.apache.org/bugzilla/show_bug.cgi?id=24333

-- TWiki:Main.OliverKrueger - 12 May 2008

I just found an easier solution to the problem: use mod_encoding (http://webdav.todo.gr.jp/) smile

After I added the following lines to my apache configuration (cf. TWiki.ApacheConfigGenerator), all URI encodings mentioned above worked (you may want to check that the given server encoding matches your setting, though):

<IfModule mod_encoding.c>
    EncodingEngine on
    SetServerEncoding iso-8859-15
</IfModule>

Example:

---++ Attachments with german umlauts
   * Attachment w/ german umlaut in name: <img src="%ATTACHURLPATH%/pl%C3%BCgin.gif" alt="pl%C3%BCgin.gif">
   * Attachment w/ german umlaut in name: <img src="%ATTACHURLPATH%/pl%FCgin.gif" alt="pl%FCgin.gif">
   * Attachment w/ german umlaut in name: <img src="%ATTACHURLPATH%/plügin.gif" alt="plügin.gif">

should be displayed as follows:

-- TWiki:Main.MarkusUeberall - 16 May 2008

Re-opening this (with Normal priority) to remind Markus to document this (in Codev internationalisation docs) for the benefit of other TWiki users.

-- TWiki:Main.CrawfordCurrie - 17 May 2008

Attachments with I18N were working fine in Cairo but something broke when things were refactored.

Some things that may be relevant:

  • TWiki:Codev.EncodeURLsWithUTF8 was written specifically to handle this case - for URLs that are served by TWiki, it converts from (URL-encoded) UTF-8 in the URL to the {Site}{CharSet} e.g. ISO-8859-15. For URLs served by Apache, it tries to pre-encode the generated URL in the attachment links that it provides, such that the URL is already in the site charset (because there won't be any chance for TWiki to fix the encoding when the attachment is served directly by Apache.) So this is something of a regression. I did work on this a while back but can't find the bug right now.
  • TWiki:Codev.ApacheTwoBreaksNonUTF8EncodedURLsOnWindows - this is an Apache on Windows bug with non-UTF-8 URLs (e.g. pl%FCgin.gif not pl%C3%BCgin.gif) - I managed to get a patch into Apache 2.0.54 that mostly fixed this, but a TWiki patch may still be necessary on Windows (see topic for patch). That was to do with URL components such as PATH_INFO but it's worth a try with 2.0.54.

-- TWiki:Main.RichardDonkin - 28 Jun 2008

ItemTemplate
Summary Attachments containing special characters (e.g. german umlauts: ä ö ü) can not be opened any more
ReportedBy TWiki:Main.MichaelSchmidt
Codebase 4.2.0
SVN Range TWiki-5.0.0, Sun, 09 Mar 2008, build 16496
AppliesTo Engine
Component I18N
Priority Normal
CurrentState Confirmed
WaitingFor TWiki:Main.MarkusUeberall
Checkins

TargetRelease n/a
ReleasedIn

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg FehlerGlblRemote3.jpg r1 manage 7.2 K 2008-03-31 - 16:53 UnknownUser Sample attachment without umlaut
JPEGjpg FehlerGlblRemöte3.jpg r1 manage 7.2 K 2008-03-31 - 16:54 UnknownUser Sample attachment with an umlaut
Unknown file formatcfg LocalSite.cfg r1 manage 9.8 K 2008-04-10 - 08:28 UnknownUser current config file
Texttxt TestItem5485.txt r1 manage 0.6 K 2008-04-10 - 08:29 UnknownUser sample page with meta-statement
PNGpng mod_encoding_solution.png r1 manage 8.9 K 2008-05-16 - 15:40 UnknownUser snapshot for "mod_encoding" based solution
JPEGjpg screen_01.jpg r1 manage 47.5 K 2008-04-10 - 08:30 UnknownUser screenshot of sample page
JPEGjpg screen_02.jpg r1 manage 34.2 K 2008-04-10 - 08:31 UnknownUser screenshot of error message
JPEGjpg screen_03.jpg r1 manage 7.4 K 2008-04-10 - 08:32 UnknownUser screenshot of modified URL
Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r16 - 2008-06-28 - RichardDonkin
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback