• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.
Caught the RSS feed failing because there were special chars in the XML output from the feed, breaking the DTD.

Look for the <code>/<pre> parts of the description tag in this snippet:

<item rdf:about="http://develop.twiki.org/~develop/cgi-bin/view/Bugs/Item1978">
  <title>Item1978 - Form.pm fails when the <code>name</code> field is <pre>&#91;&#91;Topic]&#91;fieldname]]</pre> for controls -- Waiting for Release</title>  <link>http://develop.twiki.org/~develop/cgi-bin/view/Bugs/Item1978?t=2006-03-29T18:37:06Z</link>
  <description>Form.pm fails when the <code>name</code> field is <pre>&#91;&#91;Topic]&#91;fieldname]]</pre> for controls State: Waiting for Release -- last changed by KennethLavrsen</description>
  <dc:date>2006-03-29T18:37:06Z</dc:date>
  <dc:contributor>
    <rdf:Description link="http://develop.twiki.org/~develop/cgi-bin/view?topic=Main.KennethLavrsen">
      <rdf:value>KennethLavrsen</rdf:value>
    </rdf:Description>
  </dc:contributor>
</item> 

One way to solve this would be to alter Render.pm:

Index: lib/TWiki/Render.pm
===================================================================
--- lib/TWiki/Render.pm (revision 9600)
+++ lib/TWiki/Render.pm (working copy)
@@ -1263,6 +1263,16 @@
           defined( $TWiki::cfg{Site}{CharSet} ) &&
             $TWiki::cfg{Site}{CharSet} =~ /^iso-?8859-?1$/i ) {
         $text =~ s/([\x7f-\xff])/"\&\#" . unpack( 'C', $1 ) .';'/ge;
+
+        # if there is an & that is not part of an entity, convert it
+        # to &amp;
+        $text =~ s/&(?!#?[a-zA-Z0-9]+;)/&amp;/g;
+
+        # do the rest of the standard escapes for XML: <, >, ', "
+        $text =~ s/</&lt;/g;
+        $text =~ s/>/&gt;/g;
+        $text =~ s/"/&quot;/g;
+        $text =~ s/'/&apos;/g;
     }

     return $text;

This would solve it for RSS feeds only, not sure if it might be more sensible to add it in somewhere else / more generic for SEARCH results?

-- SP

Item1924 is related to this.

-- SP

Perhaps add new parameter to %SEARCH%, i.e. escapexmlentities, escaping the 5 internal XML entities, as above (and as doc'ed at http://www.xml.com/pub/a/98/08/xmlqna1.html#INTENT)?

Entity Name Replacement Text
lt The less than sign (<)
gt The greater than sign (>)
amp The ampersand (&)
apos The single quote or apostrophe (')
quot The double quote (")

-- SP

closed duplicate Item3612

these 5 HTML entities are the only ones that an XML parser is required to know (these 5 are predefined). the other problem occurs on all of the other HTML entities: one approach (probably the best?) is to spit out a list of HTML entity definitions (to define &eacute and the rest...) at the top of the RSS feed. see http://www.xml.com/pub/a/98/08/xmlqna1.html#INTENT and http://www.w3.org/TR/REC-xml/#sec-entity-decl

-- TWiki:Main.WillNorris - 20 May 2007

It would also be good to add <![CDATA[ tags when xml is generated. Or use it in an example - currently the < in the tag is converted to

&lt;

-- TWiki:Main.ArthurClemens - 21 May 2007

Is this still a problem? AFAICT all the relevant entities are escaped correctly. Can anyone reproduce a problem? I can't.

CC

The relevant part of WebRss currently looks like this:

%SEARCH{"%URLPARAM{"search" default=".*" }%" web="%WEB%" excludetopic="WebStatistics" regex="on"
  nosearch="on" order="modified" reverse="on" nototal="on" limit="16" format="<item rdf:about=\"%SCRIPTURL{"view"}%/$web/$topic\">$n  
  <title><noautolink>$topic - $formfield(Summary) -- $formfield(CurrentState)</noautolink></title>$n  
  <link>%SCRIPTURL{"view"}%/$web/$topic?t=$isodate</link>$n  <description><noautolink>$formfield(Summary) 
  State: $formfield(CurrentState) -- last changed by <nop>$wikiname</noautolink></description>$n  
  <dc:date>$isodate</dc:date>$n  <dc:contributor>$n    <rdf:Description link=\"%SCRIPTURL{"view"}%?topic=$wikiusername\">$n      
  <rdf:value>$username</rdf:value>$n    </rdf:Description>$n  </dc:contributor>$n</item>"}%

The problem is how to XML-encode the $formfield(Summary) parts - not the search summary itself. The <![CDATA[ suggestion above is one way to go, but many RSS-readers doesn't recognize this construction and fails anyway. My experience says he same is still true for other (non-stringent) XML-document parsers in general.

Currently authorization is required to read the Bugs feed, but normally the feed can be directly validated by visiting feedvalidator.org.

-- TWiki:Main.SteffenPoulsen - 14 Jun 2007

Before we strand in a lock-up, could we use the CDATA as a partial solution first?

-- TWiki:Main.ArthurClemens - 14 Jun 2007

I have added sections to title and description to WebRSS to demonstrate that this is not a valid solution, output becomes:

<title>&lt;![CDATA[Item4298 - <code>secret</code> parameter not working -- Closed]]&gt;</title>
<link>http://develop.twiki.org/~twiki4/cgi-bin/view/Bugs/Item4298?t=2007-06-24T11:16:55Z</link>
<description>&lt;![CDATA[=secret= parameter not working State: Closed -- last changed by CrawfordCurrie]]&gt;</description>

with the < translated to its HTML entity (as Arthur also states above).

Is it correct spec that the original search string is translated to HTML entities, but the output from $formfield is not?

-- TWiki:Main.SteffenPoulsen - 24 Jun 2007

Yes, I think that is correct. $formfield ought to come out exactly as found in meta.

-- TWiki:Main.CrawfordCurrie - 02 Jul 2007

In the above example, the < to &lt; translation is ruining the CDATA markup - any ideas on how to escape it?

-- TWiki:Main.SteffenPoulsen - 18 Dec 2007

Figured this is an error in Render.pm, mistakingly taking CDATA sections for lone < and >.

The Bugs RSS feed with CDATA markup now validates, closing this.

-- TWiki:Main.SteffenPoulsen - 18 Dec 2007

ItemTemplate
Summary Not possible to use CDATA in SEARCH output (prev: RSS feeds chokes on HTML entities in SEARCH results)
ReportedBy TWiki:Main.SteffenPoulsen
Codebase 4.1.2, 4.2.0, ~twiki4
SVN Range Mon, 27 Mar 2006 build 9563
AppliesTo Engine
Component

Priority Normal
CurrentState Closed
WaitingFor

Checkins TWikirev:16042 TWikirev:16043
TargetRelease minor
ReleasedIn 4.2.0
Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r19 - 2008-01-22 - KennethLavrsen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback