• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item5718: SEARCH shows different results depending on limit

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Engine   Normal Waiting for Feedback TWiki:Main.SopanShewale major  

Edit Form Data

Summary:
Reported By:
Codebase:
Applies To:
Component:
Priority:
Current State:
Waiting For:
Target Release:
Released In:
 

Detail

When I tell WebChanges to give me more results by clicking 100, 200, 500 etc, it gives me different results. This seems to happen with SEARCH generally, at least when using order and reverse. The simplest way to describe it is that increasing limit increases the number of things it finds. Example:

TWiki web with less than 300 topics. A team made possibly hundreds of edits to many dozens of topics in the space of a day, 18 Jun 2008. Going into WebChanges, they expected to see basically all displayed changes as being from 18 Jun 2008, but there were a few from today (19 Jun 2008), then several from 18 Jun 2008, then the rest were from before the 18th.

So I've done some testing. I searched (CTL+F) for the string " -18 Jun 2008 - " on the WebChanges result (this matched the rev line for each result uniquely). Here's what I got:

Limit Hits
50 19
100 39
200 54
500 66

  • If there were 66 hits in reverse chronological order in a listing of 500, they should also show up in both the 100- and 200-limit searches. (When I did this testing, there were 17 results from today, the 19th.)
  • In every case, results from before 18 Jun 2008 were displayed at the bottom, after the 18 June 2008 hits - even on the 50-limit search, which should have at least tailed out still displaying hits from this date.
  • Also, the order changes. Eg, the first two topics displayed on the 50-limit search are in reverse order on the 500-limited search.

I've marked this "urgent" because it seems to indicate that the validity of search output is somewhat compromised.

If I say "give me the most recent 50 changes", I expect it to look at all the topics and give me the 50 most recently changed. Instead, it seems to be getting the 50 topics that it can grab without having to get out of its chair, and then listing them in reverse chronological order.

-- TWiki:Main/MarcusLeonard - 19 Jun 2008

mmm, I think this is due to the non-obvious way that the search options are applied - I think limit happens first, then the sort. There was once an attempt to change that, but iirc there were some TWiki apps that didn't like the result.

I'm about to make some changes to the way that information is passed to the search and query backends - perhaps this will be helped by it.

damned good find!

-- TWiki:Main.SvenDowideit - 19 Jun 2008

ah yes :/

            # For performance:
            #   * sort by approx time (to get a rough list)
            #   * shorten list to the limit + some slack
            #   * sort by rev date on shortened list to get the accurate list
            # SMELL: Ciaro had efficient two stage handling of modified sort.
            # SMELL: In Dakar this seems to be pointless since latest rev

-- TWiki:Main.SvenDowideit - 19 Jun 2008

This is a known problem all the way back to Cairo.

The problem manifests itself very badly if you have manipulated (edited or copied) topics in and out of a web via the file system.

I almost want to make a bet that this is what happened that day. I would not be surprised if the 100s of topics were manipulated directly on the file system or copied from another web.

TWiki generates the list of last modified by doing a pre-sort using the file time stamp instead of the timestamp in the topic. This gives a huge performance advantage when you have many topics in a web.

Normally WebChanges works pretty well and trying it here on Bugs I get valid results no matter what limit. But we have seen this problem when someone have hacked the files directly here.

In a future release we will have to revisit this. With an improved storage we can improve this a lot by having a simple DB (could be as simple as an additional file of topic and their time stamp in date order) that the modified criteria is taken from.

Marcus - put this back to urgent if I am wrong that the files were manipulated outside TWiki editing.

-- TWiki:Main.KennethLavrsen - 27 Jun 2008

Hi Kenneth,

They were edited in the normal way. As I mentioned above, the team using the web made possibly hundreds of edits to many dozens of topics in the space of a day, so it wasn't done at the filesystem level, it was done by the normal edit process through browsers.

I do sometimes do mass operations in the web at the filesystem level, but this problem was flagged by their team leader when they noticed that their edits weren't showing up as expected. I'm assuming that, even if the txt files had been tinkered with directly, any subsequent "normal" TWiki editing would have "put things right"...?

I'm just thinking about why the system-level edits would make a difference... Even if they were edited at the system level (I tend to use sed to do the bulk stuff), the timestamps should still line up the same way. Let's say I do a bulk edit on 300 topics to change permissions in META:PREFERENCE, and all the txt files get a timestamp within a second or two of each other. If humans come along the next day and do a mass of edits, all those edits they made are still more recent than the system edits. Seems to me that the edit, whether done by a TWiki perl script or by me with sed, is still an edit that changes the file at the given time. So, if TWiki uses the file timestamp instead of reading the serial date inside the file, it should be more accurate in terms of showing what got edited last; any sort of edit should be faithfully noted. Seems like there's something fishy going on here... Sven's comment above does seem to explain it.

Anyway, I'll mark it back to urgent as you advise. Cheers,

-- MarcusLeonard - 05 Jul 2008 (PS: Just noticed the signature block now leaves out the TWiki.org interwiki link - is that right?)

I still cannot reproduce this Marcus.

There is something you do not tell us here. There is an important detail that is left out. Can you create an example and attach the web here in a tar.gz?

Did you modify the WebChanges?

-- TWiki:Main.KennethLavrsen - 13 Jul 2008

Hi,

I tried hard to reproduce the issue in my development environment. Questions for you -TWiki:Main.MarcusLeonard

[1]. Did you modify WebChanges topic? If yes - is it possible to add the content in this page?

[2]. Kindly request you to attach the .tar.gz of the Web which is showing this issue?

Please let me know if any one able to reproduce this issue???


TWiki:Main.SopanShewale

TWiki:Codev.GeorgetownReleaseMeeting2008x07x21 decided to downgrade this to normal as noone can reproduce this and reporter has not given the needed info to reproduce it.

-- TWiki:Main.KennethLavrsen - 22 Jul 2008

Thanks TWiki:Main.MarcusLeonard for sharing the details...

In fact I am able to recreate the bug on my sample instances....

My instance shows following - i have tested this on TWiki

Limit Number of topics* date/time

100 19 22 Jan 2008

50 12 22 Jan 2008

500 92 22 Jan 200

I am working on this.

-- SopanShewale - 01 Aug 2008

Would love a patch for this if you come up with one!

-- TWiki:Main.MarcusLeonard - 02 Aug 2008

First limiting and then sorting is sacrificing correctness for speed. This is a known trade-off in TWiki that has been made rather conscious. If you'd like correct sorting+limitting, have a look at DBCachePlugin, where this tradeoff isn't needed anymore, and rewrite WebChanges using an appropriate DBQUERY statement. That's the only solution we can offer right now without changing SEARCH in the core.

-- TWiki:Main.MichaelDaum - 03 Aug 2008

Yikes. Okay... I'll look at some interim solutions, I guess. But can you tell me what the plan for this problem is in the longer term? It seems pretty serious to me; basically, SEARCH doesn't do what it says it does, or what people will think it does. Limiting-then-searching seems like a "lucky dip", because you won't necessarily know how the limiting will be applied. It reduces confidence in TWiki, which is not a good situation for something that's supposed to be a premier information management tool.

-- TWiki:Main.MarcusLeonard - 04 Aug 2008

ItemTemplate
Summary SEARCH shows different results depending on limit
ReportedBy TWiki:Main.MarcusLeonard
Codebase 4.2.0
SVN Range TWiki-5.0.0, Sun, 01 Jun 2008, build 16865
AppliesTo Engine
Component

Priority Normal
CurrentState Waiting for Feedback
WaitingFor TWiki:Main.SopanShewale
Checkins

TargetRelease major
ReleasedIn

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r11 - 2008-08-04 - MarcusLeonard
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback