Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Reliability of rev_id as order maintaining in enwiki

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


halfak at cs

Aug 10, 2009, 9:28 AM

Post #1 of 4 (513 views)
Permalink
Reliability of rev_id as order maintaining in enwiki

I'm using data from a snapshot of the English Wikipedia and would like
to run a query similar to the following:

SELECT * FROM revision
WHERE rev_id > some_rev_id;

Can I be confident that all revisions returned were saved after some_rev_id?

Thanks!
-Aaron

P.S. I have considered using rev_timestamp and would like to avoid that
if it is possible.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at wikimedia

Aug 10, 2009, 9:36 AM

Post #2 of 4 (490 views)
Permalink
Re: Reliability of rev_id as order maintaining in enwiki [In reply to]

On 8/10/09 9:28 AM, Aaron L Halfaker wrote:
> I'm using data from a snapshot of the English Wikipedia and would like
> to run a query similar to the following:
>
> SELECT * FROM revision
> WHERE rev_id> some_rev_id;
>
> Can I be confident that all revisions returned were saved after some_rev_id?

Yes, in the sense that they were added to the database afterwards.

No, in the sense that rev_timestamp may not always show a later date for
a later rev_id:

* Page histories imported from pre-conversion UseModWiki archives
* Anything imported via Special:Import
* Anything undeleted before ar_rev_id column was added
* Anything saved on a server that had a mis-configured clock

-- brion

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Aug 10, 2009, 10:10 AM

Post #3 of 4 (486 views)
Permalink
Re: Reliability of rev_id as order maintaining in enwiki [In reply to]

On Mon, Aug 10, 2009 at 12:36 PM, Brion Vibber<brion [at] wikimedia> wrote:
> Yes, in the sense that they were added to the database afterwards.
>
> No, in the sense that rev_timestamp may not always show a later date for
> a later rev_id:
>
> * Page histories imported from pre-conversion UseModWiki archives
> * Anything imported via Special:Import
> * Anything undeleted before ar_rev_id column was added
> * Anything saved on a server that had a mis-configured clock

Plus there's a race condition in generating the timestamps. They're
generated slightly before the row is actually inserted, so if two
revisions were saved at almost exactly the same time, it's possible
for one to have a timestamp one second later but a lower id.

This is usually a safe-ish assumption, though, as long as occasional
misordering is acceptable. Don't we generate next/previous revision
links in some places based on rev_id?

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


charlottethewebb at gmail

Aug 10, 2009, 11:50 AM

Post #4 of 4 (489 views)
Permalink
Re: Reliability of rev_id as order maintaining in enwiki [In reply to]

On Mon, Aug 10, 2009 at 12:10 PM, Aryeh
Gregor<Simetrical+wikilist [at] gmail> wrote:
> This is usually a safe-ish assumption, though, as long as occasional
> misordering is acceptable.  Don't we generate next/previous revision
> links in some places based on rev_id?

In the diff view, for one. Weird ones like this are very common in old articles:

http://en.wikipedia.org/w/index.php?title=Bill_Clinton&oldid=238014&diff=prev

—C.W.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.