Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Wikipedia database

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


zh509 at york

Nov 19, 2009, 6:38 AM

Post #1 of 10 (1162 views)
Permalink
Wikipedia database

Greeting,

May I ask the question about wikipedia database. I downloaded the Wikipedia
revision current data. and found there are some records have the exactly
same rev_id, rev_user and same timestamp. What does it mean? are they the
same edit or different?

best,

Zeyi

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at gmail

Nov 19, 2009, 8:21 AM

Post #2 of 10 (1117 views)
Permalink
Re: Wikipedia database [In reply to]

2009/11/19 <zh509 [at] york>:
> Greeting,
>
> May I ask the question about wikipedia database. I downloaded the Wikipedia
> revision current data. and found there are some records have the exactly
> same rev_id, rev_user and same timestamp. What does it mean? are they the
> same edit or different?
>
If they belong to the same wiki, they're very likely to be the same
edit. Of course such duplicates should theoretically not occur.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


zh509 at york

Nov 19, 2009, 8:48 AM

Post #3 of 10 (1106 views)
Permalink
Re: Wikipedia database [In reply to]

On Nov 19 2009, Roan Kattouw wrote:

>2009/11/19 <zh509 [at] york>:
>> Greeting,
>>
>> May I ask the question about wikipedia database. I downloaded the
>> Wikipedia revision current data. and found there are some records have
>> the exactly same rev_id, rev_user and same timestamp. What does it mean?
>> are they the same edit or different?
>>
>If they belong to the same wiki, they're very likely to be the same
>edit. Of course such duplicates should theoretically not occur.
>
>Roan Kattouw (Catrope)
>

Thanks, I noted that because i add Revision Table and Page table together.
May I ask why for the same page.page_latest, there are two same records on
the table? Is that the link between revision and Page is the
rev_id=page.page_latest?

thanks.

Zeyi

_______________________________________________
>Wikitech-l mailing list
>Wikitech-l [at] lists
>https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Nov 19, 2009, 12:04 PM

Post #4 of 10 (1107 views)
Permalink
Re: Wikipedia database [In reply to]

zh509 [at] york wrote:
> On Nov 19 2009, Roan Kattouw wrote:
>
>> 2009/11/19 <zh509 [at] york>:
>>> Greeting,
>>>
>>> May I ask the question about wikipedia database. I downloaded the
>>> Wikipedia revision current data. and found there are some records have
>>> the exactly same rev_id, rev_user and same timestamp. What does it mean?
>>> are they the same edit or different?
>>>
>> If they belong to the same wiki, they're very likely to be the same
>> edit. Of course such duplicates should theoretically not occur.
>>
>> Roan Kattouw (Catrope)
>>
>
> Thanks, I noted that because i add Revision Table and Page table together.
> May I ask why for the same page.page_latest, there are two same records on
> the table? Is that the link between revision and Page is the
> rev_id=page.page_latest?

page.page_latest point to the current revision.rev_id


However, you shouldn't be able to have several revisions with the same
rev_id. Even if something went horribly wrong at the wiki level, rev_id
is a PRIMARY KEY.
How did you do the import?
I suspect you may have broken something importing or merging.



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


zh509 at york

Nov 20, 2009, 5:23 AM

Post #5 of 10 (1103 views)
Permalink
Re: Wikipedia database [In reply to]

On Nov 19 2009, Platonides wrote:

>zh509 [at] york wrote:
>> On Nov 19 2009, Roan Kattouw wrote:
>>
>>> 2009/11/19 <zh509 [at] york>:
>>>> Greeting,
>>>>
>>>> May I ask the question about wikipedia database. I downloaded the
>>>> Wikipedia revision current data. and found there are some records have
>>>> the exactly same rev_id, rev_user and same timestamp. What does it
>>>> mean? are they the same edit or different?
>>>>
>>> If they belong to the same wiki, they're very likely to be the same
>>> edit. Of course such duplicates should theoretically not occur.
>>>
>>> Roan Kattouw (Catrope)
>>>
>>
>> Thanks, I noted that because i add Revision Table and Page table
>> together. May I ask why for the same page.page_latest, there are two
>> same records on the table? Is that the link between revision and Page is
>> the rev_id=page.page_latest?
>
>page.page_latest point to the current revision.rev_id
>
>
>However, you shouldn't be able to have several revisions with the same
>rev_id. Even if something went horribly wrong at the wiki level, rev_id
>is a PRIMARY KEY.
>How did you do the import?
>I suspect you may have broken something importing or merging.
>

I took the sub-current data from MediaWiki and import them to Oracle. I
found there are two same page_latest ID in the page table. Then when I
tried to join Revision table and Page table together, this caused two same
rev_id.

May I ask why I have two page_latest on page table, what it mean? If I want
to put Revision table and Page table together, which should be the link
point?

thanks,
Zeyi
>
>_______________________________________________
>Wikitech-l mailing list
>Wikitech-l [at] lists
>https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Nov 20, 2009, 3:03 PM

Post #6 of 10 (1095 views)
Permalink
Re: Wikipedia database [In reply to]

Zeyi wrote:
> I took the sub-current data from MediaWiki and import them to Oracle.
Which tool did you use for the import?

> I found there are two same page_latest ID in the page table. Then when I
> tried to join Revision table and Page table together, this caused two same
> rev_id.

Which pages are those?


> May I ask why I have two page_latest on page table, what it mean? If I want
> to put Revision table and Page table together, which should be the link
> point?

You shouldn't have that situation.
And why are you merging page and revision, anyway?

> thanks,
> Zeyi


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


zh509 at york

Nov 21, 2009, 6:58 AM

Post #7 of 10 (1091 views)
Permalink
Re: Wikipedia database [In reply to]

On Nov 20 2009, Platonides wrote:

>Zeyi wrote:
>> I took the sub-current data from MediaWiki and import them to Oracle.
>Which tool did you use for the import?
>
I used xml2sql tool, which is easy to use.

>> I found there are two same page_latest ID in the page table. Then when
>> I tried to join Revision table and Page table together, this caused two
>> same rev_id.
>
>Which pages are those?
kinds of every pages, is that page_latest ID unique?
>
>
>> May I ask why I have two page_latest on page table, what it mean? If I
>> want to put Revision table and Page table together, which should be the
>> link point?
>
>You shouldn't have that situation.
>And why are you merging page and revision, anyway?

I need use rev_user and page_namespace to do crossing-analysis. How i can
put them in the one table? thanks again.

>> thanks,
>> Zeyi
>
>
>_______________________________________________
>Wikitech-l mailing list
>Wikitech-l [at] lists
>https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at gmail

Nov 21, 2009, 10:28 AM

Post #8 of 10 (1081 views)
Permalink
Re: Wikipedia database [In reply to]

2009/11/21 <zh509 [at] york>:
> I need use rev_user and page_namespace to do crossing-analysis. How i can
> put them in the one table? thanks again.
>
You don't need to put them in one table, just use a query with a JOIN.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


zh509 at york

Nov 23, 2009, 6:03 AM

Post #9 of 10 (1032 views)
Permalink
Re: Wikipedia database [In reply to]

Thanks. but is that page_latest is unique in page table?

On Nov 21 2009, Roan Kattouw wrote:

>2009/11/21 <zh509 [at] york>:
>> I need use rev_user and page_namespace to do crossing-analysis. How i can
>> put them in the one table? thanks again.
>>
>You don't need to put them in one table, just use a query with a JOIN.
>
>Roan Kattouw (Catrope)
>
>_______________________________________________
>Wikitech-l mailing list
>Wikitech-l [at] lists
>https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at gmail

Nov 23, 2009, 6:16 AM

Post #10 of 10 (1035 views)
Permalink
Re: Wikipedia database [In reply to]

2009/11/23 <zh509 [at] york>:
> Thanks. but is that page_latest is unique in page table?
>
Yes. Every revision belongs to one page only (rev_page).

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.