Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Data Processing

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


qli at ica

Sep 24, 2009, 10:43 PM

Post #1 of 16 (1753 views)
Permalink
Data Processing

Hi

Now I have constructed a local wiki.And I want to add the data which download
from the internet Wikipedia to the local wiki.I tried to read the source
code,but I coudln’t find the exact thing(Interface) that I want.

So,I want to ask some questions:

when click the save button after edit an article or add a new article, how is
the data stored? Which function/class does it call?



Could you describe the process of data storage ?



What form are articles stored in database?


Thanks



Vanessa



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


petr.kadlec at gmail

Sep 25, 2009, 2:30 AM

Post #2 of 16 (1684 views)
Permalink
Re: Data Processing [In reply to]

2009/9/25 vanessa lee <qli [at] ica>:
> What form are articles  stored in database?

Raw wiki text, plus many tables containing metadata. See
http://www.mediawiki.org/wiki/Manual:Database_layout

-- [[cs:User:Mormegil | Petr Kadlec]]

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


qli at ica

Sep 25, 2009, 3:28 AM

Post #3 of 16 (1681 views)
Permalink
Re: Data Processing [In reply to]

OK, I see. And the other questions?

If I edit a page,Whether the page_id change or not?



-----Original Message-----
From: Petr Kadlec <petr.kadlec [at] gmail>
To: Wikimedia developers <wikitech-l [at] lists>
Date: Fri, 25 Sep 2009 11:30:34 +0200
Subject: Re: [Wikitech-l] Data Processing


2009/9/25 vanessa lee <qli [at] ica>:
> What form are articles stored in database?

Raw wiki text, plus many tables containing metadata. See
http://www.mediawiki.org/wiki/Manual:Database_layout

-- [[cs:User:Mormegil | Petr Kadlec]]

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


bryan.tongminh at gmail

Sep 25, 2009, 3:31 AM

Post #4 of 16 (1679 views)
Permalink
Re: Data Processing [In reply to]

On Fri, Sep 25, 2009 at 12:28 PM, 李琴 <qli [at] ica> wrote:
> OK,  I see. And the other questions?
>
> If I edit a page,Whether the page_id change or not?
>
page_latest will point to the relevant rev_id. The text will most
likely be saved in the text table. Article::doEdit and Revision are
the most likely places in the source to start looking for more
information.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at gmail

Sep 25, 2009, 4:07 AM

Post #5 of 16 (1688 views)
Permalink
Re: Data Processing [In reply to]

2009/9/25 李琴 <qli [at] ica>:
> OK,  I see. And the other questions?
>
> If I edit a page,Whether the page_id change or not?
>
No, the page_id stays the same when pages are edited, and even when
they're moved (in the latter case, page_namespace and/or page_title
will change, of course).

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


qli at ica

Sep 25, 2009, 4:27 AM

Post #6 of 16 (1681 views)
Permalink
Re: Data Processing [In reply to]

If i just edit a page's section.Then ,What will be saved in the text
table, the entire page's text or just the edited section's text ?

Why the text table's fiels are old_id, old_text,old_flags? What does
the old mean?

Thanks


-----Original Message-----
From: Bryan Tong Minh <bryan.tongminh [at] gmail>
To: Wikimedia developers <wikitech-l [at] lists>
Date: Fri, 25 Sep 2009 12:31:15 +0200
Subject: Re: [Wikitech-l] Data Processing


On Fri, Sep 25, 2009 at 12:28 PM, 李琴 <qli [at] ica> wrote:
> OK, I see. And the other questions?
>
> If I edit a page,Whether the page_id change or not?
>
page_latest will point to the relevant rev_id. The text will most
likely be saved in the text table. Article::doEdit and Revision are
the most likely places in the source to start looking for more
information.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


petr.kadlec at gmail

Sep 25, 2009, 5:06 AM

Post #7 of 16 (1681 views)
Permalink
Re: Data Processing [In reply to]

2009/9/25 李琴 <qli [at] ica>:
>       If i just edit a page's section.Then ,What will be saved in the text
> table,  the entire page's text or  just the edited section's text   ?

The entire page text, which is the result of merging the previous page
text with the changes you’ve made to the section.

>     Why the text table's fiels are old_id, old_text,old_flags? What does
> the old mean?

See http://www.mediawiki.org/wiki/Manual:Text_table

-- [[cs:User:Mormegil | Petr Kadlec]]

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


qli at ica

Sep 25, 2009, 6:37 AM

Post #8 of 16 (1683 views)
Permalink
Re: Data Processing [In reply to]

The entire page text has been stroed in text table. But the recent change
page just shows the edited text.
Then,how do these text stroed?


I want to see the content(BLOB) of old_text fiels in text table.
What should I do?


Thanks


-----Original Message-----
From: Petr Kadlec <petr.kadlec [at] gmail>
To: Wikimedia developers <wikitech-l [at] lists>
Date: Fri, 25 Sep 2009 14:06:41 +0200
Subject: Re: [Wikitech-l] Data Processing
2009/9/25 李琴 <qli [at] ica>:
> If i just edit a page's section.Then ,What will be saved in the text
> table, the entire page's text or just the edited section's text ?

The entire page text, which is the result of merging the previous page
text with the changes you’ve made to the section.

> Why the text table's fiels are old_id, old_text,old_flags? What does
> the old mean?

See http://www.mediawiki.org/wiki/Manual:Text_table

-- [[cs:User:Mormegil | Petr Kadlec]]

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


petr.kadlec at gmail

Sep 25, 2009, 7:29 AM

Post #9 of 16 (1678 views)
Permalink
Re: Data Processing [In reply to]

2009/9/25 李琴 <qli [at] ica>:
> The entire page text has been stroed in text table. But the recent change
> page just shows the edited text.
> Then,how do these text stroed?

It is not stored. It is evaluated during every diff view by comparing
(diffing) the two revisions (see phase3/includes/diff/*.php). Note you
can view differences between two non-consecutive versions (indeed, you
can compare two revisions not even belonging to the same page, e.g.
http://en.wikipedia.org/wiki/?diff=12345&oldid=67890&diffonly=1).

> I want to see the content(BLOB) of  old_text fiels in text table.
>  What should I do?

It depends on the configuration of your wiki. The “text” table might
contain the wikitext directly in the “old_text” column (in the source
text form, or as a serialized PHP object, see
phase3/includes/HistoryBlob.php), or the “old_text” column is only a
pointer to where/how the text is really stored, see e.g.
http://www.mediawiki.org/wiki/Manual:External_Storage

-- [[cs:User:Mormegil | Petr Kadlec]]

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


qli at ica

Sep 28, 2009, 5:54 AM

Post #10 of 16 (1636 views)
Permalink
Re: Data Processing [In reply to]

Why one page_title can have different page_id? For example, the page_title
'USA' has two page_id '98937','112696'.



Thanks

vanessa
lee
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


maxsem.wiki at gmail

Sep 28, 2009, 6:01 AM

Post #11 of 16 (1646 views)
Permalink
Re: Data Processing [In reply to]

On 28.09.2009, 16:54 ?? wrote:

> Why one page_title can have different page_id? For example, the page_title
> 'USA' has two page_id '98937','112696'.
>

Because they have different page_namespace ;)


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


qli at ica

Oct 10, 2009, 11:26 PM

Post #12 of 16 (1499 views)
Permalink
Re: Data Processing [In reply to]

Sorry,I can't get this sentence's exact meaning. 'External storage stores
page text on a separate DB server to the main wiki database.'

What 's separate DB server ?
Or I just want to know where can I get the page content?


Thanks
vanessa lee


-----Original Message-----
From: Petr Kadlec <petr.kadlec [at] gmail>
To: Wikimedia developers <wikitech-l [at] lists>
Date: Fri, 25 Sep 2009 16:29:58 +0200
Subject: Re: [Wikitech-l] Data Processing


2009/9/25 李琴 <qli [at] ica>:
> The entire page text has been stroed in text table. But the recent change
> page just shows the edited text.
> Then,how do these text stroed?

It is not stored. It is evaluated during every diff view by comparing
(diffing) the two revisions (see phase3/includes/diff/*.php). Note you
can view differences between two non-consecutive versions (indeed, you
can compare two revisions not even belonging to the same page, e.g.
http://en.wikipedia.org/wiki/?diff=12345&oldid=67890&diffonly=1).

> I want to see the content(BLOB) of old_text fiels in text table.
> What should I do?

It depends on the configuration of your wiki. The “text” table might
contain the wikitext directly in the “old_text” column (in the source
text form, or as a serialized PHP object, see
phase3/includes/HistoryBlob.php), or the “old_text” column is only a
pointer to where/how the text is really stored, see e.g.
http://www.mediawiki.org/wiki/Manual:External_Storage

-- [[cs:User:Mormegil | Petr Kadlec]]

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at gmail

Oct 11, 2009, 6:18 AM

Post #13 of 16 (1491 views)
Permalink
Re: Data Processing [In reply to]

2009/10/11 李琴 <qli [at] ica>:
> Sorry,I can't  get this sentence's exact meaning. 'External storage stores
> page text on a separate DB server to the main wiki database.'
>
>   What 's separate DB server ?
>   Or I just want to know where can I get the page content?
>
On Wikipedia, we use one set of DB servers for the wiki database, and
a second set for text; this second set is called the external storage.
You don't need to worry about this too much, as small wikis typically
don't use external storage, but use the text table instead.

The difference view in recent changes is produced by comparing two
texts using an external program like diff; it's not stored anywhere
(other than possibly in the objectcache table, for caching purposes).

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


ChristensenC at BATTELLE

Oct 12, 2009, 10:53 AM

Post #14 of 16 (1476 views)
Permalink
Re: Data Processing [In reply to]

2009/10/11 Roan:
> On Wikipedia, we use one set of DB servers for the wiki database, and
>a second set for text; this second set is called the external storage.
>You don't need to worry about this too much, as small wikis typically
>don't use external storage, but use the text table instead.
>
>The difference view in recent changes is produced by comparing two
>texts using an external program like diff; it's not stored anywhere
>(other than possibly in the objectcache table, for caching purposes).
>
>Roan Kattouw (Catrope)


Is there a size range at which it is accepted to be advantageous to move your text out of the standard text table?

Thanks,
-Courtney

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at gmail

Oct 12, 2009, 11:16 AM

Post #15 of 16 (1485 views)
Permalink
Re: Data Processing [In reply to]

2009/10/12 Christensen, Courtney <ChristensenC [at] battelle>:
> Is there a size range at which it is accepted to be advantageous to move your text out of the standard text table?
>
It's not so much about size as it is about load. Wikipedia uses ES
because it's a heavy-load wiki. More specific directions about when ES
makes sense and when it doesn't are probably better given by people
more familiar with databases than me.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


midom.lists at gmail

Oct 12, 2009, 12:30 PM

Post #16 of 16 (1478 views)
Permalink
Re: Data Processing [In reply to]

Hi!
> Is there a size range at which it is accepted to be advantageous to
> move your text out of the standard text table?

once it doesn't fit into single server anymore? :)

Domas

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.