Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Topology of Wikipedia

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


taw at users

Jul 22, 2002, 2:44 PM

Post #1 of 4 (176 views)
Permalink
Topology of Wikipedia

Topology of Wikipedias is very interesting.

First question is: what is distribution of number of hops needed to
reach an article from the Main page.

Attached script gives aproximate answer to this question.
It requires PHP database, and libmysql-ruby.

Data for Polish Wikipedia:

-1 602 (12.75964392%)
0 1 (0.02119542179%)
1 113 (2.395082662%)
2 886 (18.7791437%)
3 2367 (50.16956337%)
4 600 (12.71725307%)
5 126 (2.670623145%)
6 16 (0.3391267486%)
7 5 (0.1059771089%)
8 2 (0.04239084358%)
Total 4718


Results:
* It's yet to be found how much the fact that empty pages, redirects,
user and talk pages are treated like normal pages affects the results.
* Number of pages that are not reachable at all is very high.
* If page is reachable, it's usually reachable in just a few hops
* Adding more links to pages linked from Main page seems to be best
way of improving results.

I'm especially interested in results from English (which is the
biggest) and Esperanto (which has different linking philosophy)
Wikipedias. Information from Spanish, German and others would
also be interesting, but I don't expect it to differ a lot from Polish
results.
Attachments: tolopogy.rb (1.05 KB)


neil.harris at mediachannel

Jul 22, 2002, 3:56 PM

Post #2 of 4 (169 views)
Permalink
Re: Topology of Wikipedia [In reply to]

Tomasz Wegrzanowski wrote:

>Topology of Wikipedias is very interesting.
>
>First question is: what is distribution of number of hops needed to
>reach an article from the Main page.
>
>Attached script gives aproximate answer to this question.
>It requires PHP database, and libmysql-ruby.
>
>Data for Polish Wikipedia:
>
>-1 602 (12.75964392%)
>0 1 (0.02119542179%)
>1 113 (2.395082662%)
>2 886 (18.7791437%)
>3 2367 (50.16956337%)
>4 600 (12.71725307%)
>5 126 (2.670623145%)
>6 16 (0.3391267486%)
>7 5 (0.1059771089%)
>8 2 (0.04239084358%)
>Total 4718
>
>
>
Interesting. The English-language Wikipedia claims only 313 orphans (<
1%) out of 34457 articles, not counting redirects or non-comma articles.
Maybe there is a 'closure' effect as the encyclopedia gets bigger? Or
maybe 'real' articles are more likely to be linked?

Neil


taw at users

Jul 22, 2002, 4:10 PM

Post #3 of 4 (167 views)
Permalink
Re: Topology of Wikipedia [In reply to]

On Mon, Jul 22, 2002 at 11:56:58PM +0100, Neil Harris wrote:
> Tomasz Wegrzanowski wrote:
>
> >Topology of Wikipedias is very interesting.
> >
> >First question is: what is distribution of number of hops needed to
> >reach an article from the Main page.
> >
> >Attached script gives aproximate answer to this question.
> >It requires PHP database, and libmysql-ruby.
> >
> >Data for Polish Wikipedia:
> >
> >-1 602 (12.75964392%)
> >0 1 (0.02119542179%)
> >1 113 (2.395082662%)
> >2 886 (18.7791437%)
> >3 2367 (50.16956337%)
> >4 600 (12.71725307%)
> >5 126 (2.670623145%)
> >6 16 (0.3391267486%)
> >7 5 (0.1059771089%)
> >8 2 (0.04239084358%)
> >Total 4718
> >
> >
> >
> Interesting. The English-language Wikipedia claims only 313 orphans (<
> 1%) out of 34457 articles, not counting redirects or non-comma articles.
> Maybe there is a 'closure' effect as the encyclopedia gets bigger? Or
> maybe 'real' articles are more likely to be linked?

Orphans count is different. Orphans count is 175 on Polish Wikipedia.

Orphans count doesn't include redirects, empty, user and talk pages.
That's good.

But if some group of articles link to each other but are not linked
from any article outside of the group, then orphan count doesn't
include them. But they're also not accesible, so it should.


neil.harris at mediachannel

Jul 22, 2002, 4:15 PM

Post #4 of 4 (172 views)
Permalink
Re: Topology of Wikipedia [In reply to]

Tomasz Wegrzanowski wrote:

> On Mon, Jul 22, 2002 at 11:56:58PM +0100, Neil Harris wrote:
>
>
>>Tomasz Wegrzanowski wrote:
>>
>>
>>
>>>Topology of Wikipedias is very interesting.
>>>
>>>First question is: what is distribution of number of hops needed to
>>>reach an article from the Main page.
>>>
>>>Attached script gives aproximate answer to this question.
>>>It requires PHP database, and libmysql-ruby.
>>>
>>>Data for Polish Wikipedia:
>>>
>>>-1 602 (12.75964392%)
>>>0 1 (0.02119542179%)
>>>1 113 (2.395082662%)
>>>2 886 (18.7791437%)
>>>3 2367 (50.16956337%)
>>>4 600 (12.71725307%)
>>>5 126 (2.670623145%)
>>>6 16 (0.3391267486%)
>>>7 5 (0.1059771089%)
>>>8 2 (0.04239084358%)
>>>Total 4718
>>>
>>>
>>>
>>>
>>>
>>Interesting. The English-language Wikipedia claims only 313 orphans (<
>>1%) out of 34457 articles, not counting redirects or non-comma articles.
>>Maybe there is a 'closure' effect as the encyclopedia gets bigger? Or
>>maybe 'real' articles are more likely to be linked?
>>
>>
>
>Orphans count is different. Orphans count is 175 on Polish Wikipedia.
>
>Orphans count doesn't include redirects, empty, user and talk pages.
>That's good.
>
>But if some group of articles link to each other but are not linked
>from any article outside of the group, then orphan count doesn't
>include them. But they're also not accesible, so it should.
>

Oh, I see. They're disconnected sub-graphs not reachable from the root.
That's interesting.

I wonder what the equivalent figures are for the English-language Wikipedia?

Neil

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.