Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Wikistats

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


wikipedia at shaihome

Feb 16, 2004, 4:30 AM

Post #1 of 15 (1329 views)
Permalink
Wikistats

Wikistats are back on http://www.wikipedia.org/wikistats/EN/Sitemap.htm from the last dump

There's a little bug on size distribution page, Erik, if you have any idea ? I took the Perl script on your website to generate the stats. CVS files seems ok.

Shaihulud


e.p.zachte at chello

Feb 16, 2004, 8:06 PM

Post #2 of 15 (1278 views)
Permalink
Wikistats [In reply to]

> There's a little bug on size distribution page, Erik, if you have any idea
?

Yes, the decimal point got lost, so 99.9 % reads as 999%.
On request I introduced language dependant decimal point (dot or comma),
with this glitch as result. Fixed it a while ago, apparantly not in online
scripts.

----

The Polish wikipedia tripled its number of internal links in only three
months,
other pl: indicators are more in line with previous figures.
Since wp's are ordered by number of internal links on my pages pl: now ranks
third, before fr: and ja:
which is rather counter intuitive.

Does anyone have any idea why pl: links increased so dramatically?
Huge number of link tables?

Erik Zachte


wikipedia at shaihome

Feb 17, 2004, 1:53 AM

Post #3 of 15 (1277 views)
Permalink
Re: Wikistats [In reply to]

On Tue, 17 Feb 2004 04:06:30 +0100
"Erik Zachte" <e.p.zachte [at] chello> wrote:

> > There's a little bug on size distribution page, Erik, if you have any idea
> ?
>
> Yes, the decimal point got lost, so 99.9 % reads as 999%.
> On request I introduced language dependant decimal point (dot or comma),
> with this glitch as result. Fixed it a while ago, apparantly not in online
> scripts.

Where could I got the last revision ?

> The Polish wikipedia tripled its number of internal links in only three
> months,
> other pl: indicators are more in line with previous figures.
> Since wp's are ordered by number of internal links on my pages pl: now ranks
> third, before fr: and ja:
> which is rather counter intuitive.
>
> Does anyone have any idea why pl: links increased so dramatically?
> Huge number of link tables?

Database size grows to 81M, and the real size seem near of this. But bziped it took less MB than fr :
backup fr : 19M, for ~81M
backup pl : 13M, for ~100M

Maybe a bot put some stuff on pl, any people from pl here ?

It doesnt seem a bug at least.

Shaihulud


engelsAG at t-online

Feb 17, 2004, 4:13 AM

Post #4 of 15 (1271 views)
Permalink
Re: Wikistats [In reply to]

"Erik Zachte" <e.p.zachte [at] chello> schrieb:

> The Polish wikipedia tripled its number of internal links in only three
> months,
> other pl: indicators are more in line with previous figures.

Looking at it, I see the database size grows even faster (250% in December,
which is the month when the abnomaly seems to have happened, against about
240% for the internal links). The number of words also is growing very fast
(almost 50% in the given month). The number of articles also grew faster
than normal, but not as explosively (double the amount); instead, the
average article size more than doubled.

Links to other languages grew fast in January rather than December, so
it must have another reason.

> Does anyone have any idea why pl: links increased so dramatically?
> Huge number of link tables?

Given the above, and a normal, moderate grow in the number of active
Wikipedians, it looks like some kind of bot. Checking, what I find is
that the year pages on pl: have recently been augmented with a nice
table giving the ruler at that time in various countries. Whether it is
the reason for your finds, I doubt though.

Andre Engels


puglisi at arcetri

Feb 17, 2004, 4:43 AM

Post #5 of 15 (1273 views)
Permalink
Re: Wikistats [In reply to]

On Tue, 17 Feb 2004, Andre Engels wrote:

>Given the above, and a normal, moderate grow in the number of active
>Wikipedians, it looks like some kind of bot. Checking, what I find is
>that the year pages on pl: have recently been augmented with a nice
>table giving the ruler at that time in various countries. Whether it is
>the reason for your finds, I doubt though.

The table+calendar for each year is around 18Kbytes and >100 links, so if
it has been added to 2000+ years, that's easily enough to account for the
increase in database size, and I suspect in words and links.

ALfio


e.p.zachte at chello

Feb 17, 2004, 7:59 PM

Post #6 of 15 (1271 views)
Permalink
Wikistats [In reply to]

Alfio wrote:

> The table+calendar for each year is around 18Kbytes and >100 links,
> so if it has been added to 2000+ years, that's easily enough to
> account for the increase in database size, and I suspect in words and
links.

Right on the spot. I checked a random year:
http://pl.wikipedia.org/wiki/1880
7 links for weekdays occur 12 times each on the page, other doubles occur.

A bit wasteful in my view, but with new server who cares about performance
:)
Perhaps some other bot will tidy things up sometime.

Anyway, I adapted the stats script: each link will only be counted once per
article.

pl: number of internal links dropped from 598K to 373K = 62%
for comparison
nl: dropped from 312K to 292K = 94%

New Perl scripts will be ready for upload probably tomorrow.
They will also add some data on most active contributors:
edits in last 30 days, ranking now and 30 days ago.

---------

Camille/Shaihulud,

Did you produce the stats on your own PC, from downloaded dumps?
Brion used to do just that until a few months ago (I have trouble
downloading the largest dumps intact myself)
but in recent months ran them directly on the server,
(which is why they were not updated in recent weeks, by the way, with all
that server shuffling).

Downloading all dumps each week seems quite a hassle. But if Brion gave you
server access, fine with me,
one less monkey on his back. I just need to know whom to address for
occasional updates.

Erik Zachte


wikipedia at shaihome

Feb 18, 2004, 3:51 AM

Post #7 of 15 (1279 views)
Permalink
Re: Wikistats [In reply to]

On Wed, 18 Feb 2004 03:59:40 +0100
"Erik Zachte" <e.p.zachte [at] chello> wrote:

> Alfio wrote:
>
> > The table+calendar for each year is around 18Kbytes and >100 links,
> > so if it has been added to 2000+ years, that's easily enough to
> > account for the increase in database size, and I suspect in words and
> links.
>
> Right on the spot. I checked a random year:
> http://pl.wikipedia.org/wiki/1880
> 7 links for weekdays occur 12 times each on the page, other doubles occur.
>
> A bit wasteful in my view, but with new server who cares about performance
> :)
> Perhaps some other bot will tidy things up sometime.
>
> Anyway, I adapted the stats script: each link will only be counted once per
> article.
>
> pl: number of internal links dropped from 598K to 373K = 62%
> for comparison
> nl: dropped from 312K to 292K = 94%
>
> New Perl scripts will be ready for upload probably tomorrow.
> They will also add some data on most active contributors:
> edits in last 30 days, ranking now and 30 days ago.

cool

> ---------
>
> Camille/Shaihulud,
>
> Did you produce the stats on your own PC, from downloaded dumps?
> Brion used to do just that until a few months ago (I have trouble
> downloading the largest dumps intact myself)
> but in recent months ran them directly on the server,
> (which is why they were not updated in recent weeks, by the way, with all
> that server shuffling).

I run it directly on the nfs server which is idle most of time.

> Downloading all dumps each week seems quite a hassle. But if Brion gave you
> server access, fine with me,
> one less monkey on his back. I just need to know whom to address for
> occasional updates.

Yeah, send me the new script to me, less work for Brion now :)

Thanks

Shaihulud


midom.lists at gmail

Aug 31, 2009, 2:03 AM

Post #8 of 15 (1276 views)
Permalink
Re: Wikistats [In reply to]

Hello Anthony,

I'm back at my lair (phew, finally ;-)

> Regarding the files at http://dammit.lt/wikistats/ :
> What are "en.b", "en.d", "en2", etc?

suffixes indicate projects - from http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/filter.c?revision=34989&view=markup
:

projects[] = {
{"wikipedia","",NULL},
{"wiktionary",".d",NULL},
{"wikinews",".n",NULL},
{"wikimedia",".m",check_wikimedia},
{"wikibooks",".b",NULL},
{"wikisource",".s",NULL},
{"mediawiki",".w",NULL},
{"wikiversity",".v",NULL},
{"wikiquote",".q",NULL},
NULL
},

en2 is, um, http://en2.wikipedia.org/ ;-) it used to exist once upon a
time, and apparently there're some referrals.

> Are edits included, or only views?

That is views only - though you can find actual logic in above file,
it is mostly this pattern:

http://*.*.org/wiki/*

which is what we have for special pages and views.

> Are the hit counts actual, or 1/10th sampled, or something else?

They are actual, with duplicates removed (that is, we don't count in
cache-to-cache traffic, only end-user-to-cache).

> pagecounts-20090501-200000.gz<http://dammit.lt/wikistats/pagecounts-20090501-200000.gz
> >is
> the hour *beginning* 20:00:00?

ending, I think. let me check, yes, end time. logic is in
produceDump() at http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/collector.c?revision=30113&view=markup
:)

I think I may end up documenting this somewhat more, but I need to do
some promised and long overdue development on this project.

Domas

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


smolensk at eunet

Aug 31, 2009, 2:24 AM

Post #9 of 15 (1273 views)
Permalink
Re: Wikistats [In reply to]

Domas Mituzas wrote:
> Hello Anthony,
>
> I'm back at my lair (phew, finally ;-)
>
>> Regarding the files at http://dammit.lt/wikistats/ :
>> What are "en.b", "en.d", "en2", etc?
>
> suffixes indicate projects - from http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/filter.c?revision=34989&view=markup
> :
>
> projects[] = {
> {"wikipedia","",NULL},
> {"wiktionary",".d",NULL},
> {"wikinews",".n",NULL},
> {"wikimedia",".m",check_wikimedia},
> {"wikibooks",".b",NULL},
> {"wikisource",".s",NULL},
> {"mediawiki",".w",NULL},
> {"wikiversity",".v",NULL},
> {"wikiquote",".q",NULL},
> NULL
> },
>
> en2 is, um, http://en2.wikipedia.org/ ;-) it used to exist once upon a
> time, and apparently there're some referrals.
>
>> Are edits included, or only views?
>
> That is views only - though you can find actual logic in above file,
> it is mostly this pattern:
>
> http://*.*.org/wiki/*
>
> which is what we have for special pages and views.
>
>> Are the hit counts actual, or 1/10th sampled, or something else?
>
> They are actual, with duplicates removed (that is, we don't count in
> cache-to-cache traffic, only end-user-to-cache).
>
>> pagecounts-20090501-200000.gz<http://dammit.lt/wikistats/pagecounts-20090501-200000.gz
>>> is
>> the hour *beginning* 20:00:00?
>
> ending, I think. let me check, yes, end time. logic is in
> produceDump() at http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/collector.c?revision=30113&view=markup
> :)
>
> I think I may end up documenting this somewhat more, but I need to do
> some promised and long overdue development on this project.

If no one minds, I think I will copy this email to the toolserver wiki :)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Aug 31, 2009, 3:06 AM

Post #10 of 15 (1276 views)
Permalink
Re: Wikistats [In reply to]

Domas Mituzas wrote:
>> Are edits included, or only views?
>
> That is views only - though you can find actual logic in above file,
> it is mostly this pattern:
>
> http://*.*.org/wiki/*
>
> which is what we have for special pages and views.

However, note that after saving an edit, the editor will be sent to a view.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


midom.lists at gmail

Aug 31, 2009, 3:13 AM

Post #11 of 15 (1273 views)
Permalink
Re: Wikistats [In reply to]

Hi,
> However, note that after saving an edit, the editor will be sent to
> a view.

yes, you're absolutely right, but no differentiation is done on that.
technically, you're not editing, you're viewing :)

Domas

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Aug 31, 2009, 3:39 AM

Post #12 of 15 (1273 views)
Permalink
Re: Wikistats [In reply to]

Domas Mituzas wrote:
> Hi,
>> However, note that after saving an edit, the editor will be sent to
>> a view.
>
> yes, you're absolutely right, but no differentiation is done on that.
> technically, you're not editing, you're viewing :)
>
> Domas

I know, but its worth remembering that to people who might want to do
some kind of edit differenciating.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


andreengels at gmail

Aug 31, 2009, 3:51 AM

Post #13 of 15 (1272 views)
Permalink
Re: Wikistats [In reply to]

On Mon, Aug 31, 2009 at 11:03 AM, Domas Mituzas<midom.lists [at] gmail> wrote:

> en2 is, um, http://en2.wikipedia.org/ ;-) it used to exist once upon a
> time, and apparently there're some referrals.

Wikimedia news, October 2003:
--
A portion of traffic to "www.wikipedia.org" will be diverted to
"en2.wikipedia.org", while most of it will go to "en.wikipedia.org",
where all logins will be directed. Until the server configuration is
more stable and transparent load-sharing is set up, this should help
share some of the traffic without burdening the other wikis too
greatly.
--

I think the reason that en got the lion's share is that en2 was on one
machine with the other languages whereas en was on a machine on its
own. At that time apparently en: still had significantly more traffic
than all other languages taken together.

--
André Engels, andreengels [at] gmail

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


midom.lists at gmail

Aug 31, 2009, 3:57 AM

Post #14 of 15 (1284 views)
Permalink
Re: Wikistats [In reply to]

Andre,

> Wikimedia news, October 2003:

Thanks for that! Awesome artifact ;-)

Domas

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at wikimedia

Aug 31, 2009, 4:12 AM

Post #15 of 15 (1270 views)
Permalink
Re: Wikistats [In reply to]

On 8/31/09 7:51 AM, Andre Engels wrote:
> On Mon, Aug 31, 2009 at 11:03 AM, Domas Mituzas<midom.lists [at] gmail> wrote:
>
>> en2 is, um, http://en2.wikipedia.org/ ;-) it used to exist once upon a
>> time, and apparently there're some referrals.
>
> Wikimedia news, October 2003:
> --
> A portion of traffic to "www.wikipedia.org" will be diverted to
> "en2.wikipedia.org", while most of it will go to "en.wikipedia.org",
> where all logins will be directed. Until the server configuration is
> more stable and transparent load-sharing is set up, this should help
> share some of the traffic without burdening the other wikis too
> greatly.
> --
>
> I think the reason that en got the lion's share is that en2 was on one
> machine with the other languages whereas en was on a machine on its
> own. At that time apparently en: still had significantly more traffic
> than all other languages taken together.

Ah, the good old days! Sure glad we figured out Squid soon after that... ;)

-- brion

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.