Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

New statistics stuff

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


midom.lists at gmail

May 17, 2008, 4:29 AM

Post #1 of 4 (95 views)
Permalink
New statistics stuff

Helloes,

There're few new things at http://dammit.lt/wikistats/

1) All projects are included. Non-wikipedia projects will have suffix
in raw data. Suffixes are pretty much self explanatory (haha).

wiktionary: .d
wikinews: .n
wikimedia: .m (meta, commons et al)
wikibooks: .b
wikisource: .s
mediawiki: .w
wikiversity: .v
wikiquote: .q

2) For lazy people there will be daily packages, which will:
- Have a single .tgz archive with per-project files inside (no more
splitting!)
- Um, daily aggregation, instead of hourly
- Pages with low number of reads will not be included (need to have
at least 10 daily visits to be included)
- Files are generally much much smaller ( enwiki daily compressed
filtered data is just 5MB )

For now build process will go back just a week, but over time the
archive may become bigger.
This will also reduce the hourly data retention (unless archive.org
or someone wishes to archive everything)

I'll be also in process of upgrading my box (or maybe moving to new
shiny stats server we may get some day :) - cause it takes an hour to
actually process the data on my 3-year-old flake :)

3) Second number is now actually bytes, in case anyone is interested :)

I've been getting various feedback lately from non-wiki world, where
people use this data for popularity ranking of various bits.


BR,
--
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


bryan.tongminh at gmail

May 17, 2008, 6:59 AM

Post #2 of 4 (83 views)
Permalink
Re: New statistics stuff [In reply to]

On Sat, May 17, 2008 at 1:29 PM, Domas Mituzas <midom.lists[at]gmail.com> wrote:
> Helloes,
>
> There're few new things at http://dammit.lt/wikistats/
>
> 1) All projects are included. Non-wikipedia projects will have suffix
> in raw data. Suffixes are pretty much self explanatory (haha).
>
> wiktionary: .d
> wikinews: .n
> wikimedia: .m (meta, commons et al)
> wikibooks: .b
> wikisource: .s
> mediawiki: .w
> wikiversity: .v
> wikiquote: .q
>
> 2) For lazy people there will be daily packages, which will:
> - Have a single .tgz archive with per-project files inside (no more
> splitting!)
> - Um, daily aggregation, instead of hourly
> - Pages with low number of reads will not be included (need to have
> at least 10 daily visits to be included)
> - Files are generally much much smaller ( enwiki daily compressed
> filtered data is just 5MB )
>
> For now build process will go back just a week, but over time the
> archive may become bigger.
> This will also reduce the hourly data retention (unless archive.org
> or someone wishes to archive everything)
>
> I'll be also in process of upgrading my box (or maybe moving to new
> shiny stats server we may get some day :) - cause it takes an hour to
> actually process the data on my 3-year-old flake :)
>
> 3) Second number is now actually bytes, in case anyone is interested :)
>
> I've been getting various feedback lately from non-wiki world, where
> people use this data for popularity ranking of various bits.
>
>
> BR,
> --
> Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l[at]lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
Nice to see them all! What also would be nice is search statistics.
Currently only Special:Search/* can be found, whereas the major part
of the searches is via index.php?title=Special:Search=&search=xyz or
Special:Search?search=xyz.

Bryan

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

May 17, 2008, 2:28 PM

Post #3 of 4 (81 views)
Permalink
Re: New statistics stuff [In reply to]

Domas Mituzas wrote:
> Helloes,
>
> There're few new things at http://dammit.lt/wikistats/

> (...)

Good :)

> 3) Second number is now actually bytes, in case anyone is interested :)
I don't think so. The second number is always the same as the first one,
so something is wrong there. Although not on your script, as the numbers
are the same on the pagecounts files too.



Wikistats highlight some rather bizarre IE activity: grep for
IE60Fixes.css or shared.css
Didn't manage reproduce when it happens, though.


Also, some article names end in &action=edit, &action=print,
&action=history, &redlink=1...
I guess the filterer has some problems extracting titles from queries to
index.php


_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


midom.lists at gmail

May 17, 2008, 3:28 PM

Post #4 of 4 (81 views)
Permalink
Re: New statistics stuff [In reply to]

Hi!

>
> I don't think so. The second number is always the same as the first
> one,
> so something is wrong there. Although not on your script, as the
> numbers
> are the same on the pagecounts files too.

Oh, the numbers are now different. It was my script being lazy, but
new one has them all.

> Wikistats highlight some rather bizarre IE activity: grep for
> IE60Fixes.css or shared.css
> Didn't manage reproduce when it happens, though.

Its just buggered javascript somewhere, we never reproduced that, but
it is constant :)

> Also, some article names end in &action=edit, &action=print,
> &action=history, &redlink=1...
> I guess the filterer has some problems extracting titles from
> queries to
> index.php

no, it doesn't have any problems, question mark terminates the title,
and only /wiki/Blah titles are accepted.
If anything wrong shows up there, it is because request arrives to
our servers wrong already. :)

--
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]



_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.