Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

[Toolserver-l] Archive of visitor stats

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


erikzachte at infodisiac

Sep 18, 2009, 7:45 PM

Post #1 of 2 (233 views)
Permalink
[Toolserver-l] Archive of visitor stats

> Making the script aware of namespace names would be quite easy.

Yes it is more a matter of priority than feasibility.

I already use localized namespace names in wikistats, obviously.
Without those the dumps can't be interpreted.
Each xml (full) archive dump starts with list of localized namespace names.

I also parse php files for localization of reserved words like #REDIRECT
And parse other php files for language names translations
And extract many more language name translations from wp:en interwiki links
via api.

But every such action takes time, needs safeguards (files can be moved, can
be temporary inaccessible,
formats change, maybe not in xml, but in php for sure) and requires
occasional attention for maintenance.

So for a housekeeping job where really almost no-one seemed to care about at
the time,
I just chose to keep it simple (this particular optimization can always be
retrofitted).

If we find a better place to store them than on the wikistats server we
might be able to store them
unfiltered, but still condensed as one daily file, as this speeds up
processing greatly,
or maybe repackaged into a monthly file per wiki.

Erik Zachte




_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


lars at aronsson

Sep 19, 2009, 6:38 AM

Post #2 of 2 (207 views)
Permalink
Re: [Toolserver-l] Archive of visitor stats [In reply to]

Earlier, I wrote:

> Are visitor stats (as produced by Domas) safely archived
> somewhere...?

As an experiment, I uploaded the files for December 2007 to the
Internet Archive,
http://www.archive.org/details/wikipedia_visitor_stats_200712

It was the first time I uploaded something to IA, and since this
was not sound or movies, it was put under "opensource books".
Even though I have a 100 Mbit/s connection, the FTP upload only
got 2.5 Mbit/s (317 kB/s) and the entire upload took 12 hours.

Even though the pagecounts files (each covering one hour) are
compressed, each one contains the same dictionary (article titles)
and I think the total could be more efficiently compressed
(without loss of any information) if they were unpacked and
organized differently. I don't really have the time and energy to
investigate this.

Now I would feel less frustrated if these are removed from my
disk.

Should I continue to do this for the files for 2008, one batch per
month? Or do you have any better ideas?


--
Lars Aronsson (lars[at]aronsson.se)
Aronsson Datateknik - http://aronsson.se

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.