Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech
Re: Page views
 

Index | Next | Previous | View Flat


lars at aronsson

Apr 11, 2012, 3:31 AM


Views: 132
Permalink
Re: Page views [In reply to]

On 04/11/2012 01:45 AM, Erik Zachte wrote:
> Here are some numbers on total bot burden:
>
> 1)
> http://stats.wikimedia.org/wikimedia/squids/SquidReportCrawlers.htm states
> for March 2012:
>
> In total 69.5 M page requests (mime type text/html only!) per day are
> considered crawler requests, out of 696 M page requests (10.0%) or 469 M
> external page requests (14.8%). About half (35.1 M) of crawler requests come
> from Google.

The fraction will be larger than average (larger than 10%) for
a) sites with many small pages (Wiktionary) and
b) sites in languages with a smaller audience (Swedish sites).
Bots will index these pages as they are found, but each
of these pages can expect fewer search hits and less human
traffic than long articles (Wikipedia) in languages with many
speakers (English). The bot traffic is like a constant
background noise, and the human traffic is the signal on top.
Sites with many small pages and a small audience will have
a lower signal-to-noise ratio. The long tail of seldom
visited pages is drowning in that noise.

I should disclose that I "work for the competition". I tried
to add books to Wikisource, but its complexity slows me down
so I'm now focusing on my own Scandinavian book scanning
website Project Runeberg, http://runeberg.org/

It has 700,000 scanned book pages, the same size as the
English Wikisource, which is a large number of pages for
a small language audience (mostly Swedish). Yesterday,
April 10, its Apache access log had 291,000 hits, of which
116,000 are HTML pages, but 71,000 match bot/spider/crawler,
leaving only 45,000 human page views. If Swedish Wikisource
which is 1/20 that size would get 10-13 thousand human page
views per day or 1/4 of that web traffic, I'd be surprised.
It is more likely that 71/116 = 61% is bot traffic.

(Are we competitors? Really not. We're both liberating
content. Swedish Wikipedia has more external links
to runeberg.org than to any other website.)


--
Lars Aronsson (lars [at] aronsson)
Aronsson Datateknik - http://aronsson.se

Project Runeberg - free Nordic literature - http://runeberg.org/


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Subject User Time
Page views lars at aronsson Apr 7, 2012, 5:24 PM
    Re: Page views ezachte at wikimedia Apr 8, 2012, 12:16 PM
        Re: Page views emijrp at gmail Apr 8, 2012, 12:20 PM
        Re: Page views srik.lak at gmail Apr 9, 2012, 12:30 AM
    Re: Page views dvanliere at gmail Apr 9, 2012, 12:28 PM
    Re: Page views ezachte at wikimedia Apr 9, 2012, 3:30 PM
    Re: Page views ezachte at wikimedia Apr 10, 2012, 4:45 PM
        Re: Page views lars at aronsson Apr 11, 2012, 3:31 AM
        Re: Page views dvanliere at gmail Apr 11, 2012, 5:48 AM

  Index | Next | Previous | View Flat
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.