Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Squid status codes, please advice

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


erikzachte at infodisiac

Oct 11, 2009, 12:28 PM

Post #1 of 6 (3938 views)
Permalink
Squid status codes, please advice

The idea is to select edit and submit calls that are relevant to the
usability project and track edit/save ratio of filtered calls over time.
Bots will be filtered, "action=edit&redlink=1,.." will be discarded (as 95%
inadvertent edit calls), and some more.
I would appreciate help in decoding most occurring squid/html statuses:

Here are the relevant html codes from FAQ:
http://wiki.squid-cache.org/SquidFaq/SquidLogs
Also http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

000 Used mostly with UDP traffic.
200 OK
206 Partial Content
301 Moved Permanently
302 Moved Temporarily
400 Bad Request
403 Forbidden
404 Not Found
[417 Expectation Failed]
500 Internal Server Error
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout

The following are frequencies in which the index.php result codes are
found in the 1:1000 sampled squid logs from just over 6 months:

TCP_DENIED/403,action=edit 321390
TCP_DENIED/403,action=submit 33

TCP_MISS/000,action=edit 7352
TCP_MISS/000,action=submit 1186

TCP_MISS/200,action=edit 800200
TCP_MISS/200,action=submit 75768

TCP_MISS/206,action=edit 20
TCP_MISS/206,action=submit 269

TCP_MISS/301,action=edit 662

TCP_MISS/302,action=edit 184217
TCP_MISS/302,action=submit 116141

TCP_MISS/400,action=edit 6
TCP_MISS/403,action=edit 2746
TCP_MISS/404,action=edit 119
TCP_MISS/404,action=submit 206
TCP_MISS/417,action=edit 53
TCP_MISS/417,action=submit 716

TCP_MISS/500,action=edit 362
TCP_MISS/500,action=submit 81
TCP_MISS/502,action=submit 87
TCP_MISS/503,action=edit 7
TCP_MISS/503,action=submit 5878
TCP_MISS/504,action=edit 53
TCP_MISS/504,action=submit 91

Out of these most significant given range and/or frequency are:

TCP_DENIED/403,action=edit 321390
TCP_MISS/000,action=edit 7352
TCP_MISS/200,action=edit 800200
TCP_MISS/302,action=edit 184217

TCP_MISS/000,action=submit 1186
TCP_MISS/200,action=submit 75768
TCP_MISS/302,action=submit 116141

Specific questions:

A
Any idea why there are so many TCP_DENIED/403, are these really failures ?

B
For action=submit the difference between preview and save is in the result
codes right ?
I understood earlier that TCP_MISS/302 is a successful save, right ?
Does that mean TCP_MISS/200 is preview ?

C
For action=edit how to interpret /200 vs /302 ?

D (minor)
Are TCP/000 indeed (invalid) UDP messages ?

Erik Zachte

BTW For all squid status codes from Wikimedia servers see
http://stats.wikimedia.org/wikimedia/squids/SquidReportMethods.htm





_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


rarohde at gmail

Oct 11, 2009, 5:12 PM

Post #2 of 6 (3883 views)
Permalink
Re: Squid status codes, please advice [In reply to]

On Sun, Oct 11, 2009 at 12:28 PM, Erik Zachte <erikzachte [at] infodisiac> wrote:
<snip>

> A
> Any idea why there are so many TCP_DENIED/403, are these really failures ?

TCP_DENIED is usually used for requests that the Squid is configured
to reject at the ACL level without even attempting to contact upstream
servers.

I'm not sure where the squid configuration files for Wikimedia
actually live. Hopefully someone who does know will be able to give
you a precise answer to your question. However, a logical guess would
be if the Squid is configured to reject action=edit requests from
search engine spiders and similar non-human processes. Since such
things are not easily incorporated into robots.txt, blocking at the
squid layer would be a good option for stopping such traffic from
hitting the main servers. That would be my guess. I suspect others
can give a more concrete answer.

> B
> For action=submit the difference between preview and save is in the result
> codes right ?
> I understood earlier that TCP_MISS/302 is a successful save, right ?

Typically.

> Does that mean TCP_MISS/200 is preview ?

Preview, show changes, and aborted saves (e.g. saves stopped by edit
conflicts and similar problems)

> C
> For action=edit how to interpret /200 vs /302 ?

I don't know when action=edit would give a 302. It is obviously very
common, but my attempts to guess where it would come up have failed.
If you can grab some examples of URLs generating the 302 response it
might become clear quickly.

> D (minor)
> Are TCP/000 indeed (invalid) UDP messages ?

No idea.

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Oct 11, 2009, 6:03 PM

Post #3 of 6 (3885 views)
Permalink
Re: Squid status codes, please advice [In reply to]

On Sun, Oct 11, 2009 at 3:28 PM, Erik Zachte <erikzachte [at] infodisiac> wrote:
> A
> Any idea why there are so many TCP_DENIED/403, are these really failures ?

Certain types of requests are blocked at the Squid level for various
reasons. For instance, try wgetting Wikipedia; you'll get a 403
because the default UA headers for such things are blocked. (You're
supposed to use a custom UA header, preferably with contact info, to
make your script distinctive and easily blockable by itself if there's
a problem.) Similarly, try something like this:

http://en.wikipedia.org/&amp;

I assume this kind of thing is what causes those responses.

On Sun, Oct 11, 2009 at 8:12 PM, Robert Rohde <rarohde [at] gmail> wrote:
> However, a logical guess would
> be if the Squid is configured to reject action=edit requests from
> search engine spiders and similar non-human processes. Since such
> things are not easily incorporated into robots.txt, blocking at the
> squid layer would be a good option for stopping such traffic from
> hitting the main servers. That would be my guess. I suspect others
> can give a more concrete answer.

Those things are all blocked in robots.txt:

User-agent: *
Disallow: /w/

That's part of why we use long URLs for everything but page views, so
that they can be neatly blocked from spiders.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


rarohde at gmail

Oct 11, 2009, 6:11 PM

Post #4 of 6 (3882 views)
Permalink
Re: Squid status codes, please advice [In reply to]

On Sun, Oct 11, 2009 at 6:03 PM, Aryeh Gregor
<Simetrical+wikilist [at] gmail> wrote:
> On Sun, Oct 11, 2009 at 3:28 PM, Erik Zachte <erikzachte [at] infodisiac> wrote:
>> A
>> Any idea why there are so many TCP_DENIED/403, are these really failures ?
>
> Certain types of requests are blocked at the Squid level for various
> reasons.  For instance, try wgetting Wikipedia; you'll get a 403
> because the default UA headers for such things are blocked.  (You're
> supposed to use a custom UA header, preferably with contact info, to
> make your script distinctive and easily blockable by itself if there's
> a problem.)  Similarly, try something like this:
>
> http://en.wikipedia.org/&amp;
>
> I assume this kind of thing is what causes those responses.

Actually wget isn't blocked for either pageviews or action=edit based
on a test a minute ago.

> On Sun, Oct 11, 2009 at 8:12 PM, Robert Rohde <rarohde [at] gmail> wrote:
>> However, a logical guess would
>> be if the Squid is configured to reject action=edit requests from
>> search engine spiders and similar non-human processes.  Since such
>> things are not easily incorporated into robots.txt, blocking at the
>> squid layer would be a good option for stopping such traffic from
>> hitting the main servers.  That would be my guess.  I suspect others
>> can give a more concrete answer.
>
> Those things are all blocked in robots.txt:
>
> User-agent: *
> Disallow: /w/
>
> That's part of why we use long URLs for everything but page views, so
> that they can be neatly blocked from spiders.

Excellent point, though I wouldn't be surprised to find that some
disrespectful spiders and bots are also blocked at the squid level.

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at gmail

Oct 12, 2009, 3:30 AM

Post #5 of 6 (3879 views)
Permalink
Re: Squid status codes, please advice [In reply to]

2009/10/12 Robert Rohde <rarohde [at] gmail>:
>> B
>> For action=submit the difference between preview and save is in the result
>> codes right ?
>> I understood earlier that TCP_MISS/302 is a successful save, right ?
>
Upon a successful save, action=submit uses a 302 to redirect to the
page view of the newly updated/created article. So typically, a
successful request to /w/index.php?title=Example&action=submit will
redirect to /wiki/Example using a 302.

>> For action=edit how to interpret /200 vs /302 ?
>
> I don't know when action=edit would give a 302.  It is obviously very
> common, but my attempts to guess where it would come up have failed.
> If you can grab some examples of URLs generating the 302 response it
> might become clear quickly.
>
URLs with &redlink=1 redirect to the page view with a 302.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


midom.lists at gmail

Oct 12, 2009, 6:15 AM

Post #6 of 6 (3879 views)
Permalink
Re: Squid status codes, please advice [In reply to]

Hi!

> Any idea why there are so many TCP_DENIED/403, are these really
> failures ?

99% of TCP_DENIED requests for action=edit has &amp; in URL (broken
clients)

> B
> For action=submit the difference between preview and save is in the
> result
> codes right ?
> I understood earlier that TCP_MISS/302 is a successful save, right ?
> Does that mean TCP_MISS/200 is preview ?

TCP_MISS/200 can be for previews, edit conflicts, filter actions,
anything what adds more steps to save, etc

> C
> For action=edit how to interpret /200 vs /302 ?

&redlink=1

> D (minor)
> Are TCP/000 indeed (invalid) UDP messages ?

Nope, must be something else.

Domas

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.