Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Possible live mirroring, stealing of brandwith

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


cbrown1023 at comcast

Jul 17, 2007, 10:33 AM

Post #1 of 7 (738 views)
Permalink
Possible live mirroring, stealing of brandwith

We recently receive and OTRS tip
<https://secure.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom
<https://secure.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom&TicketID=
984033> &TicketID=984033>, that the site music.musictnt.com:

".is likely querying WP in real-time to serve-up WP's content +/- as-is, if
not for the the lack of a proper GFDL notice. Aside from the copyright
issues, this way of "Mirroring" is a drain on WP's bandwidth and server
resources."

The user also mentioned that there was no clear place to contact about this
type of problem. He now knows to contact Wikitech-l, but we may wish to
advertise that more. (But that is a separate discussion entirely.)

I have not investigate this matter at all, I am just relaying the message.

Thanks,
Casey Brown
Cbrown1023
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/wikitech-l


en.wp.st47 at gmail

Jul 17, 2007, 10:43 AM

Post #2 of 7 (719 views)
Permalink
Re: Possible live mirroring, stealing of brandwith [In reply to]

I've seen a great deal of live mirrors, and generally the devs tell me
that whenever they block one, it will spring up from another IP, and
that they don't bother. There is/was a page on meta
(http://meta.wikimedia.org/wiki/Live_mirrors) which is very inactive
and which I don't see any comments from devs on, plus reporting en
masse is a hassle.

If there is another more active page, I'd love to know about it,
otherwise I can post here. An old list is at
http://en.wikipedia.org/wiki/User:ST47/Mirrors

On 7/17/07, Casey Brown <cbrown1023 [at] comcast> wrote:
> We recently receive and OTRS tip
> <https://secure.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom
> <https://secure.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom&TicketID=
> 984033> &TicketID=984033>, that the site music.musictnt.com:
>
> ".is likely querying WP in real-time to serve-up WP's content +/- as-is, if
> not for the the lack of a proper GFDL notice. Aside from the copyright
> issues, this way of "Mirroring" is a drain on WP's bandwidth and server
> resources."
>
> The user also mentioned that there was no clear place to contact about this
> type of problem. He now knows to contact Wikitech-l, but we may wish to
> advertise that more. (But that is a separate discussion entirely.)
>
> I have not investigate this matter at all, I am just relaying the message.
>
> Thanks,
> Casey Brown
> Cbrown1023
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> http://lists.wikimedia.org/mailman/listinfo/wikitech-l
>


--
ST47
Editor, en.wikipedia

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/wikitech-l


thomas.dalton at gmail

Jul 17, 2007, 10:44 AM

Post #3 of 7 (711 views)
Permalink
Re: Possible live mirroring, stealing of brandwith [In reply to]

> The user also mentioned that there was no clear place to contact about this
> type of problem. He now knows to contact Wikitech-l, but we may wish to
> advertise that more. (But that is a separate discussion entirely.)

I thought the procedure was to report it to
http://meta.wikimedia.org/wiki/Live_mirrors

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/wikitech-l


thomas.dalton at gmail

Jul 17, 2007, 10:45 AM

Post #4 of 7 (704 views)
Permalink
Re: Possible live mirroring, stealing of brandwith [In reply to]

> I thought the procedure was to report it to
> http://meta.wikimedia.org/wiki/Live_mirrors

PS In fact, that page explicitly says *not* to report them to this
mailing list. We definitely need to get our message straight.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/wikitech-l


mjvqsl at gmail

Jul 17, 2007, 9:14 PM

Post #5 of 7 (699 views)
Permalink
Re: Possible live mirroring, stealing of brandwith [In reply to]

Thomas Dalton <thomas.dalton@...> writes:

>
> > I thought the procedure was to report it to
> > http://meta.wikimedia.org/wiki/Live_mirrors
>
> PS In fact, that page explicitly says *not* to report them to this
> mailing list. We definitely need to get our message straight.
>

I am the user who initiated this thread; it was kindly posted by Casey B, as I
was unsure as what to do.
Apparently we have two distinct issues to deal with:
#1: Improve the FAQ and self-help messages so that folks who wish to report or
act upon these "live mirrors" would know what to do (and not add noise to this
group).
#2: Figure out if some form of IP-based filtering or other deterrent should be
used against this particular site, and/or "live mirrors" in general.

To address #2 first:
>> ... and generally the devs tell me that whenever they block one,
>> it will spring up from another IP, and that they don't bother ...
This indicates that WP tech folks are generally discouraged about implementing
any IP filtering as the sites tend to work around such measures.
That's a fair position: "Let's not do anything unless it becomes too much of a
resource drain".
As a occasional contributor, I certainly won't try and tell more dedicated or
permanent folks what to do. My only suggestion is
to maybe mine the web usage logs/stats with the goal of identifying the worst
offenders and possibly target these above a particular threshold for action
(GFDL emails / propose them off-line mirroring / filter to deny service or to
return "bogus" pages)

Of course, if this type of abuse eventually becomes too much of a nuisance, one
could introduce a semi-automated way to red-tag the offending IPs;
discussing ideas about how to achieve this is obviously beyond the scope of this
thread, and indeed probably a topic for a more private forum, lest we help the
would-be-offenders, by offering to much transparency.

Now, _because_ of this potentially lax enforcement, the issue #1 should be dealt
with particular caution and with the following goals:
-be clear and easily located in the appropriate help / FAQ / Wizards
-provide a [simple] procedure of sorts that would be satisfactory to WP users
who try and report this type of abuse
-include some language that may discourage potential implementers of "live
mirrors", or for the least not hint in any way at the fact that WP currently
doesn't do anything about this issue.


In the spirit of moving forward, here's a draft for something that may serve the
above.
[.Attention, IANAL and quite the newbie with regards to WP's policies. What
follows certainly requires review by more qualified people]

Live "mirror" sites:
===================
Some sites query WP behind the scene and integrate WP's pages' content, verbatim
or somewhat modified, within their own web pages.
This practice is illegal, _even_ if the resulting page includes the proper GDFL
notice and WP credit.
One should ensure sure that such sites are actually live "mirrors" rather than
off-line (legal) mirrors. For example one can check that recently modified
pages such as these listed in http://en.wikipedia.org/wiki/Special:Recentchanges
are in effect provided at the suspected site in their latest version.
Such sites should be reported on http://meta.wikimedia.org/wiki/Live_mirrors so
they can be blocked and/or legal action may be undertaken if appropriate.
Site managers who wish to provide a regular mirror (legal) of WP can do so by
following the instructions at http://en.wikipedia.org/wiki/Wikipedia:Forking_FAQ.



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Jul 17, 2007, 9:39 PM

Post #6 of 7 (702 views)
Permalink
Re: Possible live mirroring, stealing of brandwith [In reply to]

The basic fact of the matter is, Wikipedia is a top-ten website. The
number of websites that are large enough to cause any noticeable
effect on server performance by live mirroring is probably in the
hundreds. Google could literally (I once did some quick calculations)
hotlink a Wikimedia image on their front page without much slowing
down the image servers. The only reasons Wikimedia has to discourage
unapproved live mirroring are 1) it can and does get money from
commercially-operated sites for that privilege and 2) we don't, in
principle, want people using Wikipedia content without proper GFDL
compliance. #2 is a pretty weak reason to spend developer-hours on
whack-a-mole, and in the case of #1, practically all of the mirrors
would either just stop using Wikipedia content or make use of dumps
instead, gaining nothing for the Foundation. So if someone wants to
make a script that will find and block these things, okay, but it's
not a very high priority.

Or at least that's my two cents, as a non-sysadmin. By all means
update the docs, scaring people is good. ;) It's a wiki, feel free.
Your text looks okay (although I'm not clear on whether it's actually
illegal to hotlink content without permission).

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Jul 18, 2007, 5:03 AM

Post #7 of 7 (701 views)
Permalink
Re: Possible live mirroring, stealing of brandwith [In reply to]

Marc Veillet wrote:
> Thomas Dalton <thomas.dalton@...> writes:
>
>>> I thought the procedure was to report it to
>>> http://meta.wikimedia.org/wiki/Live_mirrors
>> PS In fact, that page explicitly says *not* to report them to this
>> mailing list. We definitely need to get our message straight.
>>
>
> I am the user who initiated this thread; it was kindly posted by Casey B, as I
> was unsure as what to do.
> Apparently we have two distinct issues to deal with:
> #1: Improve the FAQ and self-help messages so that folks who wish to report or
> act upon these "live mirrors" would know what to do (and not add noise to this
> group).
> #2: Figure out if some form of IP-based filtering or other deterrent should be
> used against this particular site, and/or "live mirrors" in general.

There IS such filtering. And I've seen live mirrors getting such block.
My understanding was that we still filtered them.


> To address #2 first:
> >> ... and generally the devs tell me that whenever they block one,
> >> it will spring up from another IP, and that they don't bother ...
> This indicates that WP tech folks are generally discouraged about implementing
> any IP filtering as the sites tend to work around such measures.
> That's a fair position: "Let's not do anything unless it becomes too much of a
> resource drain".
There was a discussion about their workarounds, regarding a site
mirroring wikipedia by proxy. We can deny access to wikipedia for any
proxy they use. Problem is, this also affects proxies used by legitimate
readers.


> As a occasional contributor, I certainly won't try and tell more dedicated or
> permanent folks what to do. My only suggestion is
> to maybe mine the web usage logs/stats with the goal of identifying the worst
> offenders and possibly target these above a particular threshold for action
> (GFDL emails / propose them off-line mirroring / filter to deny service or to
> return "bogus" pages)

If not filtering them, having some list of them for usage comparing
could be good.

We might serve them pages with a notice, or advertisements (as was
proposed some time ago) but the mirrors will simply strip them.


If filtering is so much a trouble for sysadmins (is it?), it could be
done by stewards/meta-admins. Add to a list synchronized each X time.



PS: the image section of http://meta.wikimedia.org/wiki/Live_mirrors
should be clearer about if the live mirrors 'hotlink' or 'proxy' the images.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.