Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Foundation

Wikipedia tracks user behaviour via third party companies

 

 

First page Previous page 1 2 Next page Last page  View All Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded


scream at nonvocalscream

Jun 4, 2009, 10:52 AM

Post #26 of 48 (1941 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aryeh Gregor wrote:
> On Thu, Jun 4, 2009 at 6:01 AM, Neil Harris<usenet [at] tonal>
wrote:
>> Surely this is something which should be possible to block at the
>> MediaWiki level, by suppressing the generation of any HTML that loads
>> any indirect resources (scripts, iframes, images, etc.) whatsoever other
>> than from a clearly defined whitelist of Wikimedia-Foundation-controlled
>> domains?
>
> Not possible as long as we allow JS to be added. See [[halting problem]].
>
> On Thu, Jun 4, 2009 at 6:20 AM, John at Darkstar<vacuum [at] jeb> wrote:
>> User privacy on Wikipedia is is close to a public hoax, pages are
>> transfered unencrypted and with user names in clear text. Anyone with
>> access to a public hub is able to intercept and identify users, in
>> addition to _all_ websites that are referenced during an edit on
>> Wikipedia through correlation of logs.
>
> This only works for getting info on totally random Wikipedia users,
> who happen to edit using your router. This isn't a serious compromise
> of privacy for practical purposes due to the resources required to get
> info on a large number of users, or to target a specific user. Users
> who are concerned about this, however, can use secure.wikimedia.org.
>
> Note that if you make edits, it should be pretty easy for a MITM to
> figure out your IP address even if you're using SSL: 1) Watch all
> traffic going to Wikimedia IP addresses. 2) Guess which traffic
> streams correspond to edits by looking at the amount of data the
> client is sending. 3) Correlate suspected edits with RecentChanges
> over a period of time. Once they know your IP address, if they're a
> MITM, they can still figure out what sites you're accessing, just not
> the exact pages (or exact domain in the case of virtual hosting).
>
> So if you want real privacy against MITMs, you still need to use
> something like Tor, as usual.
>
> On Thu, Jun 4, 2009 at 12:53 PM, Robert Rohde<rarohde [at] gmail> wrote:
>> One idea is the proposal to install the AbuseFilter in a global mode,
>> i.e. rules loaded at Meta that apply everywhere. If that were done
>> (and there are some arguments about whether it is a good idea), then
>> it could be used to block these types of URLs from being installed,
>> even by admins.
>
> No, it wouldn't.
>
> document.write('<script' + ' src="' + 'http://www.go' + 'ogle-an' +
> 'alytics.com/urc' + 'hin.js" type="text/javascript"></script>');
>
> Obviously more complicated obfuscation is possible. JavaScript is
> Turing-complete. You can't reliably figure out whether it will output
> a specific string.
>
> However, perhaps a default AbuseFilter could be installed telling
> admins that installing Analytics is a violation of Foundation policy
> and that they'll get desysopped if they continue. That wouldn't stop
> them from doing it if they were determined, but it might be able to
> trigger an alert to get the appropriate parties to make sure they
> didn't try evading it. Maybe the filter could be installed on Meta
> and local violations could go to Meta logs so stewards will see? Are
> global filters possible right now?
>
> At a bare minimum, such a warning would reduce inadvertent errors.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
Has apache/proxy level filtering been considered?

Jon
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkooCegACgkQR7/9CWL6/5iiLwCgpHiWeKHr4tEoqpO5KY6lQGey
YjwAn2ocnj2zE7Gl8TTs/qCGw2fhYPw8
=I9kq
-----END PGP SIGNATURE-----


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


thomas.dalton at gmail

Jun 4, 2009, 10:59 AM

Post #27 of 48 (1934 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

2009/6/4 Jon <scream [at] nonvocalscream>:
> Has apache/proxy level filtering been considered?

Filtering for what? Javascript is executed client-side, ie. after the
page has gone through the apache servers/proxies.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


rarohde at gmail

Jun 4, 2009, 11:02 AM

Post #28 of 48 (1937 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

On Thu, Jun 4, 2009 at 10:44 AM, Aryeh Gregor
<Simetrical+wikilist [at] gmail> wrote:
> On Thu, Jun 4, 2009 at 12:53 PM, Robert Rohde<rarohde [at] gmail> wrote:
>> One idea is the proposal to install the AbuseFilter in a global mode,
>> i.e. rules loaded at Meta that apply everywhere.  If that were done
>> (and there are some arguments about whether it is a good idea), then
>> it could be used to block these types of URLs from being installed,
>> even by admins.
>
> No, it wouldn't.
>
> document.write('<script' + ' src="' + 'http://www.go' + 'ogle-an' +
> 'alytics.com/urc' + 'hin.js" type="text/javascript"></script>');
>
> Obviously more complicated obfuscation is possible.  JavaScript is
> Turing-complete.  You can't reliably figure out whether it will output
> a specific string.
>
> However, perhaps a default AbuseFilter could be installed telling
> admins that installing Analytics is a violation of Foundation policy
> and that they'll get desysopped if they continue.  That wouldn't stop
> them from doing it if they were determined, but it might be able to
> trigger an alert to get the appropriate parties to make sure they
> didn't try evading it.  Maybe the filter could be installed on Meta
> and local violations could go to Meta logs so stewards will see?  Are
> global filters possible right now?
>
> At a bare minimum, such a warning would reduce inadvertent errors.

Yeah, I meant it could detect and block the inadvertent uses by admins
who think they are doing something cool / clever. Yes, if someone
wants to intentionally ignore the warning and install an obfuscated
URL anyway, they still could; however, doing that is probably grounds
for summary desysop.

Global filters would run from Meta. Logs are intended to be both
global and local. My impression is that global filters have been
technically possible since April, but that there is "social"
resistance to installing them over questions like: who should control
them? when should they be used? how do you ensure that you aren't
blocking good edits to project W when confronting vandalism at X, Y,
and Z? You should talk to Andrew for more details on current status.

-Robert Rohde

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


dgerard at gmail

Jun 4, 2009, 1:12 PM

Post #29 of 48 (1929 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

2009/6/4 Robert Rohde <rarohde [at] gmail>:
> On Thu, Jun 4, 2009 at 10:44 AM, Aryeh Gregor
> <Simetrical+wikilist [at] gmail> wrote:

>> However, perhaps a default AbuseFilter could be installed telling
>> admins that installing Analytics is a violation of Foundation policy
>> and that they'll get desysopped if they continue.  That wouldn't stop

> Yeah, I meant it could detect and block the inadvertent uses by admins
> who think they are doing something cool / clever.


Yeah, the actual problem is not malicious admins - it's admins trying
to do a good and useful thing in good faith, that just happens to be a
massive privacy policy violation.


- d.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


andrew.gray at dunelm

Jun 4, 2009, 1:45 PM

Post #30 of 48 (1932 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

2009/6/4 Unionhawk <unionhawk.sitemod [at] gmail>:

> So how do you propose we enforce this? I'm thinking we need to prevent this
> from happening in the first place. Analytics like this could pretty much
> give checkuser powers to anybody!

There's not that many places where this sort of thing could be
implemented - would it be too impractical to just regularly run a
script to check those for things like Google Analytics links, and
remove them with a polite note when found?

--
- Andrew Gray
andrew.gray [at] dunelm

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


saintonge at telus

Jun 4, 2009, 3:19 PM

Post #31 of 48 (1927 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

David Gerard wrote:
> 2009/6/4 Robert Rohde <rarohde [at] gmail>:
>
>> Aryeh Gregor wrote:
>>
>>> However, perhaps a default AbuseFilter could be installed telling
>>> admins that installing Analytics is a violation of Foundation policy
>>> and that they'll get desysopped if they continue. That wouldn't stop
>>>
>> Yeah, I meant it could detect and block the inadvertent uses by admins
>> who think they are doing something cool / clever.
>>
> Yeah, the actual problem is not malicious admins - it's admins trying
> to do a good and useful thing in good faith, that just happens to be a
> massive privacy policy violation.
>
>
That said, talking to them may bear richer fruits that the blocking and
desysopping by a trigger-happy few.

Ec

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


dgerard at gmail

Jun 4, 2009, 4:35 PM

Post #32 of 48 (1929 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

2009/6/4 Erik Zachte <erikzachte [at] infodisiac>:

> Considering web bugs: comScore also proposed such a scheme to us.
> Apart from the question how much it would bring us that we don't or can't
> figure out ourselves an overriding concern is privacy.


So if we ran our own internal web bug mechanism, with due attention to
privacy, etc - would it do anything for what you do?


- d.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


usenet at tonal

Jun 4, 2009, 6:25 PM

Post #33 of 48 (1924 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

Thomas Dalton wrote:
> 2009/6/4 Jon <scream [at] nonvocalscream>:
>
>> Has apache/proxy level filtering been considered?
>>
>
> Filtering for what? Javascript is executed client-side, ie. after the
> page has gone through the apache servers/proxies.
>

Filtering to remove _all_ Javascript, other than references to
statically maintained Javascript files maintained by Mediawiki's developers.

-- Neil


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


thomas.dalton at gmail

Jun 4, 2009, 6:29 PM

Post #34 of 48 (1925 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

2009/6/5 Neil Harris <usenet [at] tonal>:
> Thomas Dalton wrote:
>> 2009/6/4 Jon <scream [at] nonvocalscream>:
>>
>>> Has apache/proxy level filtering been considered?
>>>
>>
>> Filtering for what? Javascript is executed client-side, ie. after the
>> page has gone through the apache servers/proxies.
>>
>
> Filtering to remove _all_ Javascript, other than references to
> statically maintained Javascript files maintained by Mediawiki's developers.

Well, that's certainly possible, but there are a large number of
legitimate and worthwhile uses of custom javascript. Things like
Twinkle are done with custom javascript and many members of the
community find such tools extremely useful.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


cimonavaro at gmail

Jun 4, 2009, 8:08 PM

Post #35 of 48 (1932 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

Neil Harris wrote:
> John at Darkstar wrote:
>
>> The interesting thing is "who has interest in which users identity".
>> Lets make an example, some organization sets up a site with a honeypot
>> and logs all visitors. Then they correlates that with RC-logs from
>> Wikipedia and then checks out who adds external links back to
>> themselves. They do not need direct access to Wikipedia logs or the raw
>> traffic.
>>
>> There is only one valid reason as I see it to avoid certain stat
>> engines, and that is to block advertising companies from getting
>> information about the readers. The writers does not have any real
>> anonymity at all.
>>
>> John
>>
>>
>
> Indeed they could. But even so, they would still have great difficulty
> in getting more than a small fraction of Wikipedia's readers to both
> visit the honeypot and make an edit that links to it, and the vast
> majority of unaffected users will still avoid being bitten by this
> attack. And even then, they will still only have obtained a mapping
> between the user's current IP and their Wikipedia account, and will
> still have to correlate this back to a personal identity, which is often
> harder than it might seem to be in theory.
>
> The world is a dangerous place, but just because privacy and security
> can never be absolute is not a reason to make good faith efforts to
> preserve it as much of both as reasonably possible within the limits of
> time and resources available.
>
> Just because a door can be knocked down with a sledgehammer (or a wall
> demolished with a pneumatic hammer) is not a reason not to have a lock
> on it, or a door there in the first place.
>
> -- Neil
>
>
>

The Finnish folk saying has it that locks are there against
honest folk, not against thieves.

That is any lock can be compromised by determined enough
pursuivants, but are a significant signal and sense that what
is on the other side is not a matter for all passersby.


Yours,

Jussi-Ville Heiskanen



_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


vacuum at jeb

Jun 4, 2009, 8:27 PM

Post #36 of 48 (1922 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

> Not to mention, as
> far as I know the program is proprietary.

This is an example of whats the real problem here; its not the security
issues but the users political issues.

> I'm not convinced that
> we need to be tracking user behavior at this point in time, or that
> the tradeoffs for doing so are worth any benefits, or that doing so is
> in furtherance of our mission.

One example of a very important solution is to identify missing links
between articles. Articles without parents are a special case. Articles
without children too. Articles where a reoccurring problem persist in a
missing link between two articles are the general case, but this can not
be solved without referrer logging, or better logging to a external
server after a JS-function has identified the logging as necessary.
Unfortunatly such logging can't be done today so we must stick to the
two less than optimum special cases.

John

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


vacuum at jeb

Jun 4, 2009, 8:51 PM

Post #37 of 48 (1924 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

> One idea is the proposal to install the AbuseFilter in a global mode,
> i.e. rules loaded at Meta that apply everywhere. If that were done
> (and there are some arguments about whether it is a good idea), then
> it could be used to block these types of URLs from being installed,
> even by admins.

Identifying client side generated urls from server side opens up a whole
lot of problems of its own. Basically you need a script that runs in a
hostile environment and reports back to a server when a whole series of
urls are injected from code loaded from some sources (mediawiki-space)
but not from other sources user space), still code loaded from user
space through call to mediawiki space should be allowed. Add to this
that your url identifying code has to run after a script has generated
the url and before it do any cleanup. The url verification can't just
say that a url is hostile, it has to check it somehow, and that leads to
reporting of the url - if the reporting code still executes at that
moment. Urk...

John

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


swatjester at gmail

Jun 4, 2009, 9:00 PM

Post #38 of 48 (1924 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

On Jun 4, 2009, at 11:27 PM, John at Darkstar wrote:

>> Not to mention, as
>> far as I know the program is proprietary.
>
> This is an example of whats the real problem here; its not the
> security
> issues but the users political issues.

I fail to see what that has to do with anything. I'm just about as far
from the open-source politics as you can get. Proprietary code can't
be modified to suit our needs.



_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


vacuum at jeb

Jun 4, 2009, 9:02 PM

Post #39 of 48 (1916 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

> On Thu, Jun 4, 2009 at 6:20 AM, John at Darkstar<vacuum [at] jeb> wrote:
>> User privacy on Wikipedia is is close to a public hoax, pages are
>> transfered unencrypted and with user names in clear text. Anyone with
>> access to a public hub is able to intercept and identify users, in
>> addition to _all_ websites that are referenced during an edit on
>> Wikipedia through correlation of logs.
>
> This only works for getting info on totally random Wikipedia users,
> who happen to edit using your router. This isn't a serious compromise
> of privacy for practical purposes due to the resources required to get
> info on a large number of users, or to target a specific user. Users
> who are concerned about this, however, can use secure.wikimedia.org.

Either you have privacy for _all_ users or you have none. If you accept
lesser privacy for some users, at random, several stat aggregation
schemes are possible. Downside is that you have to decide that some
users in fact have less privacy from time to time.

> So if you want real privacy against MITMs, you still need to use
> something like Tor, as usual.

Attacks on Tor is way outside the scoope of this discussion but it is
possible for this kind of sites.

> On Thu, Jun 4, 2009 at 12:53 PM, Robert Rohde<rarohde [at] gmail> wrote:
>> One idea is the proposal to install the AbuseFilter in a global mode,
>> i.e. rules loaded at Meta that apply everywhere. If that were done
>> (and there are some arguments about whether it is a good idea), then
>> it could be used to block these types of URLs from being installed,
>> even by admins.
>
> No, it wouldn't.
>
> document.write('<script' + ' src="' + 'http://www.go' + 'ogle-an' +
> 'alytics.com/urc' + 'hin.js" type="text/javascript"></script>');
>
> Obviously more complicated obfuscation is possible. JavaScript is
> Turing-complete. You can't reliably figure out whether it will output
> a specific string.

You can run a script to inspect the dom-three for external urls and
report back if something suspicious are found but it is highly error
prone and can easily be defeated.

John

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


mrzmanwiki at gmail

Jun 4, 2009, 9:15 PM

Post #40 of 48 (1922 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

John at Darkstar wrote:
>> One idea is the proposal to install the AbuseFilter in a global mode,
>> i.e. rules loaded at Meta that apply everywhere. If that were done
>> (and there are some arguments about whether it is a good idea), then
>> it could be used to block these types of URLs from being installed,
>> even by admins.
>
> Identifying client side generated urls from server side opens up a whole
> lot of problems of its own. Basically you need a script that runs in a
> hostile environment and reports back to a server when a whole series of
> urls are injected from code loaded from some sources (mediawiki-space)
> but not from other sources user space), still code loaded from user
> space through call to mediawiki space should be allowed. Add to this
> that your url identifying code has to run after a script has generated
> the url and before it do any cleanup. The url verification can't just
> say that a url is hostile, it has to check it somehow, and that leads to
> reporting of the url - if the reporting code still executes at that
> moment. Urk...
>

Hmm? There's no reason to do anything like that. The AbuseFilter would
just prevent sitewide JS pages from being saved with the particular URLs
or a particular code block in them. It'll stop the well-meaning but
misguided admins. Short of restricting site JS to the point of
uselessness, you'll never be able to stop determined abusers.

--
Alex (wikipedia:en:User:Mr.Z-man)

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


vacuum at jeb

Jun 4, 2009, 9:28 PM

Post #41 of 48 (1924 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

>
> Is this enough? Of course not, there is so much more to learn.
>

>
> Erik Zachte
>

There are a few very important missing items for the moment
* Number of unique visitors
* Number of page visits per visitors

All should be analyzed on user roles, possibly also on different time
spans (hour, day, week) and likelihood of the user being a real person
or a boot. The overall numbers can then be used for analyzing the squid
logs. Something like this will make it possible to make valid
comparisons with several stat aggregators.

John

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


vacuum at jeb

Jun 4, 2009, 9:49 PM

Post #42 of 48 (1927 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

>
> Hmm? There's no reason to do anything like that. The AbuseFilter would
> just prevent sitewide JS pages from being saved with the particular URLs
> or a particular code block in them. It'll stop the well-meaning but
> misguided admins. Short of restricting site JS to the point of
> uselessness, you'll never be able to stop determined abusers.
>

A very typical code fragment to make a stat url is something like

document.write('<img scr="' + server + digest + '">');

- server is some kind of external url
- digest is just some random garbage to bypass caching

This kind of code exists in so many variants that it is very difficult
to say anything about how it may be implemented. Often it will not use a
document.write on systems like Wikipedia but instead use createElement()
Very often someone claims that the definition of "server" will be
complete and may be used to identify the external server sufficiently.
That is not a valid claim as many such sites can be referred for other
purposes. Note also that the number of urls will be huge as this type of
service is very popular, not to say that anyone that want may set up a
special stat aggregator on an otherwise unknown domain.

Basically, simple regexps are not sufficient for detecting this kind of
code.

Otherwise, take a look at Simetricals earlier post.

John

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


mrzmanwiki at gmail

Jun 4, 2009, 10:52 PM

Post #43 of 48 (1915 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

John at Darkstar wrote:
>> Hmm? There's no reason to do anything like that. The AbuseFilter would
>> just prevent sitewide JS pages from being saved with the particular URLs
>> or a particular code block in them. It'll stop the well-meaning but
>> misguided admins. Short of restricting site JS to the point of
>> uselessness, you'll never be able to stop determined abusers.
>>
>
> A very typical code fragment to make a stat url is something like
>
> document.write('<img scr="' + server + digest + '">');
>
> - server is some kind of external url
> - digest is just some random garbage to bypass caching
>
> This kind of code exists in so many variants that it is very difficult
> to say anything about how it may be implemented. Often it will not use a
> document.write on systems like Wikipedia but instead use createElement()
> Very often someone claims that the definition of "server" will be
> complete and may be used to identify the external server sufficiently.
> That is not a valid claim as many such sites can be referred for other
> purposes.

Other purposes that have valid uses loading 3rd party content on a
Wikimedia wiki? Like what?

> Note also that the number of urls will be huge as this type of
> service is very popular, not to say that anyone that want may set up a
> special stat aggregator on an otherwise unknown domain.
>
> Basically, simple regexps are not sufficient for detecting this kind of
> code.

I don't think I said it would be perfect, the idea isn't to 100% prevent
it, just to try to stop the most obvious cases like Google analytics.

> Otherwise, take a look at Simetricals earlier post.
>
> John
>

--
Alex (wikipedia:en:User:Mr.Z-man)

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


vacuum at jeb

Jun 4, 2009, 11:14 PM

Post #44 of 48 (1927 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

Alex skrev:
> John at Darkstar wrote:
>>> Hmm? There's no reason to do anything like that. The AbuseFilter would
>>> just prevent sitewide JS pages from being saved with the particular URLs
>>> or a particular code block in them. It'll stop the well-meaning but
>>> misguided admins. Short of restricting site JS to the point of
>>> uselessness, you'll never be able to stop determined abusers.
>>>
>> A very typical code fragment to make a stat url is something like
>>
>> document.write('<img scr="' + server + digest + '">');
>>
>> - server is some kind of external url
>> - digest is just some random garbage to bypass caching
>>
>> This kind of code exists in so many variants that it is very difficult
>> to say anything about how it may be implemented. Often it will not use a
>> document.write on systems like Wikipedia but instead use createElement()
>> Very often someone claims that the definition of "server" will be
>> complete and may be used to identify the external server sufficiently.
>> That is not a valid claim as many such sites can be referred for other
>> purposes.
>
> Other purposes that have valid uses loading 3rd party content on a
> Wikimedia wiki? Like what?

If you don't trust other sites you also has to accept that you can't
trust ant kind of «toolserver» where you don't have complete control.
That opens a lot of problems

>> Note also that the number of urls will be huge as this type of
>> service is very popular, not to say that anyone that want may set up a
>> special stat aggregator on an otherwise unknown domain.
>>
>> Basically, simple regexps are not sufficient for detecting this kind of
>> code.
>
> I don't think I said it would be perfect, the idea isn't to 100% prevent
> it, just to try to stop the most obvious cases like Google analytics.

Its not that it won't be perfect, it simply will not work.

John

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


mrzmanwiki at gmail

Jun 4, 2009, 11:32 PM

Post #45 of 48 (1915 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

John at Darkstar wrote:
>
> Alex skrev:
>> John at Darkstar wrote:
>>>> Hmm? There's no reason to do anything like that. The AbuseFilter would
>>>> just prevent sitewide JS pages from being saved with the particular URLs
>>>> or a particular code block in them. It'll stop the well-meaning but
>>>> misguided admins. Short of restricting site JS to the point of
>>>> uselessness, you'll never be able to stop determined abusers.
>>>>
>>> A very typical code fragment to make a stat url is something like
>>>
>>> document.write('<img scr="' + server + digest + '">');
>>>
>>> - server is some kind of external url
>>> - digest is just some random garbage to bypass caching
>>>
>>> This kind of code exists in so many variants that it is very difficult
>>> to say anything about how it may be implemented. Often it will not use a
>>> document.write on systems like Wikipedia but instead use createElement()
>>> Very often someone claims that the definition of "server" will be
>>> complete and may be used to identify the external server sufficiently.
>>> That is not a valid claim as many such sites can be referred for other
>>> purposes.
>> Other purposes that have valid uses loading 3rd party content on a
>> Wikimedia wiki? Like what?
>
> If you don't trust other sites you also has to accept that you can't
> trust ant kind of «toolserver» where you don't have complete control.
> That opens a lot of problems

Its not just a matter of trust, its a matter of use. Why would people be
loading content from or linking to servers used to collect website stats
in the sitewide JS on a Wikimedia wiki?

>>> Note also that the number of urls will be huge as this type of
>>> service is very popular, not to say that anyone that want may set up a
>>> special stat aggregator on an otherwise unknown domain.
>>>
>>> Basically, simple regexps are not sufficient for detecting this kind of
>>> code.
>> I don't think I said it would be perfect, the idea isn't to 100% prevent
>> it, just to try to stop the most obvious cases like Google analytics.
>
> Its not that it won't be perfect, it simply will not work.

And anything more complex would likely be too complicated and/or too
inefficient to be worthwhile.

> John
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


--
Alex (wikipedia:en:User:Mr.Z-man)


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


usenet at tonal

Jun 5, 2009, 3:20 AM

Post #46 of 48 (1911 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

Thomas Dalton wrote:
> 2009/6/5 Neil Harris <usenet [at] tonal>:
>
>> Thomas Dalton wrote:
>>
>>> 2009/6/4 Jon <scream [at] nonvocalscream>:
>>>
>>>
>>>> Has apache/proxy level filtering been considered?
>>>>
>>>>
>>> Filtering for what? Javascript is executed client-side, ie. after the
>>> page has gone through the apache servers/proxies.
>>>
>>>
>> Filtering to remove _all_ Javascript, other than references to
>> statically maintained Javascript files maintained by Mediawiki's developers.
>>
>
> Well, that's certainly possible, but there are a large number of
> legitimate and worthwhile uses of custom javascript. Things like
> Twinkle are done with custom javascript and many members of the
> community find such tools extremely useful.
>
Which is why Twinkle's code should be hosted by WMF in the same way as
the Mediawiki code, and the Twinkle developers given commit access to
that code in the usual way.

Javascript is software, and should be managed like software, not like
wiki content. We don't give every admin commit access to the code
repository, nor should we do so for Javascript.

-- Neil


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Simetrical+wikilist at gmail

Jun 5, 2009, 6:33 AM

Post #47 of 48 (1909 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

On Fri, Jun 5, 2009 at 2:14 AM, John at Darkstar<vacuum [at] jeb> wrote:
> Its not that it won't be perfect, it simply will not work.

It will in most cases if you don't mind some false positives. False
positives would be acceptable if it's just a warning page that the
admin could click through. Check for anything that looks like a URL
that doesn't go to a Wikimedia domain, and if one is being inserted
into MediaWiki:*.js (or MediaWiki:*.css), politely notify the adder
that it's against Wikimedia's privacy policy to include content from
third-party domains, and anyone who adds it may be desysopped. That
would stop well-intended additions, provided the sysop knows English
and/or the message can be translated. And every such warning could be
logged so that stewards or whoever could keep an eye on that site's
CSS/JS for a while to make sure there are no evasion attempts. That
would be quite effective.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


midom.lists at gmail

Jun 7, 2009, 11:02 AM

Post #48 of 48 (1850 views)
Permalink
Re: Wikipedia tracks user behaviour via third party companies [In reply to]

Hello,

> If I were to compile a wishlist of stats things:
> 1. stats.grok.se data for non-Wikipedia projects

the raw data is available, anyone can build anything like that, as
long as they have resources. I've suggested Henrik to opensource his
software, but probably it suffers from "not nice enough to show" yet.

> 3. Pageview stats at <http://dammit.lt/wikistats/> in files based on
> projects. It would be a lot easier for people at the West Flemish
> Wikipedia to analyze statistics themselves if they didn't have to
> download tons of data they don't need.

I'm considering some kind of API, but have to rethink the process
(though some people want to have more data - like country tagging -
instead of less data, hehe ;-), though apparently people who cry for
stats most are also ones that are bashing my actions and attacking
'volunteer developers' , so...

On the other hand, 'tons of data' is just 50MB an hour. :-)

Cheers,
Domas

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

First page Previous page 1 2 Next page Last page  View All Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.