Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Mediawiki

jibberish

 

 

Wikipedia mediawiki RSS feed   Index | Next | Previous | View Threaded


2007 at gmask

Oct 11, 2007, 10:03 AM

Post #1 of 20 (1597 views)
Permalink
jibberish

There is a bot that keeps posting random words or alphanumeric
sequences to the beginning of pages on my wiki.

Is it possible to warn users that post under a certain of characters to
the beginning of a page?

I don't want to enforce a captcha on every edit.

-Adrian

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


bhorst at mac

Oct 11, 2007, 10:35 AM

Post #2 of 20 (1552 views)
Permalink
Re: jibberish [In reply to]

I was experiencing the same types of "spam," or whatever it is.

I started to enforce a captcha on every edit, which bothers me, but stopped the junk. I hope there's a better solution out there!

Thanks,
Ben

----
OpenOffice and open source blog:
http://www.solidoffice.com/

Wiki business directory:
http://www.wikipages.com/


On Thursday, October 11, 2007, at 01:04PM, "2007 [at] gmask" <2007 [at] gmask> wrote:
>There is a bot that keeps posting random words or alphanumeric
>sequences to the beginning of pages on my wiki.
>
>Is it possible to warn users that post under a certain of characters to
>the beginning of a page?
>
>I don't want to enforce a captcha on every edit.
>
>-Adrian
>
>_______________________________________________
>MediaWiki-l mailing list
>MediaWiki-l [at] lists
>http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>
>

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


chuck at mutualaid

Oct 12, 2007, 8:36 AM

Post #3 of 20 (1539 views)
Permalink
Re: jibberish [In reply to]

Benjamin Horst wrote:
> I was experiencing the same types of "spam," or whatever it is.
>
> I started to enforce a captcha on every edit, which bothers me, but stopped the junk. I hope there's a better solution out there!
>
> Thanks,
> Ben

We're having the same problem with our wikis. It seems that this could
be solved if thereis a switch in MediaWiki that really mandates that
changes be made by registered users.

We are planning to implement the other spam measures mentioned on this list.

Thanks to the advice from last week about how to stop DIV spam.

Chuck

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


2007 at gmask

Oct 12, 2007, 10:19 AM

Post #4 of 20 (1544 views)
Permalink
Re: jibberish [In reply to]

Yea I don't want to stop anonymous users but it seems like that might
be neccessary.. or it would be great if you could captcha new posts
from either new users or unfamiliar IP's.

-Adrian


--- Chuck <chuck [at] mutualaid> wrote:

> Benjamin Horst wrote:
> > I was experiencing the same types of "spam," or whatever it is.
> >
> > I started to enforce a captcha on every edit, which bothers me, but
> stopped the junk. I hope there's a better solution out there!
> >
> > Thanks,
> > Ben
>
> We're having the same problem with our wikis. It seems that this
> could
> be solved if thereis a switch in MediaWiki that really mandates that
> changes be made by registered users.
>
> We are planning to implement the other spam measures mentioned on
> this list.
>
> Thanks to the advice from last week about how to stop DIV spam.
>
> Chuck
>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l [at] lists
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>




_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


chuck at mutualaid

Oct 15, 2007, 8:16 PM

Post #5 of 20 (1542 views)
Permalink
Re: jibberish [In reply to]

Has anybody found a solution for the gibberish spam short of installing
captcha extensions?

Chuck

2007 [at] gmask wrote:
> Yea I don't want to stop anonymous users but it seems like that might
> be neccessary.. or it would be great if you could captcha new posts
> from either new users or unfamiliar IP's.
>
> -Adrian
>
>
> --- Chuck <chuck [at] mutualaid> wrote:
>
>
>>Benjamin Horst wrote:
>>
>>>I was experiencing the same types of "spam," or whatever it is.
>>>
>>>I started to enforce a captcha on every edit, which bothers me, but
>>
>>stopped the junk. I hope there's a better solution out there!
>>
>>>Thanks,
>>>Ben
>>
>>We're having the same problem with our wikis. It seems that this
>>could
>>be solved if thereis a switch in MediaWiki that really mandates that
>>changes be made by registered users.
>>
>>We are planning to implement the other spam measures mentioned on
>>this list.
>>
>>Thanks to the advice from last week about how to stop DIV spam.
>>
>>Chuck
>>
>>_______________________________________________
>>MediaWiki-l mailing list
>>MediaWiki-l [at] lists
>>http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>>
>
>
>
>
>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l [at] lists
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>
>


--

--------------------------
Bread and Roses Web Design
serving small businesses, non-profits, artists and activists
http://www.breadandrosesweb.com/

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


dan.bolser at gmail

Oct 16, 2007, 12:31 AM

Post #6 of 20 (1535 views)
Permalink
Re: jibberish [In reply to]

On 16/10/2007, Chuck <chuck [at] mutualaid> wrote:
> Has anybody found a solution for the gibberish spam short of installing
> captcha extensions?

Not here. I set anonymous edits to false and installed reCaptcha.

Installing reCaptcha doesn't really constitute a big change to the
wiki (its quite unobtrusive really), but disallowing anonymous edits
is a pain. This is especially true of pre 1.11.0 MW versions where
'view source' doesn't seem to work 'out of the box'.

In theory there should be a simple SQL query to detect these kinds of
spam (one nonsense word at the start of a page) - However, it seems
better to code a general solution that highlights potential spam for
review. Its keeping track of the potentially spammed pages that I find
most difficult.

Anyone handy with Bayesian filters? If we could rank edits by
'spaminess' using a Bayesian filter, and be given the option to review
the top n most spammy revisions (with feedback training) ... well...
that would be great!

Send all your edits to a gmail account and only allow those that get
forwarded back?


> Chuck
>
> 2007 [at] gmask wrote:
> > Yea I don't want to stop anonymous users but it seems like that might
> > be neccessary.. or it would be great if you could captcha new posts
> > from either new users or unfamiliar IP's.
> >
> > -Adrian
> >
> >
> > --- Chuck <chuck [at] mutualaid> wrote:
> >
> >
> >>Benjamin Horst wrote:
> >>
> >>>I was experiencing the same types of "spam," or whatever it is.
> >>>
> >>>I started to enforce a captcha on every edit, which bothers me, but
> >>
> >>stopped the junk. I hope there's a better solution out there!
> >>
> >>>Thanks,
> >>>Ben
> >>
> >>We're having the same problem with our wikis. It seems that this
> >>could
> >>be solved if thereis a switch in MediaWiki that really mandates that
> >>changes be made by registered users.
> >>
> >>We are planning to implement the other spam measures mentioned on
> >>this list.
> >>
> >>Thanks to the advice from last week about how to stop DIV spam.
> >>
> >>Chuck
> >>
> >>_______________________________________________
> >>MediaWiki-l mailing list
> >>MediaWiki-l [at] lists
> >>http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > MediaWiki-l mailing list
> > MediaWiki-l [at] lists
> > http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> >
> >
>
>
> --
>
> --------------------------
> Bread and Roses Web Design
> serving small businesses, non-profits, artists and activists
> http://www.breadandrosesweb.com/
>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l [at] lists
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>


--
hello

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


ek79501 at yahoo

Oct 16, 2007, 4:56 AM

Post #7 of 20 (1534 views)
Permalink
Re: jibberish [In reply to]

I think making an IP do a captcha on its first edit only would help. The captcha would keep a record of the most recent IP's in a table and if an edit hasnt been recorded from them, give them a captcha, otherwise pass. This may mean the IP table would grow large (may not be fast for large wikis with lots of editing), but purge it a certain period (15 days etc). Captcha has to be there one way or the other. This is least irritating.
There is definitely no way to check if an edit is spam or not, except for capthcha.


Dan Bolser <dan.bolser [at] gmail> wrote:
On 16/10/2007, Chuck wrote:
> Has anybody found a solution for the gibberish spam short of installing
> captcha extensions?

Not here. I set anonymous edits to false and installed reCaptcha.

Installing reCaptcha doesn't really constitute a big change to the
wiki (its quite unobtrusive really), but disallowing anonymous edits
is a pain. This is especially true of pre 1.11.0 MW versions where
'view source' doesn't seem to work 'out of the box'.

In theory there should be a simple SQL query to detect these kinds of
spam (one nonsense word at the start of a page) - However, it seems
better to code a general solution that highlights potential spam for
review. Its keeping track of the potentially spammed pages that I find
most difficult.

Anyone handy with Bayesian filters? If we could rank edits by
'spaminess' using a Bayesian filter, and be given the option to review
the top n most spammy revisions (with feedback training) ... well...
that would be great!

Send all your edits to a gmail account and only allow those that get
forwarded back?


> Chuck
>
> 2007 [at] gmask wrote:
> > Yea I don't want to stop anonymous users but it seems like that might
> > be neccessary.. or it would be great if you could captcha new posts
> > from either new users or unfamiliar IP's.
> >
> > -Adrian
> >
> >
> > --- Chuck wrote:
> >
> >
> >>Benjamin Horst wrote:
> >>
> >>>I was experiencing the same types of "spam," or whatever it is.
> >>>
> >>>I started to enforce a captcha on every edit, which bothers me, but
> >>
> >>stopped the junk. I hope there's a better solution out there!
> >>
> >>>Thanks,
> >>>Ben
> >>
> >>We're having the same problem with our wikis. It seems that this
> >>could
> >>be solved if thereis a switch in MediaWiki that really mandates that
> >>changes be made by registered users.
> >>
> >>We are planning to implement the other spam measures mentioned on
> >>this list.
> >>
> >>Thanks to the advice from last week about how to stop DIV spam.
> >>
> >>Chuck
> >>
> >>_______________________________________________
> >>MediaWiki-l mailing list
> >>MediaWiki-l [at] lists
> >>http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > MediaWiki-l mailing list
> > MediaWiki-l [at] lists
> > http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> >
> >
>
>
> --
>
> --------------------------
> Bread and Roses Web Design
> serving small businesses, non-profits, artists and activists
> http://www.breadandrosesweb.com/
>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l [at] lists
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>


--
hello

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l



---------------------------------
Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos.
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


dan.bolser at gmail

Oct 16, 2007, 5:37 AM

Post #8 of 20 (1536 views)
Permalink
Re: jibberish [In reply to]

On 16/10/2007, Eric K <ek79501 [at] yahoo> wrote:
> I think making an IP do a captcha on its first edit only would help. The captcha would keep a record of the most recent IP's in a table and if an edit hasnt been recorded from them, give them a captcha, otherwise pass. This may mean the IP table would grow large (may not be fast for large wikis with lots of editing), but purge it a certain period (15 days etc). Captcha has to be there one way or the other. This is least irritating.

I like the idea.


> There is definitely no way to check if an edit is spam or not, except for capthcha.

Not currently. I think a 'review for spam' feature would work very
well for most small sites.



>
> Dan Bolser <dan.bolser [at] gmail> wrote:
> On 16/10/2007, Chuck wrote:
> > Has anybody found a solution for the gibberish spam short of installing
> > captcha extensions?
>
> Not here. I set anonymous edits to false and installed reCaptcha.
>
> Installing reCaptcha doesn't really constitute a big change to the
> wiki (its quite unobtrusive really), but disallowing anonymous edits
> is a pain. This is especially true of pre 1.11.0 MW versions where
> 'view source' doesn't seem to work 'out of the box'.
>
> In theory there should be a simple SQL query to detect these kinds of
> spam (one nonsense word at the start of a page) - However, it seems
> better to code a general solution that highlights potential spam for
> review. Its keeping track of the potentially spammed pages that I find
> most difficult.
>
> Anyone handy with Bayesian filters? If we could rank edits by
> 'spaminess' using a Bayesian filter, and be given the option to review
> the top n most spammy revisions (with feedback training) ... well...
> that would be great!
>
> Send all your edits to a gmail account and only allow those that get
> forwarded back?
>
>
> > Chuck
> >
> > 2007 [at] gmask wrote:
> > > Yea I don't want to stop anonymous users but it seems like that might
> > > be neccessary.. or it would be great if you could captcha new posts
> > > from either new users or unfamiliar IP's.
> > >
> > > -Adrian
> > >
> > >
> > > --- Chuck wrote:
> > >
> > >
> > >>Benjamin Horst wrote:
> > >>
> > >>>I was experiencing the same types of "spam," or whatever it is.
> > >>>
> > >>>I started to enforce a captcha on every edit, which bothers me, but
> > >>
> > >>stopped the junk. I hope there's a better solution out there!
> > >>
> > >>>Thanks,
> > >>>Ben
> > >>
> > >>We're having the same problem with our wikis. It seems that this
> > >>could
> > >>be solved if thereis a switch in MediaWiki that really mandates that
> > >>changes be made by registered users.
> > >>
> > >>We are planning to implement the other spam measures mentioned on
> > >>this list.
> > >>
> > >>Thanks to the advice from last week about how to stop DIV spam.
> > >>
> > >>Chuck
> > >>
> > >>_______________________________________________
> > >>MediaWiki-l mailing list
> > >>MediaWiki-l [at] lists
> > >>http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> > >>
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > MediaWiki-l mailing list
> > > MediaWiki-l [at] lists
> > > http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> > >
> > >
> >
> >
> > --
> >
> > --------------------------
> > Bread and Roses Web Design
> > serving small businesses, non-profits, artists and activists
> > http://www.breadandrosesweb.com/
> >
> > _______________________________________________
> > MediaWiki-l mailing list
> > MediaWiki-l [at] lists
> > http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> >
>
>
> --
> hello
>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l [at] lists
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>
>
>
> ---------------------------------
> Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos.
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l [at] lists
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>


--
hello

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


chuck at mutualaid

Oct 16, 2007, 8:04 AM

Post #9 of 20 (1542 views)
Permalink
Re: jibberish [In reply to]

This gibberish spam doesn't make much sense, pardon the pun. The spambot
isn't inserting any actual links. My wikis are getting spammed with
short text strings like "copasnotra" and "romonboel". Based on my
limited understanding of spambots, it seems like the bots are making
these changes as a prelude to doing something else.

After some further investigation, some interesting clues emerge. This
"gibberish spambot" is evidently generating fake user accounts. I
deleted hundreds of fake accounts last night from the four wiki
databases that we run. The spambot is surprisingly doing something that
should make it easy to stop them: all of their fake user accounts
include an email address from the ".ru" domain. The user names are all
different, but the spambot only uses a limited number of fake email
addresses from the .ru domain. Would it be possible to reject user
registrations with code that rejects anything from a certain domain?

Another facet of this problem is that this spambot is using proxy ISPs
or rotating fake IP addresses. In my experience, this is a common method
that spambots use to defeat easy anti-spam measures like server level IP
blocking.

Now that I think about it, I may have thwarted the final stage of this
bot's activities by implementing that spam hack that stops hidden DIV
spam. But our wikis are still getting hit hard by the "gibberish spam".
It's unclear if the hidden DIV spam and the gibberish spam are part of
the same spambots suite of attacks.

Chuck

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


robchur at gmail

Oct 16, 2007, 8:48 AM

Post #10 of 20 (1539 views)
Permalink
Re: jibberish [In reply to]

On 16/10/2007, Eric K <ek79501 [at] yahoo> wrote:
> There is definitely no way to check if an edit is spam or not, except for capthcha.

I have to point out the flaw in that statement, tenuous thought it is
- a CAPTCHA does *not* constitute an anti-spam acid test; all it does
is confirms that, to the best of the test's ability (which might not
count for anything), that we are dealing with a human being, rather
than an automated program.

A human could quite well post spam to his/her heart's content, and
would be able to pass a CAPTCHA (we hope). The default configuration
settings for ConfirmEdit, which CAPTCHA extensions are based upon,
allow registered users to skip these tests, so in theory, one could
set up a spam bot with a few minutes of initial human assistance,
which is why we supplement such things with throttles, "heuristics"
(regular expressions aren't that great in terms of configurability,
but I cling to the hope that one day we'll have decent spam-edit
detection heuristics, even if just for the basics).


Rob Church

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


2007 at gmask

Oct 16, 2007, 9:56 AM

Post #11 of 20 (1542 views)
Permalink
Re: jibberish [In reply to]

--- Chuck <chuck [at] mutualaid> wrote:

> This gibberish spam doesn't make much sense, pardon the pun. The
> spambot
> isn't inserting any actual links. My wikis are getting spammed with
> short text strings like "copasnotra" and "romonboel". Based on my
> limited understanding of spambots, it seems like the bots are making
> these changes as a prelude to doing something else.

This is what is happening to me as well.. but the inserted words are
allways at the beginning of the page which gives me hope in blocking
these types of bot edits with a regex.


-adrian


_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


ek79501 at yahoo

Oct 16, 2007, 11:00 AM

Post #12 of 20 (1544 views)
Permalink
Re: jibberish [In reply to]

I agree, you're right, to be more accurate, captcha only makes certain that a human is editing the page (then to get more technical, complex bots can solve the captcha). Throttling is also necessary - anything to prevent bots from doing the things they do good.



Rob Church <robchur [at] gmail> wrote:
On 16/10/2007, Eric K wrote:
> There is definitely no way to check if an edit is spam or not, except for capthcha.

I have to point out the flaw in that statement, tenuous thought it is
- a CAPTCHA does *not* constitute an anti-spam acid test; all it does
is confirms that, to the best of the test's ability (which might not
count for anything), that we are dealing with a human being, rather
than an automated program.

A human could quite well post spam to his/her heart's content, and
would be able to pass a CAPTCHA (we hope). The default configuration
settings for ConfirmEdit, which CAPTCHA extensions are based upon,
allow registered users to skip these tests, so in theory, one could
set up a spam bot with a few minutes of initial human assistance,
which is why we supplement such things with throttles, "heuristics"
(regular expressions aren't that great in terms of configurability,
but I cling to the hope that one day we'll have decent spam-edit
detection heuristics, even if just for the basics).


Rob Church

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l



---------------------------------
Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos.
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


chuck at mutualaid

Oct 16, 2007, 11:11 AM

Post #13 of 20 (1536 views)
Permalink
Re: jibberish [In reply to]

2007 [at] gmask wrote:

> This is what is happening to me as well.. but the inserted words are
> allways at the beginning of the page which gives me hope in blocking
> these types of bot edits with a regex.

Right. This is the same bot we're having problems with.

Chuck

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


2007 at gmask

Oct 16, 2007, 11:34 AM

Post #14 of 20 (1549 views)
Permalink
Re: jibberish REGEX syntax help [In reply to]

So what would the syntax be to match something that begins at the start
of the page?

Sort of what I'm thinking is to try and match anonymous users who post
under a certain number of characters to the beginning of a page.

But it seems like regex is limited to matching the beginning of a line.

-Adrian

--- Chuck <chuck [at] mutualaid> wrote:

> 2007 [at] gmask wrote:
>
> > This is what is happening to me as well.. but the inserted words
> are
> > allways at the beginning of the page which gives me hope in
> blocking
> > these types of bot edits with a regex.
>
> Right. This is the same bot we're having problems with.
>
> Chuck
>


_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


michaeldaly at kayakwiki

Oct 16, 2007, 12:31 PM

Post #15 of 20 (1534 views)
Permalink
Re: jibberish [In reply to]

2007 [at] gmask wrote:

> This is what is happening to me as well.. but the inserted words are
> allways at the beginning of the page which gives me hope in blocking
> these types of bot edits with a regex.

I was thinking that this could be checked against a dictionary. If the
first "word" inserted is not in the dictionary (for the page's
language), require the user to confirm the save. A bot won't confirm.

This would have to be smart enough to skip wikitext (e.g. don't worry
about "[[Image:"). Similarly, it would choke on obscure acronyms, but a
real person would not likely complain too much.

This could be a hook into the "save" code and only need check for the
first word. However, the bot writer can switch to posting at the end of
the article... Possibly, a scan of the entire page to reject
exceptionally bad spelling might suffice, but will put off some
contributers (and annoy US vs Canadian vs British spellers if the bad
spelling algorithm isn't smart enough to think honour vs honor isn't
that bad).

Mike




_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


desrod at gnu-designs

Oct 17, 2007, 5:06 AM

Post #16 of 20 (1552 views)
Permalink
Re: jibberish [In reply to]

On Tue, 2007-10-16 at 10:04 -0500, Chuck wrote:
> My wikis are getting spammed with short text strings like "copasnotra"
> and "romonboel". Based on my limited understanding of spambots, it
> seems like the bots are making these changes as a prelude to doing
> something else

What they're doing is polluting the database of heuristics, by inserting
either common or nonsense words. For example, if (prior to this tactic),
the amount of "spammy" words in the table (Viagra, etc.) was 80% of the
total number of words, they fill the database with common, nonsense
words to get the quality of the filter to lower itself enough to let the
spammy words back through, by pushing them down below that threshold.

I've seen this used for years while using dspam, but thankfully for us,
dspam has kept us 100% spam-free for years. Not a single spam email or
other garbage in any user's mailbox going on years, with only very
minimal false-positives.

Perhaps a look at their methods, and rolling those in to mediawiki's
anti-spammy comment approach might be worthwhile?


--
David A. Desrosiers
desrod [at] gnu-designs
setuid [at] gmail
http://projects.plkr.org/
Skype...: 860-967-3820
Attachments: signature.asc (0.18 KB)


dan.bolser at gmail

Oct 17, 2007, 5:18 AM

Post #17 of 20 (1534 views)
Permalink
Re: jibberish [In reply to]

On 16/10/2007, Michael Daly <michaeldaly [at] kayakwiki> wrote:
> 2007 [at] gmask wrote:
>
> > This is what is happening to me as well.. but the inserted words are
> > allways at the beginning of the page which gives me hope in blocking
> > these types of bot edits with a regex.
>
> I was thinking that this could be checked against a dictionary. If the
> first "word" inserted is not in the dictionary (for the page's
> language), require the user to confirm the save. A bot won't confirm.
>
> This would have to be smart enough to skip wikitext (e.g. don't worry
> about "[[Image:"). Similarly, it would choke on obscure acronyms, but a
> real person would not likely complain too much.
>
> This could be a hook into the "save" code and only need check for the
> first word. However, the bot writer can switch to posting at the end of
> the article... Possibly, a scan of the entire page to reject
> exceptionally bad spelling might suffice, but will put off some
> contributers (and annoy US vs Canadian vs British spellers if the bad
> spelling algorithm isn't smart enough to think honour vs honor isn't
> that bad).

So; 1) We are all seeing the same kind of spam. 2) We need something
that looks at the whole edit, and isn't based on some trivial aspect
of the particular spam attack (that could easily be changed). 3) We
need something that goes beyond an 'are you a human captcha' - because
such tests are either too infrequent to be useful or too common to be
tenable.

4) What is wrong with a Bayesian (email style) spam filter?

Each edit gets certain attributes set - username and email or IP
address, number of good edits from this user, edit frequency of this
user, edit diff text, etc. - and then the Bayesian filter flags the
edit with a 'level of spamminess'. Depending on configuration spammy
edits can be flat out rejected with multiple spams leading to
automatic bans. Or potential spam can be queued in a special list of
edits for review (the review process being key to learning the
patterns of spam). Such a filter could equally be applied to
vandalism... Also (while I am at it) sysops will have the option to
'mark edit as spam', providing more data for the training algorithm.

So there is only one problem... Were should we start?

Some Googling for PHP code to nick looks promising...

http://www.phpclasses.org/browse/file/9319.html Guestbook Example with
SpamFilter
http://www.squirrelmail.org/plugin_view.php?id=115 uses a Bayesian
algorithm to determine what you consider to be spam.

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


karl at xtronics

Oct 17, 2007, 11:05 AM

Post #18 of 20 (1531 views)
Permalink
Re: jibberish [In reply to]

Dan Bolser wrote:

>
> 4) What is wrong with a Bayesian (email style) spam filter?
Look at bogofilter. There is no reason you couldn't pipe all changes through it - and
creating HAM and SPAM files for sorting and training. I use it for email with very good results.

----------------------------------------------------------------
Karl Schmidt EMail Karl [at] xtronics
Transtronics, Inc. WEB http://xtronics.com
3209 West 9th Street Ph (785) 841-3089
Lawrence, KS 66049 FAX (785) 841-0434

Why are so many spending time watching dark movies about
hopelessness, the macabre, and perversion; why are they reading
books about unfaithfulness and self destruction? Why is nothing
uplifting, also considered 'cool' or entertaining? -kps

----------------------------------------------------------------

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


ChristensenC at BATTELLE

Oct 19, 2007, 1:53 PM

Post #19 of 20 (1509 views)
Permalink
Re: jibberish REGEX syntax help [In reply to]

How about similar to this?
$text = explode("\n", $revision->getText());
If (preg_match($gibberishRegex, $text[0]) ) {
Return "bad user";
} else {
Return "ok";
}

-----Original Message-----
From: mediawiki-l-bounces [at] lists
[mailto:mediawiki-l-bounces [at] lists] On Behalf Of
2007 [at] gmask
Sent: Tuesday, October 16, 2007 2:34 PM
To: mediawiki-l [at] lists
Subject: Re: [Mediawiki-l] jibberish REGEX syntax help

So what would the syntax be to match something that begins at the start
of the page?

Sort of what I'm thinking is to try and match anonymous users who post
under a certain number of characters to the beginning of a page.

But it seems like regex is limited to matching the beginning of a line.

-Adrian

--- Chuck <chuck [at] mutualaid> wrote:

> 2007 [at] gmask wrote:
>
> > This is what is happening to me as well.. but the inserted words
> are
> > allways at the beginning of the page which gives me hope in
> blocking
> > these types of bot edits with a regex.
>
> Right. This is the same bot we're having problems with.
>
> Chuck
>


_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


dan.bolser at gmail

Oct 20, 2007, 12:30 AM

Post #20 of 20 (1506 views)
Permalink
Re: jibberish REGEX syntax help [In reply to]

What will you do when the pattern of spam immediately changes?

On 19/10/2007, Christensen, Courtney <ChristensenC [at] battelle> wrote:
> How about similar to this?
> $text = explode("\n", $revision->getText());
> If (preg_match($gibberishRegex, $text[0]) ) {
> Return "bad user";
> } else {
> Return "ok";
> }
>
> -----Original Message-----
> From: mediawiki-l-bounces [at] lists
> [mailto:mediawiki-l-bounces [at] lists] On Behalf Of
> 2007 [at] gmask
> Sent: Tuesday, October 16, 2007 2:34 PM
> To: mediawiki-l [at] lists
> Subject: Re: [Mediawiki-l] jibberish REGEX syntax help
>
> So what would the syntax be to match something that begins at the start
> of the page?
>
> Sort of what I'm thinking is to try and match anonymous users who post
> under a certain number of characters to the beginning of a page.
>
> But it seems like regex is limited to matching the beginning of a line.
>
> -Adrian
>
> --- Chuck <chuck [at] mutualaid> wrote:
>
> > 2007 [at] gmask wrote:
> >
> > > This is what is happening to me as well.. but the inserted words
> > are
> > > allways at the beginning of the page which gives me hope in
> > blocking
> > > these types of bot edits with a regex.
> >
> > Right. This is the same bot we're having problems with.
> >
> > Chuck
> >
>
>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l [at] lists
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l [at] lists
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>


--
hello

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l [at] lists
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Wikipedia mediawiki RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.