Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

user email validation ready

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


hashar+wmf at free

Jan 24, 2011, 1:55 PM

Post #1 of 14 (1374 views)
Permalink
user email validation ready

Hi,

We got the email validation stuff sorted out properly tonight. We even
have javascript tests (thanks Krinkle)!

Revisions got reviewed by Brion and bugs 959 & 22449 are now fixed.

I opened bug https://bugzilla.wikimedia.org/26910 as a merge request for
Roan.

Thanks everyone!

--
Ashar Voultoiz


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


conrad.irwin at gmail

Jan 24, 2011, 2:08 PM

Post #2 of 14 (1349 views)
Permalink
Re: user email validation ready [In reply to]

Out of interest, do you know what percentage of emails in the database
don't validate under the new scheme?

Conrad

On 24 January 2011 13:55, Ashar Voultoiz <hashar+wmf [at] free> wrote:
> Hi,
>
> We got the email validation stuff sorted out properly tonight. We even
> have javascript tests (thanks Krinkle)!
>
> Revisions got reviewed by Brion and bugs 959 & 22449 are now fixed.
>
> I opened bug https://bugzilla.wikimedia.org/26910 as a merge request for
> Roan.
>
> Thanks everyone!
>
> --
> Ashar Voultoiz
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at pobox

Jan 24, 2011, 2:23 PM

Post #3 of 14 (1340 views)
Permalink
Re: user email validation ready [In reply to]

On Mon, Jan 24, 2011 at 2:08 PM, Conrad Irwin <conrad.irwin [at] gmail>wrote:

> Out of interest, do you know what percentage of emails in the database
> don't validate under the new scheme?
>

That's actually a wise thing to check -- most fails will probably be
legitimately bogus entries, but if we can find any that don't validate but
*do* work (eg they've been confirmed as functional) that's info we need to
report upstream as well -- the new code is using the specs for HTML 5's
client-side form validation, which is starting to go into the latest
generation of browsers.

In theory the validation rules should be pretty liberal, and you should need
to do something very esoteric to not pass. (The old validation regexes from
~2004-2005 got kicked out for failing to deal with things like '+' which
turned out to be more common than we thought.)

Folks actually already pushed a fix upstream to the whatwg spec page to
allow single-part domains like 'localhost', needed for local-network testing
and perhaps some weird intranet setups.

-- brion
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


billinghurst at gmail

Jan 24, 2011, 3:50 PM

Post #4 of 14 (1343 views)
Permalink
Re: user email validation ready [In reply to]

It would seem that the bugzilla
https://bugzilla.wikimedia.org/show_bug.cgi?id=23710
would fall under that category, and to note that it is still marked as
new. Can it be tied to this process?

Regards, Andrew


Quoting Brion Vibber <brion [at] pobox>:

> On Mon, Jan 24, 2011 at 2:08 PM, Conrad Irwin <conrad.irwin [at] gmail>wrote:
>
>> Out of interest, do you know what percentage of emails in the database
>> don't validate under the new scheme?
>>
>
> That's actually a wise thing to check -- most fails will probably be
> legitimately bogus entries, but if we can find any that don't validate but
> *do* work (eg they've been confirmed as functional) that's info we need to
> report upstream as well -- the new code is using the specs for HTML 5's
> client-side form validation, which is starting to go into the latest
> generation of browsers.
>
> In theory the validation rules should be pretty liberal, and you should need
> to do something very esoteric to not pass. (The old validation regexes from
> ~2004-2005 got kicked out for failing to deal with things like '+' which
> turned out to be more common than we thought.)
>
> Folks actually already pushed a fix upstream to the whatwg spec page to
> allow single-part domains like 'localhost', needed for local-network testing
> and perhaps some weird intranet setups.
>
> -- brion
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



----------------------------------------------------------------
This message was sent using iSage/AuNix webmail
http://www.isage.net.au/




_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Jan 24, 2011, 4:02 PM

Post #5 of 14 (1340 views)
Permalink
Re: user email validation ready [In reply to]

Brion Vibber wrote:
> On Mon, Jan 24, 2011 at 2:08 PM, Conrad Irwin <conrad.irwin [at] gmail>wrote:
>
>> Out of interest, do you know what percentage of emails in the database
>> don't validate under the new scheme?
>>
>
> That's actually a wise thing to check -- most fails will probably be
> legitimately bogus entries, but if we can find any that don't validate but
> *do* work (eg they've been confirmed as functional) that's info we need to
> report upstream as well -- the new code is using the specs for HTML 5's
> client-side form validation, which is starting to go into the latest
> generation of browsers.
>
> In theory the validation rules should be pretty liberal, and you should need
> to do something very esoteric to not pass. (The old validation regexes from
> ~2004-2005 got kicked out for failing to deal with things like '+' which
> turned out to be more common than we thought.)
>
> Folks actually already pushed a fix upstream to the whatwg spec page to
> allow single-part domains like 'localhost', needed for local-network testing
> and perhaps some weird intranet setups.
>
> -- brion

The original spec had feedback based precisely on enwiki numbers.
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022220.html

So about 100? Note that there are invalid addresses marked as confirmed
in wikipedia.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at pobox

Jan 24, 2011, 5:09 PM

Post #6 of 14 (1335 views)
Permalink
Re: user email validation ready [In reply to]

On Mon, Jan 24, 2011 at 3:50 PM, Billinghurst <billinghurst [at] gmail>wrote:

> It would seem that the bugzilla
> https://bugzilla.wikimedia.org/show_bug.cgi?id=23710
> would fall under that category, and to note that it is still marked as
> new. Can it be tied to this process?
>

That's an issue about clickable links in the body of outgoing mails
generated by the system, and is not related to the format or validation of
email addresses.

It should be addressed (either by ensuring that links inserted into email
are escaped clearly, or that they're arranged nicely in brackets that email
clients commonly understand as delimiters, or by supplementing the plaintext
emails with HTML emails that can mark their links explicitly) but is an
entirely separate issue.

-- brion
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at pobox

Jan 24, 2011, 5:51 PM

Post #7 of 14 (1338 views)
Permalink
Re: user email validation ready [In reply to]

On Mon, Jan 24, 2011 at 4:02 PM, Platonides <Platonides [at] gmail> wrote:

> The original spec had feedback based precisely on enwiki numbers.
> http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022220.html
>
> So about 100? Note that there are invalid addresses marked as confirmed
> in wikipedia.
>

Ok so from the breakdown at
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022237.htmlwith
202 email address records that were marked as confirmed, but failed
the
proposed validation check at the time and couldn't be corrected by stripping
whitespace:


> The breakdown of the 202 is as follows.

Reordered into:

Now allowed by the current revision of the HTML 5 spec as implemented in
User::isValidEmailAddr:
> Single trailing dot in local part: 40 (prohibited by RFC but plausibly
deliverable)
> Multiple consecutive dots: 20 (prohibited by RFC but plausibly
deliverable)

Easily correctable by the user removing the extra bits upon being prompted,
as doing so would not change the actual delivery:
> * Single trailing dot in domain part: 100 (prohibited by RFC but plausibly
deliverable)
> Valid address in angle brackets (with other junk around it): 21
> (permitted by RFC, kind of, and plausibly deliverable)
> * Comment: 3 (permitted by RFC and plausibly deliverable)

v---- LINE OF DOOM ---v

Clearly wrong in typical context, should indeed be rejected (or changed to
@localhost for legit cases):
> * No @: 9 (unlikely to be deliverable)

Not quite sure what's going on but most look like stray chars that would be
ignored or else invalid and possibly bogusly marked as confirmed:
> * Miscellaneous: 9 (one containing [NO]@[SPAM], two with trailing >,
> one in "quotes", one with single leading dot in local part, two with
> single leading comma in local part, one with leading ": ", one with
> leading "\")


So from the August 2009 survey on English Wikipedia, that leaves 18 email
addresses out of over 3 million listed as confirmed, of which a few *might*
be deliverable addresses that could not be fixed by the user tweaking them
during input (ie, they actually rely on those extra chars being there in
order to be delivered to the right person).

To me it sounds like we're pretty good with this; it wouldn't hurt to make
sure that existing addresses that are stored funny (eg with extra whitespace
or trailing dots on the domain name) continue to work as long as they've
been previously.

Also wouldn't hurt to do a current survey, and to include some other
language sites.


Of interest -- gmail's validation rules were also posted in that thread:
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022268.html

-- brion
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


egil at wp

Jan 25, 2011, 2:37 PM

Post #8 of 14 (1325 views)
Permalink
Re: user email validation ready [In reply to]

Brion Vibber (2011-01-25 02:51):
> On Mon, Jan 24, 2011 at 4:02 PM, Platonides<Platonides [at] gmail> wrote:
>
>> The original spec had feedback based precisely on enwiki numbers.
>> http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022220.html
>>
>> So about 100? Note that there are invalid addresses marked as confirmed
>> in wikipedia.
>>
> Ok so from the breakdown at
> http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022237.html
> with 202 email address records that were marked as confirmed, but failed
> the proposed validation check at the time and couldn't be corrected by stripping
> whitespace:
> [...]

Could you check for validated address containing commas in user names
part? The RegExp from mediawiki.util.js did/does allow them.

Regards,
Nux.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


hashar+wmf at free

Jan 25, 2011, 11:43 PM

Post #9 of 14 (1326 views)
Permalink
Re: user email validation ready [In reply to]

On 25/01/11 23:37, Maciej Jaros wrote:
<snip>
> Could you check for validated address containing commas in user names
> part? The RegExp from mediawiki.util.js did/does allow them.
>
> Regards,
> Nux.

Nux opened bug 26948 for the comma issue (assigned myself).

https://bugzilla.wikimedia.org/26948

--
Ashar Voultoiz


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Jan 26, 2011, 12:09 PM

Post #10 of 14 (1315 views)
Permalink
Re: user email validation ready [In reply to]

On Mon, Jan 24, 2011 at 8:51 PM, Brion Vibber <brion [at] pobox> wrote:
> So from the August 2009 survey on English Wikipedia, that leaves 18 email
> addresses out of over 3 million listed as confirmed, of which a few *might*
> be deliverable addresses that could not be fixed by the user tweaking them
> during input (ie, they actually rely on those extra chars being there in
> order to be delivered to the right person).

But note that I counted only the worsening of false negatives, not the
improvement in false positives -- I only looked at confirmed
addresses. It seems likely that a vastly larger number of
undeliverable addresses would be rejected at an early stage with the
more restrictive check, allowing users to correct them before
submitting so that the e-mail doesn't get lost in the ether. So it
seems like a clear improvement overall.

However, the code should probably at least strip whitespace (including
internal whitespace, not just trailing/leading)
onkeypress/oninput/onsubmit or such. This came up in something like
0.1% of the sample. Non-techy users would probably get pretty
confused by a trailing space messing the whole thing up.


I suggest that for Firefox 4, we shortcut the JavaScript logic and
just use type=email. It's pretty nicely designed. Here's a test case
(this is a URL, just paste all the lines in Firefox 4's URL bar):

data:text/html,<form><input type=email name=email placeholder=E-mail>
<br><input placeholder="Some other input"><br><input type=submit></form>

Firefox 4b9 seems not to implement the latest spec update, so
foo [at] localhos doesn't work, but that should be an easy fix. If
someone is willing to poke at their source code, I imagine it'd be
possible to get the updated check in the final release.
Unfortunately, WebKit's form constraint validation implementation is
broken all over the place, and Opera's is pretty ugly (like only
checking validity onsubmit), so we should probably blacklist them if
we use any form validation features.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Jan 26, 2011, 12:22 PM

Post #11 of 14 (1311 views)
Permalink
Re: user email validation ready [In reply to]

On Wed, Jan 26, 2011 at 3:09 PM, Aryeh Gregor
<Simetrical+wikilist [at] gmail> wrote:
> Firefox 4b9 seems not to implement the latest spec update, so
> foo [at] localhos doesn't work, but that should be an easy fix.  If
> someone is willing to poke at their source code, I imagine it'd be
> possible to get the updated check in the final release.

Was already done, actually:

https://bugzilla.mozilla.org/show_bug.cgi?id=627657

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


questpc at rambler

Jan 26, 2011, 10:58 PM

Post #12 of 14 (1317 views)
Permalink
Re: user email validation ready [In reply to]

* Aryeh Gregor <Simetrical+wikilist [at] gmail> [Wed, 26 Jan 2011
15:09:17 -0500]:

> However, the code should probably at least strip whitespace (including
> internal whitespace, not just trailing/leading)
> onkeypress/oninput/onsubmit or such. This came up in something like
> 0.1% of the sample. Non-techy users would probably get pretty
> confused by a trailing space messing the whole thing up.
>
Surely it should. In a very similar manner, I've had a trouble with
local MediaWiki installation (old 1.14, haven't checked with newer
ones), when I've created user accounts and sent these via the email,
people were unable to login, because when you select a text line using a
mouse, Thunderbird mail sometimes copies line feed character into
clipboard, so it was pasted into the password field then and the
password didn't match. Users were frustrated. I've explained them that
line feed is being placed into the clipboard which is visible when you
paste it into the text editor. I am unsure which browser they have been
used, maybe some browsers strip 13 / 10 from text inputs, maybe don't.
Dmitriy

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Jan 27, 2011, 11:27 AM

Post #13 of 14 (1315 views)
Permalink
Re: user email validation ready [In reply to]

On Thu, Jan 27, 2011 at 1:58 AM, Dmitriy Sintsov <questpc [at] rambler> wrote:
> Surely it should. In a very similar manner, I've had a trouble with
> local MediaWiki installation (old 1.14, haven't checked with newer
> ones), when I've created user accounts and sent these via the email,
> people were unable to login, because when you select a text line using a
> mouse, Thunderbird mail sometimes copies line feed character into
> clipboard, so it was pasted into the password field then and the
> password didn't match. Users were frustrated. I've explained them that
> line feed is being placed into the clipboard which is visible when you
> paste it into the text editor. I am unsure which browser they have been
> used, maybe some browsers strip 13 / 10 from text inputs, maybe don't.

HTML5 specifies that they should, for passwords:

"User agents must not allow users to insert U+000A LINE FEED (LF) or
U+000D CARRIAGE RETURN (CR) characters into the value."
http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#password-state

The value sanitization algorithm also makes sure this holds for
default values and script-inserted values.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


questpc at rambler

Jan 29, 2011, 12:32 PM

Post #14 of 14 (1298 views)
Permalink
Re: user email validation ready [In reply to]

* Aryeh Gregor <Simetrical+wikilist [at] gmail> [Thu, 27 Jan 2011
14:27:21 -0500]:
> HTML5 specifies that they should, for passwords:
>
> "User agents must not allow users to insert U+000A LINE FEED (LF) or
> U+000D CARRIAGE RETURN (CR) characters into the value."
>
http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#password-state
>
> The value sanitization algorithm also makes sure this holds for
> default values and script-inserted values.
>
Oops.. My mistake - it seems that Thunderbird mail appends extra space
character (32) to the end of selection in the clipboard instead (when
the password is located in separated text line and one selects the
complete line using mouse), not CR / LF. However, as the password field
input value is hidden, users cannot realize why he / she cannot login
when copying / pasting the password from TB mail. It would be more
user-friendly in case trim() was used.
Dmitriy

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.