Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


g.brandl at gmx

Apr 24, 2012, 10:13 AM

Post #1 of 7 (166 views)
Permalink
Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

On 19.04.2012 03:36, ezio.melotti wrote:
> http://hg.python.org/cpython/rev/36c901fcfcda
> changeset: 76413:36c901fcfcda
> branch: 2.7
> user: Ezio Melotti <ezio.melotti [at] gmail>
> date: Wed Apr 18 19:08:41 2012 -0600
> summary:
> #14538: HTMLParser can now parse correctly start tags that contain a bare /.

> diff --git a/Misc/NEWS b/Misc/NEWS
> --- a/Misc/NEWS
> +++ b/Misc/NEWS
> @@ -50,6 +50,9 @@
> Library
> -------
>
> +- Issue #14538: HTMLParser can now parse correctly start tags that contain
> + a bare '/'.
> +

I think that's misleading: there's no way to "correctly" parse malformed HTML.

Georg

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


benjamin at python

Apr 24, 2012, 11:34 AM

Post #2 of 7 (158 views)
Permalink
Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /. [In reply to]

2012/4/24 Georg Brandl <g.brandl [at] gmx>:
> On 19.04.2012 03:36, ezio.melotti wrote:
>> http://hg.python.org/cpython/rev/36c901fcfcda
>> changeset:   76413:36c901fcfcda
>> branch:      2.7
>> user:        Ezio Melotti <ezio.melotti [at] gmail>
>> date:        Wed Apr 18 19:08:41 2012 -0600
>> summary:
>>   #14538: HTMLParser can now parse correctly start tags that contain a bare /.
>
>> diff --git a/Misc/NEWS b/Misc/NEWS
>> --- a/Misc/NEWS
>> +++ b/Misc/NEWS
>> @@ -50,6 +50,9 @@
>>  Library
>>  -------
>>
>> +- Issue #14538: HTMLParser can now parse correctly start tags that contain
>> +  a bare '/'.
>> +
>
> I think that's misleading: there's no way to "correctly" parse malformed HTML.

There is in the since that you can follow the HTML5 algorithm, which
can "parse" any junk you throw at it.



--
Regards,
Benjamin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


fdrake at acm

Apr 24, 2012, 12:00 PM

Post #3 of 7 (158 views)
Permalink
Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /. [In reply to]

On Tue, Apr 24, 2012 at 2:34 PM, Benjamin Peterson <benjamin [at] python> wrote:
> There is in the since that you can follow the HTML5 algorithm, which
> can "parse" any junk you throw at it.

This whole can of worms is why I gave up on HTML years ago (well, one
reason among many).

There are markup languages, and there's soup.


-Fred

--
Fred L. Drake, Jr.    <fdrake at acm.org>
"A person who won't read has no advantage over one who can't read."
   --Samuel Langhorne Clemens
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


g.brandl at gmx

Apr 24, 2012, 12:02 PM

Post #4 of 7 (157 views)
Permalink
Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /. [In reply to]

On 24.04.2012 20:34, Benjamin Peterson wrote:
> 2012/4/24 Georg Brandl <g.brandl [at] gmx>:
>> On 19.04.2012 03:36, ezio.melotti wrote:
>>> http://hg.python.org/cpython/rev/36c901fcfcda
>>> changeset: 76413:36c901fcfcda
>>> branch: 2.7
>>> user: Ezio Melotti <ezio.melotti [at] gmail>
>>> date: Wed Apr 18 19:08:41 2012 -0600
>>> summary:
>>> #14538: HTMLParser can now parse correctly start tags that contain a bare /.
>>
>>> diff --git a/Misc/NEWS b/Misc/NEWS
>>> --- a/Misc/NEWS
>>> +++ b/Misc/NEWS
>>> @@ -50,6 +50,9 @@
>>> Library
>>> -------
>>>
>>> +- Issue #14538: HTMLParser can now parse correctly start tags that contain
>>> + a bare '/'.
>>> +
>>
>> I think that's misleading: there's no way to "correctly" parse malformed HTML.
>
> There is in the since that you can follow the HTML5 algorithm, which
> can "parse" any junk you throw at it.

Ah, good. Then I hope we are following the algorithm here (and are slowly
coming to use it for htmllib in general).

Georg

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


benjamin at python

Apr 24, 2012, 12:05 PM

Post #5 of 7 (156 views)
Permalink
Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /. [In reply to]

2012/4/24 Benjamin Peterson <benjamin [at] python>:
> There is in the since

This is confusing, since I meant "sense".


--
Regards,
Benjamin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


merwok at netwok

Apr 24, 2012, 12:34 PM

Post #6 of 7 (151 views)
Permalink
Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /. [In reply to]

Le 24/04/2012 15:02, Georg Brandl a écrit :
> On 24.04.2012 20:34, Benjamin Peterson wrote:
>> 2012/4/24 Georg Brandl<g.brandl [at] gmx>:
>>> I think that's misleading: there's no way to "correctly" parse malformed HTML.
>> There is in the since that you can follow the HTML5 algorithm, which
>> can "parse" any junk you throw at it.
> Ah, good. Then I hope we are following the algorithm here (and are slowly
> coming to use it for htmllib in general).

Yes, Ezio’s commits on html.parser/HTMLParser in the last months have
been following the HTML5 spec. Ezio, RDM and I have had some discussion
about that on some bug reports, IRC and private mail and reached the
agreement to do the useful thing, that is follow HTML5 and not pretend
that the stdlib parser is strict or validating.

Ezio was thinking about a blog.python.org post to advertise this.

Regards
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


brian at python

Apr 24, 2012, 12:41 PM

Post #7 of 7 (151 views)
Permalink
Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /. [In reply to]

On Tue, Apr 24, 2012 at 14:34, Éric Araujo <merwok [at] netwok> wrote:
> Le 24/04/2012 15:02, Georg Brandl a écrit :
>>
>> On 24.04.2012 20:34, Benjamin Peterson wrote:
>>>
>>> 2012/4/24 Georg Brandl<g.brandl [at] gmx>:
>>>>
>>>> I think that's misleading: there's no way to "correctly" parse malformed
>>>> HTML.
>>>
>>> There is in the since that you can follow the HTML5 algorithm, which
>>> can "parse" any junk you throw at it.
>>
>> Ah, good. Then I hope we are following the algorithm here (and are slowly
>> coming to use it for htmllib in general).
>
>
> Yes, Ezio’s commits on html.parser/HTMLParser in the last months have been
> following the HTML5 spec.  Ezio, RDM and I have had some discussion about
> that on some bug reports, IRC and private mail and reached the agreement to
> do the useful thing, that is follow HTML5 and not pretend that the stdlib
> parser is strict or validating.
>
> Ezio was thinking about a blog.python.org post to advertise this.

Please do this, and I welcome anyone else who wants to write about
their work on the blog to do so. Contact me for info.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.