Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Bugs

[issue755660] allow HTMLParser to continue after a parse error

 

 

Python bugs RSS feed   Index | Next | Previous | View Threaded


report at bugs

Nov 10, 2009, 4:17 AM

Post #1 of 2 (204 views)
Permalink
[issue755660] allow HTMLParser to continue after a parse error

Francesco Frassinelli <fraph24 [at] gmail> added the comment:

I'm using Python 3.1.1 and the patch (patch.txt, provided by smroid)
works very well. It's usefull, and I really need it, thanks :)
Without this patch, I can't parse: http://ftp.vim.org/pub/vim/ (due to a
fake tag, like "<user [at] mail>"), and many others websites.

I hope this patch will be merged in Python 3.2 :)

----------
nosy: +frafra

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue755660>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Nov 10, 2009, 4:47 AM

Post #2 of 2 (176 views)
Permalink
[issue755660] allow HTMLParser to continue after a parse error [In reply to]

Francesco Frassinelli <fraph24 [at] gmail> added the comment:

Site: http://ftp.vim.org/pub/vim/unstable/patches/

Outuput without error customized function:
[...]
File "./takeit.py", line 54, in inspect
parser.feed(data.read().decode())
File "/home/frafra/Scrivania/takeit/html/parser.py", line 107, in feed
self.goahead(0)
File "/home/frafra/Scrivania/takeit/html/parser.py", line 163, in goahead
k = self.parse_declaration(i)
File "/usr/local/lib/python3.1/_markupbase.py", line 97, in
parse_declaration
decltype, j = self._scan_name(j, i)
File "/usr/local/lib/python3.1/_markupbase.py", line 387, in _scan_name
% rawdata[declstartpos:declstartpos+20])
File "/home/frafra/Scrivania/takeit/html/parser.py", line 122, in error
raise HTMLParseError(message, self.getpos())
html.parser.HTMLParseError: expected name token at '<! gives an error
me', at line 153, column 48

Output with error customized function:
[...]
File "./takeit.py", line 55, in inspect
parser.feed(data.read().decode())
File "/home/frafra/Scrivania/takeit/html/parser.py", line 107, in feed
self.goahead(0)
File "/home/frafra/Scrivania/takeit/html/parser.py", line 163, in goahead
k = self.parse_declaration(i)
File "/usr/local/lib/python3.1/_markupbase.py", line 97, in
parse_declaration
decltype, j = self._scan_name(j, i)
TypeError: 'NoneType' object is not iterable

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue755660>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

Python bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.