Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


dmarsentev at gmail

Aug 15, 2012, 5:49 AM

Post #1 of 4 (214 views)
Permalink
python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

Hello.

Has anybody already meet the problem like this? -
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

When I run scrapy, I get

File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
line 14, in <module>
libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'


When I run
python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'

I get
Traceback (most recent call last):
File "<string>", line 1, in <module>
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

How can I cure it?

Python 2.7
libxml2-python 2.6.9
2.6.11-gentoo-r6


I will be grateful for any help.

DETAILS:

scrapy crawl lgz -o items.json -t json
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 4, in <module>
execute()
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 112, in execute
cmds = _get_commands_dict(inproject)
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 37, in _get_commands_dict
cmds = _get_commands_from_module('scrapy.commands', inproject)
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 30, in _get_commands_from_module
for cmd in _iter_command_classes(module):
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 21, in _iter_command_classes
for module in walk_modules(module_name):
File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 65, in walk_modules
submod = __import__(fullpath, {}, {}, [''])
File "/usr/local/lib/python2.7/site-packages/scrapy/commands/shell.py", line 8, in <module>
from scrapy.shell import Shell
File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 14, in <module>
from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
File "/usr/local/lib/python2.7/site-packages/scrapy/selector/__init__.py", line 30, in <module>
from scrapy.selector.libxml2sel import *
File "/usr/local/lib/python2.7/site-packages/scrapy/selector/libxml2sel.py", line 12, in <module>
from .factories import xmlDoc_from_html, xmlDoc_from_xml
File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py", line 14, in <module>
libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'


--
http://mail.python.org/mailman/listinfo/python-list


dieter at handshake

Aug 15, 2012, 10:19 PM

Post #2 of 4 (195 views)
Permalink
Re: python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' [In reply to]

Dmitry Arsentiev <dmarsentev [at] gmail> writes:

> Has anybody already meet the problem like this? -
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
> When I run scrapy, I get
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
> line 14, in <module>
> libxml2.HTML_PARSE_NOERROR + \
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

Apparently, the versions of "scrapy" and "libxml2" do not fit.

Check with which "libxml2" versions, your "scrapy" version can work
and then install one of them.

--
http://mail.python.org/mailman/listinfo/python-list


personificator at gmail

Aug 16, 2012, 6:57 PM

Post #3 of 4 (199 views)
Permalink
Re: python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' [In reply to]

I believe ftp://xmlsoft.org/libxml2/libxml2-2.8.0.tar.gz was what your looking for. Submit a ticket for the docs to get updated if your feeling generous.

On Wednesday, August 15, 2012 7:49:04 AM UTC-5, Dmitry Arsentiev wrote:
> Hello.
>
>
>
> Has anybody already meet the problem like this? -
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
>
> When I run scrapy, I get
>
>
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
>
> line 14, in <module>
>
> libxml2.HTML_PARSE_NOERROR + \
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
>
>
>
> When I run
>
> python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'
>
>
>
> I get
>
> Traceback (most recent call last):
>
> File "<string>", line 1, in <module>
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
>
> How can I cure it?
>
>
>
> Python 2.7
>
> libxml2-python 2.6.9
>
> 2.6.11-gentoo-r6
>
>
>
>
>
> I will be grateful for any help.
>
>
>
> DETAILS:
>
>
>
> scrapy crawl lgz -o items.json -t json
>
> Traceback (most recent call last):
>
> File "/usr/local/bin/scrapy", line 4, in <module>
>
> execute()
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 112, in execute
>
> cmds = _get_commands_dict(inproject)
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 37, in _get_commands_dict
>
> cmds = _get_commands_from_module('scrapy.commands', inproject)
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 30, in _get_commands_from_module
>
> for cmd in _iter_command_classes(module):
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 21, in _iter_command_classes
>
> for module in walk_modules(module_name):
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 65, in walk_modules
>
> submod = __import__(fullpath, {}, {}, [''])
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/commands/shell.py", line 8, in <module>
>
> from scrapy.shell import Shell
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 14, in <module>
>
> from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/__init__.py", line 30, in <module>
>
> from scrapy.selector.libxml2sel import *
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/libxml2sel.py", line 12, in <module>
>
> from .factories import xmlDoc_from_html, xmlDoc_from_xml
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py", line 14, in <module>
>
> libxml2.HTML_PARSE_NOERROR + \
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

--
http://mail.python.org/mailman/listinfo/python-list


stefan_ml at behnel

Aug 18, 2012, 10:56 AM

Post #4 of 4 (190 views)
Permalink
Re: python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' [In reply to]

Dmitry Arsentiev, 15.08.2012 14:49:
> Has anybody already meet the problem like this? -
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
> When I run scrapy, I get
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
> line 14, in <module>
> libxml2.HTML_PARSE_NOERROR + \
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
> When I run
> python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'
>
> I get
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
> How can I cure it?
>
> Python 2.7
> libxml2-python 2.6.9
> 2.6.11-gentoo-r6

That version of libxml2 is way too old and doesn't support parsing
real-world HTML. IIRC, that started with 2.6.21 and got improved a bit
after that.

Get a 2.8.0 installation, as someone pointed out already.

Stefan


--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.