Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

HTTP request error with urlopen

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


spandanagella at gmail

Jul 2, 2008, 11:52 PM

Post #1 of 3 (56 views)
Permalink
HTTP request error with urlopen

Hello ,

I have written a code to get the page source of the google search
page .. this is working for other urls. I have this problem with

import re
from urllib2 import urlopen
string='http://www.google.com/search?num=20&hl=en&q=ipod&btnG=Search'
file_source=file("google_source.txt",'w')
file_source.write(urlopen(string).read())
page_content=file_source.readlines()

Traceback (most recent call last) :
File "C:/Python25/google.py", line 5,in <module>
file_source.write(urlopen(string).read())
File "C:\Python25\lib\urllib2.py", line 124 , in urlopen
return__opener.open(url, data)
File "C:\Python25\lib\urllib2.py", line 387 , in open
response =meth(req, response)
File "C:\Python25\lib\urllib2.py", line 498 , in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python25\lib\urllib2.py", line 425, in error
return self._call_chain(*args)
File "C:\Python25\lib\urllib2.py", line 360, in __call_chain
result = func(*args)
File "C:\Python25\lib\urllib2.py", line 506, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden

Actually urlopen is working for google labs sets page but not for the
google.com and even I have same problem with wikipedia . Please let me know
.. If any one of have any idea about this .

Thank You,
Spandana.


spandanagella at gmail

Jul 3, 2008, 7:14 PM

Post #2 of 3 (42 views)
Permalink
HTTP request error with urlopen [In reply to]

Hello ,

I have written a code to get the page source of the google search
page .. this is working for other urls. I have this problem with

import re
from urllib2 import urlopen
string='http://www.google.com/search?num=20&hl=en&q=ipod&btnG=Search'
file_source=file("google_source.txt",'w')
file_source.write(urlopen(string).read())
page_content=file_source.readlines()

Traceback (most recent call last) :
File "C:/Python25/google.py", line 5,in <module>
file_source.write(urlopen(string).read())
File "C:\Python25\lib\urllib2.py", line 124 , in urlopen
return__opener.open(url, data)
File "C:\Python25\lib\urllib2.py", line 387 , in open
response =meth(req, response)
File "C:\Python25\lib\urllib2.py", line 498 , in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python25\lib\urllib2.py", line 425, in error
return self._call_chain(*args)
File "C:\Python25\lib\urllib2.py", line 360, in __call_chain
result = func(*args)
File "C:\Python25\lib\urllib2.py", line 506, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden

Actually urlopen is working for google labs sets page but not for the
google.com and even I have same problem with wikipedia . Please let me know
.. If any one of have any idea about this .

Thank You,
Spandana.


jonas at codeazur

Jul 3, 2008, 8:34 PM

Post #3 of 3 (41 views)
Permalink
Re: HTTP request error with urlopen [In reply to]

Try:

import re
import urllib2
url = 'http://www.google.com/search?num=20&hl=en&q=ipod&btnG=Search'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = {'User-Agent' : user_agent}
req = urllib2.Request(url, None, headers)
file_source=open("google_source.txt", 'w')
file_source.write(urllib2.urlopen(req).read())
file_source.close()

I think Google blocks the User-Agent urllib2 sends.

--Jonas Galvez, http://jonasgalvez.com.br/log

On Thu, Jul 3, 2008 at 3:52 AM, spandana g <spandanagella[at]gmail.com> wrote:
> Hello ,
>
> I have written a code to get the page source of the google search
> page .. this is working for other urls. I have this problem with
>
> import re
> from urllib2 import urlopen
> string='http://www.google.com/search?num=20&hl=en&q=ipod&btnG=Search'
> file_source=file("google_source.txt",'w')
> file_source.write(urlopen(string).read())
> page_content=file_source.readlines()
>
> Traceback (most recent call last) :
> File "C:/Python25/google.py", line 5,in <module>
> file_source.write(urlopen(string).read())
> File "C:\Python25\lib\urllib2.py", line 124 , in urlopen
> return__opener.open(url, data)
> File "C:\Python25\lib\urllib2.py", line 387 , in open
> response =meth(req, response)
> File "C:\Python25\lib\urllib2.py", line 498 , in http_response
> 'http', request, response, code, msg, hdrs)
> File "C:\Python25\lib\urllib2.py", line 425, in error
> return self._call_chain(*args)
> File "C:\Python25\lib\urllib2.py", line 360, in __call_chain
> result = func(*args)
> File "C:\Python25\lib\urllib2.py", line 506, in http_error_default
> raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
> HTTPError: HTTP Error 403: Forbidden
>
> Actually urlopen is working for google labs sets page but not for the
> google.com and even I have same problem with wikipedia . Please let me know
> .. If any one of have any idea about this .
>
> Thank You,
> Spandana.
>
>
>
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.