Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

[no subject]

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


jaxnnxot at gmail

Aug 21, 2008, 5:13 PM

Post #1 of 5 (107 views)
Permalink
[no subject]

Is there a way to make a fake mouse or something like that? I'm
planning to create a frame with web open and then macro on it with
that fake mouse. And i want fake mouse so i can use my computer while
that fake mouse is doing its job.
--
http://mail.python.org/mailman/listinfo/python-list


bedouglas at earthlink

Aug 29, 2008, 8:16 AM

Post #2 of 5 (95 views)
Permalink
[no subject] [In reply to]

Hi.

I'm using mechanize to parse a page/site that uses the meta http-equiv tag
in order to perform a refresh/redirect of the page. I've tried a number of
settings, and read different posts on various threads, but seem to be
missing something.

the test.html page is the page that the url returns, however, i was
expecting the test.py app to go ahead and perform the redirect/refresh
automatically.

does the page (test.html) need to be completely valid html?

Any thoughts on what's screwed up here??


thanks

----------------------------------------------------

test.py
--------
import re
import libxml2dom
import urllib
import urllib2
import sys, string
from mechanize import Browser
import mechanize
#import tidy
import os.path
import cookielib
from libxml2dom import Node
from libxml2dom import NodeList
import subprocess
import time

########################
#
# Parse pricegrabber.com
########################
cj = "p"
COOKIEFILE = 'cookies.lwp'
#cookielib = 1


urlopen = urllib2.urlopen
#cj = urllib2.cookielib.LWPCookieJar()
cj = cookielib.LWPCookieJar()
Request = urllib2.Request
br = Browser()
br2 = Browser()

if cj != None:
print "sss"
#install the CookieJar for the default CookieProcessor
if os.path.isfile(COOKIEFILE):
cj.load(COOKIEFILE)
print "foo\n"
if cookielib:
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
print "foo2\n"

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values1 = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
headers = { 'User-Agent' : user_agent }

url="http://schedule.psu.edu/"
#=======================================


if __name__ == "__main__":
# main app

txdata = None

#----------------------------

##br.set_cookiejar(cj)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(True)
br.addheaders = [('User-Agent', 'Firefox')]

#url=str(url)+str("act_main_search.cfm")+"?"
#url=url+"Semester=FALL%202008%20%20%20&"
#url=url+"CrseLoc=OZ%3A%3AAbington%20Campus&"
#url=url+"CECrseLoc=AllOZ%3A%3AAbington%20Campus&"
#url=url+"CourseAbbrev=ACCTG&CourseNum=&CrseAlpha=&Search=View+schedule"

#url="http://schedule.psu.edu/act_main_search.cfm?Semester=FALL%202008%20%20
%20%20&CrseLoc=OZ%3A%3AAbington%20Campus&CECrseLoc=AllOZ%3A%3AAbington%20Cam
pus&CourseAbbrev=ACCTG&CourseNum=&CrseAlpha="



url="http://schedule.psu.edu/act_main_search.cfm?Semester=FALL%202008%20%20%
20%20&CrseLoc=OZ%3A%3AAbington%20Campus&CECrseLoc=AllOZ%3A%3AAbington%20Camp
us&CourseAbbrev=ACCTG&CourseNum=&CrseAlpha=&CFID=543143&CFTOKEN=71842529"


print "url =",url
br.open(url)
#cj.save(COOKIEFILE) # resave cookies

res = br.response() # this is a copy of response
s = res.read()
print "slen=",len(s)
print s

=========================================
test.html
<html>
<head>
<TITLE></TITLE>
</head>

<BODY BGCOLOR="#FFFFFF">

<TD NOWRAP WIDTH="45" VALIGN="top"><A
HREF="javascript:openAWindow('http://www.registrar.psu.edu/faculty_staff/enr
oll_services/clsrooms.html#C','Intent',625,425,1)"><FONT FACE="Arial,
Helvetica, sans-serif" SIZE="2"><strong>Tech Type</strong></FONT></A></TD>


<META HTTP-EQUIV="Refresh" CONTENT="0;url=/soc/fall/Alloz/a-c/acctg.html#">

---------------------------------------------------------




sys.exit()




--
http://mail.python.org/mailman/listinfo/python-list


jjl at pobox

Aug 29, 2008, 12:33 PM

Post #3 of 5 (92 views)
Permalink
Re: [wwwsearch-general] (no subject) [In reply to]

On Fri, 29 Aug 2008, bruce wrote:
[...]
> does the page (test.html) need to be completely valid html?

No, but there are certainly (poorly-defined) limitations.

I haven't tried to understand your script or the HTML, but did you try
this:

br = mechanize.Browser(mechanize.RobustFactory())
...


John

--
http://mail.python.org/mailman/listinfo/python-list


bedouglas at earthlink

Aug 29, 2008, 1:43 PM

Post #4 of 5 (92 views)
Permalink
RE: [wwwsearch-general] (no subject) [In reply to]

Hi john.

Thanks for your reply. I tried your suggestion of using RobustFactory, and
still get a badly maligned html back!!! The html is listed below. I would
have thought that the mech process, would have interpreted the
"http-equiv="refresh" Unfortunately, mechanize apparently isn't able to
handle a "<meta http-equiv="refresh" url="/foo/..."> when it's inside the
<body> of the html...

test.html
------------------------------------------------------------------
<html>
<head>
<TITLE></TITLE>
</head>

<BODY BGCOLOR="#FFFFFF">

<TD NOWRAP WIDTH="45" VALIGN="top"><A
HREF="javascript:openAWindow('http://www.registrar.psu.edu/faculty_staff/enr
oll_services/clsrooms.html#C','Intent',625,425,1)"><FONT FACE="Arial,
Helvetica, sans-serif" SIZE="2"><strong>Tech Type</strong></FONT></A></TD>

<META HTTP-EQUIV="Refresh" CONTENT="0;url=/soc/fall/Alloz/a-c/acctg.html#">

---------------------------------------------------------------------------

as you can see, there is no closing </body></html> tag....

thanks


stripped down, test code...
----------------------------------------
from mechanize import Browser
import mechanize
br = Browser()

br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(True)
br.addheaders = [('User-Agent', 'Firefox')]

url="http://schedule.psu.edu/act_main_search.cfm?Semester=FALL%202008%20%20%
20%20&CrseLoc=OZ%3A%3AAbington%20Campus&CECrseLoc=AllOZ%3A%3AAbington%20Camp
us&CourseAbbrev=ACCTG&CourseNum=&CrseAlpha="

br.open(url)
res = br.response() # this is a copy of response
s = res.read()
print "slen=",len(s)
print s

sys.exit()
----------------------------------


-----Original Message-----
From: python-list-bounces+bedouglas=earthlink.net[at]python.org
[mailto:python-list-bounces+bedouglas=earthlink.net[at]python.org]On Behalf
Of John J Lee
Sent: Friday, August 29, 2008 12:34 PM
To: wwwsearch-general[at]lists.sourceforge.net
Cc: python-list[at]python.org
Subject: Re: [wwwsearch-general] (no subject)


On Fri, 29 Aug 2008, bruce wrote:
[...]
> does the page (test.html) need to be completely valid html?

No, but there are certainly (poorly-defined) limitations.

I haven't tried to understand your script or the HTML, but did you try
this:

br = mechanize.Browser(mechanize.RobustFactory())
...


John

--
http://mail.python.org/mailman/listinfo/python-list

--
http://mail.python.org/mailman/listinfo/python-list


jjl at pobox

Aug 31, 2008, 4:05 AM

Post #5 of 5 (81 views)
Permalink
RE: [wwwsearch-general] (no subject) [In reply to]

On Fri, 29 Aug 2008, bruce wrote:

> Hi john.
>
> Thanks for your reply. I tried your suggestion of using RobustFactory, and
> still get a badly maligned html back!!! The html is listed below. I would

That's expected -- this affects the parsing of the HTML. It does not
modify the HTML.


> have thought that the mech process, would have interpreted the
> "http-equiv="refresh" Unfortunately, mechanize apparently isn't able to
> handle a "<meta http-equiv="refresh" url="/foo/..."> when it's inside the
> <body> of the html...

Yes, only the head element is read (albeit with a slightly fuzzy
definition of "head element").

In a theoretical future unstable branch, that might change, but currently
mechanize doesn't try all that hard to work well with bad HTML.

Currently, you have to work around this kind of issue. You can perform
the refresh manually, or modify the HTML and call .set_response(), or
replace the HTTPEquivProcessor with your own (you could use
HTTPEquivProcessor itself -- you can pass a parser factory function to its
constructor).


John

--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.