Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Accessing a Web server --- how?

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


vs at it

Nov 16, 2009, 2:14 AM

Post #1 of 4 (193 views)
Permalink
Accessing a Web server --- how?

If one goes to the following URL:
http://www.nordea.se/Privat/Spara%2boch%2bplacera/Strukturerade%2bprodukter/Aktieobligation%2bNr%2b99%2bEuropa%2bAlfa/973822.html

it contains a link (click on "Current courses NBD AT99 3113A") to:
http://service.nordea.com/nordea-openpages/six.action?target=/nordea.public/bond/nordeabond.page&magic=%28cc+%28detail+%28tsid+310746%29%29%29&

and if you now click on the tab labeled "history and compare" this will
take you to:
http://service.nordea.com/nordea-openpages/six.action?target=/nordea.public/bond/nordeabond.page&magic=%28cc+%28detail+%28tsid+310746%29+%28view+hist%29%29%29&

Finally...This is where I would like to "connect to" the data on a daily
basis or to gather data over different time intervals. I believe that if
I can get some help on this, then I will be able to customize the code
as needed for my own purposes.

It should be clear that this is financial data on a fond managed by
Nordea Bank AB. Nordea is one of the largest banks in Scandinavia.

Note, that I do have some experience with Python (2.6 mainly), and find
it a very useful and powerful language. However, I have no experience
with it in the area of Web services. Any suggestions/comments on how to
set up this financial data service project would be greatly appreciated,
and I would be glad to share this project with any interested parties.

Note, I posted a similar message to the list pywebsvcs; but, received no responses.

-- V. Stokes


--
http://mail.python.org/mailman/listinfo/python-list


clp2 at rebertia

Nov 17, 2009, 1:08 AM

Post #2 of 4 (170 views)
Permalink
Re: Accessing a Web server --- how? [In reply to]

On Mon, Nov 16, 2009 at 2:14 AM, Virgil Stokes <vs [at] it> wrote:
> If one goes to the following URL:
> http://www.nordea.se/Privat/Spara%2boch%2bplacera/Strukturerade%2bprodukter/Aktieobligation%2bNr%2b99%2bEuropa%2bAlfa/973822.html
>
> it contains a link (click on "Current courses NBD AT99 3113A") to:
> http://service.nordea.com/nordea-openpages/six.action?target=/nordea.public/bond/nordeabond.page&magic=%28cc+%28detail+%28tsid+310746%29%29%29&
>
> and if you now click on the tab labeled "history and compare" this will take
> you to:
> http://service.nordea.com/nordea-openpages/six.action?target=/nordea.public/bond/nordeabond.page&magic=%28cc+%28detail+%28tsid+310746%29+%28view+hist%29%29%29&
>
> Finally...This is where I would like to "connect to" the data on a daily
> basis or to gather data over different time intervals. I believe that if I
> can get some help on this, then I will be able to customize the code as
> needed for my own purposes.

HTML parsing: http://www.crummy.com/software/BeautifulSoup/
Downloading webpages: http://docs.python.org/library/urllib.html#urllib.urlopen

BeautifulSoup is excellently documented:
http://www.crummy.com/software/BeautifulSoup/documentation.html
You'll probably be interested in the sections on searching and
navigating the parse tree.

Cheers,
Chris
--
IANAL and do not condone violating website TOSes
http://blog.rebertia.com
--
http://mail.python.org/mailman/listinfo/python-list


davea at ieee

Nov 17, 2009, 7:02 AM

Post #3 of 4 (163 views)
Permalink
Re: Accessing a Web server --- how? [In reply to]

Virgil Stokes wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">If one
> goes to the following URL:
> http://www.nordea.se/Privat/Spara%2boch%2bplacera/Strukturerade%2bprodukter/Aktieobligation%2bNr%2b99%2bEuropa%2bAlfa/973822.html
>
>
> it contains a link (click on "Current courses NBD AT99 3113A") to:
> http://service.nordea.com/nordea-openpages/six.action?target=/nordea.public/bond/nordeabond.page&magic=%28cc+%28detail+%28tsid+310746%29%29%29&
>
>
> and if you now click on the tab labeled "history and compare" this
> will take you to:
> http://service.nordea.com/nordea-openpages/six.action?target=/nordea.public/bond/nordeabond.page&magic=%28cc+%28detail+%28tsid+310746%29+%28view+hist%29%29%29&
>
>
> Finally...This is where I would like to "connect to" the data on a
> daily basis or to gather data over different time intervals. I believe
> that if I can get some help on this, then I will be able to customize
> the code as needed for my own purposes.
>
> It should be clear that this is financial data on a fond managed by
> Nordea Bank AB. Nordea is one of the largest banks in Scandinavia.
>
> Note, that I do have some experience with Python (2.6 mainly), and
> find it a very useful and powerful language. However, I have no
> experience with it in the area of Web services. Any
> suggestions/comments on how to set up this financial data service
> project would be greatly appreciated, and I would be glad to share
> this project with any interested parties.
>
> Note, I posted a similar message to the list pywebsvcs; but, received
> no responses.
>
> -- V. Stokes
>
>
I still say you should contact the bank and see if they have any API or
interface defined, so you don't have to do web-scraping.

The following text is on the XHTML page for that last link:

<table class="tableb3" summary="Kurser för en obligation.">
<caption class="hide">Nordea Bank Finland Abp utfärdad av Nordea Bank Finland Abp</caption>
<thead>
<tr>

<th class="alignleft" scope="col">Börskod</th>
<th class="alignright" scope="col">Köp</th>
<th class="alignright" scope="col">Sälj</th>
<th class="alignright" scope="col">Senast</th>
<th class="alignright" scope="col">Förfallodag</th>
<th class="alignright" scope="col">Tid</th>
</tr>
</thead>
<tbody>
<tr>
<td class="alignleft">&nbsp;NBF AT99 3113A</td>

<td class="nowrap alignright">&nbsp;95,69</td>
<td class="nowrap alignright">&nbsp;97,69</td>
<td class="nowrap alignright">&nbsp;95,69</td>
<td class="alignright">&nbsp;2011-06-03</td>
<td class="alignright">&nbsp;12:33</td>
</tr>
</tbody>
</table>



I didn't try it, but you could presumably use urllib2 to download that
url (prob. to a file, so you can repeat the test often without loading
the server). One caution, it did ask to store a cookie, and I know
nothing about cookie handling in Python.

Several cautions: I don't know how target= and magic= were derived, or
whether they'll remain stable for more than a day or so. So you can
download this file and figure how to parse it, but you'll probably need
to also parse the earlier pages, and that could be easier or harder.

This page format is very straightforward. If you know you're looking
for NBF AT99, you could look for that particular line, then just parse
all the td's till the next /tr. No XML logic needed. If you don't
know the NBF string, you could look for

Börskod instead.

But the big risk you run is the bank could easily change this format quite drastically, at any time. Those td
elements don't have to be on separate lines, the browser doesn't care. And the class attribute could change
if the CSS also changes correspondingly. Or they could come up with an entirely different way to display the
data. All they care about is whether it's readable by the human looking at the browser page.

Using xml..elementtree would be a good start; You could build the DOM, look for the table of class 'tableb3',
and go in from there

But you still run the risk of them changing things. The class name, for example, is just a link to the CSS page which
describes how that class object should be displayed. If the name is changed at both ends, no change occurs,
except to your script.


At this point, you need to experiment. But build a sloppy skeleton
first, so you don't invest too much time in any one aspect of the
problem. Make sure you can cover the corner cases, then fill in the
tough parts.

I'd say roughly this order:
1. write code that download the page to a file, given an exact URL.
For now, keep that code separate, as it'll probably end up
being much more complex, walking through other pages.

2. parse that page, using a simple for loop that looks for some of the
key strings mentioned above.

3. Repeat that for a few different URL's, presumably one per bond fund.

4. Make sure the URL's don't go stale over a few days. If they do,
you'll have to back up to an earlier link (URL), and parse forward from
there.


Keep the various pieces in different modules, so that when an assumption
breaks, you can recode that assumption pretty much independent of the
others.


HTH
DaveA

--
http://mail.python.org/mailman/listinfo/python-list


nick at stinemates

Nov 17, 2009, 10:51 AM

Post #4 of 4 (149 views)
Permalink
Re: Accessing a Web server --- how? [In reply to]

This is what the History and Compare URL translates to:

http://service.nordea.com/nordea-openpages/six.action?target=/nordea.public/bond/nordeabond.page&magic=(cc+(detail+(tsid+310746)+(view+hist)))&

Some questions..
Do you have an idea on what the tsid is? It looks like it's a unique
identifier for the chart and the only thing you'll need to change.

Are you only interested in this chart or others?

How would you like to extract the data? Is saving the chart enough or would you like to extract values from the chart?



On Mon, Nov 16, 2009 at 11:14:26AM +0100, Virgil Stokes wrote:
> If one goes to the following URL:
> http://www.nordea.se/Privat/Spara%2boch%2bplacera/Strukturerade%2bprodukter/Aktieobligation%2bNr%2b99%2bEuropa%2bAlfa/973822.html
>
> it contains a link (click on "Current courses NBD AT99 3113A") to:
> http://service.nordea.com/nordea-openpages/six.action?target=/nordea.public/bond/nordeabond.page&magic=%28cc+%28detail+%28tsid+310746%29%29%29&
>
> and if you now click on the tab labeled "history and compare" this will
> take you to:
> http://service.nordea.com/nordea-openpages/six.action?target=/nordea.public/bond/nordeabond.page&magic=%28cc+%28detail+%28tsid+310746%29+%28view+hist%29%29%29&
>
> Finally...This is where I would like to "connect to" the data on a daily
> basis or to gather data over different time intervals. I believe that if
> I can get some help on this, then I will be able to customize the code
> as needed for my own purposes.
>
> It should be clear that this is financial data on a fond managed by
> Nordea Bank AB. Nordea is one of the largest banks in Scandinavia.
>
> Note, that I do have some experience with Python (2.6 mainly), and find
> it a very useful and powerful language. However, I have no experience
> with it in the area of Web services. Any suggestions/comments on how to
> set up this financial data service project would be greatly appreciated,
> and I would be glad to share this project with any interested parties.
>
> Note, I posted a similar message to the list pywebsvcs; but, received no responses.
>
> -- V. Stokes
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.