Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

TPCServer and xdrlib

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


gandalf at shopzeus

May 16, 2008, 3:16 AM

Post #1 of 12 (131 views)
Permalink
TPCServer and xdrlib

Hi All,

I'm trying to write a multi threaded TPC server. I have used xmlrpc
before for many purposes, but in this case this would not be efficient:

- I have to send larger amounts of data, the overhead of converting to
XML and parsing XML back would be too much pain
- I have no clue how to do keep-alive with simplexmlrpcserver and it is
slow to open a new connection for each RPC
- I would like to do session management (authentication, then store
session info on server site) which is also hard with xmlrpc.

I have looked at various solutions including:

- PyOrbit - too heavy weight
- Pyro - uses pickle, I do not trust it

BTW I do not care about the clients - they must trust the server side.
In contrast, the server should not receive anything from the clients
that is dangerous. I would like to use something that is fast, and can
only transfer data, not code. For this reason I think I cannot use the
marshal module because it is able to marshal code objects. I think I'm
going to implement my own "pickler" over xdrlib, that will only
pack/unpack data, NOT code. (It would also have the advantage that
others could write clients in different languages.)

Before I start re-inventing the wheel:

- Is there another (already existing) higher level framework that I can
try? It should be safe and fast, that is the only restriction.
- Do you think that it is a good idea to use xdrlib? I haven't seen
projects using it directly. For me it is like the rotor module was - it
was there but almost nobody used it. There might be a better lower level
library which I don't know of.

Thank you,

Laszlo

--
http://mail.python.org/mailman/listinfo/python-list


hdante at gmail

May 16, 2008, 5:20 AM

Post #2 of 12 (128 views)
Permalink
Re: TPCServer and xdrlib [In reply to]

On May 16, 7:16 am, Laszlo Nagy <gand...@shopzeus.com> wrote:
>   Hi All,

Hello, :-)

>
> I'm trying to write a multi threaded TPC server. I have used xmlrpc

How exactly did you come to the conclusion that your server must be
multi threaded ?


> - I have to send larger amounts of data, the overhead of converting to
> XML and parsing XML back would be too much pain

- What's the expected amount of data you have to transfer ?
- What's the expected network bandwidth ?
- What's the expected acceptable transfer time ?
- How many users are expected to be transfering data at the same
time ?

> I have looked at various solutions including:
>
> - PyOrbit - too heavy weight
> - Pyro - uses pickle, I do not trust it

Did you consider gzipping your XML (or YAML) packets ? Would the
transfer time be acceptable in this case ?

>
> BTW I do not care about the clients - they must trust the server side.

Oh, he said he _doesn't care about the clients_ ! ;-)

> In contrast, the server should not receive anything from the clients
> that is dangerous. I would like to use something that is fast, and can
> only transfer data, not code. For this reason I think I cannot use the
> marshal module because it is able to marshal code objects. I think I'm
> going to implement my own "pickler" over xdrlib, that will only
> pack/unpack data, NOT code. (It would also have the advantage that
> others could write clients in different languages.)

In general I would avoid that. Try to better estimate the speed
requirements, to see if you really need do to this.

>
> Before I start re-inventing the wheel:
>
> - Is there another (already existing) higher level framework that I can
> try? It should be safe and fast, that is the only restriction.

There's "Twisted".
http://twistedmatrix.com/projects/core/documentation/howto/servers.html

> - Do you think that it is a good idea to use xdrlib? I haven't seen
> projects using it directly. For me it is like the rotor module was - it

It's probably the best way to send binary stuff over the network.
But, again, I would avoid doing so.

> was there but almost nobody used it. There might be a better lower level
> library which I don't know of.
>
> Thank you,
>
>    Laszlo

--
http://mail.python.org/mailman/listinfo/python-list


deets at nospam

May 16, 2008, 5:26 AM

Post #3 of 12 (128 views)
Permalink
Re: TPCServer and xdrlib [In reply to]

>
> Did you consider gzipping your XML (or YAML) packets ? Would the
> transfer time be acceptable in this case ?

That would add even more to the overhead of transcoding the
transportlayer. Switching from XMLRPC to a json-based protocol reduced
in a project of mine reduced the overhead 10-20fold - mainly because of
reduced size and parsing efforts.

Diez
--
http://mail.python.org/mailman/listinfo/python-list


nick at craig-wood

May 16, 2008, 6:30 AM

Post #4 of 12 (127 views)
Permalink
Re: TPCServer and xdrlib [In reply to]

Laszlo Nagy <gandalf[at]shopzeus.com> wrote:
> I'm trying to write a multi threaded TPC server. I have used xmlrpc
> before for many purposes, but in this case this would not be efficient:
>
> - I have to send larger amounts of data, the overhead of converting to
> XML and parsing XML back would be too much pain
> - I have no clue how to do keep-alive with simplexmlrpcserver and it is
> slow to open a new connection for each RPC
> - I would like to do session management (authentication, then store
> session info on server site) which is also hard with xmlrpc.
>
> I have looked at various solutions including:
>
> - PyOrbit - too heavy weight
> - Pyro - uses pickle, I do not trust it

It is possible to change the serialization used by Pyro

http://pyro.sourceforge.net/manual/9-security.html#pickle

to the the 'gnosis' XML Pickler.

--
Nick Craig-Wood <nick[at]craig-wood.com> -- http://www.craig-wood.com/nick
--
http://mail.python.org/mailman/listinfo/python-list


hdante at gmail

May 16, 2008, 7:25 AM

Post #5 of 12 (127 views)
Permalink
Re: TPCServer and xdrlib [In reply to]

On May 16, 9:26 am, "Diez B. Roggisch" <de...@nospam.web.de> wrote:
> >  Did you consider gzipping your XML (or YAML) packets ? Would the
> > transfer time be acceptable in this case ?
>
> That would add even more to the overhead of transcoding the
> transportlayer. Switching from XMLRPC to a json-based protocol reduced

Yes, that's why I suggested YAML.

> in a project of mine reduced the overhead 10-20fold - mainly because of
> reduced size and parsing efforts.

I don't think so. It probably just the reduced size (check if the
json file is around 10 times smaller).

I believe the server will be mostly I/O-bound, ie, most overhead will
be in the data link/physical layers. The compression/parsing time (a
few microseconds) should be a small fraction of the total transfer
time (a few milliseconds). Even if the service is not I/O bound,
(considering the Youtube example) if there's significant traffic in
the server, the database access time should be the most significant.

I have used compression for SOAP messages in a GPRS (~20kbps) link
and got similar performance improvements (the web server was set to
automatically compress the data).

>
> Diez

--
http://mail.python.org/mailman/listinfo/python-list


deets at nospam

May 16, 2008, 8:28 AM

Post #6 of 12 (120 views)
Permalink
Re: TPCServer and xdrlib [In reply to]

Henrique Dante de Almeida schrieb:
> On May 16, 9:26 am, "Diez B. Roggisch" <de...@nospam.web.de> wrote:
>>> Did you consider gzipping your XML (or YAML) packets ? Would the
>>> transfer time be acceptable in this case ?
>> That would add even more to the overhead of transcoding the
>> transportlayer. Switching from XMLRPC to a json-based protocol reduced
>
> Yes, that's why I suggested YAML.
>
>> in a project of mine reduced the overhead 10-20fold - mainly because of
>> reduced size and parsing efforts.
>
> I don't think so. It probably just the reduced size (check if the
> json file is around 10 times smaller).
>
> I believe the server will be mostly I/O-bound, ie, most overhead will
> be in the data link/physical layers. The compression/parsing time (a
> few microseconds) should be a small fraction of the total transfer
> time (a few milliseconds). Even if the service is not I/O bound,
> (considering the Youtube example) if there's significant traffic in
> the server, the database access time should be the most significant.
>
> I have used compression for SOAP messages in a GPRS (~20kbps) link
> and got similar performance improvements (the web server was set to
> automatically compress the data).

I'm sorry, yes - I forgot that the main problem was the pure message
size due to some quadratic behaviour, which made things CPU-bound.

Still, XML-parsing is much more expensive, and packing/unpacking will of
course add to that.

Diez
--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

May 19, 2008, 2:06 AM

Post #7 of 12 (107 views)
Permalink
Re: TPCServer and xdrlib [In reply to]

> It is possible to change the serialization used by Pyro
>
> http://pyro.sourceforge.net/manual/9-security.html#pickle
>
> to the the 'gnosis' XML Pickler.
>
As I said earlier, I would not use XML. Just an example - I need to be
able to transfer image files, word and excel documents. How silly it
would be to base64encode a binary file, then put it into an XML.

L

--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

May 19, 2008, 6:28 AM

Post #8 of 12 (103 views)
Permalink
Re: TPCServer and xdrlib [In reply to]

>> I'm trying to write a multi threaded TPC server. I have used xmlrpc
>>
>
> How exactly did you come to the conclusion that your server must be
> multi threaded ?
>
I don't think that it is important. But if you are interested:

- yes, the server will probably be I/O bound, not CPU bound
- I'm have experience with thread programming, but not with twisted

>> - I have to send larger amounts of data, the overhead of converting to
>> XML and parsing XML back would be too much pain
>>
>
> - What's the expected amount of data you have to transfer ?
>
I cannot predict. But I will be trasferring image files which would be
silly to do with XML.
> - What's the expected network bandwidth ?
>
It cannot be determined in advance.
> - What's the expected acceptable transfer time ?
>
Not known.
> - How many users are expected to be transfering data at the same time ?
>
The server should be scaleable up to hundreds of users. (I'm just trying
to answer your questions, if that helps to answer mine.)
> Did you consider gzipping your XML (or YAML) packets ? Would the
> transfer time be acceptable in this case ?
>
No. "Image binary data -> base64encode -> XML -> gzip" - looks very
silly. It cannot be efficient. Do you have better ideas?
>> BTW I do not care about the clients - they must trust the server side.
>>
>
> Oh, he said he _doesn't care about the clients_ ! ;-)
>
I meant *safety* here: clients are going to download program updates
from the server. So if they do not trust the server then they should not
use it. The server is different: it must be safe against external
attacks. Maybe it was my bad English? Sorry for the misunderstanding.
> In general I would avoid that. Try to better estimate the speed
> requirements, to see if you really need do to this.
>
I cannot predict "acceptable speed" requirements, but I can tell that
there will be some clients downloading 100MB report files from the
server, so I presume that I will need a progress bar. I think that I
need to develop my own protocol for this, and probably the underlying
layer should use binary representation.
>> Before I start re-inventing the wheel:
>>
>> - Is there another (already existing) higher level framework that I can
>> try? It should be safe and fast, that is the only restriction.
>>
>
> There's "Twisted".
> http://twistedmatrix.com/projects/core/documentation/howto/servers.html
>
Yes, I tried twisted before and I did not like it. It forces me to
things that I do not want to do. (I cannot tell what it was - it was two
years ago.)
>> - Do you think that it is a good idea to use xdrlib? I haven't seen
>> projects using it directly. For me it is like the rotor module was - it
>>
>
> It's probably the best way to send binary stuff over the network.
> But, again, I would avoid doing so.
>
It is NOT the best way. Just to tell one example: big endian / little
endian integers. Definitely I need some encoding.

(But if you are right and this is the best way, why would you avoid?)

L


--
http://mail.python.org/mailman/listinfo/python-list


hdante at gmail

May 19, 2008, 8:21 AM

Post #9 of 12 (102 views)
Permalink
Re: TPCServer and xdrlib [In reply to]

On May 19, 10:28 am, Laszlo Nagy <gand...@shopzeus.com> wrote:
>
> I don't think that it is important. But if you are interested:
>
> - yes, the server will probably be I/O bound, not CPU bound
> - I'm have experience with thread programming, but not with twisted

That part was just to show you that being multithreaded is not really
a requirement. The server could be single-threaded, for example.
Surely, if you are comfortable with writing a threaded server, there's
no problem with that.

>
> No. "Image binary data -> base64encode -> XML -> gzip" - looks very
> silly. It cannot be efficient. Do you have better ideas?>> BTW I do not care about the

Okay, that would be silly. The questions above considered that you
would be sending typical unserialized objects that would be serialized
to XML, not pre-serialized binary data.

> use it. The server is different: it must be safe against external
> attacks. Maybe it was my bad English? Sorry for the misunderstanding.>  In general I

That part was a joke. You didn't have to answer that. :-P

> I cannot predict "acceptable speed" requirements, but I can tell that
> there will be some clients downloading 100MB report files from the
> server, so I presume that I will need a progress bar. I think that I
> need to develop my own protocol for this, and probably the underlying

Okay, so you need to wrap large binary files in some kind of message,
without pre processing them. I think developing your own protocol
using XDR is a safe bet.

> layer should use binary representation.>> Before I start re-inventing the wheel:

If you are worried about not reinventing the wheel, there are a
couple of solutions I can think of. None of them seem to fully support
you security and session management requirements, so you should
estimate the required project/development time for them.

- You may create a standard web application for that (with Django ?).
Binary transfers simply use HTTP and have trivial overhead. You have
to implement security and session management (cookies ?) on top of
that.

- Subclass BaseHTTPServer to implement a stateful and secure
protocol. Again, binary transfers have trivial overhead.

- If the goal of the project is to provide versioned file support,
you could use a dpkg/apt (or RPM?) based installation system (it uses
HTTP and FTP for file transfers). Write trivial front-ends in the
client and the server and choose a secure HTTP server.

- Finally, if you want to create your own protocol, but don't want to
use XDR, you could to a similar thing using MIME based messages (the
first message part is a XML message that references the binary
attachments).

> It is NOT the best way. Just to tell one example: big endian / little
> endian integers. Definitely I need some encoding.

Huh ? XDR does exactly that.

>
> (But if you are right and this is the best way, why would you avoid?)

I would avoid packing/unpacking objects by hand.

--
http://mail.python.org/mailman/listinfo/python-list


irmen.NOSPAM at xs4all

May 19, 2008, 11:25 AM

Post #10 of 12 (101 views)
Permalink
Re: TPCServer and xdrlib [In reply to]

Laszlo Nagy wrote:
>
>> It is possible to change the serialization used by Pyro
>>
>> http://pyro.sourceforge.net/manual/9-security.html#pickle
>>
>> to the the 'gnosis' XML Pickler.
>>
> As I said earlier, I would not use XML. Just an example - I need to be
> able to transfer image files, word and excel documents. How silly it
> would be to base64encode a binary file, then put it into an XML.
>
> L
>

Fair enough.

In that case, here's 5 suggestions:

- use simple file copying from a mounted network drive
- use http (web server)
- use ftp
- use scp
- use rsync

Why wouldn't one of these work for you? Did I miss something in your original
requirements? All of the above high level protocols are very efficient in concurrently
transferring files from a server to multiple clients.

--irmen
--
http://mail.python.org/mailman/listinfo/python-list


nick at craig-wood

May 20, 2008, 1:30 AM

Post #11 of 12 (84 views)
Permalink
Re: TPCServer and xdrlib [In reply to]

Henrique Dante de Almeida <hdante[at]gmail.com> wrote:
> On May 19, 10:28?am, Laszlo Nagy <gand...@shopzeus.com> wrote:
> > I cannot predict "acceptable speed" requirements, but I can tell that
> > there will be some clients downloading 100MB report files from the
> > server, so I presume that I will need a progress bar. I think that I
> > need to develop my own protocol for this, and probably the underlying
>
> Okay, so you need to wrap large binary files in some kind of message,
> without pre processing them. I think developing your own protocol
> using XDR is a safe bet.

You might want to consider using netstrings rather than XDR

http://cr.yp.to/proto/netstrings.txt

They are very simple and would be minimal overhead if all you are
passing is a file and a bit of metadata.

You'll find several modules for python with a bit of searching. Also
I believe twisted supports them directly or you could easily roll your
own.

--
Nick Craig-Wood <nick[at]craig-wood.com> -- http://www.craig-wood.com/nick
--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

May 20, 2008, 7:22 AM

Post #12 of 12 (82 views)
Permalink
Re: TPCServer and xdrlib [In reply to]

>
> - use simple file copying from a mounted network drive
Untrustable clients should not mount out anything from my server. (Also,
it is not a protocol. I need to communicate with a real program, not
just copying files.)
> - use http (web server)
I mentioned this before - don't know how to keep-alive with
simplehttpserver. Similar solutions e.g. Apache + mod_python are too
heavy weight. Too many dependencies etc.
> - use ftp
> - use scp
> - use rsync
>
> Why wouldn't one of these work for you? Did I miss something in your
> original requirements?
Yes. I also need business logic on the server. Not just copying file. It
happens that some of the messages will contain images.


Thank you for all your efforts. I think I'll go with TCPServer + xdrlib.


Laszlo



--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.