Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

creating pipelines in python

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


perfreem at gmail

Nov 22, 2009, 2:49 PM

Post #1 of 7 (312 views)
Permalink
creating pipelines in python

hi all,

i am looking for a python package to make it easier to create a
"pipeline" of scripts (all in python). what i do right now is have a
set of scripts that produce certain files as output, and i simply have
a "master" script that checks at each stage whether the output of the
previous script exists, using functions from the os module. this has
several flaws and i am sure someone has thought of nice abstractions
for making these kind of wrappers easier to write.

does anyone have any recommendations for python packages that can do
this?

thanks.
--
http://mail.python.org/mailman/listinfo/python-list


lie.1296 at gmail

Nov 22, 2009, 6:28 PM

Post #2 of 7 (300 views)
Permalink
Re: creating pipelines in python [In reply to]

per wrote:
> hi all,
>
> i am looking for a python package to make it easier to create a
> "pipeline" of scripts (all in python). what i do right now is have a
> set of scripts that produce certain files as output, and i simply have
> a "master" script that checks at each stage whether the output of the
> previous script exists, using functions from the os module. this has
> several flaws and i am sure someone has thought of nice abstractions
> for making these kind of wrappers easier to write.
>
> does anyone have any recommendations for python packages that can do
> this?
>
> thanks.

You're currently implementing a pseudo-pipeline:
http://en.wikipedia.org/wiki/Pipeline_%28software%29#Pseudo-pipelines

If you want to create a unix-style, byte-stream-oriented pipeline, have
all scripts write output to stdout and read from stdin (i.e. read with
raw_input and write with print). Since unix pipeline's is byte-oriented
you will require parsing the input and formatting the output from/to an
agreed format between each scripts. A more general approach could use
more than two streams, you can use file-like objects to represent stream.

For a more pythonic pipeline, you can rewrite your scripts into
generators and use generator/list comprehension that reads objects from
a FIFO queue and write objects to another FIFO queue (queue can be
implemented using list, but take a look at Queue.Queue in standard
modules). Basically an Object Pipeline:
http://en.wikipedia.org/wiki/Pipeline_%28software%29#Object_pipelines

For unix-style pipeline, you shell/batch scripts is the best tool,
though you can also use subprocess module and redirect the process's
stdin's and stdout's. For object pipeline, it can't be simpler than
simply passing an input and output queue to each scripts.

For in-script pipelines (c.f. inter-script pipeline), you can use
generator/list comprehension and iterators. There are indeed several
modules intended for providing slightly neater syntax than
comprehension: http://code.google.com/p/python-pipeline/ though I
personally prefer comprehension.
--
http://mail.python.org/mailman/listinfo/python-list


robert.kern at gmail

Nov 23, 2009, 12:39 AM

Post #3 of 7 (297 views)
Permalink
Re: creating pipelines in python [In reply to]

per wrote:
> hi all,
>
> i am looking for a python package to make it easier to create a
> "pipeline" of scripts (all in python). what i do right now is have a
> set of scripts that produce certain files as output, and i simply have
> a "master" script that checks at each stage whether the output of the
> previous script exists, using functions from the os module. this has
> several flaws and i am sure someone has thought of nice abstractions
> for making these kind of wrappers easier to write.
>
> does anyone have any recommendations for python packages that can do
> this?

You may want to try joblib or ruffus. I haven't had a chance to evaluate either
one, though.

http://pypi.python.org/pypi/joblib/
http://pypi.python.org/pypi/ruffus/

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


paul.nospam at rudin

Nov 23, 2009, 1:02 AM

Post #4 of 7 (290 views)
Permalink
Re: creating pipelines in python [In reply to]

per <perfreem [at] gmail> writes:

> hi all,
>
> i am looking for a python package to make it easier to create a
> "pipeline" of scripts (all in python). what i do right now is have a
> set of scripts that produce certain files as output, and i simply have
> a "master" script that checks at each stage whether the output of the
> previous script exists, using functions from the os module. this has
> several flaws and i am sure someone has thought of nice abstractions
> for making these kind of wrappers easier to write.
>
> does anyone have any recommendations for python packages that can do
> this?
>

Not entirely what you're looking for, but the subprocess module is
easier to work with for this sort of thing than os. See e.g. <http://docs.python.org/library/subprocess.html#replacing-shell-pipeline>
--
http://mail.python.org/mailman/listinfo/python-list


wentland at cl

Nov 23, 2009, 1:20 AM

Post #5 of 7 (296 views)
Permalink
Re: creating pipelines in python [In reply to]

On Sun, Nov 22, 2009 at 14:49 -0800, per wrote:
> i am looking for a python package to make it easier to create a
> "pipeline" of scripts (all in python). what i do right now is have a
> set of scripts that produce certain files as output, and i simply have
> a "master" script that checks at each stage whether the output of the
> previous script exists, using functions from the os module. this has
> several flaws and i am sure someone has thought of nice abstractions
> for making these kind of wrappers easier to write.
> does anyone have any recommendations for python packages that can do
> this?

There are various possibilities. I would suggest you have a look at [1]
which details the creation of pipelines with generators that can be used
within *one* program. If you want to chain different programs together
you can use the subprocess package in the stdlib of Python 2.6.

[1] http://www.dabeaz.com/generators/
--
.''`. Wolodja Wentland <wentland [at] cl>
: :' :
`. `'` 4096R/CAF14EFC
`- 081C B7CD FF04 2BA9 94EA 36B2 8B7F 7D30 CAF1 4EFC
Attachments: signature.asc (0.82 KB)


perfreem at gmail

Nov 25, 2009, 8:42 AM

Post #6 of 7 (277 views)
Permalink
Re: creating pipelines in python [In reply to]

Thanks to all for your replies. i want to clarify what i mean by a
pipeline. a major feature i am looking for is the ability to chain
functions or scripts together, where the output of one script -- which
is usually a file -- is required for another script to run. so one
script has to wait for the other. i would like to do this over a
cluster, where some of the scripts are distributed as separate jobs on
a cluster but the results are then collected together. so the ideal
library would have easily facilities for expressing this things:
script X and Y run independently, but script Z depends on the output
of X and Y (which is such and such file or file flag).

is there a way to do this? i prefer not to use a framework that
requires control of the clusters etc. like Disco, but something that's
light weight and simple. right now ruffus seems most relevant but i am
not sure -- are there other candidates?

thank you.

On Nov 23, 4:02 am, Paul Rudin <paul.nos...@rudin.co.uk> wrote:
> per <perfr...@gmail.com> writes:
> > hi all,
>
> > i am looking for a python package to make it easier to create a
> > "pipeline" of scripts (all in python). what i do right now is have a
> > set of scripts that produce certain files as output, and i simply have
> > a "master" script that checks at each stage whether the output of the
> > previous script exists, using functions from the os module. this has
> > several flaws and i am sure someone has thought of nice abstractions
> > for making these kind of wrappers easier to write.
>
> > does anyone have any recommendations for python packages that can do
> > this?
>
> Not entirely what you're looking for, but the subprocess module is
> easier to work with for this sort of thing than os. See e.g. <http://docs.python.org/library/subprocess.html#replacing-shell-pipeline>

--
http://mail.python.org/mailman/listinfo/python-list


stefan_ml at behnel

Nov 25, 2009, 1:46 PM

Post #7 of 7 (275 views)
Permalink
Re: creating pipelines in python [In reply to]

per, 25.11.2009 17:42:
> Thanks to all for your replies. i want to clarify what i mean by a
> pipeline. a major feature i am looking for is the ability to chain
> functions or scripts together, where the output of one script -- which
> is usually a file -- is required for another script to run. so one
> script has to wait for the other. i would like to do this over a
> cluster, where some of the scripts are distributed as separate jobs on
> a cluster but the results are then collected together. so the ideal
> library would have easily facilities for expressing this things:
> script X and Y run independently, but script Z depends on the output
> of X and Y (which is such and such file or file flag).
>
> is there a way to do this? i prefer not to use a framework that
> requires control of the clusters etc. like Disco, but something that's
> light weight and simple. right now ruffus seems most relevant but i am
> not sure -- are there other candidates?

As others have pointed out, a Unix pipe approach might be helpful if you
want the processes to run in parallel. You can send the output of one
process to stdout, a network socket, an HTTP channel or whatever, and have
the next process read it and work on it while it's being generated by the
first process.

Looking into generators is still a good idea, even if you go for a pipe
approach. See the link posted by Wolodja Wentland.

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.