Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: ModPerl: ModPerl

go crazy with me

 

 

First page Previous page 1 2 Next page Last page  View All ModPerl modperl RSS feed   Index | Next | Previous | View Threaded


jt at plainblack

Dec 18, 2005, 7:18 PM

Post #1 of 32 (1352 views)
Permalink
go crazy with me

Forget for a second what Apache is. Think outside the box with me. Go a little crazy.

Apache 2 (especially with mod_perl) opens up a whole new world of possibilities. There
are people turning apache into an FTP server, a chat server, a Mail server, a version
control system, etc. I want to turn it into a workflow system. If you think about it,
workflow is nothing but a set of transactional tasks (nothing new) with two additional
components (here's where it get's weird). The two additional components are cron
(scheduling) and queue (a task executor).

So the question is this: What would it take to add these two components to apache?

Let's think of this another way. What mechanism could we use, to handle what would
normally be considered non-transactional, offline maintenance functions?

Apache is the ultimate event handler. It's listening for socket events. Why couldn't we
change it just a bit to listen to timer events and thusly kick off an execution once per
minute to check a cron tab. The reading of cron tabs is the easy part
(DateTime::Cron::Simple for example). What would it take to just just get Apache to
handle events other than a socket request? Is it possible? Of course it is, presumably
it already knows how to handle signals. If it couldn't, there wouldn't be a way for us
to issue a SIGHUP to do a soft restart. So, how do we get it to also handle timer
events?

Any ideas?


JT ~ Plain Black
ph: 703-286-2525 ext. 810
fax: 312-264-5382
http://www.plainblack.com

I reject your reality, and substitute my own. ~ Adam Savage


perrin at elem

Dec 18, 2005, 9:28 PM

Post #2 of 32 (1351 views)
Permalink
Re: go crazy with me [In reply to]

On Sun, 2005-12-18 at 21:18 -0600, JT Smith wrote:
> I want to turn it into a workflow system. If you think about it,
> workflow is nothing but a set of transactional tasks (nothing new) with two additional
> components (here's where it get's weird). The two additional components are cron
> (scheduling) and queue (a task executor).

Earl Cahill and I talked about how to use apache for a queue system on
the list a while back. In the end, we both decided it was a bad idea.
Apache is a very flexible network server, but this task is very unlike a
network server. I ended up writing a simple forking daemon with
Parallel::ForkManager that stores the queue in a database, and I think
Earl ended up with something similar.

- Perrin


jt at plainblack

Dec 18, 2005, 9:48 PM

Post #3 of 32 (1324 views)
Permalink
Re: go crazy with me [In reply to]

Yup, I've actually already done it that way with both Parallel::ForkManager in one
instance and Proc::Queue as an alternative. I added in event handling with both Event
and Event::Lib as seperate trials. All those implementations were relatively easy to do.
But the question becomes, why? If everything else is running in Apache, why start a
seperate service to run these tasks? And again, I said I want to go crazy. Let's not
figure out how else we could do that (I already know that), but how could we do it using
Apache?

However, you're right, I should look back at the list archives and see what conclusions
other people asking similar questions came to. I guess I hadn't considered that this
question would have been asked before.


On Mon, 19 Dec 2005 00:28:42 -0500
Perrin Harkins <perrin [at] elem> wrote:
> On Sun, 2005-12-18 at 21:18 -0600, JT Smith wrote:
>> I want to turn it into a workflow system. If you think about it,
>> workflow is nothing but a set of transactional tasks (nothing new) with two additional
>> components (here's where it get's weird). The two additional components are cron
>> (scheduling) and queue (a task executor).
>
> Earl Cahill and I talked about how to use apache for a queue system on
> the list a while back. In the end, we both decided it was a bad idea.
> Apache is a very flexible network server, but this task is very unlike a
> network server. I ended up writing a simple forking daemon with
> Parallel::ForkManager that stores the queue in a database, and I think
> Earl ended up with something similar.
>
> - Perrin
>


JT ~ Plain Black
ph: 703-286-2525 ext. 810
fax: 312-264-5382
http://www.plainblack.com

I reject your reality, and substitute my own. ~ Adam Savage


jon at 2xlp

Dec 19, 2005, 7:31 AM

Post #4 of 32 (1330 views)
Permalink
Re: go crazy with me [In reply to]

You could look into the Twisted framework for python:
http://twistedmatrix.com/

It's a really solid networking framework, but its python based (not
perl).


On Dec 19, 2005, at 12:48 AM, JT Smith wrote:

> Yup, I've actually already done it that way with both
> Parallel::ForkManager in one instance and Proc::Queue as an
> alternative. I added in event handling with both Event and
> Event::Lib as seperate trials. All those implementations were
> relatively easy to do. But the question becomes, why? If everything
> else is running in Apache, why start a seperate service to run
> these tasks? And again, I said I want to go crazy. Let's not figure
> out how else we could do that (I already know that), but how could
> we do it using Apache?
>
> However, you're right, I should look back at the list archives and
> see what conclusions other people asking similar questions came to.
> I guess I hadn't considered that this question would have been
> asked before.
>
>
> On Mon, 19 Dec 2005 00:28:42 -0500
> Perrin Harkins <perrin [at] elem> wrote:
>> On Sun, 2005-12-18 at 21:18 -0600, JT Smith wrote:
>>> I want to turn it into a workflow system. If you think about it,
>>> workflow is nothing but a set of transactional tasks (nothing
>>> new) with two additional components (here's where it get's
>>> weird). The two additional components are cron (scheduling) and
>>> queue (a task executor).
>> Earl Cahill and I talked about how to use apache for a queue
>> system on
>> the list a while back. In the end, we both decided it was a bad
>> idea.
>> Apache is a very flexible network server, but this task is very
>> unlike a
>> network server. I ended up writing a simple forking daemon with
>> Parallel::ForkManager that stores the queue in a database, and I
>> think
>> Earl ended up with something similar.
>> - Perrin


jt at plainblack

Dec 19, 2005, 7:46 AM

Post #5 of 32 (1330 views)
Permalink
Re: go crazy with me [In reply to]

Please, I specifically asked not to tell me how else to do it. I want to know how, if at
all, it's possible to do it under Apache/modperl. I know I can do it 1,000,000 other
ways that I'm totally not interested in. I just want everyone to focus on what's
possible with Apache/modperl, and nothing outside of Apache/modperl.

If it's not possible so be it. I'm not trying to solve a problem here. I've already
solved the problem. What I'm trying to do is see if I can solve it better with
Apache/modperl.

Sorry if my words sound harsh. They aren't meant to. They're simply meant to stress that
solutions outside of Apache/modperl aren't what I'm interested in.



On Mon, 19 Dec 2005 10:31:05 -0500
Jonathan Vanasco <jon [at] 2xlp> wrote:
>
> You could look into the Twisted framework for python:
> http://twistedmatrix.com/
>
> It's a really solid networking framework, but its python based (not perl).
>
>
> On Dec 19, 2005, at 12:48 AM, JT Smith wrote:
>
>> Yup, I've actually already done it that way with both
>> Parallel::ForkManager in one instance and Proc::Queue as an
>> alternative. I added in event handling with both Event and
>> Event::Lib as seperate trials. All those implementations were
>> relatively easy to do. But the question becomes, why? If everything
>> else is running in Apache, why start a seperate service to run
>> these tasks? And again, I said I want to go crazy. Let's not figure
>> out how else we could do that (I already know that), but how could
>> we do it using Apache?
>>
>> However, you're right, I should look back at the list archives and
>> see what conclusions other people asking similar questions came to.
>> I guess I hadn't considered that this question would have been
>> asked before.
>>
>>
>> On Mon, 19 Dec 2005 00:28:42 -0500
>> Perrin Harkins <perrin [at] elem> wrote:
>>> On Sun, 2005-12-18 at 21:18 -0600, JT Smith wrote:
>>>> I want to turn it into a workflow system. If you think about it,
>>>> workflow is nothing but a set of transactional tasks (nothing
>>>> new) with two additional components (here's where it get's
>>>> weird). The two additional components are cron (scheduling) and
>>>> queue (a task executor).
>>> Earl Cahill and I talked about how to use apache for a queue
>>> system on
>>> the list a while back. In the end, we both decided it was a bad
>>> idea.
>>> Apache is a very flexible network server, but this task is very
>>> unlike a
>>> network server. I ended up writing a simple forking daemon with
>>> Parallel::ForkManager that stores the queue in a database, and I
>>> think
>>> Earl ended up with something similar.
>>> - Perrin
>


JT ~ Plain Black
ph: 703-286-2525 ext. 810
fax: 312-264-5382
http://www.plainblack.com

I reject your reality, and substitute my own. ~ Adam Savage


perrin at elem

Dec 19, 2005, 11:13 AM

Post #6 of 32 (1337 views)
Permalink
Re: go crazy with me [In reply to]

On Sun, 2005-12-18 at 23:48 -0600, JT Smith wrote:
> I added in event handling with both Event
> and Event::Lib as seperate trials.

I just used a short sleep with Time::HiRes between polling the database
for new jobs.

> If everything else is running in Apache, why start a
> seperate service to run these tasks?

Because everything else sounds much harder and less reliable. That was
my reason.

> And again, I said I want to go crazy. Let's not
> figure out how else we could do that (I already know that), but how
> could we do it using
> Apache?

I came up with about a half dozen possible ways of doing our queue
system. Only one of them used apache as the only daemon. I looked into
custom protocol handlers and the rest of the mod_perl 2 API and there's
nothing I can see that would make a time-based system possible. It
would require rewriting some C code and probably changing things that
are not part of the module API.

The idea I had for handling events without all that goes like this:

Run a mod_perl server that the job submitters contact via HTTP. When a
process gets a request, it checks to see if there are enough listener
processes free for accepting jobs (as opposed to processing jobs). If
there are not, it adds the job to the queue and goes back to listening
for requests. If there are, it processes the job. This ensures that
processing jobs does not starve the ability to accept new ones.

A process which has started working on a job will loop (keeping the
current request alive), pulling jobs off the queue and working on them
until the queue is empty again, when it will allow the request to finish
and go back to sleep. In other words, new requests will start child
processes working, and all processes that get started will stay working
until the queue is empty again.

Pros
* No polling except by processes that have just finished a job and
are deciding whether or not to exit. (This may still be more
than the polling done by a simple perl daemon.)
* Quick pickup of new jobs.
Cons
* Clustering would require some kind of custom load balancer that
would know which machines were actually busiest. This might
involve reading the scoreboard, or something more complex. Much
harder than other approaches.
* No obvious way to tell how many processes are working vs.
listening. Would probably need to use something like
Cache::FastMmap to track this.
* The whole idea is fairly hard to explain, which probably means
it's too complex and will be hard to build and debug.

Anyway, feel free to expand on this idea or try it out. As complex as
it is, it avoids having to delve into the guts of the httpd code.

- Perrin


stas at stason

Dec 19, 2005, 11:33 AM

Post #7 of 32 (1346 views)
Permalink
Re: go crazy with me [In reply to]

JT Smith wrote:
> Yup, I've actually already done it that way with both
> Parallel::ForkManager in one instance and Proc::Queue as an alternative.
> I added in event handling with both Event and Event::Lib as seperate
> trials. All those implementations were relatively easy to do. But the
> question becomes, why? If everything else is running in Apache, why
> start a seperate service to run these tasks? And again, I said I want to
> go crazy. Let's not figure out how else we could do that (I already know
> that), but how could we do it using Apache?

Here at mailchannels.com we have first used mp2 to handle the email
traffic shaping entirely inside mod_perl2, but the nature of our product
is so different from serving HTTP, it just won't scale (mostly
memory-wise, but also too many processes). We have now switched to having
Event::Lib (over libevent) doing all the non-blocking IO and using mp2's
protocol handler to do blocking IO (like network-bound operations). The
performance is just amazing, hardly any memory used and we can easily
handle a thousand concurrent connections on very low-end hardware.

Switching to event based flow was a challenge, since you no longer have
the normal logic flow. But we have written a few abstraction layers and
now it's almost easy. We are planning to release our AsyncIO abstraction
module on CPAN once we have some spare resources.

I highly recommend Event::Lib, at least for its wonderful maintainer:
Tassilo von Parseval, who's a great perl/C/XS expert and who is resolving
any problems with Event::Lib almost as soon as we are posting the bug
reports. I wish more CPAN authors were as responsive as Tassilo is :)

--
_____________________________________________________________
Stas Bekman mailto:stas [at] stason http://stason.org/
MailChannels: Assured Messaging(TM) http://mailchannels.com/
The "Practical mod_perl" book http://modperlbook.org/
http://perl.apache.org/ http://perl.org/ http://logilune.com/


chase.venters at clientec

Dec 19, 2005, 11:43 AM

Post #8 of 32 (1322 views)
Permalink
Re: go crazy with me [In reply to]

On Mon, 19 Dec 2005, Perrin Harkins wrote:
> processes free for accepting jobs (as opposed to processing jobs). If
> there are not, it adds the job to the queue and goes back to listening
> for requests. If there are, it processes the job. This ensures that
> processing jobs does not starve the ability to accept new ones.

Be careful with this approach because this idea, while sweet and peachy on
paper, is often fundamentally wrong in practice. My place of employment
implemented complex call processing software based on queues percisely
because "processing jobs does not starve the ability to accept new ones".

I considered that the implementation would have terrible consequences for
performance and behavior under overload, and my coworkers strongly
disagreed. So I armed a test:

(call generator) -> (our software) -> (call receiver)

At 20 CPS, our software ate 20% of the CPU and 20% of the Memory. Thus it
wouldn't be all that unreasonable to expect 30 CPS to be possible. As it
turned out, problems of lock contention were harsh enough that attempting
the rate of call attempts per second on the call generator meant that the
call receiver would only get 5 call attempts per second.

But that failure was much more about lock contention than it was about
application workflow. The real demonstration was yet to come.

I fetched one of our developers back to our desk to point out the
contention issues. Rather than re-starting all the software, I told the
call generator to "pause" - ie, stop making new phone calls.

Stunningly, the moment I did this, the call receiver went from 5 call
attempts per second to 10, and kept receiving call attempts for several
minutes (even though they were all invalid and the generator had been
stopped long ago). What was happening? The application had been taking
messages into the queue, promising the call generator to handle them. Thus
the queue kept growing, and growing, and growing...

Now, a queue is of course applicable in some places. The application
mentioned in this thread is one of them. But IMNSHO, you should never ever
think that a queue's ability to allow you to accept new work while you're
busy processing current work is a good thing unless this trait is (a)
necessary for performance and (b) carefully constrained to keep things
under control (ie, the kernel's network buffers).

>
> - Perrin
>

Cheers,
Chase


perrin at elem

Dec 19, 2005, 11:57 AM

Post #9 of 32 (1341 views)
Permalink
Re: go crazy with me [In reply to]

On Mon, 2005-12-19 at 13:43 -0600, Chase Venters wrote:
> What was happening? The application had been taking
> messages into the queue, promising the call generator to handle them. Thus
> the queue kept growing, and growing, and growing...

That is what a queue is supposed to do when the demand exceeds the
capacity.

> Now, a queue is of course applicable in some places. The application
> mentioned in this thread is one of them. But IMNSHO, you should never ever
> think that a queue's ability to allow you to accept new work while you're
> busy processing current work is a good thing unless this trait is (a)
> necessary for performance and (b) carefully constrained to keep things
> under control (ie, the kernel's network buffers).

A queue is just a method for handling bursty demand in applications
where it would be too expensive to always provide enough throughput to
handle all requests immediately. You have to build the system with
enough throughput to handle the common load level without backing up.
If you don't need to handle bursts that exceed capacity, or you don't
mind telling clients to go away when capacity is full, there's no good
reason to use a queue.

- Perrin


chase.venters at clientec

Dec 19, 2005, 12:19 PM

Post #10 of 32 (1333 views)
Permalink
Re: go crazy with me [In reply to]

On Mon, 19 Dec 2005, Perrin Harkins wrote:

> On Mon, 2005-12-19 at 13:43 -0600, Chase Venters wrote:
>> What was happening? The application had been taking
>> messages into the queue, promising the call generator to handle them. Thus
>> the queue kept growing, and growing, and growing...
>
> That is what a queue is supposed to do when the demand exceeds the
> capacity.

Indeed. And that's why using a queue is sometimes wrong.

> A queue is just a method for handling bursty demand in applications
> where it would be too expensive to always provide enough throughput to
> handle all requests immediately. You have to build the system with
> enough throughput to handle the common load level without backing up.
> If you don't need to handle bursts that exceed capacity, or you don't
> mind telling clients to go away when capacity is full, there's no good
> reason to use a queue.

To be fair, it depends a bit on what your application is. In handling
calls, it's particularly nasty because a user expects a call to complete
rather quickly and if they're simply put on some line tens of thousands of
calls long, to be addressed several minutes from now, they'll just hang
up and complain.

An example application for a queue in a web application would be to set up
a request to send out an e-mail. (Simple apps simply invoke sendmail
directly, but the MTA can elect to queue the outgoing message as qmail
does [.Sendmail might too, I've just never been a fan or a user :P])

The advantage here is interactive response - the user doesn't have to wait
for the SMTP client to look up the MX records, initiate a session, send
the message...

So if your server can handle 10 mails per second and you never *ask* it to
do more, you have no problems. But let's suppose you start asking it for
15 mails per second. This spike lasts five seconds. You've now built your
load up by 25 messages (more if you factor in that spending the time
accepting the extra five takes time away from sending the 10).

If you were to continue from that point forward with 10 mails per second,
you'd be permanently 25 messages behind, until your activity dips low
enough to catch back up. In this case, the queue has done its job.

I don't want to imply that queues are bad or wrong... just that you have
to be careful when you consider accepting work while busy a good property
to have. If you spend much of your time close to capacity, queues can bite
you in the ass. And if you do happen to spend much time over your capacity
limits, the problem will at best be nasty and at worst be a damn nightmare,
depending on what it is you're using the queues for.

>
> - Perrin
>

Cheers,
Chase


valdez at linuxasylum

Dec 19, 2005, 1:42 PM

Post #11 of 32 (1319 views)
Permalink
Re: go crazy with me [In reply to]

Hello,

On Monday 19 December 2005 04:18, JT Smith wrote:
> Apache is the ultimate event handler. It's listening for socket events. Why
> couldn't we change it just a bit to listen to timer events and thusly kick
> off an execution once per minute to check a cron tab. The reading of cron
> tabs is the easy part (DateTime::Cron::Simple for example). What would it
> take to just just get Apache to handle events other than a socket request?
> Is it possible? Of course it is, presumably it already knows how to handle
> signals. If it couldn't, there wouldn't be a way for us to issue a SIGHUP
> to do a soft restart. So, how do we get it to also handle timer events?
>
> Any ideas?

I investigated the signal approach with MP1 for a client of mine, and it is
feasible. The idea was to build a queue system that was able to accept
messages via HTTP and used signals to wake up a child dedicated to delivery.

Here it is a simple piece of code to stuck an Apache child in a loop and
control it via signals (USR1 and USR2). We used it as a PerlLogHandler.

package Test::ChildLoop;

use Apache::Constants qw(OK DECLINED);

use strict;

my $usr1 = 0;

sub catch_usr1 {
my $signame = shift;
$usr1++;
warn "Somebody sent me a SIG$signame";
return;
}

$SIG{USR1} = \&catch_usr1;

my $exit = 0;

sub catch_usr2 {
my $signame = shift;
$exit++;
warn "Somebody sent me a SIG$signame";
return;
}

$SIG{TERM} = \&catch_usr2;

sub handler {
my $r = shift;
my $count = 0;

while (not $exit) {
warn "child $$ waited for $count seconds and got $usr1 usr1 signals\n";
sleep 5;
$count += 5;
warn "hey, we got a shutdown\n" if $exit;
}

return OK;
}

1;

They ended using a modified version of Postfix, so I didn't investigate
further on, but I'm still interested in this approach.

HTH, Valerio


paolini at students

Dec 19, 2005, 1:47 PM

Post #12 of 32 (1330 views)
Permalink
Re: go crazy with me [In reply to]

Hello,

On Monday 19 December 2005 04:18, JT Smith wrote:
> Apache is the ultimate event handler. It's listening for socket events. Why
> couldn't we change it just a bit to listen to timer events and thusly kick
> off an execution once per minute to check a cron tab. The reading of cron
> tabs is the easy part (DateTime::Cron::Simple for example). What would it
> take to just just get Apache to handle events other than a socket request?
> Is it possible? Of course it is, presumably it already knows how to handle
> signals. If it couldn't, there wouldn't be a way for us to issue a SIGHUP
> to do a soft restart. So, how do we get it to also handle timer events?
>
> Any ideas?

I investigated the signal approach with MP1 for a client of mine, and it is
feasible. The idea was to build a queue system that was able to accept
messages via HTTP and used signals to wake up a child dedicated to delivery.

Here it is a simple piece of code to stuck an Apache child in a loop and
control it via signals (USR1 and USR2). We used it as a PerlLogHandler.

package Test::ChildLoop;

use Apache::Constants qw(OK DECLINED);

use strict;

my $usr1 = 0;

sub catch_usr1 {
my $signame = shift;
$usr1++;
warn "Somebody sent me a SIG$signame";
return;
}

$SIG{USR1} = \&catch_usr1;

my $exit = 0;

sub catch_usr2 {
my $signame = shift;
$exit++;
warn "Somebody sent me a SIG$signame";
return;
}

$SIG{TERM} = \&catch_usr2;

sub handler {
my $r = shift;
my $count = 0;

while (not $exit) {
warn "child $$ waited for $count seconds and got $usr1 usr1 signals\n";
sleep 5;
$count += 5;
warn "hey, we got a shutdown\n" if $exit;
}

return OK;
}

1;

They ended using a modified version of Postfix, so I didn't investigate
further on, but I'm still interested in this approach.

HTH, Valerio

-------------------------------------------------------


jhfoo at nexlabs

Dec 19, 2005, 7:21 PM

Post #13 of 32 (1336 views)
Permalink
Re: go crazy with me [In reply to]

Just went to your company web site and read that you got the White Camel
award. Congrats, both on the award and your new career!

We're talking to the Director of Development here guys... :)

----- Original Message -----
From: "Stas Bekman" <stas [at] stason>
To: "JT Smith" <jt [at] plainblack>
Cc: <modperl [at] perl>
Sent: Tuesday, December 20, 2005 3:33 AM
Subject: Re: go crazy with me


> JT Smith wrote:
> > Yup, I've actually already done it that way with both
> > Parallel::ForkManager in one instance and Proc::Queue as an alternative.
> > I added in event handling with both Event and Event::Lib as seperate
> > trials. All those implementations were relatively easy to do. But the
> > question becomes, why? If everything else is running in Apache, why
> > start a seperate service to run these tasks? And again, I said I want to
> > go crazy. Let's not figure out how else we could do that (I already know
> > that), but how could we do it using Apache?
>
> Here at mailchannels.com we have first used mp2 to handle the email
> traffic shaping entirely inside mod_perl2, but the nature of our product
> is so different from serving HTTP, it just won't scale (mostly
> memory-wise, but also too many processes). We have now switched to having
> Event::Lib (over libevent) doing all the non-blocking IO and using mp2's
> protocol handler to do blocking IO (like network-bound operations). The
> performance is just amazing, hardly any memory used and we can easily
> handle a thousand concurrent connections on very low-end hardware.
>
> Switching to event based flow was a challenge, since you no longer have
> the normal logic flow. But we have written a few abstraction layers and
> now it's almost easy. We are planning to release our AsyncIO abstraction
> module on CPAN once we have some spare resources.
>
> I highly recommend Event::Lib, at least for its wonderful maintainer:
> Tassilo von Parseval, who's a great perl/C/XS expert and who is resolving
> any problems with Event::Lib almost as soon as we are posting the bug
> reports. I wish more CPAN authors were as responsive as Tassilo is :)
>
> --
> _____________________________________________________________
> Stas Bekman mailto:stas [at] stason http://stason.org/
> MailChannels: Assured Messaging(TM) http://mailchannels.com/
> The "Practical mod_perl" book http://modperlbook.org/
> http://perl.apache.org/ http://perl.org/ http://logilune.com/


stas at stason

Dec 19, 2005, 10:04 PM

Post #14 of 32 (1331 views)
Permalink
Re: go crazy with me [In reply to]

Foo Ji-Haw wrote:
> Just went to your company web site and read that you got the White Camel
> award. Congrats, both on the award and your new career!

Thanks for the kind words, Foo!

> We're talking to the Director of Development here guys... :)

Hehe, don't let titles mislead you :)

BTW, we are always looking for bright people to join our team. So if you
want to work in a fun team, have challenging projects and like Vancouver,
BC drop me a note!

>>Here at mailchannels.com we have first used mp2 to handle the email
>>traffic shaping entirely inside mod_perl2, but the nature of our product
>>is so different from serving HTTP, it just won't scale (mostly
>>memory-wise, but also too many processes). We have now switched to having
>>Event::Lib (over libevent) doing all the non-blocking IO and using mp2's
>>protocol handler to do blocking IO (like network-bound operations). The
>>performance is just amazing, hardly any memory used and we can easily
>>handle a thousand concurrent connections on very low-end hardware.
>>
>>Switching to event based flow was a challenge, since you no longer have
>>the normal logic flow. But we have written a few abstraction layers and
>>now it's almost easy. We are planning to release our AsyncIO abstraction
>>module on CPAN once we have some spare resources.
>>
>>I highly recommend Event::Lib, at least for its wonderful maintainer:
>>Tassilo von Parseval, who's a great perl/C/XS expert and who is resolving
>>any problems with Event::Lib almost as soon as we are posting the bug
>>reports. I wish more CPAN authors were as responsive as Tassilo is :)

--
_____________________________________________________________
Stas Bekman mailto:stas [at] stason http://stason.org/
MailChannels: Assured Messaging(TM) http://mailchannels.com/
The "Practical mod_perl" book http://modperlbook.org/
http://perl.apache.org/ http://perl.org/ http://logilune.com/


matt at sergeant

Dec 20, 2005, 6:40 AM

Post #15 of 32 (1323 views)
Permalink
Re: go crazy with me [In reply to]

On 19 Dec 2005, at 14:33, Stas Bekman wrote:

> JT Smith wrote:
>> Yup, I've actually already done it that way with both
>> Parallel::ForkManager in one instance and Proc::Queue as an
>> alternative. I added in event handling with both Event and Event::Lib
>> as seperate trials. All those implementations were relatively easy to
>> do. But the question becomes, why? If everything else is running in
>> Apache, why start a seperate service to run these tasks? And again, I
>> said I want to go crazy. Let's not figure out how else we could do
>> that (I already know that), but how could we do it using Apache?
>
> Here at mailchannels.com we have first used mp2 to handle the email
> traffic shaping entirely inside mod_perl2, but the nature of our
> product is so different from serving HTTP, it just won't scale (mostly
> memory-wise, but also too many processes).

Not sure what you mean by it not scaling - can you elaborate? Sure it
uses more ram than multiplexing, but even for a high traffic mail
server like apache.org's the mail-in-apache2 model works well
(apache.org uses Apache::Qpsmtpd for email).

I'm curious as to how you've mixed things up though - if the details
aren't private IP I'd love to know more.

Matt.


ksimpson at mailchannels

Dec 20, 2005, 9:00 AM

Post #16 of 32 (1314 views)
Permalink
Re: go crazy with me [In reply to]

> Not sure what you mean by it not scaling - can you elaborate? Sure it
> uses more ram than multiplexing, but even for a high traffic mail
> server like apache.org's the mail-in-apache2 model works well
> (apache.org uses Apache::Qpsmtpd for email).
>
> I'm curious as to how you've mixed things up though - if the details
> aren't private IP I'd love to know more.

I'll cut in so that Stas can save his keyboarding wrist.

Our application requires establishing lots of very long running SMTP
connections -- in busy sites, thousands. Most SMTP applications handle
all connections in a short period of time, meaning the average process
load is manageable. Our application selectively slows down
connections, causing the connection load to go up even though the CPU
isn't doing any more work.

Since it's not possible within reasonable memory constraints to have
thousands of Apache mod_perl processes resident, our application needs
to be designed using an approach that can allow multiple SMTP
connections per Perl process.

Instead of having one multiprocessing-oriented Apache mod_perl process
per SMTP connection, we use a single event-driven process to handle
all SMTP connections. To avoid blocking in this single process, we
dispatch any CPU-intensive tasks to a pool of Apache mod_perl
processes via a simple TCP protocol. We also dispatch out any tasks
which are difficult to re-program using asynchronous IO calls --
including calls out to libraries written by third parties (such as
spam scanning engines).

We are considering open sourcing the asyncronous API we built on top
of Event::Lib when we have time to refactor the application to
separate it from the application's proprietary parts.

TTUL
Ken

--
MailChannels: Assured Messaging (TM) | http://mailchannels.com

--
Suite 203, 910 Richards St.
Vancouver, BC, V6B 3C1, Canada
Direct: +1-604-729-1741


andreas.koenig.gmwojprw at franz

Dec 20, 2005, 2:25 PM

Post #17 of 32 (1316 views)
Permalink
Re: go crazy with me [In reply to]

>>>>> On Tue, 20 Dec 2005 09:40:43 -0500, Matt Sergeant <matt [at] sergeant> said:

> I'm curious as to how you've mixed things up though - if the details
> aren't private IP I'd love to know more.

Me too:-)

I'd also like to hear what people are doing when the apache model has
scaling problems. We have one problematic project here: we're a
gateway and must server a high number of very slow customers to a high
number of very slow feeds. Ideally we would run this in an event loop
or in coroutines/continuations style, but we have not yet tried that
out, mainly because so much of our infrastructure relies on everything
being apache. Is there something in apache2 that would make our lives
easier? (we have not yet switched to apache2 at all)

--
andreas


merlyn at stonehenge

Dec 20, 2005, 4:21 PM

Post #18 of 32 (1323 views)
Permalink
Re: go crazy with me [In reply to]

>>>>> "Andreas" == Andreas J Koenig <andreas.koenig.gmwojprw [at] franz> writes:

Andreas> I'd also like to hear what people are doing when the apache model has
Andreas> scaling problems. We have one problematic project here: we're a
Andreas> gateway and must server a high number of very slow customers to a high
Andreas> number of very slow feeds. Ideally we would run this in an event loop
Andreas> or in coroutines/continuations style, but we have not yet tried that
Andreas> out, mainly because so much of our infrastructure relies on everything
Andreas> being apache. Is there something in apache2 that would make our lives
Andreas> easier? (we have not yet switched to apache2 at all)

Are you already using a reverse-proxy? Make sure the front lightweight
servers *do* use cache and *don't* use keep-alive to the backend...
your heavy backend will spit the entire response, and go free to service
the next request... your thin front-end will then deliver that response
slowly as needed.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn [at] stonehenge> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!


jhfoo at nexlabs

Dec 20, 2005, 9:21 PM

Post #19 of 32 (1320 views)
Permalink
Re: go crazy with me [In reply to]

I usually go for the easy way out: buy more machines!

----- Original Message -----
From: "Andreas J. Koenig" <andreas.koenig.gmwojprw [at] franz>
To: "Matt Sergeant" <matt [at] sergeant>
Cc: "Stas Bekman" <stas [at] stason>; "JT Smith" <jt [at] plainblack>;
<modperl [at] perl>
Sent: Wednesday, December 21, 2005 6:25 AM
Subject: Re: go crazy with me


> >>>>> On Tue, 20 Dec 2005 09:40:43 -0500, Matt Sergeant
<matt [at] sergeant> said:
>
> > I'm curious as to how you've mixed things up though - if the details
> > aren't private IP I'd love to know more.
>
> Me too:-)
>
> I'd also like to hear what people are doing when the apache model has
> scaling problems. We have one problematic project here: we're a
> gateway and must server a high number of very slow customers to a high
> number of very slow feeds. Ideally we would run this in an event loop
> or in coroutines/continuations style, but we have not yet tried that
> out, mainly because so much of our infrastructure relies on everything
> being apache. Is there something in apache2 that would make our lives
> easier? (we have not yet switched to apache2 at all)
>
> --
> andreas


andreas.koenig.gmwojprw at franz

Dec 20, 2005, 11:09 PM

Post #20 of 32 (1322 views)
Permalink
Re: go crazy with me [In reply to]

>>>>> On 20 Dec 2005 16:21:42 -0800, merlyn [at] stonehenge (Randal L. Schwartz) said:

> Are you already using a reverse-proxy? Make sure the front lightweight
> servers *do* use cache and *don't* use keep-alive to the backend...
> your heavy backend will spit the entire response, and go free to service
> the next request... your thin front-end will then deliver that response
> slowly as needed.

I should not have mentioned that the customers are slow as well.
Currently our main concern is that our processes have to wait for
several data sources, then compute the answer and that our valuable
memory is wasted during the wait.

--
andreas


tagore at tagoresmith

Dec 21, 2005, 1:30 AM

Post #21 of 32 (1316 views)
Permalink
Re: go crazy with me [In reply to]

Andreas J. Koenig wrote:
>>>>>>On 20 Dec 2005 16:21:42 -0800, merlyn [at] stonehenge (Randal L. Schwartz) said:
>
>
> > Are you already using a reverse-proxy? Make sure the front lightweight
> > servers *do* use cache and *don't* use keep-alive to the backend...
> > your heavy backend will spit the entire response, and go free to service
> > the next request... your thin front-end will then deliver that response
> > slowly as needed.
>
> I should not have mentioned that the customers are slow as well.
> Currently our main concern is that our processes have to wait for
> several data sources, then compute the answer and that our valuable
> memory is wasted during the wait.

That is a very different issue, and one that may not be solved by
fooling around with the front end (and I wonder why you brought
continuations/asynchronous handling into it). You need to find your
bottleneck, fix it, and then see if the performance is acceptable at
that point. If your bottleneck is in generating results, rather than
serving them, you need to start there. In particular, you need to find
out if you are IO bound or CPU bound before you even begin to think
about fixing things (as that is easy to discover)- it's generally wise
to build efficient systems, but it makes no sense to optimize things
that aren't your bottleneck when you are actually having problems
serving your traffic in a production system. At least, that is my judgement.

T


matt at sergeant

Dec 21, 2005, 7:07 AM

Post #22 of 32 (1324 views)
Permalink
Re: go crazy with me [In reply to]

On 20 Dec 2005, at 12:00, Ken Simpson wrote:

>> Not sure what you mean by it not scaling - can you elaborate? Sure it
>> uses more ram than multiplexing, but even for a high traffic mail
>> server like apache.org's the mail-in-apache2 model works well
>> (apache.org uses Apache::Qpsmtpd for email).
>>
>> I'm curious as to how you've mixed things up though - if the details
>> aren't private IP I'd love to know more.
>
> I'll cut in so that Stas can save his keyboarding wrist.
>
[snip]

Thanks for that - that's what we found too, but not many places needed
the scalability. Anyhow if anyone's interested in an open source
version of what you've done they can check out the current svn trunk of
Qpsmtpd - http://smtpd.develooper.com/ - it uses Danga::Socket instead
of Event::Lib (similar concepts though) to multiplex all connections.

Matt.


perrin at elem

Dec 21, 2005, 9:16 AM

Post #23 of 32 (1322 views)
Permalink
Re: go crazy with me [In reply to]

On Wed, 2005-12-21 at 08:09 +0100, Andreas J. Koenig wrote:
> I should not have mentioned that the customers are slow as well.
> Currently our main concern is that our processes have to wait for
> several data sources, then compute the answer and that our valuable
> memory is wasted during the wait.

That problem is typically solved by a queue and some sort of
"working..." page that reloads until done. The multiplexing approach
being discussed here could work too, but you'd have to write
multiplexing client code for all of these data sources as well as the
server code.

- Perrin


perrin at elem

Dec 21, 2005, 9:27 AM

Post #24 of 32 (1308 views)
Permalink
Re: go crazy with me [In reply to]

On Tue, 2005-12-20 at 09:00 -0800, Ken Simpson wrote:
> Instead of having one multiprocessing-oriented Apache mod_perl process
> per SMTP connection, we use a single event-driven process to handle
> all SMTP connections. To avoid blocking in this single process, we
> dispatch any CPU-intensive tasks to a pool of Apache mod_perl
> processes via a simple TCP protocol. We also dispatch out any tasks
> which are difficult to re-program using asynchronous IO calls --
> including calls out to libraries written by third parties (such as
> spam scanning engines).

Okay, so you basically run two daemons -- mod_perl, and a separate
multiplexing one -- to handle this? Did you investigate what would be
involved in changing apache to support multiplexing as an MPM?

- Perrin


ksimpson at mailchannels

Dec 21, 2005, 10:44 AM

Post #25 of 32 (1309 views)
Permalink
Re: go crazy with me [In reply to]

> Okay, so you basically run two daemons -- mod_perl, and a separate
> multiplexing one -- to handle this? Did you investigate what would be
> involved in changing apache to support multiplexing as an MPM?

Yes -- we did look at that. The problem is that the MPM model assumes
that the handler runs until it's finished and then dies -- i.e. in a
single context of execution. To support an event driven model you
would need to provide a way within the Apache API to allow the handler
to post new event requests and return from handling without destroying
the connection.

i.e. the current Apache MPM model is:

1. Answer connection.
2. Call handler.
3. Handler runs.
4. Handler returns.
5. Disconnect.

The event driven model would need to look something like this:

1. Answer connection.
2. Call handler.
3. Handler registers event callbacks on socket and returns.
4. ... MPM potentially answers other connections.
... time passes ...
5. Event occurs on socket.
6. Handler callback called. Handler may register another callback...
... time passes
7. Event occurs on socket.
8. Handler callback called. Handler decides the connection is over,
and so disconnects from socket.
9. MPM goes on with life.

TTUL
Ken

--
MailChannels: Assured Messaging (TM) | http://mailchannels.com

--
Suite 203, 910 Richards St.
Vancouver, BC, V6B 3C1, Canada
Direct: +1-604-729-1741

First page Previous page 1 2 Next page Last page  View All ModPerl modperl RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.