graham.dumpleton at gmail
Nov 5, 2009, 4:30 PM
Post #9 of 69
2009/11/5 Graham Leggett <minfrin [at] sharp>:
> Jim Jagielski wrote:
>> Let's get 2.4 out. And then let's rip it to shreds and drop
>> buckets/brigades and fold in serf.
> I think we should decide on exactly what problem we're trying to solve,
> before we start thinking about how it is to be solved.
> I'm keen to teach httpd v3.0 to work asynchronously throughout - still
> maintaining the prefork behaviour as a sensible default, but being
> asynchronous and non blocking throughout.
>  The fact that dodgy module code can leak, crash and be otherwise
> unsociable, and yet the server remains functional, is one of the key
> reasons why httpd still endures.
Sorry, long post but it was inevitable that I was going to air all
this at some point. Now seems a good as time as any.
I'd like to see a more radical architecture change, one that
recognises that it isn't just about serving static files any more and
provides much better builtin support for safe hosting of content
generating web applications constructed using alternate languages.
Before anyone jumps to the conclusion that I want to start seeing even
more heavy weight applications being run direct in the Apache server
child processes that accept initial requests, know that I don't want
that and that I actually want to promote a model which is the opposite
and which would encourage people not to do that.
As first step, like Jim I would like to see the current Apache server
child processes (workers) being asynchronous. In addition to that
though, I would like to see as part of core Apache, and running in
parent process, a means for spawning and monitoring of distinct
processes outside of the set of worker processes.
There is currently support in APR and in part in Apache for 'other'
processes via 'apr_proc_other_child_???()' functions, but this is
quite basic and you still need to a large degree need to roll your own
management routines around that for (re)spawning etc. As a result, you
see modules such as mod_cgid, mod_fastcgi, mod_fcgid, mod_wsgi all
having their own process management code for managing either their
daemon processes and/or manager process.
Technically one could implement this as a distinct module called
mod_procd which had an API which could be utilised by other modules
and stop duplication of all this stuff, but perhaps needs to go a step
further than that as far as being integrated into core. This is
because at present any 'other' processes are dealt with rather harshly
on graceful restarts because they are still simply killed off after a
few seconds if they don't shutdown. Being able to extend graceful
restart semantics into other processes may be worthwhile for some
The next thing want to see is for the whole FASTCGI type ecosystem be
revisited and for a better version of this concept for hosting web
applications in disparate languages be developed which modernises it
and brings it in as a core feature of Apache. The intent here being to
simplify the task for implementers as well as those wish to deploy
An important part of this would be to switch away from the interface
being a socket protocol. Instead, let the web server control both
halves of the communication channel between Apache worker process and
the application daemon process. What would replace the socket protocol
as interface would be C API and instead of the application having to
implement the socket protocol as foreign process, specific language
support would provided as a way of a dynamically loaded plugin. That
plugin would then use embedding to access support for a particular
language and just execute code in the file that the enclosing code of
the web server system told it to execute.
By way of example, imagine languages such as Python, Perl or Ruby
which in turn now have simplified web server interfaces in the form of
WSGI, PSGI and RACK, or even PHP. In the Apache configuration one
would simply say that a specific file extension is implemented by a
specific named language plugin. One would also indicate that a
separate manager process should be started up for managing processes
for handling any requests for that language.
Only after that separate manager process had been spawned be it by
just straight fork or preferably fork/exec would the specific language
plugin be loaded. This eliminates the problems caused by complex
language modules being preloaded into Apache parent process and
causing conflicts with other languages. The existing mod_php module is
a good example for causing lots of problems because of it dragging in
libraries which aren't multithread safe.
That manager process would then spawn its own language specific worker
processes as configured for handling actual requests. When the main
asynchronous Apache worker processes receive a request and determines
that target resource file is related to specific language, it
determines then how to connect to those language specific worker
processes and proxies the request to them for handling.
On the language worker process side the web server part of the code in
that process receives the proxied request and then calls into the
plugin code to have the request handle against the target file.
Because most language solutions for web applications aren't
asynchronous, these language specific worker processes would still use
traditional threading techniques, or could even be single threaded
where language or extension modules for that language aren't thread
safe such as is case for PHP.
In the bigger scheme of things what we would have is a set of front
end Apache worker processes which are asynchronous and which handle
static file requests, but where request relates to resources which
needed to be implemented by a specific complex language would be
proxied internally to other processes managed internally within the
sphere of the web server. The language specific worker processes could
be single threaded, multithreaded, or could also still provide their
own asynchronous API.
The important thing is we aren't loading support for these languages
into the Apache parent process of the main Apache worker processes.
The separation isn't as great as for FASTCGI and is still quite
tightly integrated with the web server code effectively controlling
both ends of the communication channel used when proxying as well as
all the process management.
The actual socket protocol used where the proxying occurs is at this
point not important as it is a private protocol within the web server
and is not intended to be exposed publicly. In other words, the
protocol is only used to communicate with the web servers own process
on the same server. It is not intended to be used to communicate with
process on other servers such as with FASTCGI. For the latter then
traditional HTTP proxying techniques can be used.
By the protocol being private then it can be changed and updated as
needed based on the requirements of the web server and you aren't
beholden to some external community and get stuck with a protocol that
never gets updated and which over time just becomes a poor solution
for the problem needing to be solved, such as in some respects has
occurred with FASTCGI where there has never been updates to FASTCGI to
make it more modern and usable.
The protocol might be packet based like with FASTCGI or AJP, but also
might be more like HTTP or something simplified like WAKA. Use of
packet based protocols wouldn't be strictly necessary as probably want
to avoid trying to multiplex multiple requests over a single proxied
channel. Important thing though is to be able to handle end to end
100-continue as necessary, something which FASTCGI cant currently do.
Use of distinct manager processes for each language also has other
benefits. First is that you could have multiple instances which are
configured differently. In other words, multiple process groups to
which requests for that language could be delegated. These could be
configured differently in respect of number of processes they create
for handling requests or number of threads in those process, whether
worker processes are precreated, created only on demand, how often
they are recycled, killed off when idle or what language modules are
preloaded into the manager process before the language specific worker
processes are forked.
Secondly, process groups could also drop privileges down to different
users rather than Apache user, making it a much simpler process to run
different applications or different users codes as different users,
thereby avoiding the whole mess that is suexec.
A third benefit is that the Apache configuration files themselves
would really only have details of how to map URLs to those managed
processes for each language. The configuration for each language or
process group could be distinct. This would allow configuration for
process groups to be changed and a process group restarted without
having to restart the whole of Apache.
As to the language specific worker processes, because the web server
code would control the main loop in that that as well you option up
better ability to control those processes. This includes better
management of process recycling when set number of requests reached,
when processes are idle or when memory usage bloats out. It also opens
up option for instrumenting code with hooks for collecting statistics
about how efficiently processes are handling requests and whether they
are getting overloaded and whether the number of processes/threads in
a process group needs to be tuned further to cope with load or even to
cut back on processes/threads if under utilised.
Finally we may even get a chance to improve how error logging is done
for such hosted language applications. The current FASTCGI method of
proxying error messages back via the request channel has its problems
including fact that technically there is no error channel during
process startup or between requests. If looking at error logging maybe
can come up with better system for handling error logging across a
large number of virtual hosts. The current limitations on needing to
use VirtualHost or other complex systems for separate live error logs
for lots of virtual hosts can be a pain and for VirtualHost doesn't
scale for large number of virtual hosts and isn't particular dynamic
because of large cost of restarting Apache to add new hosts. So, solve
dynamic configuration of virtual hosts where can have separate
application error logging and you are definitely on a winner.
Anyway, hope some can see an inkling of what I am suggesting. I have
left out many details based on my opinions and previous thinking about
this and certainly would be much easier to describe this on a white
board where can draw pictures.
I guess overall I just want to see if we can come up with a more
modern web server that is better for complex dynamic web application
hosting as well as static file serving. I don't want to see us just go
asynchronous, becoming just another static file serving web server
like nginx, lighttpd or cherokee and ignore the problem of dynamic web
applications and punt that on to a less than capable FASTCGI eco