Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: devel

spamC/D on Win32

 

 

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded


SpamAssassin at evanscorp

Jan 7, 2004, 8:21 AM

Post #1 of 6 (137 views)
Permalink
spamC/D on Win32

Hello all. I notice that there is a bit of work going on in the vicinity of
getting a Win32 port of spamc.

I have been working on precisely this, sorta. I have taken a slightly
different tact, however, and re-implemented in C++ using Windows API calls
and am calling it spamcpp. I did this mainly because, although spamc can be
made to work on the Windows platform, there are some platform-specific
Windows bits that can be used to make the Windows code a lot simpler. I
have an alpha version running already and it is working well but requires
further refinement which I will get to when I finish with spamD.

I think that, for the Windows community to pick up SpamAssassin, we need to
have a Windows distribution that will "just work" with minimal effort. It
needs to run as a Windows service, maybe also as a COM object. Ideally it
needs to run without cygwin. To that end, I am putting together a spamd
engine in C++ (spamdpp) that embeds ActivePerl (5.8) for execution of spamd.
I have read the work by Mike Bell at
http://www.openhandhome.com/howtosa260.html and I think the main issues that
he outlines can be addressed by moving some of the socket/thread handling
code out of Perl.

Right now my spamdpp can:
Start as a Windows service (or run interactively)
Load an instance of the Perl interpreter
Load a script into the interpreter
listen and accept spamc connection requests
Decode spamc requests and call a Perl routine passing the message
for analysis (this is similar behaviour to spamd)

This is all pretty cool (IMO) and there's some more work to be done but I'm
comfortable with most of that. I do have two problems that I'm hoping you
guys can help me with.

Problem 1 is that I don't know Perl. The script that spamdpp needs will be
a cut-down version of spamd. Basically it needs to set up the environment
in a similar way that what spamd does but without any socket play. It also
needs a sub similar to spamd's check() but again without any socket play. I
think it would be possible to package spamd such that the routines used by
spamdpp could be shared between spamdpp and spamd if spamd is restructured
slightly.

Problem 2 could be related to problem 1. The problem is that my spamdpp
engine is multi-threaded but when I have multiple requests being processed
at the same time in the same Perl interpreter the process is crashing at
random locations within the Perl engine. This could be because my script
isn't doing everything it needs to do or it could be that there's a problem
with the way I've embedded Perl. eg: perhaps the Perl interpreter itself
isn't thread-safe although I thought it would have to be for Peel fork() to
work?


What I am looking for is:
1. Feedback from those more knowledgeable than I about Perl and SA
about how crazy I am and/or how feasible or useful this work may be.
2. Someone who knows Perl well enough to write me a spamdpp script.
3. Someone who has embedded Perl into multi-threaded C/C++ programs
before.

I have already had some correspondence with Michael Bell and Fred and Fred
is keen to help test but more testers are always better. If my problems can
be solved we'll work out how we can test it more widely.

Sorry for the long-winded e-mail. Any comments would be appreciated.

Regards,

Phil.

PS: I believe my spamcpp and spamdpp can be cross-platform without too much
hassle. If I can get them running on Windows I will port to Linux (at
least).


sidney at sidney

Jan 7, 2004, 12:51 PM

Post #2 of 6 (135 views)
Permalink
Re: spamC/D on Win32 [In reply to]

Phillip Evans wrote:

> I have been working on precisely this, sorta. I have taken a slightly
> different tact, however, and re-implemented in C++ using Windows API calls
> and am calling it spamcpp.

Not to minimize your work on this, but the Windows port of spamc is
working fine and is just about all checked in to the source tree. So
give this a lower priority than getting spamd working. There may be some
awkward bits in the code and bit too many #ifdef, but it does just build
and run.

Spamd would be useful. Here are the issues: The four things used by
spamd that are not supported in the Windows perl are forking of
processes, signals, unix RPC sockets, and syslog. The unix rpc sockets
are no big deal -- It would be fine to have the Windows version of spamd
use only tcp/ip sockets. Syslog is easy to work around, ideally by
writing a log routine that can use syslog in unix and the event log
under Windows. ActivePerl does have an interface for using the event log.

The big problem is the way spamd uses child processes and signals. The
whole idea of spamd is to run one instance of the perl
interpreter/runtime that instantiates SpamAssassin and to spin off a
separate process or thread to process individual messages.

The documentation at
http://aspn.activestate.com/ASPN/docs/ActivePerl/lib/Pod/perlfork.html
gets into some of the deficiencies in the Windows implementation of perl
fork(). It may be that it can all be made to work with the latest 5.8.x
versions of ActivePerl.

But writing a C wrapper for spamd doesn't solve the big problem, which
is getting a single instance of the perl interpreter running
SpamAssassin and forking off smaller children to process individual
messages. And if you can solve that part, there really isn't any need
for the C wrapper, since the starting up an instance and listening part
of spamd should be easy to make portable.

If spamd can be made to work under Windows, a wrapper would be needed to
make it into a service and have it use the event log instead of syslog.
That would be really handy.

I don't mean to be discouraging -- I think this is a good idea, but we
need someone with some expertise on getting the fork stuff to work in
ActivePerl. I have read up some on the problems, but I'm not a perl
expert (yet).

-- sidney


SpamAssassin at evanscorp

Jan 7, 2004, 3:45 PM

Post #3 of 6 (135 views)
Permalink
RE: spamC/D on Win32 [In reply to]

Yep, the threading issue was the one I picked as being the show-stopper.
Unix RPC is no big deal (just to be obtuse we could implement Windows named
pipes instead), syslog is no big deal and someone who writes Perl can fix
the logging routines.

I know what spamd is about and I thought it would be easier to write a
wrapper that hosts a (single) Perl interpreter which is invoked from
multiple threads controlled by the host. I think this must be done in the
Perl ISAPI plugin somehow.

The design for spamdpp *does* have only one Perl interpreter loaded but it's
when it is called from multiple threads that I get problems. It could be
something to do with the Perl stack not using TLS or similar - I just don't
know. Perhaps I'll have to go hunting through the ActivePerl source.... In
any case, the idea was to maintain the architecture/design of spamd just
move the stuff that ActivePerl doesn't do well (ie: fork) into an
environment that does do that well (ie: C++).

Phil.





-----Original Message-----
From: Sidney Markowitz [mailto:sidney [at] sidney]
Sent: Thursday, 8 January 2004 6:52 AM
To: Phillip Evans
Cc: spamassassin-dev [at] incubator
Subject: Re: spamC/D on Win32

Phillip Evans wrote:

> I have been working on precisely this, sorta. I have taken a slightly
> different tact, however, and re-implemented in C++ using Windows API
> calls and am calling it spamcpp.

Not to minimize your work on this, but the Windows port of spamc is working
fine and is just about all checked in to the source tree. So give this a
lower priority than getting spamd working. There may be some awkward bits in
the code and bit too many #ifdef, but it does just build and run.

Spamd would be useful. Here are the issues: The four things used by spamd
that are not supported in the Windows perl are forking of processes,
signals, unix RPC sockets, and syslog. The unix rpc sockets are no big deal
-- It would be fine to have the Windows version of spamd use only tcp/ip
sockets. Syslog is easy to work around, ideally by writing a log routine
that can use syslog in unix and the event log under Windows. ActivePerl does
have an interface for using the event log.

The big problem is the way spamd uses child processes and signals. The whole
idea of spamd is to run one instance of the perl interpreter/runtime that
instantiates SpamAssassin and to spin off a separate process or thread to
process individual messages.

The documentation at
http://aspn.activestate.com/ASPN/docs/ActivePerl/lib/Pod/perlfork.html
gets into some of the deficiencies in the Windows implementation of perl
fork(). It may be that it can all be made to work with the latest 5.8.x
versions of ActivePerl.

But writing a C wrapper for spamd doesn't solve the big problem, which is
getting a single instance of the perl interpreter running SpamAssassin and
forking off smaller children to process individual messages. And if you can
solve that part, there really isn't any need for the C wrapper, since the
starting up an instance and listening part of spamd should be easy to make
portable.

If spamd can be made to work under Windows, a wrapper would be needed to
make it into a service and have it use the event log instead of syslog.
That would be really handy.

I don't mean to be discouraging -- I think this is a good idea, but we need
someone with some expertise on getting the fork stuff to work in ActivePerl.
I have read up some on the problems, but I'm not a perl expert (yet).

-- sidney


sidney at sidney

Jan 7, 2004, 4:52 PM

Post #4 of 6 (135 views)
Permalink
Re: spamC/D on Win32 [In reply to]

Phillip Evans wrote:
> Perhaps I'll have to go hunting through the ActivePerl source.... In
> any case, the idea was to maintain the architecture/design of spamd just
> move the stuff that ActivePerl doesn't do well (ie: fork) into an
> environment that does do that well (ie: C++).

What happens when spamd clones itself into a fork without having to load
a whole new interpreter and passes the current object environment to the
child? How do you simulate that with something that wraps around the
entire perl process? This are real questions, not a criticism, I'm
wondering how you do it or if those are the right questions.

One thing which might be worth trying: The documentation I linked to in
an earlier message in this thread says that ActivePerl 5.8.0 and newer
is much better at implementing fork() than earlier versions. It might be
worth dealing with the syslog and signal stuff and then see if spamd
simply works in ActivePerl 5.8.x.

-- sidney


spamassassin at evanscorp

Jan 7, 2004, 5:37 PM

Post #5 of 6 (135 views)
Permalink
RE: spamC/D on Win32 [In reply to]

In spamdpp the Perl code never forks and there's only one interpreter.
There is a "global" address space which is configured when the thing starts
up. At the moment my Perl script does most of the processing spamd does
prior to listening (remember the Perl doesn't do any of the thread or socket
work). The interpreter is then idle - it doesn't have any threads of its
own.

Meanwhile, the C++ host sets up and listens for connection attempts. When
it receives a connection it's the C++ code that starts a new thread, decodes
the request, and passes the message to a Perl routine (checkspam). The Perl
runs, classifies the message, and returns the results to the C++ code that
then packages it up and ships it back to the caller.

Let's see if I can draw an ASCII sequence diagram....

C++ Perl Mail::SpamAssassin

Start-|
<---|

registerService-|
<-------------|

Instantiate Perl ->new
---------------->loadScript()
---------------->runScript() -|
<----------|
Instantiate SA -->new

startListening-|
<------------|

.... Time passes .....

acceptNewConnection*-|
<------------------|
beginthread
parseRequest
-------------->checkspam(mail)
----------------->check(mail)
sendResponse
endthread


The "acceptNewConnection" bit loops forever.


As you can see (I hope) there is no forking going on in the Perl side of the
fence. In this example the global environment is shared between all
invocations. Naturally, this would require that Mail::SpamAssassin is
thread safe. If it's not (this is where I need Perl help) possibly a global
one could be cloned(?) at the beginning of the check(mail) Perl bit?
Initialising a new Mail::SpamAssassin on each invocation is a performance
killer (and sorta defeats the purpose to some extent).

I have written a load test Perl script that doesn't do very much (loops from
1 to 100000 writing to stdout) but I can quite happily invoke it on multiple
threads using this engine without difficulty. NB: It doesn't share memory
between invocations.


I haven't tried running spamd under ActivePerl 5.8 because Michael Bell has
done the experiment and it doesn't work. To quote Michael:
"Eh I dunno if it crashes, most of the time it just doesn't go
anywhere <g>"
That experiment also requires some level of Perl knowledge but my (Perl)
skill level doesn't extend beyond cutting and pasting interesting bits from
scripts that already do something pretty close to what I want (and even then
I'm not that competent <g>). Someone with more Perl expertise might be
interested in giving it a go, though, and I'd be most interested in the
results.

Phil.


-----Original Message-----
From: Sidney Markowitz [mailto:sidney [at] sidney]
Sent: Thursday, 8 January 2004 10:52 AM
To: Phillip Evans
Cc: spamassassin-dev [at] incubator
Subject: Re: spamC/D on Win32


Phillip Evans wrote:
> Perhaps I'll have to go hunting through the ActivePerl source.... In
> any case, the idea was to maintain the architecture/design of spamd
> just move the stuff that ActivePerl doesn't do well (ie: fork) into an
> environment that does do that well (ie: C++).

What happens when spamd clones itself into a fork without having to load
a whole new interpreter and passes the current object environment to the
child? How do you simulate that with something that wraps around the
entire perl process? This are real questions, not a criticism, I'm
wondering how you do it or if those are the right questions.

One thing which might be worth trying: The documentation I linked to in
an earlier message in this thread says that ActivePerl 5.8.0 and newer
is much better at implementing fork() than earlier versions. It might be
worth dealing with the syslog and signal stuff and then see if spamd
simply works in ActivePerl 5.8.x.

-- sidney


sidney at sidney

Jan 7, 2004, 6:22 PM

Post #6 of 6 (135 views)
Permalink
Re: spamC/D on Win32 [In reply to]

Maybe someone who has worked on spamd can chime in about what it assumes
is global and what it instantiates anew for each child process. I can
see how it would be easy to get that wrong and end up with something
that is not threadsafe calling the spamassassin process from outside.

Regarding running spamd under ActivePerl, I can see right off that it is
doing things that the doc
(http://www.xav.com/perl/faq/Windows/ActivePerl-Winfaq5.html) says
doesn't work under Windows: Signal handlers in Windows are not allowed
to die or exit. Maybe the fix is as simple as catching the signals in
Thread::Signal, setting a flag, and checking them in some appropriate
places in a main loop that exits when it sees the flag? The doc actually
says "signals are unsupported under Windows" but that doesn't explain
Thread::Signal which is in the Windows version.

-- sidney

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.