
felicity at apache
Jan 20, 2004, 3:33 PM
Post #1 of 1
(64 views)
Permalink
|
|
svn commit: rev 6247 - in incubator/spamassassin/trunk: . lib/Mail lib/Mail/SpamAssassin lib/Mail/SpamAssassin/MIME masses spamd
|
|
Author: felicity Date: Tue Jan 20 14:33:37 2004 New Revision: 6247 Removed: incubator/spamassassin/trunk/lib/Mail/SpamAssassin/AuditMessage.pm incubator/spamassassin/trunk/lib/Mail/SpamAssassin/EncappedMIME.pm incubator/spamassassin/trunk/lib/Mail/SpamAssassin/EncappedMessage.pm incubator/spamassassin/trunk/lib/Mail/SpamAssassin/Message.pm incubator/spamassassin/trunk/lib/Mail/SpamAssassin/PhraseFreqs.pm Modified: incubator/spamassassin/trunk/INSTALL incubator/spamassassin/trunk/MANIFEST incubator/spamassassin/trunk/README incubator/spamassassin/trunk/USAGE incubator/spamassassin/trunk/lib/Mail/SpamAssassin.pm incubator/spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm incubator/spamassassin/trunk/lib/Mail/SpamAssassin/MIME.pm incubator/spamassassin/trunk/lib/Mail/SpamAssassin/MIME/Parser.pm incubator/spamassassin/trunk/lib/Mail/SpamAssassin/NoMailAudit.pm incubator/spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm incubator/spamassassin/trunk/lib/Mail/SpamAssassin/Received.pm incubator/spamassassin/trunk/masses/mass-check incubator/spamassassin/trunk/spamassassin.raw incubator/spamassassin/trunk/spamd/spamd.raw Log: bug 2939: initial work to remove Mail::Audit code Modified: incubator/spamassassin/trunk/INSTALL ============================================================================== --- incubator/spamassassin/trunk/INSTALL (original) +++ incubator/spamassassin/trunk/INSTALL Tue Jan 20 14:33:37 2004 @@ -46,8 +46,8 @@ spamassassin to filter your mail and then something else wrote it into a folder for you, then you should be fine. -Support for versions of the optional Mail::Audit module before 1.9 is no -longer included. +Support for versions of the optional Mail::Audit module is no longer +included. The default mode of tagging (which used to be ***SPAM*** in the subject line) no longer takes place. Instead the message is rewritten. @@ -387,28 +387,6 @@ Note that MIMEDefang users may need to set the 'pyzor_path' configuration setting, since MIMEDefang does not set a PATH by default. - - - - Mail::Audit, Mail::Internet, Net::SMTP (from CPAN) - - If you want to use SpamAssassin with Mail::Audit, you will (obviously) - require the Mail::Audit module, and any modules it requires (there's - lots of them, unfortunately). - - Additionally, Mail::Internet is required if you wish to use the - "-r/-w" options of the spamassassin program (reporting and replying, - for spam-trap mail accounts). - - If you use procmail, KMail, 'spamassassin', or you plan to use - 'spamd', you will *not* need these. - - Here's how to install them using CPAN.pm: - - perl -MCPAN -e shell - o conf prerequisites_policy ask - install Mail::Audit - quit - - Net::Ident (from CPAN) Modified: incubator/spamassassin/trunk/MANIFEST ============================================================================== --- incubator/spamassassin/trunk/MANIFEST (original) +++ incubator/spamassassin/trunk/MANIFEST Tue Jan 20 14:33:37 2004 @@ -26,7 +26,6 @@ configure lib/Mail/SpamAssassin.pm lib/Mail/SpamAssassin/ArchiveIterator.pm -lib/Mail/SpamAssassin/AuditMessage.pm lib/Mail/SpamAssassin/AutoWhitelist.pm lib/Mail/SpamAssassin/Bayes.pm lib/Mail/SpamAssassin/BayesStore.pm @@ -35,8 +34,6 @@ lib/Mail/SpamAssassin/ConfSourceSQL.pm lib/Mail/SpamAssassin/DBBasedAddrList.pm lib/Mail/SpamAssassin/Dns.pm -lib/Mail/SpamAssassin/EncappedMIME.pm -lib/Mail/SpamAssassin/EncappedMessage.pm lib/Mail/SpamAssassin/EvalTests.pm lib/Mail/SpamAssassin/HTML.pm lib/Mail/SpamAssassin/Locales.pm @@ -44,13 +41,11 @@ lib/Mail/SpamAssassin/MIME.pm lib/Mail/SpamAssassin/MIME/Parser.pm lib/Mail/SpamAssassin/MailingList.pm -lib/Mail/SpamAssassin/Message.pm lib/Mail/SpamAssassin/NetSet.pm lib/Mail/SpamAssassin/NoMailAudit.pm lib/Mail/SpamAssassin/PerMsgLearner.pm lib/Mail/SpamAssassin/PerMsgStatus.pm lib/Mail/SpamAssassin/PersistentAddrList.pm -lib/Mail/SpamAssassin/PhraseFreqs.pm lib/Mail/SpamAssassin/Received.pm lib/Mail/SpamAssassin/Reporter.pm lib/Mail/SpamAssassin/SHA1.pm Modified: incubator/spamassassin/trunk/README ============================================================================== --- incubator/spamassassin/trunk/README (original) +++ incubator/spamassassin/trunk/README Tue Jan 20 14:33:37 2004 @@ -69,10 +69,9 @@ [1]: http://razor.sourceforge.net/ The distribution provides "spamassassin", a command line tool to perform -filtering, along with "Mail::SpamAssassin", a set of perl modules which -implement a Mail::Audit plugin, allowing SpamAssassin to be used in a -Mail::Audit filter, spam-protection proxy SMTP or POP/IMAP server, or a -variety of different spam-blocking scenarios. +filtering, along with the "Mail::SpamAssassin" module set which allows +SpamAssassin to be used in spam-protection proxy SMTP or POP/IMAP server, +or a variety of different spam-blocking scenarios. In addition, Craig Hughes has contributed "spamd", a daemonized version of SpamAssassin, which runs persistently. Using "spamc", a lightweight C Modified: incubator/spamassassin/trunk/USAGE ============================================================================== --- incubator/spamassassin/trunk/USAGE (original) +++ incubator/spamassassin/trunk/USAGE Tue Jan 20 14:33:37 2004 @@ -37,17 +37,6 @@ -If you use Mail::Audit already: - - - run "perldoc Mail::SpamAssassin" and take a look at the synopsis, it - outlines what you need to add to your audit script. - - - Copy the configuration files (see CUSTOMISING, below) to a known - location, so your script can set the appropriate options for the - Mail::SpamAssassin constructor to load them. - - - If you use KMail: - http://kmail.kde.org/tools.html mentions: Modified: incubator/spamassassin/trunk/lib/Mail/SpamAssassin.pm ============================================================================== --- incubator/spamassassin/trunk/lib/Mail/SpamAssassin.pm (original) +++ incubator/spamassassin/trunk/lib/Mail/SpamAssassin.pm Tue Jan 20 14:33:37 2004 @@ -59,7 +59,7 @@ =head1 NAME -Mail::SpamAssassin - Mail::Audit spam detector plugin +Mail::SpamAssassin - Spam detector and markup engine =head1 SYNOPSIS @@ -69,7 +69,7 @@ my $status = $spamtest->check ($mail); if ($status->is_spam ()) { - $status->rewrite_mail (); + $mail = $status->rewrite_mail (); } else { ... } @@ -78,23 +78,19 @@ =head1 DESCRIPTION -Mail::SpamAssassin is a module to identify spam using text analysis and several -internet-based realtime blacklists. +Mail::SpamAssassin is a module to identify spam using several methods +including text analysis, internet-based realtime blacklists, statistical +analysis, and internet-based hashing algorithms. Using its rule base, it uses a wide range of heuristic tests on mail headers -and body text to identify "spam", also known as unsolicited commercial email. +and body text to identify "spam", also known as unsolicited bulk email. -Once identified, the mail can then be optionally tagged as spam for later -filtering using the user's own mail user-agent application. +Once identified, the mail can then be tagged as spam for later filtering +using the user's own mail user-agent application or at the mail transfer +agent. -This module also implements a Mail::Audit plugin, allowing SpamAssassin to be -used in a Mail::Audit filter. If you wish to use a command-line filter tool, -try the C<spamassassin> or C<spamd> tools provided. - -Note that, if you're using Mail::Audit, the constructor for the Mail::Audit -object must use the C<nomime> option, like so: - - my $ma = new Mail::Audit ( nomime => 1 ); +If you wish to use a command-line filter tool, try the C<spamassassin> +or C<spamd> tools provided. SpamAssassin also includes support for reporting spam messages to collaborative filtering databases, such as Vipul's Razor ( http://razor.sourceforge.net/ ). @@ -417,8 +413,8 @@ =item $status = $f->check ($mail) -Check a mail, encapsulated in a C<Mail::Audit> or -C<Mail::SpamAssassin::Message> object, to determine if it is spam or not. +Check a mail, encapsulated in a C<Mail::SpamAssassin::Message> object, +to determine if it is spam or not. Returns a C<Mail::SpamAssassin::PerMsgStatus> object which can be used to test or manipulate the mail message. @@ -435,8 +431,7 @@ local ($_); $self->init(1); - my $mail = $self->encapsulate_mail_object ($mail_obj); - my $msg = Mail::SpamAssassin::PerMsgStatus->new($self, $mail); + my $msg = Mail::SpamAssassin::PerMsgStatus->new($self, $mail_obj); # Message-Id is used for a filename on disk, so we can't have '/' in it. $msg->check(); $msg; @@ -446,8 +441,7 @@ =item $status = $f->learn ($mail, $id, $isspam, $forget) -Learn from a mail, encapsulated in a C<Mail::Audit> or -C<Mail::SpamAssassin::Message> object. +Learn from a mail, encapsulated in a C<Mail::SpamAssassin::Message> object. If C<$isspam> is set, the mail is assumed to be spam, otherwise it will be learnt as non-spam. @@ -478,8 +472,7 @@ require Mail::SpamAssassin::PerMsgLearner; $self->init(1); - my $mail = $self->encapsulate_mail_object ($mail_obj); - my $msg = Mail::SpamAssassin::PerMsgLearner->new($self, $mail); + my $msg = Mail::SpamAssassin::PerMsgLearner->new($self, $mail_obj); if ($forget) { $msg->forget($id); @@ -651,7 +644,7 @@ =item $f->report_as_spam ($mail, $options) -Report a mail, encapsulated in a C<Mail::Audit> object, as human-verified spam. +Report a mail, encapsulated in a C<Mail::SpamAssassin::Message> object, as human-verified spam. This will submit the mail message to live, collaborative, spam-blocker databases, allowing other users to block this message. @@ -691,8 +684,6 @@ my @msg = split (/^/m, $self->remove_spamassassin_markup($mail)); $mail = Mail::SpamAssassin::NoMailAudit->new ('data' => \@msg); - $mail = $self->encapsulate_mail_object ($mail); - # learn as spam if enabled if ( $self->{conf}->{bayes_learn_during_report} ) { $self->learn ($mail, undef, 1, 0); @@ -707,7 +698,7 @@ =item $f->revoke_as_spam ($mail, $options) -Revoke a mail, encapsulated in a C<Mail::Audit> object, as human-verified ham +Revoke a mail, encapsulated in a C<Mail::SpamAssassin::Message> object, as human-verified ham (non-spam). This will revoke the mail message from live, collaborative, spam-blocker databases, allowing other users to block this message. @@ -737,8 +728,6 @@ my @msg = split (/^/m, $self->remove_spamassassin_markup($mail)); $mail = Mail::SpamAssassin::NoMailAudit->new ('data' => \@msg); - $mail = $self->encapsulate_mail_object ($mail); - # learn as nonspam $self->learn ($mail, undef, 0, 0); @@ -855,10 +844,9 @@ my $list = Mail::SpamAssassin::AutoWhitelist->new($self); $self->init(1); - my $mail = $self->encapsulate_mail_object ($mail_obj); my @addrlist = (); - my @hdrs = $mail->get_header ('From'); + my @hdrs = $mail_obj->get_header ('From'); if ($#hdrs >= 0) { push (@addrlist, $self->find_all_addrs_in_line (join (" ", @hdrs))); } @@ -949,8 +937,7 @@ } } - my $mail = $self->encapsulate_mail_object ($mail_obj); - my $hdrs = $mail->get_all_headers(); + my $hdrs = $mail_obj->get_all_headers(); # remove DOS line endings $hdrs =~ s/\r//gs; @@ -1001,7 +988,7 @@ my @newbody = (); my $inreport = 0; - foreach $_ (@{$mail->get_body()}) + foreach $_ (@{$mail_obj->get_body()}) { s/\r?$//; # DOS line endings @@ -1130,8 +1117,7 @@ $self->init($use_user_prefs); my $mail = Mail::SpamAssassin::NoMailAudit->new(data => \@testmsg); - my $encapped = $self->encapsulate_mail_object ($mail); - my $status = Mail::SpamAssassin::PerMsgStatus->new($self, $encapped, + my $status = Mail::SpamAssassin::PerMsgStatus->new($self, $mail, { disable_auto_learning => 1 } ); $status->word_is_in_dictionary("aba"); # load triplets.txt into memory $status->check(); @@ -1174,8 +1160,7 @@ $self->{syntax_errors} += $self->{conf}->{errors}; my $mail = Mail::SpamAssassin::NoMailAudit->new(data => \@testmsg); - my $encapped = $self->encapsulate_mail_object ($mail); - my $status = Mail::SpamAssassin::PerMsgStatus->new($self, $encapped, + my $status = Mail::SpamAssassin::PerMsgStatus->new($self, $mail, { disable_auto_learning => 1 } ); $status->check(); @@ -1472,56 +1457,22 @@ ########################################################################### -sub encapsulate_mail_object { - my ($self, $mail_obj) = @_; - - # first, check to see if this is not actually a Mail::Audit object; - # it could also be an already-encapsulated Mail::Audit wrapped inside - # a Mail::SpamAssassin::Message. - if ($mail_obj->{is_spamassassin_wrapper_object}) { - return $mail_obj; - } - - if ($self->{use_my_mail_class}) { - my $class = $self->{use_my_mail_class}; - (my $file = $class) =~ s/::/\//g; - require "$file.pm"; - return $class->new($mail_obj); - } - - # new versions of Mail::Audit can have one of 2 different base classes. URGH. - # we can tell which class, by querying the is_mime() method. Support for - # MIME::Entity contributed by Andrew Wilson <andrew [at] rivendale>. - # - my $ismime = 0; - if ($mail_obj->can ("is_mime")) { $ismime = $mail_obj->is_mime(); } - - if ($ismime) { - require Mail::SpamAssassin::EncappedMIME; - return Mail::SpamAssassin::EncappedMIME->new($mail_obj); - } else { - require Mail::SpamAssassin::EncappedMessage; - return Mail::SpamAssassin::EncappedMessage->new($mail_obj); - } -} - sub find_all_addrs_in_mail { my ($self, $mail_obj) = @_; $self->init(1); - my $mail = $self->encapsulate_mail_object ($mail_obj); my @addrlist = (); foreach my $header (qw(To From Cc Reply-To Sender Errors-To Mail-Followup-To)) { - my @hdrs = $mail->get_header ($header); + my @hdrs = $mail_obj->get_header ($header); if ($#hdrs < 0) { next; } push (@addrlist, $self->find_all_addrs_in_line (join (" ", @hdrs))); } # find addrs in body, too - foreach my $line (@{$mail->get_body()}) { + foreach my $line (@{$mail_obj->get_body()}) { push (@addrlist, $self->find_all_addrs_in_line ($line)); } @@ -1602,12 +1553,8 @@ =head1 PREREQUISITES -C<Mail::Audit> -C<Mail::Internet> - -=head1 COREQUISITES - -C<Net::DNS> +C<HTML::Parser> +C<Sys::Syslog> =head1 MORE DOCUMENTATION Modified: incubator/spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm ============================================================================== --- incubator/spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm (original) +++ incubator/spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm Tue Jan 20 14:33:37 2004 @@ -331,7 +331,8 @@ $self->{mta_added_message_id_later} = 0; $self->{mta_added_message_id_backup} = 0; - my @received = grep(/\S/, split(/\n/, $self->get('Received'))); + # We may get headers with continuations in them, so deal with it ... + my @received = grep(/\S/, map { s/\r?\n\s+/ /g; $_; } $self->get('Received')); my $id = $self->get('Resent-Message-ID') || $self->get('Message-ID'); return unless defined($id) && $id; my $local = 1; Modified: incubator/spamassassin/trunk/lib/Mail/SpamAssassin/MIME.pm ============================================================================== --- incubator/spamassassin/trunk/lib/Mail/SpamAssassin/MIME.pm (original) +++ incubator/spamassassin/trunk/lib/Mail/SpamAssassin/MIME.pm Tue Jan 20 14:33:37 2004 @@ -63,6 +63,9 @@ use strict; use MIME::Base64; use Mail::SpamAssassin; +use Mail::SpamAssassin::HTML; +use MIME::Base64; +use MIME::QuotedPrint; # M::SA::MIME is an object method used to encapsulate a message's MIME part # @@ -121,30 +124,26 @@ $key =~ s/\s+$//; if (@_) { - my ( $decoded_value, $raw_value ) = @_; - $raw_value = $decoded_value unless defined $raw_value; + my $raw_value = shift; push @{ $self->{'header_order'} }, $rawkey; - if ( exists $self->{'headers'}{$key} ) { - push @{ $self->{'headers'}{$key} }, $decoded_value; - push @{ $self->{'raw_headers'}{$key} }, $raw_value; + if ( !exists $self->{'headers'}->{$key} ) { + $self->{'headers'}->{$key} = []; + $self->{'raw_headers'}->{$key} = []; } - else { - $self->{'headers'}{$key} = [$decoded_value]; - $self->{'raw_headers'}{$key} = [$raw_value]; - } - return $self->{'headers'}{$key}[-1]; + + push @{ $self->{'headers'}->{$key} }, _decode_header($raw_value); + push @{ $self->{'raw_headers'}->{$key} }, $raw_value; + + return $self->{'headers'}->{$key}->[-1]; } - my $want = wantarray; - if ( defined($want) ) { - if ($want) { - return unless exists $self->{'headers'}{$key}; - return @{ $self->{'headers'}{$key} }; - } - else { - return '' unless exists $self->{'headers'}{$key}; - return $self->{'headers'}{$key}[-1]; - } + if (wantarray) { + return unless exists $self->{'headers'}->{$key}; + return @{ $self->{'headers'}->{$key} }; + } + else { + return '' unless exists $self->{'headers'}->{$key}; + return $self->{'headers'}->{$key}->[-1]; } } @@ -159,12 +158,12 @@ $key =~ s/\s+$//; if (wantarray) { - return unless exists $self->{'raw_headers'}{$key}; - return @{ $self->{'raw_headers'}{$key} }; + return unless exists $self->{'raw_headers'}->{$key}; + return @{ $self->{'raw_headers'}->{$key} }; } else { - return '' unless exists $self->{'raw_headers'}{$key}; - return $self->{'raw_headers'}{$key}[-1]; + return '' unless exists $self->{'raw_headers'}->{$key}; + return $self->{'raw_headers'}->{$key}->[-1]; } } @@ -316,6 +315,58 @@ return $self->{'type'}; } } + +sub delete_header { + my($self, $hdr) = @_; + + foreach ( grep(/^${hdr}$/i, keys %{$self->{'headers'}}) ) { + delete $self->{'headers'}->{$_}; + delete $self->{'raw_headers'}->{$_}; + } + + my @neworder = grep(!/^${hdr}$/i, @{$self->{'header_order'}}); + $self->{'header_order'} = \@neworder; +} + +sub __decode_header { + my ( $encoding, $cte, $data ) = @_; + + if ( $cte eq 'B' ) { + # base 64 encoded + return Mail::SpamAssassin::Util::base64_decode($data); + } + elsif ( $cte eq 'Q' ) { + # quoted printable + return Mail::SpamAssassin::Util::qp_decode($data); + } + else { + die "Unknown encoding type '$cte' in RFC2047 header"; + } +} + +=item _decode_header() + +Decode base64 and quoted-printable in headers according to RFC2047. + +=cut + +sub _decode_header { + my($header) = @_; + + return '' unless $header; + + # deal with folding and cream the newlines and such + $header =~ s/\n[ \t]+/\n /g; + $header =~ s/\r?\n//g; + + return $header unless $header =~ /=\?/; + + $header =~ + s/=\?([\w_-]+)\?([bqBQ])\?(.*?)\?=/__decode_header($1, uc($2), $3)/ge; + + return $header; +} + sub dbg { Mail::SpamAssassin::dbg (@_); } Modified: incubator/spamassassin/trunk/lib/Mail/SpamAssassin/MIME/Parser.pm ============================================================================== --- incubator/spamassassin/trunk/lib/Mail/SpamAssassin/MIME/Parser.pm (original) +++ incubator/spamassassin/trunk/lib/Mail/SpamAssassin/MIME/Parser.pm Tue Jan 20 14:33:37 2004 @@ -22,9 +22,6 @@ use Mail::SpamAssassin; use Mail::SpamAssassin::MIME; -use Mail::SpamAssassin::HTML; -use MIME::Base64; -use MIME::QuotedPrint; =item parse() @@ -70,28 +67,34 @@ my $msg = Mail::SpamAssassin::MIME->new(); my $header = ''; + # Go through all the headers of the message while ( my $last = shift @message ) { + # Store the non-modified headers in a scalar $msg->{'pristine_headers'} .= $last; - $last =~ s/\r?\n//; # NB: Really need to figure out special folding rules here! - if ( $last =~ s/^[ \t]+// ) { # if its a continuation - $header .= " $last"; # fold continuations + if ( $last =~ /^[ \t]+/ ) { # if its a continuation + $header .= $last; # fold continuations next; } + # Ok, there's a header here, let's go ahead and add it in. if ($header) { my ( $key, $value ) = split ( /:\s*/, $header, 2 ); - $msg->header( $key, $self->_decode_header($value), $value ); + $msg->header( $key, $value ); } # not a continuation... $header = $last; - last if ( $last =~ /^$/m ); + # Ok, we found the header/body blank line ... + last if ( $last =~ /^\r?$/m ); } - #$msg->{'pristine_body'} = \@message; + # Store the pristine body for later -- store as a copy since @message will get modified below + $msg->{'pristine_body'} = join('', @message); + + # Figure out the boundary my ($boundary); ($msg->{'type'}, $boundary) = Mail::SpamAssassin::Util::parse_content_type($msg->header('content-type')); dbg("main message type: ".$msg->{'type'}); @@ -277,39 +280,6 @@ # BTW: please leave this after add_body_parts() since it'll add it back. # delete $part_msg->{body_parts}; -} - -sub __decode_header { - my ( $encoding, $cte, $data ) = @_; - - if ( $cte eq 'B' ) { - # base 64 encoded - return Mail::SpamAssassin::Util::base64_decode($data); - } - elsif ( $cte eq 'Q' ) { - # quoted printable - return Mail::SpamAssassin::Util::qp_decode($data); - } - else { - die "Unknown encoding type '$cte' in RFC2047 header"; - } -} - -=item _decode_header() - -Decode base64 and quoted-printable in headers according to RFC2047. - -=cut - -sub _decode_header { - my($self, $header) = @_; - - return '' unless $header; - return $header unless $header =~ /=\?/; - - $header =~ - s/=\?([\w_-]+)\?([bqBQ])\?(.*?)\?=/__decode_header($1, uc($2), $3)/ge; - return $header; } sub dbg { Mail::SpamAssassin::dbg (@_); } Modified: incubator/spamassassin/trunk/lib/Mail/SpamAssassin/NoMailAudit.pm ============================================================================== --- incubator/spamassassin/trunk/lib/Mail/SpamAssassin/NoMailAudit.pm (original) +++ incubator/spamassassin/trunk/lib/Mail/SpamAssassin/NoMailAudit.pm Tue Jan 20 14:33:37 2004 @@ -71,233 +71,69 @@ use strict; use bytes; -use Fcntl qw(:DEFAULT :flock); -use Mail::SpamAssassin::Message; use Mail::SpamAssassin::MIME; use Mail::SpamAssassin::MIME::Parser; - [at] Mai::SpamAssassin::NoMailAudit::ISA = ( - 'Mail::SpamAssassin::Message' -); - # --------------------------------------------------------------------------- sub new { my $class = shift; my %opts = @_; - my $self = $class->SUPER::new(); - - $self->{is_spamassassin_wrapper_object} = 1; - $self->{has_spamassassin_methods} = 1; - $self->{headers} = { }; - $self->{header_order} = [ ]; + my $self = { + mime_parts => Mail::SpamAssassin::MIME::Parser->parse($opts{'data'} || \*STDIN), + }; bless ($self, $class); - - # data may be filehandle (default stdin) or arrayref - my $data = $opts{data} || \*STDIN; - - if (ref $data eq 'ARRAY') { - $self->{textarray} = $data; - } elsif (ref $data eq 'GLOB') { - if (defined fileno $data) { - $self->{textarray} = [ <$data> ]; - } - } - - # Parse the message for MIME parts - $self->{mime_parts} = Mail::SpamAssassin::MIME::Parser->parse($self->{textarray}); - - # Parse the message to get header information - $self->parse_headers(); return $self; } # --------------------------------------------------------------------------- -sub parse_headers { - my ($self) = @_; - local ($_); - - $self->{headers} = { }; - $self->{header_order} = [ ]; - my ($prevhdr, $hdr, $val, $entry); - - while (defined ($_ = shift @{$self->{textarray}})) { - # warn "parse_headers $_"; - if (/^\r*$/) { last; } - - $entry = $hdr = $val = undef; - - if (/^\s/) { - if (defined $prevhdr) { - $hdr = $prevhdr; $val = $_; - $val =~ s/\r+\n/\n/gs; # trim CRs, we don't want them - $entry = $self->{headers}->{$hdr}; - $entry->{$entry->{count} - 1} .= $val; - next; - - } else { - $hdr = "X-Mail-Format-Warning"; - $val = "No previous line for continuation: $_"; - $entry = $self->_get_or_create_header_object ($hdr); - $entry->{added} = 1; - } - - } elsif (/^From /) { - $self->{from_line} = $_; - next; - - } elsif (/^([\x21-\x39\x3B-\x7E]+):\s*(.*)$/s) { - # format of a header, as defined by RFC 2822 section 3.6.8; - # 'Any character except controls, SP, and ":".' - $hdr = $1; $val = $2; - $val =~ s/\r+//gs; # trim CRs, we don't want them - $entry = $self->_get_or_create_header_object ($hdr); - $entry->{original} = 1; - - } else { - $hdr = "X-Mail-Format-Warning"; - $val = "Bad RFC2822 header formatting in $_"; - $entry = $self->_get_or_create_header_object ($hdr); - $entry->{added} = 1; - } - - $self->_add_header_to_entry ($entry, $hdr, $val); - $prevhdr = $hdr; - } -} - -sub _add_header_to_entry { - my ($self, $entry, $hdr, $line, $order) = @_; - - # Do a normal push if no specific order # is set. - $order ||= @{$self->{header_order}}; - - # ensure we have line endings - if ($line !~ /\n$/s) { $line .= "\n"; } - - # Store this header - $entry->{$entry->{count}} = $line; - - # Push the header and which count it is in header_order - splice @{$self->{header_order}}, $order, 0, $hdr.":".$entry->{count}; - - # Increase the count of this header type - $entry->{count}++; -} - -sub _get_or_create_header_object { - my ($self, $hdr) = @_; - - if (!defined $self->{headers}->{$hdr}) { - $self->{headers}->{$hdr} = { - 'count' => 0, - 'added' => 0, - 'original' => 0 - }; - } - return $self->{headers}->{$hdr}; -} - -# --------------------------------------------------------------------------- - -sub _get_header_list { - my ($self, $hdr, $header_name_only) = @_; - - # OK, we want to do a case-insensitive match here on the header name - # So, first I'm going to pick up an array of the actual capitalizations used: - my $lchdr = lc $hdr; - my @cap_hdrs = grep(lc($_) eq $lchdr, keys(%{$self->{headers}})); - - # If the request is just for the list of headers names that matched only ... - if ( defined $header_name_only && $header_name_only ) { - return @cap_hdrs; - } - else { - # return the values in each of the headers - return map($self->{headers}->{$_},@cap_hdrs); - } -} - sub get_pristine_header { my ($self, $hdr) = @_; + + return $self->{mime_parts}->{pristine_headers} unless $hdr; my(@ret) = $self->{mime_parts}->{pristine_headers} =~ /^(?:$hdr:[ ]+(.*\n(?:\s+\S.*\n)*))/mig; if (@ret) { - return wantarray ? @ret : $ret[0]; + return wantarray ? @ret : $ret[-1]; } else { return $self->get_header($hdr); } } +#sub get { shift->get_header(@_); } sub get_header { my ($self, $hdr) = @_; # And now pick up all the entries into a list - my @entries = $self->_get_header_list($hdr); - - if (!wantarray) { - # If there is no header like that, return undef - if (scalar(@entries) < 1 ) { return undef; } - foreach my $entry (@entries) { - if($entry->{count} > 0) { - my $ret = $entry->{0}; - $ret =~ s/^\s+//; - $ret =~ s/\n\s+/ /g; - return $ret; - } - } - return undef; - - } else { - - if(scalar(@entries) < 1) { return ( ); } - - my @ret = (); - # loop through each entry and collect all the individual matching lines - foreach my $entry (@entries) - { - foreach my $i (0 .. ($entry->{count}-1)) { - my $ret = $entry->{$i}; - $ret =~ s/^\s+//; - $ret =~ s/\n\s+/ /g; - push (@ret, $ret); - } - } - - return @ret; + # This is assumed to include a newline at the end ... + # This is also assumed to have removed continuation bits ... + my @hdrs; + foreach ( $self->{'mime_parts'}->raw_header($hdr) ) { + s/\r?\n\s+/ /g; + push(@hdrs, $_); } -} -sub put_header { - my ($self, $hdr, $text, $order) = @_; - - my $entry = $self->_get_or_create_header_object ($hdr); - $self->_add_header_to_entry ($entry, $hdr, $text, $order); - if (!$entry->{original}) { $entry->{added} = 1; } + if (wantarray) { + return @hdrs; + } + else { + return $hdrs[-1]; + } } +#sub header { shift->get_all_headers(@_); } sub get_all_headers { my ($self) = @_; + my %cache = (); my @lines = (); - # warn "JMD".join (' ', caller); - push(@lines, $self->{from_line}) if ( defined $self->{from_line} ); - foreach my $hdrcode (@{$self->{header_order}}) { - $hdrcode =~ /^([^:]+):(\d+)$/ or next; - - my $hdr = $1; - my $num = $2; - my $entry = $self->{headers}->{$hdr}; - next unless defined($entry); - - my $text = $hdr.": ".$entry->{$num}; - if ($text !~ /\n$/s) { $text .= "\n"; } - push (@lines, $text); + foreach ( @{$self->{mime_parts}->{header_order}} ) { + push(@lines, "$_: ".($self->get_header($_))[$cache{$_}++]); } if (wantarray) { @@ -307,118 +143,38 @@ } } -sub replace_header { - my ($self, $hdr, $text) = @_; - - # Figure out where the first case insensitive header of this name is stored. - # We'll use this to add the new header with the same case and in the order. - my($casehdr,$order) = ($hdr,undef); - my $lchdr = lc "$hdr:0"; # just lc it once - - # Now find the header - for ( my $count = 0; $count <= @{$self->{header_order}}; $count++ ) { - next unless (lc $self->{header_order}->[$count] eq $lchdr); - - # Remember where in the order the header is, and the case of said header. - $order = $count; - ($casehdr = $self->{header_order}->[$count]) =~ s/:\d+$//; - - last; - } - - # Remove all instances of this header - $self->delete_header ($hdr); - - # Add the new header with correctly cased header and in the right place - return $self->put_header($casehdr, $text, $order); -} - sub delete_header { my ($self, $hdr) = @_; - - # Delete all versions of the header, case insensitively - foreach my $dhdr ( $self->_get_header_list($hdr,1) ) { - @{$self->{header_order}} = grep( rindex($_,"$dhdr:",0) != 0, @{$self->{header_order}} ); - delete $self->{headers}->{$dhdr}; - } + $self->{mime_parts}->delete_header($hdr); } +#sub body { return shift->get_body(@_); } sub get_body { my ($self) = @_; - return $self->{textarray}; -} - -sub replace_body { - my ($self, $aryref) = @_; - $self->{textarray} = $aryref; + my @ret = split(/^/m, $self->get_pristine_body()); + return \@ret; } # --------------------------------------------------------------------------- sub get_pristine { my ($self) = @_; - return join ('', $self->{mime_parts}->{pristine_headers}, @{ $self->{textarray} }); + return $self->{mime_parts}->{pristine_headers} . $self->{mime_parts}->{pristine_body}; } sub get_pristine_body { my ($self) = @_; - return join ('', @{ $self->{textarray} }); + return $self->{mime_parts}->{pristine_body}; } sub as_string { my ($self) = @_; - return join ('', $self->get_all_headers(), "\n", - @{$self->get_body()}); -} - -sub replace_original_message { - my ($self, $data) = @_; - - if (ref $data eq 'ARRAY') { - $self->{textarray} = $data; - } elsif (ref $data eq 'GLOB') { - if (defined fileno $data) { - $self->{textarray} = [ <$data> ]; - } - } - - $self->parse_headers(); -} - -# --------------------------------------------------------------------------- -# Mail::Audit emulation methods. - -sub get { shift->get_header(@_); } -sub header { shift->get_all_headers(@_); } - -sub body { - my ($self) = shift; - my $replacement = shift; - - if (defined $replacement) { - $self->replace_body ($replacement); - } else { - return $self->get_body(); - } + return $self->get_all_headers() . "\n" . $self->{mime_parts}->{pristine_body}; } sub ignore { my ($self) = @_; exit (0) unless $self->{noexit}; -} - -# --------------------------------------------------------------------------- - -# does not need to be called it seems. still, keep it here in case of -# emergency. -sub finish { - my $self = shift; - delete $self->{textarray}; - foreach my $key (keys %{$self->{headers}}) { - delete $self->{headers}->{$key}; - } - delete $self->{headers}; - delete $self->{mail_object}; } 1; Modified: incubator/spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm ============================================================================== --- incubator/spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm (original) +++ incubator/spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm Tue Jan 20 14:33:37 2004 @@ -164,7 +164,7 @@ # TODO: change this to do whitelist/blacklists first? probably a plan # NOTE: definitely need AWL stuff last, for regression-to-mean of score - $self->clean_spamassassin_headers(); + $self->{msg}->delete_header('X-Spam-.*'); $self->{learned_hits} = 0; $self->{body_only_hits} = 0; $self->{head_only_hits} = 0; @@ -586,14 +586,11 @@ my ($self) = @_; if ($self->{is_spam} && $self->{conf}->{report_safe}) { - $self->rewrite_as_spam(); + return $self->rewrite_as_spam(); } else { - $self->rewrite_headers(); + return $self->rewrite_headers(); } - - # invalidate the header cache, we've changed some of them. - $self->{hdr_cache} = { }; } # rewrite the entire message as spam (headers and body) @@ -753,44 +750,49 @@ EOM my @lines = split (/^/m, $newmsg); - $self->{msg}->replace_original_message(\@lines); + return Mail::SpamAssassin::NoMailAudit->new(data => \@lines); } sub rewrite_headers { my ($self) = @_; - if($self->{is_spam}) { + # put the pristine headers into an array + my(@pristine_headers) = $self->{msg}->get_pristine_header() =~ /^([^:]+:[ ]+(?:.*\n(?:\s+\S.*\n)*))/mig; + my $addition = 'headers_ham'; + if($self->{is_spam}) { # Deal with header rewriting - foreach my $header (keys %{$self->{conf}->{rewrite_header}}) { - $_ = $self->{msg}->get_header($header); - my $tag = $self->_replace_tags($self->{conf}->{rewrite_header}->{$header}); + while ( my($header, $value) = each %{$self->{conf}->{rewrite_header}}) { + unless ( $header =~ /^(?:Subject|From|To)$/ ) { + dbg("rewrite: ignoring $header = $value"); + next; + } + + # Figure out the rewrite piece + my $tag = $self->_replace_tags($value); $tag =~ s/\n/ /gs; - if ($header eq 'Subject') { - s/^(?:\Q${tag}\E |)/${tag} /; - } - elsif ($header =~ /From|To/) { - s/(?:\t\Q(${tag})\E|)$/\t(${tag})/; - } - $self->{msg}->replace_header($header,$_); - } - # Deal with header adding - foreach my $header (keys %{$self->{conf}->{headers_spam}} ) { - my $data = $self->{conf}->{headers_spam}->{$header}; - my $line = $self->_process_header($header,$data) || ""; - $self->{msg}->put_header ("X-Spam-$header", $line); - } + # The tag should be a comment for this header ... + $tag = "($tag)" if ( $header =~ /^(?:From|To)$/ ); - } else { - - foreach my $header (keys %{$self->{conf}->{headers_ham}} ) { - my $data = $self->{conf}->{headers_ham}->{$header}; - my $line = $self->_process_header($header,$data) || ""; - $self->{msg}->put_header ("X-Spam-$header", $line); + # Go ahead and markup the headers + foreach ( @pristine_headers ) { + # skip non-correct-header or headers that are already tagged + next if ( !/^${header}:/i ); + s/^([^:]+:[ ]*)(?:\Q${tag}\E )?/$1${tag} /i; + } } + $addition = 'headers_spam'; } + + while ( my($header, $data) = each %{$self->{conf}->{$addition}} ) { + my $line = $self->_process_header($header,$data) || ""; + push(@pristine_headers, "X-Spam-$header: $line\n"); + } + + push(@pristine_headers, "\n", split (/^/m, $self->{msg}->get_pristine_body())); + return Mail::SpamAssassin::NoMailAudit->new(data => \@pristine_headers); } sub _process_header { @@ -907,28 +909,6 @@ ########################################################################### -=item $messagestring = $status->get_full_message_as_text () - -Returns the mail message as a string, including headers and raw body text. - -If the message has been rewritten using C<rewrite_mail()>, these changes -will be reflected in the string. - -Note: this is simply a helper method which calls methods on the mail message -object. It is provided because Mail::Audit uses an unusual (ie. not quite -intuitive) interface to do this, and it has been a common stumbling block for -authors of scripts which use SpamAssassin. - -=cut - -sub get_full_message_as_text { - my ($self) = @_; - return join ("", $self->{msg}->get_all_headers(), "\n", - @{$self->{msg}->get_body()}); -} - -########################################################################### - =item $status->finish () Indicate that this C<$status> object is finished with, and can be destroyed. @@ -1355,7 +1335,7 @@ else { my @hdrs = $self->{msg}->get_header ($hdrname); if ($#hdrs >= 0) { - $_ = join ("\n", @hdrs); + $_ = join ('', @hdrs); } else { $_ = undef; @@ -2422,34 +2402,6 @@ sub dbg { Mail::SpamAssassin::dbg (@_); } sub sa_die { Mail::SpamAssassin::sa_die (@_); } - -########################################################################### - -sub clean_spamassassin_headers { - my ($self) = @_; - - # attempt to restore original headers - for my $hdr (('Content-Transfer-Encoding', 'Content-Type', 'Return-Receipt-To')) { - my $prev = $self->{msg}->get_header ("X-Spam-Prev-$hdr"); - if (defined $prev && $prev ne '') { - $self->{msg}->replace_header ($hdr, $prev); - } - } - # delete the SpamAssassin-added headers - $self->{msg}->delete_header ("X-Spam-Checker-Version"); - $self->{msg}->delete_header ("X-Spam-Flag"); - $self->{msg}->delete_header ("X-Spam-Level"); - $self->{msg}->delete_header ("X-Spam-Prev-Content-Transfer-Encoding"); - $self->{msg}->delete_header ("X-Spam-Prev-Content-Type"); - $self->{msg}->delete_header ("X-Spam-Report"); - $self->{msg}->delete_header ("X-Spam-Status"); - foreach my $header (keys %{$self->{conf}->{headers_spam}} ) { - $self->{msg}->delete_header ("X-Spam-$header"); - } - foreach my $header (keys %{$self->{conf}->{headers_ham}} ) { - $self->{msg}->delete_header ("X-Spam-$header"); - } -} ########################################################################### Modified: incubator/spamassassin/trunk/lib/Mail/SpamAssassin/Received.pm ============================================================================== --- incubator/spamassassin/trunk/lib/Mail/SpamAssassin/Received.pm (original) +++ incubator/spamassassin/trunk/lib/Mail/SpamAssassin/Received.pm Tue Jan 20 14:33:37 2004 @@ -342,17 +342,17 @@ # so protect against that here. These will not appear in the final # message; they're just used internally. - if ($self->{msg}->can ("delete_header")) { - $self->{msg}->delete_header ("X-Spam-Relays-Trusted"); - $self->{msg}->delete_header ("X-Spam-Relays-Untrusted"); - - if ($self->{msg}->can ("put_metadata")) { - $self->{msg}->put_metadata ("X-Spam-Relays-Trusted", - $self->{relays_trusted_str}); - $self->{msg}->put_metadata ("X-Spam-Relays-Untrusted", - $self->{relays_untrusted_str}); - } - } +# if ($self->{msg}->can ("delete_header")) { +# $self->{msg}->delete_header ("X-Spam-Relays-Trusted"); +# $self->{msg}->delete_header ("X-Spam-Relays-Untrusted"); +# +# if ($self->{msg}->can ("put_metadata")) { +# $self->{msg}->put_metadata ("X-Spam-Relays-Trusted", +# $self->{relays_trusted_str}); +# $self->{msg}->put_metadata ("X-Spam-Relays-Untrusted", +# $self->{relays_untrusted_str}); +# } +# } $self->{tag_data}->{RELAYSTRUSTED} = $self->{relays_trusted_str}; $self->{tag_data}->{RELAYSUNTRUSTED} = $self->{relays_untrusted_str}; Modified: incubator/spamassassin/trunk/masses/mass-check ============================================================================== --- incubator/spamassassin/trunk/masses/mass-check (original) +++ incubator/spamassassin/trunk/masses/mass-check Tue Jan 20 14:33:37 2004 @@ -290,7 +290,7 @@ $ma->{noexit} = 1; # remove SpamAssassin markup, if present and the mail was spam - $_ = $ma->get ("X-Spam-Status"); + $_ = $ma->get_header ("X-Spam-Status"); if (defined($_) && /^Yes, hits=/) { my $newtext = $spamtest->remove_spamassassin_markup($ma); my @newtext = split (/^/m, $newtext); @@ -328,12 +328,12 @@ my $tests = join(",", sort(grep(length,$status->get_names_of_tests_hit(),$status->get_names_of_subtests_hit()))); my $extra = join(",", @extra); - if (defined $opt_rewrite) { - $status->rewrite_mail(); - open(REWRITE, "> " . ($opt_rewrite ? $opt_rewrite : "/tmp/out")); - print REWRITE $status->get_full_message_as_text(); - close(REWRITE); - } +# if (defined $opt_rewrite) { +# $status->rewrite_mail(); +# open(REWRITE, "> " . ($opt_rewrite ? $opt_rewrite : "/tmp/out")); +# print REWRITE $status->get_full_message_as_text(); +# close(REWRITE); +# } $id =~ s/\s/_/g; Modified: incubator/spamassassin/trunk/spamassassin.raw ============================================================================== --- incubator/spamassassin/trunk/spamassassin.raw (original) +++ incubator/spamassassin/trunk/spamassassin.raw Tue Jan 20 14:33:37 2004 @@ -208,20 +208,17 @@ $mail->ignore(); # will exit } -# not reporting? OK, do checks instead. Create a status object which -# holds details of the message's spam/not-spam status. + # not reporting? OK, do checks instead. Create a status object which + # holds details of the message's spam/not-spam status. my $status = $spamtest->check ($mail); - $status->rewrite_mail (); + $mail = $status->rewrite_mail (); + + print $mail->get_pristine(); if ($opt{'test-mode'}) { - # add the spam report to the end of the body as well, if testing. - my $lines = $mail->body(); - push (@{$lines}, split (/$/, $status->get_report())); - $mail->body ($lines); + print $status->get_report(); } -# if we're piping it, deliver it to stdout. - print $mail->header(), "\n", join ('', @{$mail->body()}); if (defined $opt{'error-code'} && $status->is_spam ()) { exit ($opt{'error-code'} || 5) ; } exit; @@ -642,7 +639,6 @@ sa-learn(1) Mail::SpamAssassin(3) Mail::SpamAssassin::Conf(3) -Mail::Audit(3) Razor(3) =head1 BUGS @@ -652,15 +648,6 @@ =head1 AUTHOR Justin Mason E<lt>jm /at/ jmason.orgE<gt> - -=head1 PREREQUISITES - -C<Mail::Audit> - -=head1 COREQUISITES - -C<Net::DNS> -C<Razor> =cut Modified: incubator/spamassassin/trunk/spamd/spamd.raw ============================================================================== --- incubator/spamassassin/trunk/spamd/spamd.raw (original) +++ incubator/spamassassin/trunk/spamd/spamd.raw Tue Jan 20 14:33:37 2004 @@ -760,10 +760,10 @@ my $spamhdr = "Spam: $response_spam_status ; $msg_score / $msg_threshold"; if ($method eq 'PROCESS') { - $status->rewrite_mail; #if $status->is_spam; + $mail = $status->rewrite_mail; #if $status->is_spam; # Build the message to send back and measure it - my $msg_resp = join '',$mail->header,"\n",@{$mail->body}; + my $msg_resp = $mail->as_string(); my $msg_resp_length = length($msg_resp); if($version >= 1.3) # Spamc protocol 1.3 means multi hdrs are OK {
|