
roy at gnomon
Feb 28, 2004, 7:51 AM
Post #1 of 17
(4443 views)
Permalink
|
|
Rejecting spam from <>: SES, message-id caches and cookies (long)
|
|
I've been thinking a fair bit over the past few months about the correct way to deal with unwanted mail from the <> address, be it spam explicitly sent from that address, or bounces from joe jobs. There are two main approaches to this that I've seen discussed, and I've seen references to both solutions being deployed (though how wisespread they are is difficult to determine). The first is to use a timespamed, cryptographically signed envelope sender, as is proposed in SES. In fact, I currently do something similar using TMDA's dated address feature. The second approach is to record the message-ids of all messages that you emit, and then scan incoming mail from the <> address for a message-id you recognise. I'm also going to propose a third class of solution for discussion: place a cryptographically signed cookie in an extension header. With the first approach (signed envelope sender), there are a couple of points to note with regard to how it interacts with other anti-spam technology already in widespread use. The first issue is Sender Address Verification (SAV). A number of MTAs have the option to verify all envelope senders, by issueing the following sequence of commands to the primary MX associated with the sender's domain. MAIL FROM:<> RCPT TO:<address-to-test> RSET The idea is to test whether the originating domain thinks the address is a valid address. If the RCPT TO command gives a hard error, then an MTA using SAV will return a hard error for the transaction, too. Therefore, as Shevek notes, any SES scheme needs to postpone rejecting the SMTP transaction until after the DATA command. Rejecting unsigned addresses at the RCPT TO command will cause MTAs that use SAV to fail to validate your address, and hence reject mail from you. Since SAV is a standard feature of the latest versions of a number of popular MTAs (including exim and postfix), it's reasonable to expect it to be widely deployed, so any SES scheme needs to play nice with it. The second potential problem with SES and similar schemes is how they interact with challenge/response systems, such as TMDA. Challenge/response systems send a message to the sender (typically the envelope sender) the first time they receive a message from that sender -- this is the challenge. The sender is required to respond to the challenge in some way (typically by replying to it or by clicking on a URL containing in the challenge) in order to verify that they really received the challenge. Some systems incorporate a CAPTCHA into the challenge to verify that the sender is human, too. Once the sender has successfully responded, they are whitelisted, and won't have to go through this process again. With an SES-like scheme, every message will come from a new, unknown envelope sender, and hence will trigger a challenge and require a response before the message can be delivered. TMDA has a solution to this; since it implements both challenge/response and signed envelope senders, it has to. The solution is that any messsage using a signed sender should also contain an X-Primary-Address header, containing the primary (unsigned) address of the sender. Provided the value of the header matches a rudimentory sanity check (the domain matches that in the envelope sender) this address is whitelisted, rather than the envelope sender address. Subsequent whitelist checks test both the envelope sender and the X-Primary-Address header, and let the message through if either matches. I would therefore suggest that any SES scheme should add an X-Primary-Address header, since if nothing else this will ensure it plays nice with TMDA. An alternative solution would be for challenge/response systems to try to parse the signed envelope sender and guess the underlying real sender, but the problem with that is that there are probably too many different encoding schemes in use (eg TMDAs dated addresses are completely unlike SRS addresses). Perhaps a challenge/response system could fall back to this if there are known widely deployed SES systems that don't add a X-Primary-Address header. The third (related) problem with SES and similar schemes is that it interacts badly with greylisting. Greylisting is a technique where mail from a new, unrecognised envelope sender is delayed for a while (typically a couple of hours) by returning a 4xx code from the SMTP transaction. The idea is generally to see whether the originating IP address identifies itself as a spammer during this period (eg by hitting a spam trap) in which case the IP address will be blacklisted and subsequent delivery attempts will be rejected with a 5xx code. If nothing untoward happens during the holding period, then the envelope sender address is whitelisted, and subsequent delivery attempts from this envelope sender will succeed without delay. The problem with an SES sender talking to a greylisting receiver is that _every_ message will be delayed for a couple of hours, since it will always be from a new, unknown sender address that hasn't been whitelisted. The problem here is much the same as for challenge/response, and the potential solutions would appear to be the same: either whitelist the address indicated in the X-Primary-Address header, or attempt to intuit the correct address yourself by parsing the signed sender address. There's an added wrinkle, though, with using X-Primary-Address. The greylisting system can no longer reject with 4xx after RCPT TO; it needs to see the entire message in order to check whether the address in the X-Primary-Address header has been whitelisted, so will have to reject after the DATA phase. Message-ID systems are immune to these problems since they don't involve modifying the envelope sender. All bounce messages should contain the full headers of the original message, so they should be reliable. Because Message-ID systems tend to play nicer with other anti-spam technologies, I'm inclined to favour them over SES schemes. However, although they're ideal for implementing in an MUA, they cause scalability problems if you want to implement them in a large MTA cluster: each MTA needs access to the complete list of message-ids of all messages sent by all MTAs in the cluster, in order to be able to validate bounces. As an aside, there's a subcategory of message-id schemes which use a cryptographically-signed message ID, rather than maintaining a list of all message-ids. Unfortunately this is only implementable in an MUA, since by the time an MTA receives the message it might already have a message-id, preventing the MTA from making its own choice here. So I'd like to propose a third class of solution to this problem: the cryptographically signed cookie (CSC). The idea is that you add an additional header (X-CSC-Cookie, say) to all outgoing mail. For sake of argument, this could contain a timestamp, and a keyed hash of the timestamp and message-id of the message. When you receive a bounce, you just need to extract the Message-ID and X-CSC-Cookie fields to validate it. This avoids the problems of SES, and is completely scalable in large MTA clusters, since the MTA only needs to know the key in order to perform verification. There is another problem which all these solutions face, which is worth analysing, and that is how they handle automatically generated messages that are _not_ bounces, eg vacation messages, challenges from challenge/response systems, etc. I'll dismiss the challenge/response case very quickly by observing that all challenges sent by a default TMDA install are syntactically identical to bounces. They are sent from the <> address to the envelope sender of the original message, and include the full headers of the original message. So any of the above schemes (and indeed any other scheme that correctly handles bounce messages) will necesarily do the right thing for TMDA challenges, too. Of course, this may not be true of other challenge/response systems. So, what happens when an autoresponder sends a message from the <> sender address? Well, if the autoresponder sends a message to the envelope sender, then SES is fine. If it sends it to some other address (header sender, address already on file) then SES will lose. Message-ID schemes will work either if the message includes full headers (typically not the case in vacation messages) or if it quotes the Message-ID in an In-Reply-To or References header. The Cookie scheme fairs somewhat worse than the Message-ID scheme, here though, since it needs the full headers of the message. And of course, if the autogenerated message was some kind of notification what was not generated in response to a message, all the above schemes lose. Just some random thoughts, -roy ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?listname=srs-discuss [at] v2
|