Greylisting using Kamaelia

September 19, 2007 at 10:49 PM | categories: python, oldblog | View Comments

I've written a greylisting server using Kamaelia, and its turned my mail back to something usable. I've been running this server for 52 hours now & it's processed over 5000 mails. 94% of those have been rejected as spam, leaving a handful of spams coming through from mailing lists. It's a spectacular change for me.

How does it work? Well at it's core, when someone connects, a mail handler is create, which is managed by this main loop:
def main(self):
    brokenClient = False
    self.handleConnect()
    self.gettingdata = False
    self.client_connected = True
    self.breakConnection = False

    while (not self.gettingdata) and (not self.breakConnection):
        yield WaitComplete(self.getline(), tag="_getline1")
        try:
            command = self.line.split()
        except AttributeError:
            brokenClient = True
            break
        self.handleCommand(command)
    if not brokenClient:
        if (not self.breakConnection):
            EndOfMessage = False
            self.netPrint('354 Enter message, ending with "." on a line by itself')
            while not EndOfMessage:
                yield WaitComplete(self.getline(), tag="getline2")
                if self.lastline():
                    EndOfMessage = self.endOfMessage()
            self.netPrint("250 OK id-deferred")

    self.send(producerFinished(),"signal")
    if not brokenClient:
        yield WaitComplete(self.handleDisconnect(),tag="_handleDisconnect")
    self.logResult()

Handle command then results in a bunch of SMTP commands being dealt with, and dispatched:
def handleCommand(self,command):
    if len(command) < 1:
        self.netPrint("500 Sorry we don't like broken mailers")
        self.breakConnection = True
        return
    if command[0] == "HELO": return self.handleHelo(command) # RFC 2821 4.5.1 required
    if command[0] == "EHLO": return self.handleEhlo(command) # RFC 2821 4.5.1 required
    if command[0] == "MAIL": return self.handleMail(command) # RFC 2821 4.5.1 required
    if command[0] == "RCPT": return self.handleRcpt(command) # RFC 2821 4.5.1 required
    if command[0] == "DATA": return self.handleData(command) # RFC 2821 4.5.1 required
    if command[0] == "QUIT": return self.handleQuit(command) # RFC 2821 4.5.1 required
    if command[0] == "RSET": return self.handleRset(command) # RFC 2821 4.5.1 required
    if command[0] == "NOOP": return self.handleNoop(command) # RFC 2821 4.5.1 required
    if command[0] == "VRFY": return self.handleVrfy(command) # RFC 2821 4.5.1 required
    if command[0] == "HELP": return self.handleHelp(command)
    self.netPrint("500 Sorry we don't like broken mailers")
    self.breakConnection = True

In practical terms that MailHandler is subclassed by a ConcreteMailHandler that effectively enforces the normal sequence of commands of SMTP. However part of it has a core hook when we receive the DATA command:
def handleData(self, command):
    if not self.seenRcpt:
        self.error("503 valid RCPT command must precede DATA")
        return

    if self.shouldWeAcceptMail():
        self.acceptMail()
    else:
        self.deferMail()
Clearly the main hook here is "shouldWeAcceptMail" which defaults in ConcreteMailHandler to returning False.

In the actual class we instantiate to handle connections - GreyListingPolicy which subclasses ConcreteMailHandler - we customise shouldWeAcceptMail as follows:
def shouldWeAcceptMail(self):
    if self.sentFromAllowedIPAddress():
        return True           # Allowed hosts can always send to anywhere through us
    if self.sentFromAllowedNetwork():
        return True           # People on trusted networks can always do the same
    if self.sentToADomainWeForwardFor():
        try:
            for recipient in self.recipients:
                if self.whiteListed(recipient):
                    return True
                if not self.isGreylisted(recipient):
                    return False
        except Exception, e:
            print "Whoops", e
        return True # Anyone can always send to hosts we own

    # print "NOT ALLOWED TO SEND, no valid forwarding"
    return False

Finally the actual core code for handling greylisting looks like this:
def isGreylisted(self, recipient):
    max_grey = 3000000
    too_soon = 180
    min_defer_time = 3600
    max_defer_time = 25000

    IP = self.peer
    sender = self.sender
    def _isGreylisted(greylist, seen, IP,sender,recipient):
        # If greylisted, and not been there too long, allow through
        if greylist.get(triplet,None) is not None:
            greytime = float(greylist[triplet])
            if (time.time() - greytime) > max_grey:
                del greylist[triplet]
                try:
                    del seen[triplet]
                except KeyError:
                    # We don't care if it's already gone
                    pass
                print "REFUSED: grey too long"
            else:
                print "ACCEPTED: already grey (have reset greytime)" ,
                greylist[triplet] = str(time.time())
                return True
        # If not seen this triplet before, defer and note triplet
        if seen.get( triplet, None) is None:
            seen[triplet] = str(time.time())
            print "REFUSED: Not seen before" ,
            return False

        # If triplet retrying waaay too soon, reset their timer & defer
        last_tried = float(seen[triplet])
        if (time.time() - last_tried) < too_soon:
            seen[triplet] = str(time.time())
            print "REFUSED: Retrying waaay too soon so resetting you!" ,
            return False
   
        # If triplet retrying too soon generally speaking just defer
        if (time.time() - last_tried) < min_defer_time :
            print "REFUSED: Retrying too soon, deferring" ,
            return False
   
        # If triplet hasn't been seen in aaaages, defer
        if (time.time() - last_tried) > max_defer_time :
            seen[triplet] = str(time.time())
            print "REFUSED: Retrying too late, sorry - reseting you!" ,
            return False
   
        # Otherwise, allow through & greylist them
        print "ACCEPTED: Now added to greylist!" ,
        greylist[triplet] = str(time.time())
        return True

    greylist = anydbm.open("greylisted.dbm","c")
    seen = anydbm.open("attempters.dbm","c")
    triplet = repr((IP,sender,recipient))
    result = _isGreylisted(greylist, seen, IP,sender,recipient)
    seen.close()
    greylist.close()
    return result

All of which is pretty compact, and I suspect is pretty OK for people to follow. The rest of the code in the file is really about dealing with errors and abuse of the SMTP code. (The reaction to which is to disconnect telling the sender to retry later)

At present I'm ironing out some remaining issues (some people simply don't disconnect and need booting), and the code also depends on versions of Axon & Kamaelia that are sitting on my Scratch branch. All that said, you can check out the code (link is to web svn) here using this command line:
svn co https://kamaelia.svn.sourceforge.net/svnroot/kamaelia/trunk/Sketches/MPS/Grey Grey
You can get the Axon & Kamaelia versions you need from this command line:
svn co https://kamaelia.svn.sourceforge.net/svnroot/kamaelia/branches/private_MPS_Scratch Kamaelia
Install the contents of the Axon directory, then the Kamaelia directory by doing "python setup.py install" in each.

You can then configure the greylisting code, by changing the class GreylistServer, which for me looks like this:
class GreylistServer(MoreComplexServer):
    socketOptions=(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    port = 25
    class protocol(GreyListingPolicy):
        servername = "mail.cerenity.org" 
# Server name we greet the world with
        serverid = "MPS-SMTP 1.0"         # Server type we declare ourselves to be
        smtp_ip = "192.168.2.9"  # SMTP server we forward to
        smtp_port = 8025         # SMTP server port we forward to
        allowed_senders = ["127.0.0.1"]
        allowed_sender_nets = ["192.168.2"] # Yes, only class C network style
        allowed_domains = [ "private.thwackety.com",
                            "thwackety.com",
                            "yeoldeclue.com",
                            ... other domains snipped ...
                            "kamaelia.org",
                            "owiki.org",
                            "cerenity.org"
        ]
        whitelisted_triples = [
             # IP, claimed sender (MAIL FROM:), recipient from "RCPT TO:"
             ( "213.38.186.202", "<post@mx1.redcats.co.uk>", "<...email censored...>"),
        ]
        whitelisted_nonstandard_triples = [
             # claimed hostname, IP prefix (can be full IP), recipient from "RCPT TO:"
             ("listmail.artsfb.org.uk", "62.73.155.19", "<...email censored...>"),
             ("domainwithborkedmailer.com", "204.15.20", "<
...email censored...>"),
             ("adomainwithborkedmailer.com", "204.15.20", "<
...email censored...>"),
             ("yetanotherdomainwithborkedmailer.com", "204.15.20", "<
...email censored...>"),
             ("andanotherdomainwithborkedmailer.com", "204.15.20", "<
...email censored...>"),
        ]

I've blanked out the email addresses, since there's no point in encouraging more spam... :-)

I'll be packaging this up properly at some point when I'm happy with the code. In the meantime if anyone grabs it and uses it from SVN, I'd be interested in hearing how you get on :-)
blog comments powered by Disqus