A simple agent for filtering spam

Posted 2011-08-19 11:53 under spam, python, linux

I run my own Linux server, with Postfix for handling mail. I get a lot of spam, and have used a number of filters which can be tacked on to Postfix, but have always found them to be a pain. Here's a solution I have developed which takes a different approach, and works really well for me.

There are two problems. First of all, filters such as spamassassin require an external program such as fetchmail to separate spam from good mail. Fetchmail rules are bizarre, and the configuration is rather complex in my view.

The second problem is that the spam filtering rules, using Bayesian filtering rules, black/white lists, etc., are more complex than they need to be. I have noticed that the vast majority of spam is very easy to identify by looking at the to, from, and subject headers. A simple keyword search on these headers catches almost all of my spam.

The solution I developed for this is a Python program that checks the headers, and marks any message for which those three headers contain certain keywords or phrases as spam. The keyword list is currently about 250 words long, and includes laughable words such as "enlarg, penis, viagra, bigger, hottie" and many unmentionables that don't relate to my business, or to the topics of my e-mail discourse. Believe it or not, this works as well as the Bayesian filters, as long as I can update the list easily (see below). And it results in almost no false positives.

To solve the Postfix integration problem, my spam filter bypasses this entirely. There is no integration with Postfix at all. Instead, it runs on its own every few minutes, and logs into my mail, pretending to be me, and checks the headers of all my messages. Any messages that look like spam it (a) copies to a separate folder (which I can check once a day or so), and (b) deletes from my main inbox.

I have set up a cron job to run the script every 5 minutes or so, and leave Postfix running with minimal spam filtering enabled.

To add words to the spam keyword list, just run the script on its own (phrases have to be in quotes so the phrase appears as one rather than multiple words):

zapspam.py biotic hardon "get rid of"

You can download the script here, and a sample keyword list here.

Add a comment