Main-in-a-Box: tuning spamassassin

Home Main-in-a-Box: tuning spamassassin

Main-in-a-Box: tuning spamassassin

14th May 2025 sys-admin bash mail-servers Guides

Improve spam processing on your MiaB

What is Mail-in-a-Box?

Mail-in-a-Box (MiaB) is a collection of tightly coupled scripts that deploy and manage a Postfix/Dovecot/Postgrey/SpamAssassin mail server. Put simply, MiaB simplifies the job of deploying and managing an email server. It does a great job of correctly configuring and hardening Postfix and Dovecot. However, I’ve found MiaB's SpamAssassin configuration to be a bit lacking out of the box.

To be fair, this is likely because the developers err on the side of caution—SpamAssassin tuning can be tricky. If it’s set up too aggressively, you risk deliverability issues. This post will walk through the steps I take when tuning a client’s MiaB instance for better spam filtering.

An Overview of How SpamAssassin Works

Before we get into tuning SpamAssassin (SA), it’s helpful to understand the mechanisms SA uses to assign a spam score.

In simple terms, SA uses a scoring system. When Postfix hands an email off to SA, it is scanned against a variety of rules. Each rule has a score. If a rule matches during the scan, its score is added to the total. If the total exceeds a set threshold, the email is classified as spam.

SA uses three primary methods to determine spam:

Blacklists
Regex rules
Bayesian learning

Blacklists are typically defined by the administrator. They automatically block known domains and networks from sending mail to your server.

Regex rules scan the raw content of an email. If a rule matches, its score is added. You can write your own rules to detect common spam or phishing patterns. (Custom rule creation will be covered in a future post.)

Bayesian classification is machine learning–based. Since spam can look different across users and organizations, the Bayes system learns over time. The classifier is trained when a user moves an email to their spam or inbox folder. When an incoming message is scanned, it’s compared to known spam/ham and assigned a probability. That probability is then converted to a score.

End-User Training

For Bayesian classification to work well, users must understand that spam should be moved to the Spam folder, not the trash. Dovecot (in MiaB’s setup) is configured to trigger SpamAssassin’s learning when an email is moved to Spam. Emails deleted or trashed are not learned from.

Likewise, false positives should be moved back to the inbox so SA can learn they are ham.

Note on mailing lists:
Many people advise clicking “unsubscribe” on unwanted mail. This is very bad advice today. Attackers frequently use fake unsubscribe links to launch phishing attacks. Train your users to put unsolicited or suspicious mail in the Spam folder instead.

Tuning

Now that we understand how SA classifies spam, let’s improve its behavior on MiaB.

1. Enable Bayes

Open the SpamAssassin config:

sudo vi /etc/spamassassin/local.cf

Add or uncomment:

use_bayes 1
bayes_auto_learn 1

2. Enable Shortcircuiting for Bayes

In the same file (`/etc/spamassassin/local.cf`), add:

shortcircuit BAYES_99 spam

A shortcircuit rule tells SpamAssassin to stop further rule processing and treat the email as spam if this rule matches.
BAYES_99 triggers if the classifier determines a 99% probability that the message is spam.

Adjust the required score:

required_score 3.0

I default to 3.0 for stricter spam detection. You can tweak this based on your tolerance. Lower scores are more aggressive but may increase false positives. Monitor closely after changing this.

3. Ensure the Shortcircuit Plugin Is Enabled

Check if it’s loaded:

grep -i shortcircuit /etc/spamassassin/v320.pre

If commented out, open the file and uncomment the plugin:

vi /etc/spamassassin/v320.pre

loadplugin Mail::SpamAssassin::Plugin::Shortcircuit

4. Restart SpamAssassin

sudo systemctl restart spamassassin spampd

Post update script

MiaB harbors a gotcha! when implementing custom changes. Whenever you perform an update/upgrade, your custom changes will be wiped out. To make re-implementing them easier, we'll put all of the above into a post-update script:

#!/bin/bash

# Path to SpamAssassin local config
LOCAL_CF="/etc/spamassassin/local.cf"
PLUGIN_FILE="/etc/spamassassin/v320.pre"

# Backup original configs (optional safety step)
cp "$LOCAL_CF" "$LOCAL_CF.bak.$(date +%F-%H%M%S)"
cp "$PLUGIN_FILE" "$PLUGIN_FILE.bak.$(date +%F-%H%M%S)"

echo "Re-applying SpamAssassin configuration..."

# Ensure required lines exist in local.cf
grep -q '^use_bayes' "$LOCAL_CF" || echo "use_bayes 1" >> "$LOCAL_CF"
grep -q '^bayes_auto_learn' "$LOCAL_CF" || echo "bayes_auto_learn 1" >> "$LOCAL_CF"
grep -q '^shortcircuit BAYES_99' "$LOCAL_CF" || echo "shortcircuit BAYES_99 spam" >> "$LOCAL_CF"
grep -q '^required_score' "$LOCAL_CF" || echo "required_score 3.0" >> "$LOCAL_CF"

# Ensure shortcircuit plugin is enabled
if grep -q '^#.*Shortcircuit' "$PLUGIN_FILE"; then
  sed -i 's/^#\(loadplugin Mail::SpamAssassin::Plugin::Shortcircuit\)/\1/' "$PLUGIN_FILE"
fi

echo "Restarting SpamAssassin services..."
systemctl restart spamassassin spampd

echo "Done"

Final Notes

Remember, the Bayesian classifier must be trained before it becomes accurate. It may take seeing several examples before it learns to catch a specific type of spam.

In an upcoming post, I’ll cover how to write custom regex rules to further improve phishing and spam detection.