SpamAssassin is very configurable. Almost every setting can be configured on a system-wide or user-specific basis.
Reasons to customize
If SpamAssassin is so good, then why configure it? Well, there are several reasons why it’s worth improving spam filtering with SpamAssassin.
- SpamAssassin by default (that is, when installed but not customized) typically manages to detect over 80% of spam. After adding a few customizations, the detection rate can be greater than 95%.
- Everyone’s spam is different and one user’s spam might look like another user’s ham. By trying to be general, SpamAssassin may fail to filter spam for every user.
- Some of the features of SpamAssassin are disabled by default. By enabling them, the spam recognition rate is increased.
The following configuration options are discussed in this article:
- Altering the scores for rules: This allows rules to be disabled, poor rules to be given less weight, and better rules to be given a higher weight.
- Obtaining and using new rules: This can improve spam detection.
- Adding e-mail addresses to white and blacklists: This allows the e-mail from specified senders to always be treated as ham, no matter what the content is, or the opposite.
- Enabling SpamAssassin’s Bayesian filter: This can increase filtering accuracy from 80% to 95% or more.
Rules and scores
The configuration files for standard, sitewide, and user-specific settings are saved in different directories as follows:
- Standard configuration settings are stored in /usr/share/spamassassin.
- Site-wide customizations and settings are stored in /etc/mail/spamassassin/. All files matching *.cf are examined by SpamAssassin.
- User-specific settings are stored in ~/.spamassassin/local.cf.
The bulk of the standard configuration files is devoted to simple rules and their scores.
A rule is typically a match for letters, numbers, or other printing characters. Rules are written using a technique called regular expressions, or regex for short. This is a shorthand method of specifying that certain combinations of characters will trigger the rule. A rule might try to detect a particular word, such as “Rolex”, or it might look for particular words in certain orders, such as “buy Rolex online”. The rules are stored in text files.
Default files are stored in /usr/share/spamassassin. These are files that are shipped with SpamAssassin and may change with each release. It’s best not to modify these files or place new files in this directory, as an upgrade to SpamAssassin will overwrite these files. Most of the rules that SpamAssassin uses, and the scores applied to each rule, are defined within files in this directory.
The defaults can be overwritten by sitewide configuration files. These are placed in /etc/mail/spamassassin. SpamAssassin will read all files matching *.cf in this directory. Settings made here can overrule those in the default files. They can include defining new rules and new rule scores.
User-specific customizations can be placed in the ~/.spamassassin/local.cf file. Settings made here can override sitewide settings defined in /etc/mail/spamassassin, and default settings in /usr/share/spamassassin/. New rules may be defined here, and scores for existing rules can be overridden.
SpamAssassin first reads all the files in /usr/share/spamassassin in alphanumerical order; 10_misc.cf will be read before 23_bayes.cf. SpamAssassin then reads all the .cf files in /etc/mail/spamassassin/, again in alphanumeric order. Finally, SpamAssassin reads ~user/.spamassassin/user_prefs. If a rule or score is defined in two files, the setting in the last file read is used. This allows the administrator to override the defaults and a user to override the sitewide settings.
Each line in a rules file can be blank or contain a comment or a command. The hash or pound (#) symbol is used for comments. Rules generally have three parts, the rule definition, a textual description, and the score or series of scores. Convention dictates that all rule scores for rules provided by SpamAssassin should be located together in a separate file. That file is /usr/share/spamassassin/50_scores.cf.
Altering rule scores
The simplest configuration change is to change a rule score. There are two reasons why this might be done:
- A rule is very good at detecting spam, but the rule has a low score. E-mails that fire the rule are not being detected as spam.
- A rule is acting on non spam. As a result, e-mails that fire the rule are wrongly being detected as spam.
The rules that give a positive result when SpamAssassin is run are listed in the X-Spam-Status: header of the e-mail:
X-Spam-Status: Yes, score=5.8 required=5.0 tests=BAYES_05,HTML_00_10,
The rules applied to the e-mail are listed after tests=. If one continually appears in e-mail that should be marked as spam, but isn’t, then the score for the rule should be increased. If a rule often fires in e-mail that is wrongly classified as spam, the score should be decreased.
To find the current score, use the grep utility in all the locations where a score can be defined.
grep score.*BAYES /usr/share/spamassassin/* /etc/mail/spamassassin/*
/etc/mail/spamassassin/local_scores.cf:score RULE_NAME 0 0 1.665 2.599
In the previous example, the rule has a default score that is overridden in
The original score for the rule had four values. SpamAssassin changes the scores it uses, depending on whether network tests (for example, those that test open relays) are in use and whether the Bayesian Filter is in use. Four scores are listed, which are used in the following circumstances:
//===INSERT TABLE 02===
If only one score is given, as overridden in /etc/mail/spamassassin/local_scores.cf, it is used in all circumstances.
In the previous example, the system administrator has overridden the default score in /etc/mail/spamassassin/local_scores.cf with a single value in /etc/mail/spamassassin/local_scores.cf. To change this value for a particular user, their ~/.spamassassin/local.cf might read:
score RULE_NAME 1.2
This changes the score used from 4.34, set in /etc/mail/spamassassin/local_scores.cf, to 1.2. To disable the rule entirely, the score can be set to zero.
score RULE_NAME 0
Endless hours can be spent configuring rule scores. SpamAssassin includes tools to recalculate optimal rule scores, by examining existing e-mails, both spam and non spam. They are covered in detail in the book SpamAssassin published by Packt.
Using other rulesets
SpamAssassin has a large following, and the design of SpamAssassin has made it easy to add new rulesets, which are sets of rules and default scores for those rules. There are many different rulesets available. Most are based on a particular theme, for example finding the names of drugs often sold with spam or telephone numbers found in spam e-mails. Most custom rulesets are listed on the Custom Rulesets page of the SpamAssassin Wiki at http://wiki.apache.org/spamassassin/CustomRulesets.
As the battle against spam is so aggressive, rulesets have been developed that may possibly be uploaded daily. SpamAssassin provides this ability with the sa-update utility. You can choose to use sa-update on a regular basis, or to download a particular ruleset and keep it, or to manually update the rulesets that you choose. To obtain the best results in filtering spam, use of sa-update is recommended.
If you wish to install rulesets manually, the Wiki page gives a general description of each ruleset and a URL to download it. Once a ruleset has been chosen, we install it as follows:
- In a browser, follow the link on the SpamAssassin Wiki page. In most cases, the link will be to a file with a name matching *.cf, and a browser will open it as a text file.
- Save the file using the browser (normally, the File menu has a Save as option).
- Copy the file to /etc/mail/spamassassin—the rules will be automatically run if the file is placed in this location.
- Check that the file has scores in it, otherwise the rules will not be used.
- Monitor spam performance to ensure that legitimate e-mail is not being detected as spam.
Adding rules to SpamAssassin will increase the memory used by SpamAssassin, and the time that it takes to process e-mails. It is best to be cautious and add new rulesets gradually, to ensure that the effect on the machine is understood.
You may manually monitor the ruleset and update it on your system using the same process.
If you choose to use sa-update, you should plan your use of it. sa-update can use several channels, which are basically sources of rulesets. By default, the channel updates.spamassassin.org is used; another popular channel is the OpenProtect channel, called saupdates.openprotect.com.
To enable sa-update, it must be run regularly, for example via cron. Add a cron entry to your system calling the following commands, to update the base rulesets:
If you use an additional channel, the command might look like:
sa-update –channel saupdates.openprotect.com
To protect against DNS poisoning and impersonation, SpamAssassin allows digital signing of rulesets. To use a signed ruleset, use the –gpgkey parameter to sa-update. The correct value to use with the –gpgkey parameter will be described in the SpamAssassin wiki page for the ruleset.