SpamAssassin is very configurable. Almost every setting can be configured on a system-wide or user-specific basis.
If SpamAssassin is so good, then why configure it? Well, there are several reasons why it’s worth improving spam filtering with SpamAssassin.
The following configuration options are discussed in this article:
The configuration files for standard, sitewide, and user-specific settings are saved in different directories as follows:
The bulk of the standard configuration files is devoted to simple rules and their scores.
A rule is typically a match for letters, numbers, or other printing characters. Rules are written using a technique called regular expressions, or regex for short. This is a shorthand method of specifying that certain combinations of characters will trigger the rule. A rule might try to detect a particular word, such as “Rolex”, or it might look for particular words in certain orders, such as “buy Rolex online”. The rules are stored in text files.
Default files are stored in /usr/share/spamassassin. These are files that are shipped with SpamAssassin and may change with each release. It’s best not to modify these files or place new files in this directory, as an upgrade to SpamAssassin will overwrite these files. Most of the rules that SpamAssassin uses, and the scores applied to each rule, are defined within files in this directory.
The defaults can be overwritten by sitewide configuration files. These are placed in /etc/mail/spamassassin. SpamAssassin will read all files matching *.cf in this directory. Settings made here can overrule those in the default files. They can include defining new rules and new rule scores.
User-specific customizations can be placed in the ~/.spamassassin/local.cf file. Settings made here can override sitewide settings defined in /etc/mail/spamassassin, and default settings in /usr/share/spamassassin/. New rules may be defined here, and scores for existing rules can be overridden.
SpamAssassin first reads all the files in /usr/share/spamassassin in alphanumerical order; 10_misc.cf will be read before 23_bayes.cf. SpamAssassin then reads all the .cf files in /etc/mail/spamassassin/, again in alphanumeric order. Finally, SpamAssassin reads ~user/.spamassassin/user_prefs. If a rule or score is defined in two files, the setting in the last file read is used. This allows the administrator to override the defaults and a user to override the sitewide settings.
Each line in a rules file can be blank or contain a comment or a command. The hash or pound (#) symbol is used for comments. Rules generally have three parts, the rule definition, a textual description, and the score or series of scores. Convention dictates that all rule scores for rules provided by SpamAssassin should be located together in a separate file. That file is /usr/share/spamassassin/50_scores.cf.
The simplest configuration change is to change a rule score. There are two reasons why this might be done:
The rules that give a positive result when SpamAssassin is run are listed in the X-Spam-Status: header of the e-mail:
X-Spam-Status: Yes, score=5.8 required=5.0 tests=BAYES_05,HTML_00_10,
HTML_MESSAGE,MPART_ALT_DIFF autolearn=no
version=3.1.0-r54722
The rules applied to the e-mail are listed after tests=. If one continually appears in e-mail that should be marked as spam, but isn’t, then the score for the rule should be increased. If a rule often fires in e-mail that is wrongly classified as spam, the score should be decreased.
To find the current score, use the grep utility in all the locations where a score can be defined.
grep score.*BAYES /usr/share/spamassassin/* /etc/mail/spamassassin/*
~/.spamassassin/local.cf
/etc/mail/spamassassin/local_scores.cf:score RULE_NAME 0 0 1.665 2.599
/etc/mail/spamassassin/local_scores.cf: 4.34
In the previous example, the rule has a default score that is overridden in
/etc/mail/spamassassin/local_scores.cf.
The original score for the rule had four values. SpamAssassin changes the scores it uses, depending on whether network tests (for example, those that test open relays) are in use and whether the Bayesian Filter is in use. Four scores are listed, which are used in the following circumstances:
//===INSERT TABLE 02===
If only one score is given, as overridden in /etc/mail/spamassassin/local_scores.cf, it is used in all circumstances.
In the previous example, the system administrator has overridden the default score in /etc/mail/spamassassin/local_scores.cf with a single value in /etc/mail/spamassassin/local_scores.cf. To change this value for a particular user, their ~/.spamassassin/local.cf might read:
score RULE_NAME 1.2
This changes the score used from 4.34, set in /etc/mail/spamassassin/local_scores.cf, to 1.2. To disable the rule entirely, the score can be set to zero.
score RULE_NAME 0
Endless hours can be spent configuring rule scores. SpamAssassin includes tools to recalculate optimal rule scores, by examining existing e-mails, both spam and non spam. They are covered in detail in the book SpamAssassin published by Packt.
SpamAssassin has a large following, and the design of SpamAssassin has made it easy to add new rulesets, which are sets of rules and default scores for those rules. There are many different rulesets available. Most are based on a particular theme, for example finding the names of drugs often sold with spam or telephone numbers found in spam e-mails. Most custom rulesets are listed on the Custom Rulesets page of the SpamAssassin Wiki at http://wiki.apache.org/spamassassin/CustomRulesets.
As the battle against spam is so aggressive, rulesets have been developed that may possibly be uploaded daily. SpamAssassin provides this ability with the sa-update utility. You can choose to use sa-update on a regular basis, or to download a particular ruleset and keep it, or to manually update the rulesets that you choose. To obtain the best results in filtering spam, use of sa-update is recommended.
If you wish to install rulesets manually, the Wiki page gives a general description of each ruleset and a URL to download it. Once a ruleset has been chosen, we install it as follows:
Adding rules to SpamAssassin will increase the memory used by SpamAssassin, and the time that it takes to process e-mails. It is best to be cautious and add new rulesets gradually, to ensure that the effect on the machine is understood.
You may manually monitor the ruleset and update it on your system using the same process.
If you choose to use sa-update, you should plan your use of it. sa-update can use several channels, which are basically sources of rulesets. By default, the channel updates.spamassassin.org is used; another popular channel is the OpenProtect channel, called saupdates.openprotect.com.
To enable sa-update, it must be run regularly, for example via cron. Add a cron entry to your system calling the following commands, to update the base rulesets:
sa-update
If you use an additional channel, the command might look like:
sa-update –channel saupdates.openprotect.com
To protect against DNS poisoning and impersonation, SpamAssassin allows digital signing of rulesets. To use a signed ruleset, use the –gpgkey parameter to sa-update. The correct value to use with the –gpgkey parameter will be described in the SpamAssassin wiki page for the ruleset.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…