Anti-Spam Techniques
As the techniques to deliver spam have become more sophisticated, so have the techniques to detect and filter spam from legitimate email. The main techniques are described in the following sections. These techniques can be used on the email server by a system administrator, or an anti-spam service can be purchased from an external vendor.
Keyword Filters
Filters are based upon common words or phrases in an email body, for example 'buy', 'last chance', and 'Viagra'. SpamAssassin includes a variety of keyword filters and allows easy addition of new rules.
Open Relay Blacklists (ORBLs)
Open relay blacklists (ORBLs) are lists of open relays that have been reported and added to these blacklists after being tested. Anti-spam tools can query open relay blacklists and filter out emails originating from these sources. SpamAssassin can integrate with several open relay blacklists.
ISP Complaints
It has always been possible to complain to an ISP about a spammer. Some ISPs take complaints seriously, give a single warning, and after another complaint, they terminate the account of the offender. Other ISPs take a less active approach to spam that will rarely stop a spammer. Spammers naturally gravitate towards ISPs that are lenient with spammers.
ISP complaints remain a manually managed technique, due to the effort that might be wasted if an automatic report is wrong and the email reported is not spam. The website http://www.spamcop.net can examine an email, determine where ISP reports should be directed, and send appropriate messages of complaint to the corresponding ISPs.
Statistical Filters
Statistical filters are those that learn common words in both spam and ham. Subsequently, the data collected is used to examine emails and determine whether they are spam or ham. These filters are often based on the mathematical theory called Bayesian analysis. Statistical filters need to be trained by passing both ham and spam emails through, enabling the filter to learn the difference between the two. Ideally, a statistical filter should be trained regularly, and some anti-spam tools allow statistical filters to be trained automatically.
SpamAssassin includes a Bayesian filter, along with utilities to train it. SpamAssassin's Bayesian filter can also be configured to automatically learn from incoming spam and ham email.
Email Header Analysis
The software that spammers use often generates unusual headers in the emails produced. Anti-spam tools can detect these unusual headers and use them to separate spam from ham. SpamAssassin includes many email header tests.
Non-Spam Content Tests
There are possibilities that ham emails could inadvertently trigger some anti-spam tests. For example, many emails are legitimately but unfortunately routed through a blacklisted open relay. Non-spam content tests indicate that an email is not spam. They are usually created specifically for an individual or organization.
Non-spam content tests are rarely shared in public, as they are specific to an industry or company, and should not get into the hands of spammers as they would start using this information to their advantage.
SpamAssassin allows users to create rules that will subtract from the score of an email if certain content is received. An email administrator might add negative rules for the names of products sold by the company or for industry-related jargon.
Whitelists
Whitelists are the opposite of blacklists—lists of email senders who are trusted to send ham and not spam. Email from someone listed on a whitelist will normally not be marked as spam, no matter what the content of their email.
SpamAssassin allows system administrators or users to create a whitelist for users that send content that may be like spam; for example, mailing lists that discuss spam. SpamAssassin also allows the use of a blacklist. It creates auto-whitelists and blacklists, based on previous emails received from trusted and non-trusted senders.
Email Content Databases
Email content databases store the content of spam emails. These work because the same spam email will often be sent to hundreds or thousands of recipients. Email content databases store these emails and compare the content of new emails to that contained in the database. A single person reporting a spam email to such a database will assist all other users of the service.
SpamAssassin can integrate with several email content databases automatically.
Sender Validation Systems
A slightly different approach to spam is taken by sender validation systems. In these systems, when an email is received from an unknown source, the source is sent a challenge email. If a valid response is received to such an email, then the sender is added to a whitelist, the original email is delivered to the recipient, and the sender is never sent a challenge again.
This is effective as generally spammers use forged sender and reply-to addresses and do not receive replies to the spam they send out. Consequently, the challenge is never received. In addition, spammers do not have the time to respond to validation requests.
Some systems cleverly integrate with the user's outgoing email and address book to automatically add known contacts to a whitelist. Sender validation systems are proprietary and may involve annual licensing costs or large initial fees.
Sender validation systems are inconvenient when subscribing to mailing lists. Few email list administrators will respond to a challenge, so the user might end up not receiving emails from the list. With most systems, it is possible to manually add addresses to the whitelist to avoid a challenge or response being required, but in the case of mailing lists, the address that emails are sent from may not be known until emails are received. SpamAssassin does not include sender validation features.
Sender Policy Framework (SPF)
The Sender Policy Framework (SPF) can be used to ensure that an email is from a valid source. It validates that a user sending email from a particular email address is permitted to send email from their current machine. SPF is a recent development and is being introduced relatively quickly. It uses additional Domain Name System (DNS) records to state which machines can send email for a domain. SpamAssassin uses the current draft standards for SPF.