SpamAssassin: A practical guide to integration and configuration
上QQ阅读APP看书,第一时间看更新

Websites

Many small organizations now have websites and provide an email address for customers to contact them. A simple HTML link of the form mailto:user@domain.com is easy to implement (all popular HTML editors allow you to create this), and the results are easy to retrieve—they arrive in the user's mailbox.

The alternative to a mailto: link in a web page is to have a web form where the customer enters an email address and a message, and then submits the form. The data is processed by the web server and forwarded to the recipient. This is less flexible than an email—for example, attachments cannot be added. Additionally, the web form relies on the customer to enter their email address correctly. If this is typed incorrectly, then the customer contact will be lost.

From an early time in the history of the Internet, automated computer programs have tried to download web pages and follow links to other web pages. Typically, these spiders walk the Web to generate indexes for search engines such as Google and AltaVista. This technique has been adopted by spammers to capture email addresses. It is currently the most common method for harvesting email addresses.

Once a spammer's spider has discovered a company website, the email addresses listed on it will start to receive spam. Organizations can carry on using the mailto: links, or they can implement other methods to capture customer input, such as a web form. These other options may incur additional expense.

Two techniques can be used in simple web pages to render an email address invisible to a spammer's web spider and yet allow a user to click on the link. These are described in the following sections.

Alternative Character Representations

In HTML, characters can be represented in several ways. Usually, they are typed in as normal keyboard characters but other representations are also possible.

The following are all representations of the character 'a':

  • a
  • a
  • a

The character 'a' is represented in the decimal format as 97 and as x61in the hexadecimal format; 98 or x62 is the representation of 'b', and so on. Every letter, number, and other symbol has similar alternative representations. For email addresses, '@' is represented by #x40 or #64, and the period is represented by #x2e or #46. This conversion is called ASCII encoding of characters. Other encoding techniques used on the Web include UTF-8 and UTF-16.

A mailto: link can be generated using these characters instead of the normal characters. An alternative representation of mailto@user@domain.com. would appear as follows:

mailto:user@domain.com

The user's browser will convert this to a normal link and allow the user to click on it. This will invoke their email client as with a normal link. There is a website that performs this conversion automatically: http://www.zapyon.de/spam-me-not/index.html.

Note

This is a relatively simple conversion for a computer program to make, and spider technology may overcome this method for blocking spiders in the future.

JavaScript

This method relies on the user's browser having JavaScript enabled. All modern browsers have JavaScript capability enabled by default, although some users disable it for security reasons. Spammers' spiders search for simple email addresses of the form mailto:user@domain.com. It is reasonable to assume that the spammer's web spider reads a web page and copies or stores all the parts that have the @ character in them, such as user@domain.com. The techniques described here remove the @ symbol, making the web spider miss the email addresses.

JavaScript can write text into a web page while the page is being loaded by the browser. It can also be used to manipulate the positioning of text. It would be unwise to simply write a script that outputs the email address as user@domain.com. A spider could easily detect this, despite the fact that the email address is in a block of JavaScript.

Fortunately, JavaScript allows more complex manipulation using variables. If we have a section of script code that builds up an email address in a variable and then displays it:

message = "mailto:user";
message += "@";
message += "domain.com";
document.write(message);

This is equivalent to the following code:

message = "mailto:user@domain.com"
document.write (message);

The @ sign, the fingerprint of an email address, is isolated from the other parts of the email address, which prevents the email address from being detected by a spammer's spider.

The same email address can be written in a more complex way as follows:

a = "@";
b = "mailto:";
c = "user";
d = "domain";
e = ".com";
document.write ("<A HREF=\"" + b+c+a+d+e + "\">contact us</A>");

Spammers may start to include simple JavaScript parsing in their web spiders. This is unlikely due to the amount of effort required.