Article Navigation

Back To Main Page


 

Click Here for more articles

Google
How Spammers Fool Bayesian Filters - And How to Stop Them
by: Paul Judge, CTO, CipherTrust, Inc.
Effectively stopping spam over the long-term requires much more than blocking individual IP addresses also creating rules based on keywords that spammers typically use. The increasing sophistication of spam tools coupled with the increasing number of spammers in the wild has created a hyper-evolution in the variety also volume of spam. The old ways of blocking the bad guys just don’t work anymore.

Examining spam also spam-blocking technology can illuminate how this evolution is taking place also what can be done to combat spam also reclaim e-mail as the efficient, effective communication tool it was intended to be.

One method used to combat spam is Bayesian Filtering. Named after Thomas Bayes, an English mathematician, Bayesian Logic is used in decision making also inferential statistics. Bayesian Filers maintain a database of known spam also ham, or legitimate email. Once the database is large enough, the system ranks the words according to the probability they will appear in a spam message.

Words more likely to appear in spam are given a high score (between 51 also 100), also words likely to appear in legitimate email are given a low score (between one also 50). For example, the words “free” also “sex” generally have values between 95 also 98, whereas the words “emphasis” or “disadvantage” may have a score between one also 4. Commonly used words such as “the” also “that”, also words new to the Bayesian filters are given a neutral score between 40 also 50 also would not be used in the system’s algorithm.

When the system receives an email, it breaks the message down into tokens, or words with values assigned to them. The system utilizes the tokens with scores on the high also low end of the range also develops a score for the email as a whole. If the email has more spam tokens than ham tokens, the email will have a high spam score. The email administrator determines a threshold score the system uses to allow email to pass through to users.

Bayesian filters are effective at filtering spam also minimizing false positives. Because they adapt also learn based on user feedback, Bayesian Filers produce better results as they are used within an organization over time. They are not, however, foolproof. Spammers have learned which words Bayesian Filters consider spammy also have developed ways to insert non-spammy words into emails to lower the message’s overall spam score. By adding in paragraphs of text from novels or news stories, spammers can dilute the effects of high-ranking words. Text insertion has or else caused normally legitimate words that are found in novels or news stories to have an inflated spam score. This may potentially render Bayesian filters less effective over time.

Another approach spammers use to fool Bayesian filters is to create less spammy emails. For example, a spammer may send an email containing only the phrase, “Here’s the link…”. This approach can neutralize the spam score also entice users to click on a link to a Web site containing the spammer’s message. To block this type of spam, the filter would have to be designed to follow the link also scan the content of the Web site users are asked to visit. This type of filtering is not currently employed by Bayesian filters because it would be prohibitively expensive in terms of server resources also could potentially be used as a method of launching denial of service attacks against commercial servers.

As with all single-method spam filtering methodologies, Bayesian filters are effective against certain techniques spammers use to fool spam filters, however are not a magic bullet to solving the spam problem. Bayesian filters are most effective when combined with other methods of spam detection.

The Solution
When used individually, each anti-spam technique has been systematically overcome by spammers. Grandiose plans to rid the world of spam, such as charging a penny for each e-mail received or forcing servers to solve mathematical problems before delivering e-mail, have been proposed with few results. These schemes are not realistic also would require a large percentage of the population to adopt the same anti-spam method in order to be effective. You can learn more about the fight against spam by visiting our website at www.ciphertrust.com also downloading our whitepapers.


About the author:

Dr. Paul Judge is a noted scholar also entrepreneur. He is Chief Technology Officer at CipherTrust, the industry's largest provider of enterprise email security. The company’s flagship product, IronMail provides a best of breed enterprise anti spam solution designed to stop spam, phishing attacks also other email-based threats. Learn more by visiting www.ciphertrust.com/products/spam_and_fraud_protection today.

Circulated by Article Emporium

 



©2005 - All Rights Reserved