Change Language
Sun Sun Sun

You are here: Resources >> What is a Bayesian Filter?

 

What is a Bayesian Filter?
 
Bayesian spam filters are content-based filters specifically trained to recognize the individual email user's spam and good mail, making them highly effective and difficult for spammers to adapt to.
 
These innovative filters calculate the probability of a message being spam based on its contents. Unlike simple content-based filters, Bayesian spam filtering learns from spam and from good mail, resulting in a very robust, versatile, and efficient anti-spam approach that, best of all, hardly returns any false positives.
 
Click To Buy a Protector UTM Appliance!
Click To Buy a Portable Penetrator Wifi Pen Test!
Click To Buy a Penetrator Pen Testing Appliance!
Click To Buy a Web Security Scan!
Click For a Free Security Scan!
Click For a Free Newsletter!
 
Those of us plagued by the onslaught of tens—if not hundreds—of unwanted emails greeting us as we open up our email accounts have some hope for respite in the form of Bayesian spam filters. For years, spammers have been able to remain one step ahead of spam blockers simply because of their creativity and ability to adjust and evade blocking each time a new spam filter was developed.
 
As a result, anti-spam software developers were certain of the task before them: to develop software that could continually learn from the new and creative techniques of spammers, and as a result never fall behind in the spam blocking game. Think about how you detect spam. A quick glance is often enough. You know what spam looks like, and you know what good mail looks like. The probability of spam looking like good mail is around... zero.
 
The SecPoint® Protector (http://www.secpoint.com/secpoint-protector.html) comes fully loaded with this feature to give the customers the best anti-spam solution.
 
Scoring Content-Based Filters Do Not Adapt
 
Wouldn't it be great if automatic spam filters worked like that too? Scoring content-based spam filters try it. They look for words and other characteristics typical of spam. Every characteristic element is assigned a score, and a spam score for the whole message is computed from the individual scores. Some scoring filters also look for characteristics of legitimate mail, lowering the complete score.
 
The scoring filters approach works, but it also has several problems. The list of characteristics is built from the spam (and the good mail) the filter maker gets. To get a good grasp of the typical spam anybody might get, mail must be collected at hundreds of email addresses. This weakens the efficiency of the filters, especially because the characteristics of good mail will be different for each person, but this is not taken into account.
 
The characteristics to look for are more or less set in stone. If the spammers make the effort to adapt (and make their spam look like good mail to the filters), the filtering characteristics have to be tweaked manually, which is an even bigger effort.
 
The score assigned to each word is probably based on a good estimate, but it is still arbitrary. And like the list of characteristics, it neither adapts to the changing world of spam in general nor to an individual user's needs.
 
Bayesian Spam Filters Tweak Themselves, Getting Better and Better
 
Bayesian spam filters are a kind of scoring content-based filters as well. However, this approach does away with the problems of simple scoring spam filters, and it does so radically. Since the weakness of scoring filters is in the manually built list of characteristics and their scores, this list is eliminated.
 
Instead, Bayesian spam filters build the list themselves. Ideally, you start with a (big) bunch of emails that you have classified as spam, and another bunch of good mail. The filters look at both, and analyze the legitimate mail as well as the spam to calculate the probability of various characteristics appearing in spam and in good mail.
 
The characteristics of a Bayesian spam filter can be the words in the body of the message and its headers (senders and message paths). It can also be other aspects such as HTML code (like colors) or even word pairs, phrases, and meta information (where a particular phrase appears).
 
If a word—"Cartesian", for example—never appears in spam but often in your legitimate mail, the probability of "Cartesian" indicating spam is near zero. "Toner", on the other hand, appears exclusively, and often, in spam. "Toner" has a very high probability of being found in spam, not much below 1 (100%).
 
When a new message arrives, it is analyzed by the Bayesian spam filter, and the probability of the complete message being spam is calculated using the individual characteristics. Let's say a message contains both "Cartesian" and "toner". From these words alone, it's not yet clear whether we have spam or legit mail. But other characteristics will (most probably) indicate a probability that allows the filter to classify the message as either spam or good mail.
 
Bayesian Spam Filters Can Adapt Automatically
 
Now that we have a classification, the message can be used to train the filter further. In this case, either the probability of "Cartesian" indicating good mail is lowered (if the message containing both "Cartesian" and "toner" is found to be spam), or the probability of "toner" indicating spam must be reconsidered.
 
Using this auto-adaptive technique, Bayesian filters can learn from both their own and the user's decisions (if he manually corrects a misjudgment by the filters). The adaptability of Bayesian filtering also makes sure they are most effective for the individual email user. While most people's spam may have similar characteristics, the legitimate mail is characteristically different for everybody.
 
How Can Spammers Get Past Bayesian Filters?
 
The characteristics of legitimate mail are just as important for the Bayesian spam filtering process as the spam is. If the filters are trained specifically for every user, spammers will have an even harder time working around everybody's or even most people's spam filters, and the filters can adapt to almost everything spammers try.
 
Spammers will only make it past well-trained Bayesian filters if they make their spam messages look perfectly like the ordinary email everybody may get. They could do that today too. Spammers do not usually send such ordinary emails, I presume, because they don't work. So chances are they won't be doing it when ordinary, boring emails are the only way to make it past the anti-spam filters.
 
However, if spammers do switch to mostly normal-looking emails, we will then see a lot of spam in our inboxes again, and email may become as frustrating as it was in pre-Bayesian days (or even worse). It will also ruin the market for most kinds of spam, though, and thus won't last for long.
 
One exception can be formulated by spammers in order to work their way through Bayesian filters even with their usual content. It's in the nature of Bayesian statistics that one word that very frequently appears in good mail can be so significant as to turn any message from looking like spam to being rated as good mail by the filter.
 
If spammers find a way to determine your surefire good-mail words—by using HTML return receipts to see which messages you opened, for example—they can include one of them in a junk mail and reach you even through a well-trained Bayesian filter.
 
 
Click To Buy a Protector UTM Appliance!
Click To Buy a Portable Penetrator Wifi Pen Test!
Click To Buy a Penetrator Pen Testing Appliance!
Click To Buy a Web Security Scan!
Click For a Free Security Scan!
Click For a Free Newsletter!
 
 
 
Read more about our services and products here: About SecPoint, IT Security Products, and IT Security Jobs.

 

Get A Free Vulnerability Scan!

Get a Free Evaluation Unit!

How to Buy Locate a Partner!

Follow SecPoint on your
favorite Social Media!



Got a Question?
sales@secpoint.com

See More
 
  Email :
     
Appliance VS Software
What is Cross Site Scripting(XSS)?
What is SQL Injection?
What is a Routing Table?
What is High Availability?
What is Grey Listing?
What is a Web Filter?
What is a Vulnerability?
What is a Proxy Server?
What is a Firewall?
What is a Cookie?
What is a Bayesian Filter?
Test Your Security Policy
Email & Spam Test Links
What is RoHS Weee?
What is Vulnerability Scanning?
What is Vulnerability Assessment?
What is Penetration Testing?
What is a Security Exploit?
What is Appliance Scanning?
What is Zero Day?
What is Unified Threat Management?
What is Intrusion Prevention?
What is a Content-Filter?
What is VoIP?
What is Virus?
What is Spyware?
What is Phishing?
What is P2P?
What is Instant Messaging?
What is Spam?
White Papers
Technology Papers
What is Wi-fi?
What is Wimax?
What is an open relay
What is vlan tagging?
Security Mailinglist Rss Feeds
What is a Man in the Middle Attack?
What is a Botnet?
Top 10 Ways to Protect Your Computer from Hackers
Top 10 Free IT Security Tools
Top 10 Website Security Myths
Top 10 Most Secure Operating Systems
Top 10 Worms
Top 10 Hackers
Top 10 Social Engineering Tactics
Top 10 Spyware
Top 10 Viruses
Top 10 Phishing Scams
SecPoint
Anti-Spyware Tips and Tricks
Anti-Spam Tips & Tricks
Anti-Virus Tips & Trick
How to get rid of malware
How to protect against client wireless hacking
Risks of Cyber Crime
How to choose a vulnerability scanning vendor?
Better Wi-Fi Range without Interference
SecPoint Free Security Scan
IT Security Gurus
Top 10 Myths in IT Security
Top 10 IT Security Tools
Top 10 IT Security Tips
Top 10 Hacker Attacks
Anti-Spam Appliance
Top 10 Spam Attacks
UTM Appliance
Penetration Testing
Application Security
Vulnerability Scanning
Vulnerability Assessment
Internet Filter
Spam Filters
Web Content Filter
WEP Crack
WiFi Security
Anti-Phishing Tips & Trick
PCI-DSS Compliance
Anti-Social Engineering Tips & Trick
Anti-Denial of Service Tips & Trick
Wifi Security Tips & Trick
Anti Hacking – Anti Cracking Tips & Tricks
Wireless Encryption Standards
CIDR Network Information
Virus Spam Bounce Ruleset
Anti-Cross Site Scripting (XSS) Tips and Tricks
Anti-SQL Injection Tips and Tricks
Wifi WEP Encryption Cracking Guide
Wifi WPA & WPA2 Encryption Cracking Guide
How to get rid of a trojan horse
What is Port Knocking?
SecPoint Training Videos
RC Release Candidate Software Firmware
What is SSL?
What is SOCKS?
What is SOCKS5?
Worldwide Security Events
Server Spam Filter
Spam Blocker
Anti-Spam Software
Vulnerability Scanning Appliance
What is a Grey Hat?
What is a White Hat?
What is a Black Hat?
Top 10 Cloud Computing Services
Cloud Security
WPA Key
Block Email Junk
Stop Spam
Anti-Virus
WEP Key
What is Encryption?
What is SSH?
Dell Worldwide Warranty Benefits
Aircrack
Anti-Spam Appliance Guide
Anti-Spam Firewall
BackTrack
Web Filter Appliance
Pen Test Appliance
Security Scanner
WEP WPA2 Crack
What is Blacklisting?
UTM Appliance Anti-Virus
What is FTP?
UTM Appliance WiFi Security
What is Greylisting?
Vulnerability Assessment Guide
What is SFTP?
Vulnerability Scanner
What is Telnet?
Wardriving
What is Whitelisting?
WPA2 Encryption
WiFi Audit
WiFi Pen Test Appliance
WiFi Client Cracking
WiFi Pen Test
WiFi Client Hacking
WiFi Hacking
WiFi Crack
WiFi Hack
WiFi Cracking
 
Privacy Statement | Link Policy | User Policy | IT Security Blog | IT Security Forum | SecPoint Pictures
Event Pictures | Exploit Archive | IT Security Web Shop | Vulnerability Library
IT Security Video | Sitemap
© Copyright 1999-2010: SecPoint®
SecPoint ApS - Lergravsvej 53 - 2300 Copenhagen S - Phone +45 70 235 245
Recent awards Compatible with Visit us on Facebook! Visit us on LinkedIn! Visit us on Myspace!
   
Facebook
Group!


Follow us on Twitter!
Anti-Spam Appliance - Anti-Spam Firewall - Unified Threat Management Appliance Anti-Virus - Web Filter Appliance - Anti Spam Appliance - Anti Spam Firewall - UTM Appliance Wifi Security - Wifi Pen Test - Wifi Crack - Wifi Hack - Wifi Audit - Wep Wpa2 Crack Vulnerability Scanner - Vulnerability Assessment - Security Scanner - Pen Test Appliance