Email Security
Bogofilter
Bogofilter is a Bayesian spam filter. In its normal mode of operation, it takes an email message or other text on standard input, does a statistical check against lists of "good" and "bad" words, and returns a status code indicating whether or not the message is spam. Bogofilter is designed with a fast algorithm, uses the Berkeley DB for fast startup and lookups, coded directly in C, and tuned for speed, so it can be used for production by sites that process a lot of mail.
Bogofilter treats its input as a bag of tokens. Each token is checked against a wordlist, which maintains counts of the numbers of times it has occurred in non-spam and spam mails. These numbers are used to compute an estimate of the probability that a message in which the token occurs is spam. Those are combined to indicate whether the message is spam or ham.
While this method sounds crude compared to the more usual pattern-matching approach, it turns out to be extremely effective.
Bogofilter does proper MIME decoding and a reasonable HTML parsing. Special kinds of tokens like hostnames and IP addresses are retained as recognition features rather than broken up. Various kinds of MTA cruft such as dates and message-IDs are ignored so as not to bloat the wordlist. Tokens found in various header fields are marked appropriately.
visit
Bogofilter
Bogofilter was visited : 123 times
Loading .....