Indiana University


ComputerScience






 Home

 Contacts

 Courses

 Academics

 Careers

 Research

 People

 Calendar

 Resources

 Facilities
   FAQ
   System Notices
   Help
   Hardware
   Software
   Network
   Policies
   CSG Staff
   Lindley Hall

[Return to the System Notices Directory]

May 2002 - Spam/Virus Email Filtering

The CS Department has installed the ActiveState PureMessage Email Spam/Virus filtering package to assist users more effectively manage email by eliminating spam and virus-laden email. Spam and Virus email are currently being treated as described below:

Email Containing a Virus

Incoming email messages are scanned for known viruses based on a virus profile that is updated 3 times daily. If an email is identified as having a virus-laden payload, the email is placed in "quarantine" and you will be notified via email that the message has been quarantined. In most circumstances, you will have no need for the original email so no action is required. However, if you want to actually receive the original email, you need only follow the instructions in the notification email and the message will be forwarded to you intact (INCLUDING THE VIRUS!).

Spam Email

First, it should be noted that the spam measures described in this section are only applied to messages sent from outside the department. So, for example, if you send email to a local user from a local CS/Extreme/OSL machine, none of the spam scans will be performed. So, if you want to test things be sure you send your test messages from a system outside the department (such as a UITS machine or any non-IU system).

One of the problems with spam is that it is extremely difficult to determine whether a given message is spam or not. The PureMessage software uses a set of rules and heuristics to assign a probability that a message is spam and thresholds can be set to either tag or block messages that exceed a threshold. We currently have the software running in what is called "Training Mode" which means that no messages are actually being blocked. However, some useful information is added to the header of the messages that can be used by the end user to reject or file messages based on these headers. Here is the message rewriting that is done:

  • X-Perlmx-Spam: header - The X-Perlmx-Spam: header is added to the message. This header contains three pieces of information:

    1. Gauge - This is the probablilty that a message is spam. The gauge is listed with an X for each 10% and an I for each 1%. For example, if the probablitiy is 32%, the gauge will be XXXII.
    2. Probability - This is the same as the gauge field but is just listed as a numeric probability.
    3. Report - The report field lists the rules this email message matched in the spam rules. You can get a description of each rule from the PureMessage Spam Filtering Rule Descriptions Page. The rules are divided between rules that come with PureMessage and local rules we have added.

    Here is a sample X-Perlmx-Spam: header:

    X-Perlmx-Spam: Gauge=XXXXII, Probability=42%, Report=ADVERT_CODE, SUBJ_ALL_CAPS

    From the Rule Description Page, we see this message matched the following rules:

    ADVERT_CODESubject: contains advertising tag
    SUBJ_ALL_CAPSSubject is all capitals

    This resulted in a 42% probability that the message was spam.

  • [SPAM: ] Subject - When the probability of a message being spam reaches a configurable threshold (currently set to 50%), the Subject: line of the message is rewritten to include "[SPAM: ## NN%]". The number of hash marks (#) indicates how far above the threshold the message was, with one # for each 0-9%, and the numeric probability is also listed. So, for example, if you receive an email message with the subject line of:

    Subject: [SPAM:### 73%] ADV: AMAZING

    you will know that this message was 20-29% above the threshold for being flagged as spam (###) and the actual numeric probability was 73%. The hash marks are useful when automatically filtering email, as you will see below.

Filtering Spam

Since we are currently not rejecting any of the spam that is being identified, it is up to the end user to block it using something like procmail or the built in filtering options of some mailers like Pine, Netscape, Eudora, and Outlook.

Please see the Spam Filtering FAQ Entry for information about how to filter out spam from your email.

What Can You Do To Help?

I'm glad you asked. We need to get some feedback on how well the current rules are identifying spam. In particular, we are interested in the following:

  1. False Positives - If you receive an email message that had the Subject line rewritten to indicate it was spam but it wasn't, we would like to know. If the message doesn't contain any personal information, the best thing you can do is forward the entire mail message (including the full headers) to me (robh). If the message does contain personal information and you don't want to send it all to me, then you can send just the X-Perlmx-Spam: header from the message and, optionally, the Subject: line.

  2. Missed Spam - I expect there will be a fair amount of spam email that slips by under the 50% probablilty limit we currently have set. However, I would like to get some idea of how much. To do this, you can keep track of the X-Perlmx-Spam: headers for all the spam you get. One easy way to do this is to just manually file all the spam you receive into a folder. Then, after a week or so of getting spam, you could run something like:

    % grep X-Perlmx-Spam: spamfolder | mailx -s spamstats robh

    to send me the statistics.

Any and all comments on this subject can be sent to me (robh). Thanks, and death to spam!

[Return to the System Notices Directory]








Valid HTML 4.01!