Indiana University


ComputerScience






 Home

 Contacts

 Courses

 Academics

 Careers

 Research

 People

 Calendar

 Resources

 Facilities
   FAQ
   System Notices
   Help
   Hardware
   Software
   Network
   Policies
   CSG Staff
   Lindley Hall

CSG FAQ

Q: How can I filter out SPAM email I don't want to see?

This FAQ provides a lot of detailed information and is worth reading if you want to know all the details. If all you want to do is set up spam filtering for your CS email, you can do this by simply logging into any CS system and running killspam. This will guide you through the process of setting up spam filtering, allowing you to set things up in various ways. For more information, read on...

The CS department mail server is using a product called PureMessage that tries to identify email as spam. When it finds a message that it thinks is spam (based on an extensive set of rules and tests) it adds "[SPAM: #" to the Subject: line of the message and also encodes this information into the X-Perlmx-Spam: header. This makes it easy for user mail filters (such as procmail and those built into many mail programs like thunderbird) to filter this spam.

Here is a description of these features:

  • X-Perlmx-Spam: header - The X-Perlmx-Spam: header is added to the message. This header contains three pieces of information:

    1. Gauge - This is the probablilty that a message is spam. The gauge is listed with an X for each 10% and an I for each 1%. For example, if the probablitiy is 32%, the gauge will be XXXII.
    2. Probability - This is the same as the gauge field but is just listed as a numeric probability.
    3. Report - The report field lists the rules this email message matched in the spam rules.

    Here is a sample X-Perlmx-Spam: header:

    X-Perlmx-Spam: Gauge=XXXXII, Probability=42%, Report='ADVERT_CODE, SUBJ_ALL_CAPS'

    This message matched the ADVERT_CODE and SUBJ_ALL_CAPS rules and resulted in a 42% probability that the message was spam.

  • [SPAM: ] Subject - When the probability of a message being spam reaches a configurable threshold (currently set to 60%), the Subject: line of the message is rewritten to include "[SPAM: ## NN%]". The number of hash marks (#) indicates how far above the threshold the message was, with one # for each 10%, and the numeric probability is also listed. So, for example, if you receive an email message with the subject line of:

    Subject: [SPAM:### 73%] ADV: AMAZING

    you will know that this message was 20-29% above the threshold for being flagged as spam (###) and the actual numeric probability was 73%.

    The hash marks can be used to automatically filtering email. However, we recommend that you use the X-Perlmx-Spam: header instead of the Subject: line to filter spam because the X-Perlmx-Spam: header includes an absolute probability whereas the Subject: line contains a probability relative to the current threshold. If the threshold changes, the behavior of your filtering will also change. This is not the case when using the X-Perlmx-Spam: header.

If you don't want the CS mail filters to add the SPAM tag to any of your email messages, please see the associated FAQ entry for instructions.

Below are several ways you can configure your account or mail program to filter these mail messages. But, before we begin, it is important to note that, like all other such tools, PureMessage is not perfect. We have found it to be extremely good at identifying spam but it may still miss some spam and falsely identify other non-spam email as spam. You should monitor the performance of this filtering before you start throwing email away completely. For example, you may want to automatically filter such email into a folder that you scan periodically for messages that really weren't spam.

There are instructions below on how to do this forwarding using the killspam script and procmail as well as using the filtering tools built into the pine, Thunderbird, and Outlook mail programs.

  • killspam - There is a script on the CS systems called killspam that you can run to configure your account to use procmail to automatically filter your email. When you simply run killspam and it will guide you through the configuration and ask for confirmation before anything is changed with your account. If you would prefer to set up procmail manually, see the next section.

  • procmail - You can configure your account manually to use procmail to filter your email. Procmail is extremely powerful and a treatise on using procmail is certainly beyond the scope of this FAQ. However, here is a cookbook example of how to drop all incoming email from spam.com and all email classified as spam by PureMessage with a 70% probability or higher using the X-Perlmx-Spam header into a file called spamfolder in your ~/Mail directory.

    First, you must create a .procmailrc file in your home directory that contains a set of rules you wish to apply to your incoming mail. For this example, we create a file that contains the following:

    MAILDIR=$HOME/Mail
    LOGFILE=$MAILDIR/procmail.log

    # File anything from spam.com into spamfolder
    :0:
    * ^From.*@spam.com
    spamfolder

    # File messages with 70+% spam probability into spamfolder
    :0:
    * ^X-Perlmx-Spam:.*Gauge=XXXXXXX
    spamfolder

    The first rule (frequently called recipes in procmail) tells procmail to file any email from spam.com into a file called spamfolder. The second rule does the same for email with a matching X-Perlmx-Spam header with 70% or greater spam probability. If you want to just delete the email instead of saving it to a file, simply replace "spamfolder" with "/dev/null" in the above example.

    Next, create a .forward file in your home directory that invokes procmail. Simply create a .forward file using your favorite editor that contains the following line:

    "|IFS=' '&&p=/usr/local/bin/procmail&&test -f $p&&exec $p -Yf-||exit 75 #username"

    where you replace "username" with your username.

    If you found that email from your friend at somefriend@gooddomain.com was incorrectly being identified as spam, you could add the following as the first rule to your .procmailrc to send it to your normal mailspool before it is matched against the other rules.

    # Deliver email from somefriend@gooddomain.com to the inbox
    :0:
    * ^From.*somefriend@gooddomain.com
    /var/mail/username

    where you would replace username with your username.

    If you wanted to forward all non-spam email off to a non-CS account instead of delivering it locally, you can do this by adding the following rule to the end of your .procmailrc.

    # Forward email off to somename@somedomain.com
    :0
    * !^FROM_MAILER
    ! somename@somedomain.com

    where you would replace somename@somedomain.com with the email address to which you wanted your email forwarded. The test that the email is not from the mailer daemon (!^FROM_MAILER) is critical to prevent mail loops in the event the forwarding address fails.

    You can also key on "[SPAM: #" that PureMessage adds to the Subject line. For example, if you wanted to automatically refile any message that had this spam tag in the Subject: line, you could use:

    # Send anything with [SPAM: in the Subject: to spamfolder
    :0:
    * ^Subject:.*\[SPAM:
    spamfolder

    Currently PureMessage only adds the SPAM tag to the subject line of messages that are above a spam probablility of 60%. If you also want to add the SPAM tag to messages in the 50-59% probability range, you can do this with the following procmail recipe:

    # Add the SPAM tag to the Subject: for messages with spam probability between 50-59%
    :0:
    * ^X-Perlmx-Spam: Gauge=XXXXX
    * !^Subject:.*SPAM:
    {
    SUBJ=`formail -c -xSubject:`

    :0 fhw
    | formail -c -I"Subject: [SPAM:# 50%] $SUBJ"
    }

    Another common thing people want to be able to do with spam is reject it outright (ie. bounce it) instead of silently filing it into a folder or throwing it away. In most cases, the spammer will never see the bounce but it may be useful if a real, non-spam message gets tagged as spam since the originator of the email will get a bounce message saying it couldn't be delivered. In order to do this with procmail, you set a non-zero exit code that causes sendmail to sense a failure in the delivery which causes the bounce. There are only a limited number of exit codes and associated messages available, with 69 (service unavailable), 67 (addressee unknown), and 65 (data format error) being a few of the commonly used exit codes. Here is an example that uses the exit code to bounce spam email with 'service unavailable' but also save a copy (unbeknownst to the spammer!) into your spamfolder.

    # Bounce email that has 70+% spam probability, but also save a copy in spamfolder
    :0
    * ^X-Perlmx-Spam:.*Gauge=XXXXXXX
    {
    EXITCODE=69
    LOG="SPAM rejection - "

    :0:
    spamfolder
    }

    For more the details, see the procmail, procmailrc, and procmailex man pages.

  • Filtering Mail Using Pine - Here's a quick synopsis of how to set up a filter natively in pine using the X-Perlmx-Spam header:

    • From the main menu, type "S" for setup.
    • Type "R" for rules.
    • Type "F" for filters.
    • You're now at a list of your currently defined filters (which is probably empty).
    • Type "A" to add a new filter.
    • You'll now be at a setup menu for the filter.
      • It's a good idea to set a name for the filter. The first field is "Nickname" -- hit Enter to edit it. Backspace out the default "Filter rule" text and type "IU/CS spam filtering".
      • Use the down arrow to scroll to "Add Extra Headers".
      • Hit Enter and you will be prompted to enter the name of the header. Just enter X-Perlmx-Spam and hit return.
      • You should now see the "X-Perlmx-Spam pat =" option. Just hit enter to edit the rule corresponding to this header. The X-Perl-Spam header contains the Gauge that contains an X for each 10% probability of spam. Just select the probability you want. For example, to filter on a 70% or greater probability, use a value of "Gauge=XXXXXXX".
      • Use the down arrow to scroll to "Filter action".
        • You can choose "Delete" to delete the matched mails, or
        • You can choose "Move" and specify a folder to move matched mails into (this is probably best while getting confidence in the system).
    • Type "E" to exit setup and type "Y" to confirm.

    [Thanks to Jeff Squyres for providing this synopsys!]

  • Filtering Mail Using Mozilla Thunderbird - In Mozilla Thunderbir, you can do the following to set up filters:

    • Select Message Filters... from the Tools menu.
    • Make sure Filters for: is set to your CS mail account.
    • Select New... to create a new filter rule.
    • In the Filter Rules window give the filter a descriptive name (such as 'SPAM') and set the Filter Criteria to Subject Contains [SPAM: #. If you want to filter based on the X-Perlmx-Spam: header, select Customize in the first pulldown menu under the filter criteria and add this new header.
    • Under Filter Action you can choose from a variety of actions, which includes deleting the message or automatically moving the message to a folder.
    • You can fill in the Description: field if you like.
    • When you have the Filter Rule as you want it, just hit OK.

  • Outlook - Please consult the Outlook Filtering Instructions. These instructions are for Outlook 2000. If you are running a different version of Outlook you will probably find the procedure to be quite similar. Note that the examples are keying on [PMX: in the subject. For the local CS installation, you will have to use [SPAM: instead.



See an error in this FAQ entry? Please report it.

[Return to the FAQ index]









Valid HTML 4.01!