Archive for the ‘FreeBSD’ Category

Spamassassin with OCR for image attachment using FuzzyOCR

Posted on February 13th, 2007 by mark  |  No Comments »

After receiving ridiculous amounts of stock, viagra and various other image spam we have tested and implemented FuzzyOcr on some of our work servers. The aim is to scan every message that comes in with a GIF or JPG attachment using optical character recognition software and cross check the resulting words against a list generated by us.

So this is how we went about installing FuzzyOCR and various required application for use with FreeBSD, Matt Simersons Mail toaster and Spamassassin.

  1. Update the ports tree

    cd /usr/ports
    make update
    pkgdb -F
  2. If you arent running the latest SpamAssassin (3.1.7 at the time of writing this) I suggest you upgrade

    portupgrade -f `pkg_info | grep razor-agents | cut -d" " -f1`
    portupgrade -f `pkg_info | grep p5-Mail-SpamAssassin | cut -d" " -f1`

  3. INSTALL REQUIRED PACKAGES - this can take a while

    portinstall -m WITHOUT_X11=yes graphics/netpbm graphics/ImageMagick graphics/gocr devel/p5-String-Approx security/p5-Digest-MD5 graphics/libungif

  4. Download FuzzyOCR

    mkdir /usr/local/src
    cd /usr/local/src
    fetch http://users.own-hero.net/~decoder/fuzzyocr/fuzzyocr-latest.tar.gz
    tar zxf fuzzyocr-latest.tar.gz
    cd FuzzyOcr-2.3b
    cp FuzzyOcr.cf /usr/local/etc/mail/spamassassin
    cp FuzzyOcr.pm /usr/local/etc/mail/spamassassin
    cp FuzzyOcr.words.sample /usr/local/etc/mail/spamassassin/FuzzyOcr.words

  5. Edit the wordlist as you please and add any works you require.
    Words can be matched loosely or strictly depending on your requirements. Making it too lose causes false positives.
  6. Add the following lines to v310.pre

    # FuzzyOCR - performs fuzzy Optical Character Recognition on spam images
    #
    loadplugin FuzzyOcr /usr/local/etc/mail/spamassassin/FuzzyOcr.pm
    loadplugin Mail::SpamAssassin::Timeout

  7. Edit your /usr/local/etc/mail/spamassassin/FuzzyOcr.cf - change all /usr/bin/ to /usr/local/bin/
    - adjust the rest of the file to match your requirements. Alternatively download it from here: fetch -o /usr/local/etc/mail/spamassassin/FuzzyOcr.cf http://www.rsaweb.co.za/rbl/FuzzyOcr.cf
  8. Restart spamassassin: /usr/local/etc/rc.d/sa-spamd.sh restart
  9. Check that Spamassassin is running: ps ax | grep spam
  10. Check the logs to make sure no errors from Spamassassin:

    tail -f /var/log/maillog

  11. Check the FuzzyOcr logs to make sure no errors. But first enable debugging by setting "focr_verbose 2"
    - remember to set it back to 1 after testing

    ee /usr/local/etc/mail/spamassassin/FuzzyOcr.cf

    # focr_verbose 2

    tail -f /var/log/mail/fuzzyocr.log

  12. Thats it - check that your mail is getting scanned. Add any words that aren't being detected and that should be it!