Spamassassin with OCR for image attachment using FuzzyOCR
After receiving ridiculous amounts of stock, viagra and various other image spam we have tested and implemented FuzzyOcr on some of our work servers. The aim is to scan every message that comes in with a GIF or JPG attachment using optical character recognition software and cross check the resulting words against a list generated by us.
So this is how we went about installing FuzzyOCR and various required application for use with FreeBSD, Matt Simersons Mail toaster and Spamassassin.
- Update the ports tree
cd /usr/ports
make update
pkgdb -F - If you arent running the latest SpamAssassin (3.1.7 at the time of writing this) I suggest you upgrade
portupgrade -f `pkg_info | grep razor-agents | cut -d" " -f1`
portupgrade -f `pkg_info | grep p5-Mail-SpamAssassin | cut -d" " -f1` - INSTALL REQUIRED PACKAGES - this can take a while
portinstall -m WITHOUT_X11=yes graphics/netpbm graphics/ImageMagick graphics/gocr devel/p5-String-Approx security/p5-Digest-MD5 graphics/libungif
- Download FuzzyOCR
mkdir /usr/local/src
cd /usr/local/src
fetch http://users.own-hero.net/~decoder/fuzzyocr/fuzzyocr-latest.tar.gz
tar zxf fuzzyocr-latest.tar.gz
cd FuzzyOcr-2.3b
cp FuzzyOcr.cf /usr/local/etc/mail/spamassassin
cp FuzzyOcr.pm /usr/local/etc/mail/spamassassin
cp FuzzyOcr.words.sample /usr/local/etc/mail/spamassassin/FuzzyOcr.words - Edit the wordlist as you please and add any works you require.
Words can be matched loosely or strictly depending on your requirements. Making it too lose causes false positives. - Add the following lines to v310.pre
# FuzzyOCR - performs fuzzy Optical Character Recognition on spam images
#
loadplugin FuzzyOcr /usr/local/etc/mail/spamassassin/FuzzyOcr.pm
loadplugin Mail::SpamAssassin::Timeout - Edit your /usr/local/etc/mail/spamassassin/FuzzyOcr.cf - change all /usr/bin/ to /usr/local/bin/
- adjust the rest of the file to match your requirements. Alternatively download it from here:fetch -o /usr/local/etc/mail/spamassassin/FuzzyOcr.cf http://www.rsaweb.co.za/rbl/FuzzyOcr.cf - Restart spamassassin:
/usr/local/etc/rc.d/sa-spamd.sh restart - Check that Spamassassin is running:
ps ax | grep spam - Check the logs to make sure no errors from Spamassassin:
tail -f /var/log/maillog
- Check the FuzzyOcr logs to make sure no errors. But first enable debugging by setting "focr_verbose 2"
- remember to set it back to 1 after testing
ee /usr/local/etc/mail/spamassassin/FuzzyOcr.cf
# focr_verbose 2
tail -f /var/log/mail/fuzzyocr.log - Thats it - check that your mail is getting scanned. Add any words that aren't being detected and that should be it!