* [#overview Overview] * [#wiki_as_email Wiki as Email] ** [#headers Headers] ** [#content Content] * [#filtering Filtering] ** [#bayesian_filtering Bayesian Filtering] ** [#_spamassassin !SpamAssassin] ---- ! Overview Its rather simple. Rewrite each edit as an email, and use existing spam tools to classify the edit. Bayesian filters should work fantastically well for this application, though I can't think of any good reasons why more traditional filters such as (http://www.spamassassin.org/ SpamAssassin) won't work. ---- ! Wiki as Email !! Headers Most of the headers in this application are defunct, but well formed headers will help the filters work their magic in the correct fashion. !! Content The content is a little tricky. Do we simply supply the raw wiki text, or do we render into HTML? Which content do we include - everything or just the diff? Initially I think that the diff text in raw form should be enough, rendering into HTML is probably a good idea at a later date. ---- ! Filtering !! Bayesian Filtering The regular benefits of Bayesian filtering over other methods should apply equally as well on a wiki as in email. As with any Bayesian filtering, the system needs to be trained and so the training interface will probably be the most cumbersome component of our anti-wiki-spam coding. !! !SpamAssassin !SpamAssassin's default rules would need to be tweaked by use of a custom config file, as various tests (eg: MIME_HTML_ONLY) are useless in this context.