Home → 2005 → 10 → Akismet, centralised spam combating solution by Matt
Akismet, centralised spam combating solution by Matt
Matt Mullenweg announced a new spam combating service — Akismet. If you are a personal blogger, or pro-blogger-wannabe who cannot make more than $500 a month, Akismet is free for you to use. If it is a commercial site, or you are making big bucks from your blogsites — then a commercial license needs to be acquired starting from $5/month.
So, how does Akismet catch spams? How does it reduce false positives? What sort of algorithm does it use? Well, hmmm. We don’t know. Akismet is a centralised spam classifying service. For every comment received by your blog, it gets delivered to a centralised server, using a REST-based API. If the big brain on that server doesn’t like, it yells back “Spam!!!” and so that comment will be marked.
So, how does this centralised server determine whether a comment is a ham or a spam? According to the FAQ,
When a new comment, trackback, or pingback comes to your blog it is submitted to the Akismet web service which runs hundreds of tests on the comment and returns a thumbs up or thumbs down.
Hmm. Probably something like SpamAssassin but for blog comments. According to Michael Hampton, it “entirely replace plugins such as wp-hashcash, Spam Karma 2, AuthImage, etc” so I guess they must have sampled some of those implementations. Further on, he mentioned that he has “integrating CJD’s Spam Nuker”. So we probably get some idea what kind of backend does it have.
It also allows the users to manually classify comments as spam or ham. In the sense it might have some kind of Bayesian classifier that can be trained. Useful to report all the false positive. and false negatives.
So, what’s good about Akismet?
- A large sample of comment spams allows its Bayesian classifier to be thoroughly trained.
- Centralised service so Matt and co can do all the fine tuning without touching your site. No more updates for algorithm changes.
- Nice API that can be easily integrated into other blog tools. There might even command-line tools that can submit spam/ham in bulk.
But why I probably would not use it?
- A centralised server. I hate latency, especially my blogs are hosted somewhere half way around the world to Akismet’s central server.
- A centralised service. Just imagine millions of WordPress blogs download this plugin and deploy it today, and send millions of comments to this potentially CPU intensive classifying job…
- A centralised user-trained classification service. Although FAQ said that it is unlikely to poison the classifier (probably some kind of jail on a per-API key level), I just don’t feel right when someone anonymous blogger is moderating my comments.
- I don’t earn USD$500 a month blogging, but I hope one day I will. (Currently projecting when un*x time(2) wraps around…)
- But most importantly, I don’t get spams. Well. Rarely — to a point that it has never bothered me, when I require all first-time commenters to be moderated, which should be a default option for WordPress.
Still, I applaud for this great product. Not perfect, but probably still the next best thing than that red button labelled “Kill All Spammers”.
Update: Since I moved the site to DreamHost, I have actually started to use Akismet, and was surprised by the result — it is quite good. Centralised issue still concerns me. Things like Ping-o-matic outage can really stall blog posting, but fortunately Akismet plugin has good built-in timeout so that it will give up if the server is not responsive.
2 Links to This Article
-
2005-10-26 17:43boakes.org
Akismet - Comment Spam Killer
This website - like any website that allows readers to submit comments - receives comment-spam, usually advertising medicines, gambling, or other vices.
I’ve been trialling a new anti-comment-spam plugin since mid September. It’s called … -
2006-05-19 20:11Getting WordPress.com Account, Not Blog | Scott Yang’s Playground
[…] WordPress.com has opened registration so you can get an account without blog attached. Useful if all you want is an API key so you can use products like Akismet, a centralised comment/trackback spam filter (which I previously reviewed here). […]
4 Comments
Add a comment
Gravatar is used. Email address is required but will not be displayed. Please keep your comment on topic. No spamming and/or bad language. Scott reserves the right to delete/edit your comments.
True, it’s centralised. The idea is, as the service expands, Matt will be able to add additional servers to handle the additional loads, perhaps even servers located outside the U.S.
By the way, the integration of CJD’s Spam Nuker is in the plugin itself, not in the server. It’s basically the screen you get when you click Manage » Akismet Spam.
I agree that adding hardware “sometimes” solves scalability issues, but not always. I guess with applications like content classification should be CPU bound and can easily run in clusters.
And thanks for the clarification on how Spam Nuker has been integrated.
Clustering is one thing, and that’ll certainly make things faster, but the big problem for you guys is network latency. I’d like to see you get an answer in, say, 60ms, as opposed to 600ms. A (future) Australian mirror of the whole thing would help in that regard.
Interactive applications like blog commenting hates latency (maybe one reason why people avoids MT due to page rebuilds). Latency can be introduced by networks, like routing a HTTP request from Australia to the States. However, amount of time taken to process this request can also add up latency — and that was one thing I was worried about when number of users increases, especially Akismet is probably going to be very popular amongst bloggers.
Having the service more scalable and just keep on adding hardwares can be one way to fix that.
Another way I can think of is by having some operations working in asynchronised fashion. Instead of waiting for the Akismet server(s) to decide the fake of this comment, before sending back the result to the browser. It might feel more responsive by temporarily marking the comment as “Waiting for Response” and send back the HTML straight away. Plugin can then pull the result in a separate thread, or register an async callback, to mark the comment as “spam” or “ham”.