A Spammy Year in Review

It’s that time of the year again.

Time for family members to joyfully gather for the holidays. Time to work on those ill-fated New Year’s resolutions. Time to relax and reflect on the past year and lessons learned.

Here at Akismet, we proudly work year round to protect millions of sites from comment spam. To date, in fact, we have eliminated over 65 billion (yes, with a ‘b’) spam comments, and we saw many interesting — and nasty — things in 2012. Make no mistake about it — spam levels are certainly on the rise.

Akismet saved the web from over 25 billion pieces of spam content this past year alone (and December is not over yet!). Toward the end of the year, specifically, we began seeing alarming and heightened levels of spam. Our daily totals — starting in early December — began topping 120 million spam comments per day, a trend that appears to be continuing into the new year. To add some perspective, these numbers are roughly double what we were seeing in previous months. We also topped the 3 billion spam mark in November:

Akismet Spam Totals By Month, 2012

The chart above contains both current (C) and projected (P) December figures.

More troubling in terms of trends, there has been an unfortunate increase in compromised sites, wikis, and forums. We come across these after a hacker takes over a site, sets up their payload, and proceeds to spam a great number of sites with their malicious links. It is clear that spammers are increasingly willing to use illegal methods, such as hacking and exploiting these vulnerable websites. We have even seen these tactics used to advertise otherwise-reputable and well-known websites, a trend that suggests some marketing firms are outsourcing work to black-hat spammers.

Further noticeable increases in spam include elevated traffic from China, as well as the promotion of Chinese knockoff fashion sites. If you happen to regularly check your spam queue, we’re sure that you’ve seen at least one offer to purchase discount Christian Louboutin shoes or “authentic” Michael Kors purses. Also steadily on the rise has been spam content promoting TV streaming, payday loans, and “Get rich by working from home!” sites and programs. Sure, we have all seen this garbage before, but its recent increase is something to still keep in mind. Here are some example sites, all of which were created on free blog hosts:

Spammers have also taken a liking to the abuse of reputable affiliate and referral programs.Their goal is simple: set up a free blog or site (example shown below), publish oodles of product listings including specific affiliate URLs, and spam the rest of the internet. You may notice that many of these spammers plant their payloads on free site hosts. Luckily, on WordPress.com, Akismet is actively working to combat the creation of these spam sites. We would love to see such integration on other hosts.

Affiliate Spammer

A spammer abusing Amazon’s affiliate program. A prominent form of spam in 2012.

Recent streams of human-generated spam are also worth mentioning. These campaigns tend to focus on more difficult targets, such as forums, third-party commenting platforms, and social networks. Such spam can be more difficult to systematically neutralize, which is why Akismet continues to develop and employ refined tactics against it.

Rest assured, we are always hard at work to make sure that any global increase in spam does not mean an unfortunate rise in unwanted comments getting through to your posts or moderation queues. Because we know that you have far more important things to do than sift through trash.

As always, our resolution for the new year is to continue making Akismet better, faster, and more accurate. As spam evolves, so will Akismet. We sincerely thank each and every one of our users for trusting us to defend their sites against the web’s underbelly.

Happy Holidays to you and yours,
Team Akismet

Akismet WordPress Plugin 2.5.7

Version 2.5.7 of the Akismet plugin for WordPress is now available. This is a maintenance release that fixes various minor bugs and includes some proactive security improvements. Changes include:

  • Fix a bug displaying the Stats page in some versions of FireFox
  • Fix mshots previews when using https
  • Add .htaccess to block direct access to files
  • Prevent some PHP notices
  • Fix Check For Spam return location when referrer is empty
  • Fix Settings links for network admins
  • Prevent some prepare() warnings in WordPress 3.5

To upgrade, visit the Plugins page of your WordPress wp-admin dashboard and follow the instructions.  If you need to download the zip file directly, links to all versions are available in the plugins directory.

Please note that Akismet 2.5.7 requires WordPress 3.0 or higher.  We recommend all users of older WordPress versions should upgrade as soon as possible.

Over 60 billion spams squashed

It works while you work, it works while you sleep, during your vacation, your weekends, and never takes a day off. Akismet, the best way to protect your online properties from spam, recently hit an incredible milestone we’re delighted to share: over 60 billion spam comments, forum comments, blog posts, pingbacks, trackbacks, and tweets squashed on sight. Boom!

118million

Just how many is 60 billion? Well, let’s say we equate one spam to one mile. 60 billion miles would take you to the sun 645 times. Akismet continues to squash more and more spam by the minute. In fact, back when we hit our 50 billion spam milestone, we were catching about 700 spams per second. In November 2012 alone, we caught three billion bits of spam, which is 100 million spams per day. Per. Day. That’s 1200 spams per second. Blink: Akismet just nabbed 1200 spam. Incredible, yes?

Akismet works with WordPress and many other platforms. If you don’t see your platform on the list, all you need do is grab an API key and get coding.

Akismet is 100% free as in air, free as in birds for individual users’ personal use. Check out our professional / business plans: sign up and say adios to spam.

Pro Tip: Testing, testing

If you’re developing a new implementation for the Akismet API, or integrating an existing library with your own application, you will of course need to test it. Often we see developers get ahead of themselves, making a few trivial API calls with minimal values and drawing the wrong conclusions about Akismet’s accuracy. Here are a few tips on what and how to test, and an outline of what you should and should not expect.

Use a test API key

If you’re developing your own code, please contact us and ask about creating an API key for testing purposes. We like to keep in contact with developers so we can help make sure you get the most out of Akismet.

For automated testing, include the parameter is_test=1 in your tests. That will tell Akismet not to change its behaviour based on those API calls – they will have no training effect. That means your tests will be somewhat repeatable, in the sense that one test won’t influence subsequent calls. (Be aware however that Akismet is non-deterministic, so you can expect to see results that change over time. See below for ways of forcing a specific response when you need a predictable test.)

There are no separate sandboxes or test servers. You needn’t worry about your tests having an effect on anyone else or on Akismet as a whole – we maintain careful isolation between API keys and users in order to make sure no one can adversely influence Akismet, accidentally or otherwise.

Test your API calls

Akismet works by examining all the available information combined. It is not enough to provide just the content of a message; you need to provide as many independent pieces of information as you can in each call. So before you can test Akismet’s accuracy, you need to make sure you’re sending complete and correct information.

To simulate a positive (spam) result, make a comment-check API call with the comment_author set to viagra-test-123, and all other required fields populated with typical values. The Akismet API will always return a true response to a valid request with that value. If you receive anything else, something is wrong in your client, data, or communications.

To simulate a negative (not spam) result, make a comment-check API call with the user_role set to administrator, and all other required fields populated with typical values. The Akismet API will always return a false response. Any other response indicates a data or communication problem.

Also, make sure your client will handle an unexpected response. Don’t assume that the comment-check API will always return either true or false. An invalid request may result in an error response. Additional information will usually be available in HTTP headers in this case. And of course a connectivity problem may result in no response at all. It’s important not to misinterpret an invalid response as meaning spam or ham.

Test your data

Akismet is highly dependent on the quality and completeness of the data you provide. It’s important to provide as many parameters as possible, and to make sure they contain correct values. If you can’t populate a particular field because that information is unavailable or irrelevant, use an empty string – a missing value is better than an incorrect or made up one.

We recommend capturing a few of your API calls in order to make sure they really do contain the intended values. It’s quite common for a bug to cause the user_ip or user_agent value to be incorrect, for example – make sure they come from the remote browser that posted the comment, and not from your server.

Also make sure that the values your submit-spam and submit-ham API calls match your comment-check API calls as closely as possible. In order to learn from its mistakes, Akismet needs to match your missed spam and false positive reports to the original comment-check API call, made when the comment was first posted. It’s normal for less information to be available for submit-spam/ham calls, since most comment systems and forums won’t store all metadata. But you should make sure that the values you do send match the originals. (A common bug is for clients to mistakenly send the moderator’s IP address or user agent instead of the comment poster’s when reporting a comment as spam).

Finally, try to send unmodified data if you can. Most applications will transform content with formatting and markup. It’s best to send Akismet the original content, prior to formatting, if you can.

Test with live comments

It’s important to test with a significant amount of real live data if you want to draw any conclusions about accuracy. We often hear from developers who have made a handful of API calls using imitation comments they’ve written themselves, and who aren’t seeing the results they expect. This is because Akismet works by comparing comments to genuine spam activity that is happening right now (and it does so based on more than just the content). An artificially constructed test spam comment probably won’t have much in common with real spam, so Akismet correctly returns a negative response.

The best way to measure Akismet’s accuracy is with a feed of live data from your production servers. Don’t act on its responses yet, just log the results to a file or store them in your metadata for analysis. Examine the results, or compare them with another filtering method, and decide if they are acceptable. Systematic errors usually indicate a data issue, so if you notice any oddities then please tell us – we can probably suggest ways of improving accuracy.

Erroneous claims of vulnerabilities in the Akismet plugin

Recently we were alerted to several claims of security flaws in the Akismet 2.5.6 plugin for WordPress.

We tested the claims of vulnerabilities in the current version of the Akismet plugin, and found them to be baseless. There was a minor exploit possible in version 2.5.3, but this had already been fixed in a routine security audit in December 2011. That fix was included in the 2.5.4 release in January 2012, prior to the publication of the advisory.

Several of the claims refer to Akismet 2.5.6 running in WordPress 2.0, an incompatible combination – Akismet 2.5 requires WordPress 3.0 or higher.

There was a minor exploit possible in Akismet 2.4.0, which is the legacy branch maintained only for versions of WordPress 2.9 and earlier. This has been fixed in the 2.4.1 release.

In short, the claims of a vulnerability in 2.5.6 are false. They were published without any attempt to contact Akismet.com or Automattic. Any security alerts about the Akismet plugin should be made here.

Of course it’s always a good idea to keep WordPress and its plugins up to date. If you haven’t done so already, we recommend taking the time to update to WordPress 3.4 and the current version of the Akismet plugin.

Legacy plugin 2.4.1 is now available

Version 2.4.1 of the legacy Akismet plugin is now available. The 2.4 branch of Akismet is for old versions of WordPress only, WP 2.9 and earlier.

This is a security update. 2.4.1 fixes a XSS vulnerability.

Anyone still using an old version of WordPress should update to Akismet 2.4.1:

akismet-2.4.1.zip (svn)

Users of WordPress 3.0 and higher can ignore this release. Akismet 2.5.6 is the current plugin version for WordPress 3.x.

We’d like to remind all users of old versions of WordPress that the latest stable version includes many security updates and improvements to WordPress itself.

Pro Tip: tell us your comment_type

This is the first in an irregular series of tips for developers interacting with the Akismet API. Akismet is very heavily dependent on the quality of the data included in API calls. Whether you’re developing a custom implementation, or maintaining an Akismet extension for a CMS or forum application, we’d like to help you get the best results possible. Our API docs outline the basics. This series will expand on that with some simple suggestions for developers.

Our first recommendation:

Use an appropriate comment_type value.

Akismet works with almost any kind of user-submitted web content: blog comments, forum posts, blog posts, contact forms and so on. The characteristics of spam tend to vary across those type – comment spam is quite different from forum spam. So it’s important to give us some context by telling us what type of messages you’re asking Akismet to check. That’s what the comment_type value is for.

The API will accept an arbitrary string. It’s best if you use a meaningful symbolic name. We recommend the following values for common types of web-based content, which are mostly self-explanatory:

comment
For blog comment forms and replies to forum posts.
pingback
trackback
Pingbacks and trackbacks respectively.
forum-post
New top-level forum posts
blog-post
Blog posts.
contact-form
Contact forms, inquiry forms and the like.
tweet
Twitter messages

That’s not an exhaustive list. If you need to check messages that don’t fit one of those categories, it’s best to use a different comment_type value. It’s especially important not to default to comment for messages that are fundamentally different from blog comments – if you do that, you can expect to see mixed results. It’s better to be too specific than too ambiguous.

There’s no need to check with us first before using a different comment_type value – use your judgement and identify your messages as best you can. To help make sure we’re interpreting your types correctly, please drop us a line – we’d love to hear from you.

50 Billion Little Pieces

Akismet passed another milestone: we caught our 50 billionth piece of spam yesterday. TechCrunch has the details:

In April, Akismet blocked 1.8 billion spam messages, or 60 million pieces of spam per day, 2.5 million per hour, or 700 per second.

Whoa, that’s a lot of spam.

Akismet, those with long memories will recall, was the first product Automattic ever launched, arriving on October 25th, 2005 – a month before WordPress.com. WordPress sites now attract over 600 million unique visitors each month, according to Quantcast, and WordPress powers 1 in 2 blogs today (including yours truly). 50,000 to 100,000 new blogs launch on WordPress daily, giving spammers a seemingly never-ending network to target.

Of course Akismet runs on many more platforms than just WordPress, and is the standard anti-spam tool used by many of the most popular forum and CMS applications. Those 700 spams per second include not just comments, but forum and blog posts, pingbacks, trackbacks, tweets and more. (Ironically it doesn’t include the FaceBook comments you’ll see on that TechCrunch post; Facebook has its own proprietary anti-spam system).

About 92% of all the items checked by Akismet are spam. That varies considerably depending on the content type: less than half of the forum posts we check are spam, but more than 99.5% of all trackbacks are spam.