Quantcast’s Stochastic Statistics

I noticed that on some visits, Digg includes a JavaScript file hosted by Quantcast, which describes itself itself as “a new media measurement service that enables advertisers to view audience reports for millions of sites and services to build their brands with confidence.”

In other words, they track your usage of a site and try to correlate it with other users’ behavior. Even better, say after reading the front page of Digg I switch over to Maxim to check out the Today’s Girl section. Since Maxim also uses Quantcast, they now know that I am both a Digg reader and a Maxim reader, and this will presumably be shared with both sites so that they can get a better idea of their audience.

To their credit, they seem to be fairly open about their methodology (see Interpreting Panel Based Analysis, About Our Estimates, and their blog prior to September 2007). I’m not thrilled with the fact that you can’t opt-out of all this, though.

Useless Data

If you visit Quantcast’s homepage, you can look up stats on virtually any site that springs to mind. In this regard it’s somewhat like Amazon’s Alexa, but there’s one difference: If you go to Alexa and ask for a list of sites similar to Digg, you get this one, which includes Slashdot and Boing Boing, along with some less sensible results (Vietnam Business Portal, anyone?).

Now go to Quantcast and do the same query. Granted, Alexa is suggesting related sites and Quantcast is finding sites with similar audiences, but here are some that it comes up with:

2. Hot-babes.name
This site reaches approximately 12,597 U.S. monthly uniques. The site appeals to a more affluent, more male crowd. The typical visitor reads thehollywoodgossip.com, subscribes to Playboy, and watches heavy.com.

5. ahotporn.com
This site reaches approximately 12,718 U.S. monthly uniques. The site appeals to a slightly more female than male, youth/child centric audience. The typical visitor subscribes to Playboy, reads thehollywoodgossip.com, and visits gamespy.com.

9. vidshadow.com
This site reaches over 393K monthly uniques, of which 124K (32%) are in the U.S. The site caters to a more youthful audience. The typical visitor reads anncoulter.com and visits freeweblayouts.com.

So according to Quantcast, Digg users are rich, porn-loving children with gender identity issues and who obsess over celebrities like Ann Coulter. (As an aside, the 7th result is a Romanian site named “RaPeRbOy,” though it’s just some kid’s blog. I can’t find a working Romanian-to-English translator.)

I should reiterate that Digg is “Quantified,” meaning they actively track visitors. Compare the results with those for Slashdot or RPI. The results are far more reasonable, and neither is using the tracking.

Finally, their demographics info for Digg is amusing, as it claims that no visitors are under 18.

Oh Yeah, JavaScript

If you’re curious about what that JavaScript is doing, it shouldn’t be too hard to decipher. First, the server issues a tracking cookie when you load the file (unless you already had one):

The server also dynamically inserts this identifier into the script as the value of the dc variable. When the code executes, it gathers some information about the session (e.g., tracking identifier, time, URL, referrer URL, display resolution and color depth) and packs it into a URL pointing to an image file (a 1-by-1 pixel GIF called a “measurement pixel”) on one of Quantcast’s servers. This URL is inserted into the src attribute of an img tag that is inserted into the document. The browser subsequently requests the file and in doing so sends the information to Quantcast.

· · ·

How to Close a Swiss Army Knife

I was looking through the manual for my new sapphire Victorinox Cybertool 34 and found this to be enlightening:

Danger Zone

It’s a pretty neat tool, though it is a bit much to carry around in my pocket. Luckily, my Wenger Synergy backpack has a special pocket specifically for Swiss Army knives (but I feel like a traitor putting a Victorinox product in there).

The coolest part of all, perhaps, is that it has a T10 Torx bit. About the same time last year, I had a hellish time locating such a bit (thanks in part to misinformation on the Web) so that I could install a new hard drive in my iMac G5.

· · ·

Flaming: A White Paper

I was reading a worst-case scenario guide on how to expunge a nasty e-mail and got to the part where it mentions automatic flame detection:

One e-mail program offers a “Mood Watch” function that monitors your typing and alerts you if a message is approaching “flame” status.

I vaguely recalled this as a feature of an e-mail client I used, so I did some Googling and found that MoodWatch is a feature of the now open source Eudora client, which I used until switching to Mac OS X.

As it turns out, they’ve got a white paper on the algorithm. Essentially, they dug through alt.flame and categorized “words and phrases that are commonly considered offensive, dictatorial, aggressive, insulting and rude.” The authors come just short of proposing a Bayesian spam flame filter, but I imagine that’s how it’s implemented in Eudora (I didn’t trudge through the source).

It gets me thinking, though, why we couldn’t use Bayes’ theorem for sorting all kinds of e-mail. We don’t have to use a black-and-white differentiation between spam and non-spam. To be sure, 419 frauds, phishing e-mails, and unsolicited stock advice is spam through and through. But that Amazon.com sale might be useful to me, I just don’t want it in my inbox.

· · ·

Dodd drops out, I go directory digging

After presidential candidate Chris Dodd came out of the Iowa caucus with approximately 0% of the votes, he abandoned his bid for the presidency.

His website, chrisdodd.com, is now displaying the following banner:

Dodd A

The URL for the above is http://chrisdodd.com/i/10wa/13A.jpg — note the cleverly obfuscated “iowa” directory name. 13 is presumably a reference to the date (January 3rd). What happens if we change that 13A to a 13B, though?

Dodd B

Looks like Dodd had hope yet, but that was clearly “Plan B.” There’s also a third, though it just reads “Thank You” and has a donation link.

Time will tell

If we query Dodd’s web server for last update timestamps for those files, we get

13A.jpg  Thu, 03 Jan 2008 8:50:20 PM CST
13B.jpg  Thu, 03 Jan 2008 4:49:18 PM CST
13C.jpg  Thu, 03 Jan 2008 7:13:24 PM CST

Come up with whatever theories you will, but I’ll point out that the caucus began at 7:00 PM and the earliest mention of his dropping out was at 9:37 PM.