I've accepted an answer, but sadly, I believe we're stuck with our original worst case scenario: CAPTCHA everyone on purchase attempts of the crap. Short explanation: caching / web farms make it impossible to track hits, and any workaround (sending a non-cached web-beacon, writing to a unified table, etc.) slows the site down worse than the bots would. There is likely some pricey hardware from Cisco or the like that can help at a high level, but it's hard to justify the cost if CAPTCHA-ing everyone is an alternative. I'll attempt a more full explanation later, as well as cleaning this up for future searchers (though others are welcome to try, as it's community wiki).
Situation
This is about the bag o' crap sales on woot.com. I'm the president of Woot Workshop, the subsidiary of Woot that does the design, writes the product descriptions, podcasts, blog posts, and moderates the forums. I work with CSS/HTML and am only barely familiar with other technologies. I work closely with the developers and have talked through all of the answers here (and many other ideas we've had).
Usability is a massive part of my job, and making the site exciting and fun is most of the rest of it. That's where the three goals below derive. CAPTCHA harms usability, and bots steal the fun and excitement out of our crap sales.
Bots are slamming our front page tens of times a second screen scraping (and/or scanning our RSS) for the Random Crap sale. The moment they see that, it triggers a second stage of the program that logs in, clicks I want One, fills out the form, and buys the crap.
Evaluation
lc: On stackoverflow and other sites that use this method, they're almost always dealing with authenticated (logged in) users, because the task being attempted requires that.
On Woot, anonymous (non-logged) users can view our home page. In other words, the slamming bots can be non-authenticated (and essentially non-trackable except by IP address).
So we're back to scanning for IPs, which a) is fairly useless in this age of cloud networking and spambot zombies and b) catches too many innocents given the number of businesses that come from one IP address (not to mention the issues with non-static IP ISPs and potential performance hits to trying to track this).
Oh, and having people call us would be the worst possible scenario. Can we have them call you?
BradC: Ned Batchelder's methods look pretty cool, but they're pretty firmly designed to defeat bots built for a network of sites. Our problem is bots are built specifically to defeat our site. Some of these methods could likely work for a short time until the scripters evolved their bots to ignore the honeypot, screen-scrape for nearby label names instead of form ids, and use a javascript-capable browser control.
lc again: "Unless, of course, the hype is part of your marketing scheme." Yes, it definitely is. The surprise of when the item appears, as well as the excitement if you manage to get one is probably as much or more important than the crap you actually end up getting. Anything that eliminates first-come/first-serve is detrimental to the thrill of 'winning' the crap.
novatrust: And I, for one, welcome our new bot overlords. We actually do offer RSSfeeds to allow 3rd party apps to scan our site for product info, but not ahead of the main site HTML. If I'm interpreting it right, your solution does help goal 2 (performance issues) by completely sacrificing goal 1, and just resigning the fact that bots will be buying most of the crap. I up-voted your response, because your last paragraph pessimism feels accurate to me. There seems to be no silver bullet here.
The rest of the responses generally rely on IP tracking, which, again, seems to both be useless (with botnets/zombies/cloud networking) and detrimental (catching many innocents who come from same-IP destinations).
Any other approaches / ideas? My developers keep saying "let's just do CAPTCHA" but I'm hoping there's less intrusive methods to all actual humans wanting some of our crap.
Original question
Say you're selling something cheap that has a very high perceived value, and you have a very limited amount. No one knows exactly when you will sell this item. And over a million people regularly come by to see what you're selling.
You end up with scripters and bots attempting to programmatically [a] figure out when you're selling said item, and [b] make sure they're among the first to buy it. This sucks for two reasons:
- Your site is slammed by non-humans, slowing everything down for everyone.
- The scripters end up 'winning' the product, causing the regulars to feel cheated.
A seemingly obvious solution is to create some hoops for your users to jump through before placing their order, but there are at least three problems with this:
- The user experience sucks for humans, as they have to decipher CAPTCHA, pick out the cat, or solve a math problem.
- If the perceived benefit is high enough, and the crowd large enough, some group will find their way around any tweak, leading to an arms race. (This is especially true the simpler the tweak is; hidden 'comments' form, re-arranging the form elements, mis-labeling them, hidden 'gotcha' text all will work once and then need to be changed to fight targeting this specific form.)
- Even if the scripters can't 'solve' your tweak it doesn't prevent them from slamming your front page, and then sounding an alarm for the scripter to fill out the order, manually. Given they get the advantage from solving [a], they will likely still win [b] since they'll be the first humans reaching the order page. Additionally, 1. still happens, causing server errors and a decreased performance for everyone.
Another solution is to watch for IPs hitting too often, block them from the firewall, or otherwise prevent them from ordering. This could solve 2. and prevent [b] but the performance hit from scanning for IPs is massive and would likely cause more problems like 1. than the scripters were causing on their own. Additionally, the possibility of cloud networking and spambot zombies makes IP checking fairly useless.
A third idea, forcing the order form to be loaded for some time (say, half a second) would potentially slow the progress of the speedy orders, but again, the scripters would still be the first people in, at any speed not detrimental to actual users.
Goals
- Sell the item to non-scripting humans.
- Keep the site running at a speed not slowed by bots.
- Don't hassle the 'normal' users with any tasks to complete to prove they're human.