Epilogue


Posted by Si Dawson on 29/03/13 in Administration

In the two weeks since I announced the shutdown of Twit Cleaner, the following has happened:

I’ve personally responded to:

  • Somewhere between 3-4000 Twitter messages (I figure, if people cared enough to comment, they deserved a response)
  • 200-ish blog comments
  • 50+ emails
  • a stack of meetings and proposals of various seriousness/ridiculousness (“You should write Angry Birds, but on a zipline! It’d be easy! We’ll make millions!”)

On top of that, I had offers from two different “competitors” (of sorts), to buy, variously – the domain, the traffic, the email list (ie, all your email addresses), and the Twitter account. One offered a trivial amount, in exchange for wanting me to spam you all, three times a day for three months.

Uhh. Yeah, really.

Needless to say, I told them (politely) to bite the wax tadpole.

Possibly Replacement Services

Unfortunately, despite looking, I haven’t found any other services that with good conscience I can point you to as a reasonable replacement.

There are a dozen services that do the basics – who’s not following you back, who’s left Twitter (although be warned here – Twitter gives false information; a lot of these services tell you WAY more people have left than Twit Cleaner would have, because I very carefully checked this data several times, to compensate for Twitter’s unreliability).

There are no services that go to the depth that Twit Cleaner did.

Why not? Simply because doing that requires an absolute ton of data. Getting that data is no longer possible in a reasonable timeframe. Thus, any service that could have done this would have had to shut down, same as I did.

The Source Code

Regarding the code-base. I’ve had several people suggest (*cough*demand*cough*) I open-source the whole lot, “so people could run their own reports” (aka “You’ve spent 10,000 hours writing this, give it to me for free!”).

Unfortunately, this simply isn’t practical. Twit Cleaner consists of several hundred database tables (many of which require specific maintenance processes to keep them operational and performant), and a dozen different processes all carefully balanced and interacting.

Additionally, the problems I had getting data from Twitter are actually worse if you only have access to a single account (because a lot of what I did was only possibly by spreading data requests across tens if not hundreds of accounts – but shh, don’t tell Twitter that. It may have been a little naughty).

In a nutshell, the core reason Twit Cleaner has closed is because data access has dropped to 111-333 times smaller.

I do plan to roll my core Twitter library changes back into python-twitter (mainly more robust error handling code), if the maintainers want them. I’ll also be writing a post or two about some of the more subtle maintenance techniques I had to figure out to keep Twit Cleaner running. I’m sure they won’t be a surprise to the MySQL gurus out there, but other beginners might find them helpful.

What’s Left?

It’s nearing the end of the month, so the server will be shut down this weekend (otherwise I have to pay server fees for another month). I’ll be moving the blog over to another server (same domain, different hardware, different ip address), but all Twit Cleaner related services will cease once that happens. Which means you only have a couple of days left to do any last minute cleaning, if you still have a report sitting around.

I will keep blogging here. There are still a few things worth saying in this space. Plus, of course, I’ll still be keeping you alerted to trials and tribulations in Twitter land, over at @TwitCleaner.

Twitter’s first major update to its API went live today.

Here’s how it impacts Twit Cleaner.

In short? We can no longer support large users, where “large” currently equals about 50k friends (those you’re following), but may drop further if I find the system is still struggling.

Why is this happening?

Well, as you know, Twit Cleaner does a lot of analysis. This requires a LOT of data.

Twitter puts limits on how much data-per-hour you can get from it. Specifically, how many things you can ask for (“requests”).

Previously, we were able to make 20,000 requests per hour – and that included any kind of request (eg, getting your profile image, your user information, your tweets, your friends, who you’ve talked to, etc).

Now, each type of request is broken into its own tiny little bucket, and each bucket is very, very small. Most of them are limited to one request per minute.

Twit Cleaner depends pretty heavily on analysing the tweets that someone has made (not just the easy, fast basic info you can get – bio, profile image, following/follower counts etc). This is why our reports take so much longer than any other unfollowing/analysis service out there. We dig deeper so we can (I like to think) add more value.

In order to do this, we have to make 1.01 (sometimes 2.01) requests (plus a little overhead) per person you’re following.

You follow 5k people? We have to make 5,050 requests to Twitter. 5000 of one kind (tweets), 50 of the other (user info).

You follow 100k people? We have to make 101,000 requests.

At 20,000 requests per hour, that meant we could get a report in five hours, although with a bit of clever jiggery pokery (caching like crazy) we can usually cut that in half – ie, operating at about the maximum speed Twitter can send data to us anyway.

At 60 requests per hour (and assuming we’re similarly clever), doing a report for someone following 100k people would now take 101,000/60/2 = three and a half days. Now, we can definitely get that timeframe down a bit, but it requires an enormous drain on the server, and it’s still going to be slow as heck. Oh, plus it’ll slow things down for everyone else (despite my best efforts).

I don’t like slow. I really don’t like slow.

So, much as I hate to do this – and it sucks, it really does, I’m going to have to stop supporting large users. Why should you be penalised, just because you’re popular/successful/cute? I don’t know. Maybe best to ask Twitter that.

The good news is, most accounts aren’t large. The average following count is 500. So, most of you guys will be zippier than ever (not being slowed down by other people hogging server resources).

Since one 100k account uses 200 times as many resources (cpu/bandwidth/database contention/grey hair) as your average account, this is probably one of those “needs of the many outweigh the needs of the few” things. Mostly people will be better off, but I’m still sad not to be able to help larger users more.

Oh, and I’ve also had to shut down the Retweets section of the website (which was super-cute, I thought). The new version of the API just doesn’t support it any more.

Kind of a sad day.