Performance, Performance, Performance


Posted by Si Dawson on 07/02/12 in Improvements

It bothers me when you have to wait to get your reports.

Obviously there are some physical limits - how fast can we get data from Twitter, how many other people want reports at the same time, how much data do we need to analyse and so on. The answer to the last one can be A LOT, by the way. For every 100k people you follow, we need to ask Twitter for roughly 3 gigs of data.

There's been particular problems when super large users (100k+) have asked for reports. If Twitter was being particularly grumpy, it used to create enough of a backlog it could slow down everyone's reports for an entire day. Blergh!

If you're only following a few hundred people, why should you have to wait just because some giant user requested a report right before you?

So, over the last few months I've been sitting here, all day every day, watching the systems on a second by second basis. I've identified roughly a dozen bottlenecks, some small, some huge.

The good news is, I've managed to bend the underlying technology in a few ways that are theoretically impossible (according to their documentation) to get around these limitations. I'm justifiably pleased.

You will have already seen some of the benefits as I've been gradually adjusting things on the live system. Most of the big improvements I've rolled out this evening.

Oh, I've also taken the opportunity to roll out a few big structural changes to allow me to offer significant new functionality. What? Ahh, you'll just have to wait and see!

Yes, I'm super pleased with progress. Yes, I'm even more excited about what's to come!

A few new sub categories


Posted by Si Dawson on 19/05/11 in Improvements

I've added some new sub-categories to the reports.

App Spam

Accounts where more than 50% of their tweets are auto-generated by an application. Examples of this include paper.li, 4sq, blip.fm, RunKeeper, miso etc.

Uses Advertising Networks

Accounts that are being paid to put tweets into their stream. Advertising networks include organisations such as MyLikes, ad.ly, Magpie, Sponsored Tweets, etc.

Self Obsessed

Accounts where more than 50% of the tweets are about themselves. So, either starting their tweets with "I", "Ive", "I'm", "I'd" or retweeting things people have said about them.

Relatively Unpopular

People who follow more than 3.3 times as many people who follow them. Or, put another way, those who have fewer than 30% people following them back. Sometimes this is just because someone likes to follow lots of people, but occasionally it can also show someone (eg spam bots) whose behaviour is offputting or boring, so very few people choose to follow them back.

The Self Obsessed and Relatively Unpopular sub-categories go into a new category, called "Not Very Interesting." It will appear at the bottom of your reports.

As always, this is informational. It's always up to you who you want to follow.

Personally, I follow (& will always follow) a whole bunch of people that appear on my report. Why? Because they're entertaining, informative and I like them. Free will - it's not an accident!