Identifying people that are high volume has been something that has taken the longest time to really settle down.

I’m getting much happier with this latest (the 3rd? 4th?) incarnation.

If you remember, the previous version simply identified anyone that tweeted on average more than 50 times in a day. That was much more reliable than earlier versions, but suffered from one major limitation. There’s a huge difference between someone who tweets “hello world!” in the morning, then uses the other 49 tweets to chat to their friends, vs someone that just blasts out junk 50 times a day.

The difference is – how many of those tweets are public?

Why are high volume tweeters even a problem? Well, this is something that people tend to forget once they start following more than a few thousand people. When you’re following that many people, there are so many tweets flying past it’s mostly a blur. So, if you tweet like crazy who cares, it gets lost in the blur, right?

What is forgotten is this: Of the active users on twitter, most people only follow 2-400 others. One high volume user can flood an entire tweetstream, making it impossible to connect with anyone else.

With that in mind, the new high volume algorithm works like this. If you tweet publicly (ie, anything other than a reply) more than 24 times a day, you’re listed as high volume. This isn’t completely accurate, since if you have any overlapping friends, you’ll see their conversations with them too, but it’s a good estimate.

Once an hour may not sound like a lot, but once you factor in work, sleep, play – oh, and the fact that this is only public tweets, it’s an absolute ton. So talk, talk away! Just connect, make some friends! Don’t blather on about yourself all day :)

I’ve added a new category to the bottom of the reports, “Little Original Content.”

This covers two areas:

People who retweet 70% or more of the time

Of course, some people do find the best stuff out there, but in general, if someone is only ever RTing things by other people – why not just follow the other person? This is also something that is done a lot by spam bots, to make them appear ‘more human.’

People who post quotes more than 50% of the time

Similarly to retweeting, spam bots often intersperse their crap with quotes. It’s a zero effort way for them to have ‘fresh’ content. In reality though, if they’re quoting Epicurius, this probably isn’t something you need to be getting second-by-second Twitter updates on, the guy’s been dead 2300 years!

That said, as with everything on the reports, there will always be those you choose to follow that fit into the above categories (eg, I follow a couple of accounts that post nothing but quotes). Just click their icons & they’ll be saved.

If you don’t want to unfollow any of them,  simply click the headings, & the the entire category will be saved. As easy as ever!

No More Checkbox

Posted by Si on 23/02/10 in Improvements

The primary complaint I’ve had about the site, at the rate of about one in every two hundred users is that of people not seeing the checkbox on the front page. Which one? The one that said “Tell your followers”:

If it was checked, a tweet went out on your behalf, if not, no tweet.

I’ve had an improvement waiting for a new version of the site before I rolled it out, but yesterday I was goaded by some Brazilians into pushing it out sooner instead.

So, now the front page now looks like this:

(Well that’s cleaner & simpler, isn’t it?)

And when you click that, you now get a choice (rather than defaulting to on like it used to):

If there’s a way to make that any clearer, I’m not sure what it is :) and more happy people? Well, that’s always a good thing.

I’ve been keeping an eye on current best practices on Twitter (of course), & it appears Twitter is cracking down on bulk unfollow. This was an informative article, & an eye opening tweet.

Specifically, they want to avoid churning – that is, following a ton of people, then unfollowing those that don’t follow back, repeat ad nauseum. I suspect part of their motivation is to limit spammy behaviour, & part is that it’s a massive drain on their servers (getting hit with thousands of API requests in a short period of time).

I realise that with The Twit Cleaner, we skirting a fine line. However, my priority is to keep your accounts safe, operate within the guidelines of Twitter, while improving the quality of experience for everyone.

In short: I want to improve the Twitter experience as much as I possibly can – but without pissing Twitter off (or causing them any hassle) in the process. I’ve been very careful to try & ensure that the service is the least possible use to those I’m trying to rid Twitter of – those engaging in churn or other spammy practices. Ie, there’s a lot of things I could have put in, but I deliberately haven’t because of any possibility of abuse.

Of course if Twitter says jump, the only appropriate response is “how high”, but I believe we’re safe because:

  1. You only ever have the option to unfollow people that are bad Twitter citizens in the first place – typically a very small percentage of anyone’s account
  2. We do the unfollowing very, very slowly (only one every few seconds) to limit drain on Twitter’s servers
  3. We never unfollow more than a small percent of your account per day, no matter how many you request.

To this end, I have slowed the unfollow down even further than before. It will now not unfollow more than 20% 10% 5% (or 500, whichever is smaller) of your friend count per day, as well as spacing each unfollow out much, much more slowly.

Ergo, if you want to use The Twit Cleaner to empty your account, you’re better off going somewhere else (it’s not something I’d recommend anyway). If you want to use it to trim out spammers & time wasters of course, we’re the guys for you.

It’ll still happen, just very, VERY slowly. Don’t hold your breath :)

Improving the auto-tweet

Posted by Si on 04/01/10 in Improvements

The “auto tweet” on the front page is a source of occasional consternation.

There are two specific behaviours that I’ve tweaked & improved.

1. If you had a problem with Twitter when authenticating, it would default back to tweeting (even if you’d previously deselected the checkbox).

2. People occasionally request a report (with the checkbox ticked, ie – send out a tweet), then a minute or two later seem to change their mind & request another report, with the checkbox unticked (no tweet). I guess they decide to read “Tell your followers” after they’ve clicked it?

Anyway, I’ve tweaked both these issues. The option should remember your choice – a bit hard to test, since I can’t exactly call up Twitter & ask them to break so I can test things, but it should be good. Plus, if you accidentally select the wrong option, as long as you’re quick about it, you should be able to overwrite your previous request by requesting one again – toot suite though!

Lists are a great new addition to Twitter.

I recently got to thinking (spurred on by @GLComputing – thank you!) about them in a different way.

Lists are groups of people that you’ve taken the time to say “Hey, this person is important to me.” If that’s the case, why should you need to tell The Twit Cleaner as well? You’ve already said it once, after all.

So, now the reports will automatically exclude anyone you’ve added to any of your lists (including the automated “conversationalist” list). There is a slight issue here with regards to people who’ve left Twitter but may still be on a list, but in general this will be far offset by the benefit of far fewer false positives on your reports.

As an extra bonus, the benefit (time saved having to check everything) increases the more people you follow – or the larger your lists.

I’ve been wondering for some time now how to deal with the “Hey! My friend is on the report!” issue.

The Twit Cleaner looks at a limited set of data – it’s simply not practical to go back through every tweet someone has ever written, for example. As it is, some reports require us to download & analyse up to 3 Gigs of data. It can get pretty crazy.

What I have done instead is the following:

1. Look at who @mentions you. Obviously spammers do this all the time, so it’s not foolproof, however it will now take someone completely off the “never interacts” part of the report. If they are spammers, they’ll likely show up elsewhere.

2. Look at who you @reply or RT to. If you’re RTing or @ing someone, then obviously they’re significant to you – therefore, that person is now removed from the report completely.

Unfortunately, this isn’t quite as awesome as I’d like – mainly due to Twitter flakiness, but it’s a definite improvement. What flakiness? Here’s a typical conversation I recorded earlier:

Me: I’d like the last few hundred tweets this person made please!

Twitter: The last 57 you say? Sure thing! Here they are!

I have a plan in place to work around this, but it involves rewriting the entire back-end in a different language & moving it all to a different operating system altogether. This might take a little time. Heh.

It was brought to my attention recently that people were using the bit.ly & ow.ly URL shortening services to track who clicked through from their profiles.

I have to admit this is something that hadn’t occurred to me when I first wrote The Twit Cleaner (I didn’t realise some URL shorteners did this).

I investigated a bit more deeply, & my conclusion is that spammers in general aren’t using shortened URLs in their profiles so much these days. In fact, the number of people appearing on the report vs the number of “bad guys” was way too high.

So, I’ve removed that sub category from the report altogether. No point in just showing a bunch of shy (or stats geeks like myself) people on there. I’ve also removed the “No bio no url” sub category, since there was a similarly high false positive rate there. This means the entire “Secretive” category has now disappeared. One or two profiles will still show for the next little while, as they drain slowly from the cache, but their incidence will be drastically reduced.

In general, any spammers or other dodgy people will be adequately caught with the other criteria on the report.

I have more significant things in the pipeline to improve accuracy (particularly identifying people you care most about), but I have some technical hiccups (broken 3rd party libraries) to work around first. More on that later.