I’ve been wondering for some time now how to deal with the “Hey! My friend is on the report!” issue.

The Twit Cleaner looks at a limited set of data – it’s simply not practical to go back through every tweet someone has ever written, for example. As it is, some reports require us to download & analyse up to 3 Gigs of data. It can get pretty crazy.

What I have done instead is the following:

1. Look at who @mentions you. Obviously spammers do this all the time, so it’s not foolproof, however it will now take someone completely off the “never interacts” part of the report. If they are spammers, they’ll likely show up elsewhere.

2. Look at who you @reply or RT to. If you’re RTing or @ing someone, then obviously they’re significant to you – therefore, that person is now removed from the report completely.

Unfortunately, this isn’t quite as awesome as I’d like – mainly due to Twitter flakiness, but it’s a definite improvement. What flakiness? Here’s a typical conversation I recorded earlier:

Me: I’d like the last few hundred tweets this person made please!

Twitter: The last 57 you say? Sure thing! Here they are!

I have a plan in place to work around this, but it involves rewriting the entire back-end in a different language & moving it all to a different operating system altogether. This might take a little time. Heh.

It was brought to my attention recently that people were using the bit.ly & ow.ly URL shortening services to track who clicked through from their profiles.

I have to admit this is something that hadn’t occurred to me when I first wrote The Twit Cleaner (I didn’t realise some URL shorteners did this).

I investigated a bit more deeply, & my conclusion is that spammers in general aren’t using shortened URLs in their profiles so much these days. In fact, the number of people appearing on the report vs the number of “bad guys” was way too high.

So, I’ve removed that sub category from the report altogether. No point in just showing a bunch of shy (or stats geeks like myself) people on there. I’ve also removed the “No bio no url” sub category, since there was a similarly high false positive rate there. This means the entire “Secretive” category has now disappeared. One or two profiles will still show for the next little while, as they drain slowly from the cache, but their incidence will be drastically reduced.

In general, any spammers or other dodgy people will be adequately caught with the other criteria on the report.

I have more significant things in the pipeline to improve accuracy (particularly identifying people you care most about), but I have some technical hiccups (broken 3rd party libraries) to work around first. More on that later.

Simplifying Pricing

Posted by Si on 30/11/09 in Administration

I’ve always been a huge fan of Apple products. Why? Because of their simplicity.

It takes a lot of work to get something that clean yet still intuitive.

I’ve always believed that the reason we have computers is to make things simpler for us. We shouldn’t be burning our own cycles if we can get the machine to do it for us.

Now, I’m the first to admit that The Twit Cleaner is nowhere near that level of elegance yet, but it’s an iterative process, & a high goal. I’ll keep pushing towards that.

In the meantime, I’ve simplified the pricing.

Now, things are like this: All reports are free, as always. If you follow fewer than 2000 people, we’ll auto-unfollow whoever you want for free. If you follow more than 2000 people, it’s five bucks.

Yep, Five US Dollars ($5) to clean your list.

It’s hard to get cleaner & simpler than that.

[Edit: Ok, that pricing has been in place for a week or so, & some interesting things have happened. For example, I had two users in the space of half an hour, each with 60k+ lists. Given that creating a report for a list of that size involves downloading many gigabytes of data & 6-12 hours of processing, I started to think "Is it really worth doing all that work, for five measly bucks?" This is even more the case when not everybody that requests a report pays, of course.

Even a list that is 20,000 people basically takes ten times as long as a 2,000 list - due to getting the lists from Twitter, downloading the data, & running the analysis. There are no real economies of scale.

So, I've adjusted the pricing (again). I still like the $5 mark, & for most people, that'll still be it. For the whales though? Above 25k users is $10, & above 50k is $20. It's still not a hell of a lot, but I feel it's a better reflection of the costs & effort involved.

We'll see how long this pricing sticks for. Everything is a work in progress, & over time I'm sure things will settle down.]

Cutting Back on DM Spam

Posted by Si on 24/11/09 in tips

Running The Twit Cleaner is great for cutting back on public timeline spam, but what about DMs?

Here are a few tricks that work well:

1. (This is the biggie) TweetLater/SocialOomph

You wouldn’t think it, but the vast majority (90+%) of auto-DMs come from users of this software.

Go here, & follow the instructions. Or, if you can’t be bothered following that link, do this:

  1. Follow @OptMeOut
  2. When @OptMeOut follows you back, it’ll DM you to let you know
  3. Send a DM to @OptMeOut (anything at all, eg “Hey! Stop spamming me, thanks!”)
  4. Unfollow @OptMeOut (totally up to you, just keeps it private)

Voila, no more auto-DMs from Social Oomph.

2. fun140.com

For some reason, fun140 thinks that fun involves DM spamming all your followers. Ok, each to their own, but if you don’t want an endless barrage of quizzes, polls & so on, go here & click “Don’t send me any direct messages.”

Fun140 uses Twitter to verify your identity, ie, prove that you are who you say you are. In other words, it looks like it might be a bit dodgy, but it’s actually completely safe. Yes, I have done this on all my accounts, & trust me, I checked VERY thoroughly.

The delightful thing is that they seem to enjoy ‘forgetting’ your opt out settings every month or so. So don’t be surprised if you need to go back & kick them again.

3. Twables.com

Go here (again, they’ll use Twitter to verify your identity, so click “allow access” when Twitter asks). Click both the “don’t contact me” preferences, & voila, no more spam from twables.

Once you’re done, go to your Twitter connection settings & revoke access.

4. PollPidgeon.com

Go here. Click the big button “Don’t send me any direct messages”, allow access when Twitter asks, then go to your Twitter connection settings & revoke access.

5. blip.fm

For some reason blip hasn’t set up any automated process to do this (can’t think why not). However, they promise that if you email support@blip.fm with the usernames you want removed, they’ll do it. If that doesn’t work, just hassle @blipfm on Twitter.

6. Mafiawars

Similar to blip.fm, you’ll need to send an email to tweetygame@gmail.com with your Twitter usernames.

7. Fauxlowers.com

Go here. It’ll take you straight to Twitter to verify your identity, then back to their site to finish the process. Go back to your Twitter connection settings & revoke access.

This site has been pretty broken recently – the above page errored, & now when you go there, it errors again when saving your request. The good news is, if it’s broken like that, it’s very likely not sending out spam auto DMs. If/when they do get it sorted, the above process will work.

8. PlaySpymaster.com

Go here. It’ll take you to Twitter to verify you account. You will need to have javascript AND cookies enabled for their site in order for it to work (otherwise it’ll go into a silly loop). Once done go back to your Twitter connection settings & revoke access.


Just follow these instructions for any you DMs you get – or all of them in advance if you want to be proactive. I’ll add to this list as I find more ways to cut back on some of the crap on Twitter.

Anything to help Twitter be more awesome.

The hack, the cleanup

Posted by Si on 12/11/09 in Administration

So, this site was hacked.

What happened
Previously, you could enter the url for anyone else’s report, & see it. These URLs weren’t public anywhere, & the backend was deliberately setup so you couldn’t just scan a directory & find them. I was aware of this, but figured the chance of randomly guessing another user’s name was low enough that it didn’t require immediate fixing. I was wrong.

What the hacker did was search for anyone who publicly mentioned The Twit Cleaner. That then gave him (because yes, I know exactly who did it) their usernames, & he could then see their report.

So, once he was viewing their report, he entered a rude message into the “Tweet my followers” box, & hit send.

That was the extent of it. Not very clever, mostly just annoying. At no point did he have access to any of our databases, your OAuth info, or any level of control over your account.

This affected around 1.6% of our customers, & the entire thing was over in 18 minutes (before I could figure out what was going on & shut down the right bits). I started by going to Twitter & killing our OAuth access, since I figured that was the most dangerous possibility – turned out it was much more trivial than that.


What damage was done
The worst affected were two close friends of the attacker (you know who you are). They got some very offensive messages posted on their accounts. Everyone else had a fairly innocuous message sent linking to another website (an innocent third party).


Why was this even possible
Ironically, the day the attack happened it was on my schedule to shut down that loophole altogether. Instead I spent a week cleaning up. There were two things I was going to do. One was to remove the option to manually enter a message on the report (because who wants to do that anyway?), two was to put security back in so you could only access your own reports.

Yep, that’s right, put the security back in. I’d had security in there a couple of weeks earlier addressing this very issue. However, in the first 6 hours it was in place, it managed to piss off 44 people, so I ripped it out again. I figured it was more important that people be able to easily see their reports, rather than just annoy people like crazy. I needed to think of a better way to implement the security, so I put it on the back burner while I worked on other things.

Obviously I made the wrong call.


What’s been done to stop this kind of thing happening again?
First, if you revoke access to The Twit Cleaner, you won’t be able to see your report. Yep, that’s kind of a pain for you – but it means that we know exactly who is looking at any given report. It means we can ensure that people only look at their own report. It also means that if any older accounts (eg, ones that used The Twit Cleaner ages back, but have revoked access) get their accounts hacked, they won’t have access to the system at all. You’d be amazed how many Twitter accounts are no longer active & thus easy targets for hackers. A LOT.

Second. We store information locally tracking that you are who you say you are. Just something to make it harder for people to try and get around the system, & no, nothing personal or incriminating, just a marker.

Third. You must be signed into the specific user of the report you want to look at.

Fourth. Without cookies & javascript you won’t be able to access the system at all.


Finally
The trick with all these kinds of things is to make it so it’s not a massive pain in the ass to use.

I think I’ve achieved that. A common issue is that Twitter (or part of it) goes down, or is inaccessible. As much as possible, I’ve made it so you’ll still be able to securely see your report (or request one).

I’ve also tried to make as much of it invisible to you as possible. There’s a lot happening automatically in the background. If you clear cookies, or move to a new browser, you will need to re-authenticate with Twitter, but I’ve made this much simpler & cleaner than before. A new window pops up, you can watch it do its thing, then it goes away again. In most cases, you’ll click two buttons & be done. Very simple.

If (when?) Twitter dies, you’ll get a little message so you can just hit the button & try again.

Obviously this happening at all is. Ahh. Hmm. Significantly sub-optimal. I was hoping for a little more time before I needed to get super hardcore about security. Security is a multi-layered, complex thing. Big chunks had already been taken care of, but obviously not enough.

As I stated at the time, I’m extremely sorry to everyone that was hit by this hacker. All affected paying customers had their money refunded.

I also apologise for the time we’ve been offline – both to existing users wanting to see their reports, & new potential users.

Welcome to the new site!

Posted by Si on 04/11/09 in Administration

Just a brief hello.

I’ve inserted a blog section, so as things upgrade & improve I can provide a little more more information than 140 characters allows – as much as I love Twitter.

It’s also a move to allow better feedback for & integration with you guys.

The comment system is run by Disqus, who are basically the best & largest comment system out there. You sign up with them once, register a picture etc, & can then comment without needing a sign-in on thousands of blogs across the web. A bit different if you’ve never seen it before, but thoroughly awesome.

I’ve also upgraded the hell out of the communication with Twitter. Twitter can be pretty unreliable, so I’ve done all I can to mitigate that – and make it a lot more visually obvious what’s going on.

Oh, and once you’ve requested a report, you can come back here any time & it’ll be accessible from the front page. It’ll tell you who you’re signed in as (if you have multiple accounts), & there is a direct link both to your report, or if you’d like to rerun the report, one click & it’ll be done.

Anyway, have a play. I look forward to getting your feedback.