I've been wondering for some time now how to deal with the "Hey! My friend is on the report!" issue.

The Twit Cleaner looks at a limited set of data - it's simply not practical to go back through every tweet someone has ever written, for example. As it is, some reports require us to download & analyse up to 3 Gigs of data. It can get pretty crazy.

What I have done instead is the following:

1. Look at who @mentions you. Obviously spammers do this all the time, so it's not foolproof, however it will now take someone completely off the "never interacts" part of the report. If they are spammers, they'll likely show up elsewhere.

2. Look at who you @reply or RT to. If you're RTing or @ing someone, then obviously they're significant to you - therefore, that person is now removed from the report completely.

Unfortunately, this isn't quite as awesome as I'd like - mainly due to Twitter flakiness, but it's a definite improvement. What flakiness? Here's a typical conversation I recorded earlier:

Me: I'd like the last few hundred tweets this person made please!

Twitter: The last 57 you say? Sure thing! Here they are!

I have a plan in place to work around this, but it involves rewriting the entire back-end in a different language & moving it all to a different operating system altogether. This might take a little time. Heh.