58 hours offline


Posted by Si Dawson on 03/09/12 in Administration

Twit Cleaner was just offline for 58 hours. This is the longest we've been down (by a wide margin) in the last three years. Given the budget we're operating on, that's something to be pleased about.

Of course, being offline for ANY time bugs me (and inconveniences you).

So what happened?

Obviously, everything we do depends on Twitter. If we can't talk to Twitter (for whatever reason) we're as good as dead.

What happened was, the network connection between us and Twitter got broken, somewhere 6 hops down the lineĀ (in the 10 or so servers between us & Twitter).

Now, when companies screw up, they generally don't like to explain why or what they did, so I may never know exactly what happened.

Two main possibilities:

1. The company that runs hop 6 and hop 7 screwed up their config somehow.

2. The servers at the edge of Twitter's network (at hop 8) told the hop 7 machine to dump us.

Now, why would Twitter say that? Most likely is misconfiguration (this stuff is CRAZY complicated. It's super easy to screw something up & not notice immediately).

Less likely is that someone close to us was causing mischief, attacking Twitter. When that happens, servers are often set up to automatically drop all traffic from that neighbourhood. It's like if someone on your block throws a tantrum, the whole street gets cordoned off. If we were offline for exactly 48 hours, that would be my best guess. However, since it was 58 hours (a weird number) this seems unlikely.

Least likely of all is that Twitter was deliberately trying to take us offline. Why? Because a) they didn't tell us (and they've always been very open, friendly and helpful to us), and b) our API access stayed rock solid the whole time, just not from our specific server. Eg, I could still run things perfectly from my home laptop.

So, the most likely thing that happened was scenario one above. Nothing to do with Twitter at all, just some accidental screw up, deep in the bowels of a huge corporation. The reason it took so long to sort was because it all happened over a weekend.

Annoying for me (and you) but understandable. I know I've certainly screwed up thousands of times in the past (fortunately mostly without you guys seeing too much before I was able to fix things).

All the above is speculation - as I said, we may never know exactly what happened. The good news is, we're back and everything is hunky dory again. Have fun cleaning!

It's been a heck of a long time since there's been much external perception of progress around here.

Today, that all changes. Well, almost.

See, when I first started Twit Cleaner, I wanted to get something useful out there as soon as possible. A Minimum Viable Product, it's called. That worked well enough, but when I started looking at moving to a one-click-one-unfollow model, I realised that the current infrastructure - the way I'd designed things - simply wasn't going to work. It was possible to do new things, but it would have been horrible, painful & slow.

I made some (with hindsight, of course) silly technical choices, & they came back to bite me on the ass.

So, for the past 7 months I've been going through redesigning the entire of Twit Cleaner, more or less from the ground up. The first of that giant chunk of work rolled out yesterday. Believe it or not, but the previous version had almost everything just shoved into a giant directory tree*. So yes, that's a folder with many, many million files in it. It worked ok for the one task it was designed for, but it seriously hampered the ease & speed with which I could develop any neat new tools. There were a bunch of other bad technical decisions, but that was the key one.

Now, everything is in a big shiny database. Which has its own issues, of course (everything does) but I'll iron those out over the next couple of days. Once that's over, & the database is fully loaded up (it's happening as we speak, and looks like it might take a few days to complete so expect the site to be a little shaky until that's finished, please be patient), it will smooth the way to quickly & easily roll out a bunch of new tools to help you manage & explore your Twitter life. Oh yes, I have many, many great ideas I've been working on.

On the outside, things may have been serene, unbroken, just meandering along like a duck floating on a pond.. but underneath I've been, just like a duck, paddling furiously seven days a week all hours of the day & night to get things working just the way they should be - and to get you guys the help you deserve.

Unfortunately, like a duck, there's not much to look at just yet. Oh, except reports will be much, much faster.

*If you're really curious, the very first version of Twit Cleaner used to run on my desktop machine at home in Melbourne, then copy things furiously back & forth to the web server, which at that time was in London. Now THAT was nutty.