RoutineHub Outages Update 09092022-145500

Updated 09092022-155055


Hello, RoutineHub community!

TLDR;

Why are there all these new problems arising ever since the site migrated? The problems have always been there. They've just been masked. The team can prove this because we have not yet been able to publish an update to the new site since the transfer. Our blog writer cannot add a new analytics script to the page's footer!

Why can the new team not update the site since the new transfer? We're blocked. When migrated to the new host, something was misconfigured; or a new bug made its way. Either way, the only path to green is to figure out the problem. Harley's team was responsible for the migration, so they're the only ones who can help us. Although he has been gracious in offering support, his time is limited.

Why is communication so opaque? I take full responsibility for that. It's not part of my nature to announce that "we're blocked ."So I've been patiently waiting for the engineering team to provide a mitigation plan. Right now, I'm speaking with Harley about some way for us to work with him to help dig into this system and find a path to green.

LONG-FORM As promised, I want to update you on the status of RoutineHub. The transition from Harley Hicks has been difficult on every level. Here is the list of problems:

  1. There was no easy path to transfer ownership from one hosting provider to another. Harley's team had to rebuild a new server, migrate data dump, and transition microservices. This was a rocky transfer, and it revealed a lot of internal issues. Since servers cannot transfer from one account to another, much institutional knowledge from command history was lost.

  2. The transfer triggered endless new problems with continuous integration and deployment.

  3. Transfer created new problems for site checks and validation.

  4. The new team is trying to keep the system going while having limited documentation of the site and limited time with the previous team's technical support.

  5. Increased traffic and popularity have accelerated all the problems at once.

WHERE ARE WE TODAY?

  1. We've transferred the site and database from one account to our new account.

  2. We now have microservices working.

  3. Our main mechanism to do this still does not work. This is our biggest problem, and we are working on resolving this as our main priority.

  4. Every time the team makes one change, a new problem arises. This means that there are a series of underlying problems. The only way to identify them is to keep making incremental changes, identifying them, and then fixing them. I promise you, this is the worst possible way to inherit a site, but it is what it is. For example, the "log-in, log-out" issue has a lot to do with our current implementation of Django. Again, how do I know that the team has done a thing wrong to trigger this bug? Because we have not published a single update to the site since its transfer.

  5. The new team has had to write documentation and playbooks from the ground up. Anyone who has ever written documentation will know that it's painful, long, and expensive. It hurts to do this, but this is the only way the team has been able to transfer knowledge successfully.

  6. The site's increased traffic is a good thing. It's making us better and stronger, but it feels imperfect when things crash.

For those of you who ever used Twitter during its earliest days, you will know that Twitter always failed. It was painful and aggravating, but simultaneously, the team needed to strip out their implementation of Ruby on Rails for more robust technologies such as Scala. We're not throwing away our current implementation, but we must make big changes to accommodate the next level of RoutineHub. I recognize that many of you are frustrated by this transition, and in all honesty, a big part of the frustration comes from communication.

COMMUNICATION I take full responsibility for the lack of communication from the site. It's not that I've been trying to remain opaque. Rather, I've been trying to understand the situation. As a leader, I'm not particularly eager to give half-baked information, especially to technical users like you. My default is to first understand what's happening. In this particular situation, the problems have appeared as a perfect storm. Either way, I recognize that I'm taking too long to communicate.

For this reason, I will send more frequent messages even if I have nothing to say except "hello." I aim to send you a message on a bi-weekly cadence until the end of Q4. Once we are in December, I will check in and decide if I should reduce the messages to one monthly.

WHAT'S NEXT? For those of you who have asked, "what's next?". I will send that response in my next message. If you're impatient, my immediate response is, "The team needs to first focus on the URGENT and IMPORTANT before we can think about anything else." Stability and Reliability are URGENT and IMPORTANT.


09092022-154300

as someone who's struggled working on the docs/dev end with the opposite issue - giving in to the compulsion to blab before having absolutely every bit of info I need - I very much appreciate the sentiment behind this update. I'm no whiz or anything but the site's behavior has always been a bit sick - like... since the beginning - and I think I internalized a long time ago that it 1) might just be a fact of life indefinitely! there are much worse predicaments and 2) the suspicion that whatever you guys were doing... it probably wasn't just goofing off (negligence) and almost certainly was No Fun At All lol.

honestly though, even if I believed you were totally incompetent (which would be very presumptuous and hypocritical of me,) it really is the fact that you're sticking out this idea - this wee little idea - of creating a collaborative library around Siri Shortcuts (which absolutely nobody else has, for years now??) really does mean a lot!

||On that note, literally none of them actually seem to have bothered to declare their sunsetting, even... So actual details on site migration are a luxury as far as I'm concerned lol.||