Oder warum es schlecht ist, Kernabhängigkeit im letzten Moment zu aktualisieren
TL;DR: Unser Windows Serivceprovider entfernte eine für uns nötige eine Softwareversion kurzfristig vor der Veröffentlichung von Mudlet 4.0, was eine Menge Stress auslöste. Es zerstörte auch unsere automatischen Aktualisierungen, also wenn du Mudlet 4.0.0 oder 4.0.2 benutzt, aktualisiere bitte selbst über mudlet.org/download.
I’m keneanung and one of my main tasks with the Mudlet project is to make sure that compilation and deployment of test and release versions of the program go smooth. This is why I want to share this post-mortem of the 4.0 release of Mudlet with you.
Der Auftakt
The story starts 2019-08-02 about 8:10 CEST when Vadi notified me about an unusually high amount of failing Windows builds. Those builds timed out when trying to install Qt 5.12.2 one of our core dependencies. This is strange since the library should have been installed on the machines supplied by our service provider Appveyor. A look into the build environment documentation revealed that the machines got software update and unsupported version of Qt got removed to conserve space. This was 2 days before the release.
To get our process back onto its feet, I switched it to use the now provided version 5.13 of the framework. We deemed this pretty safe since we were using it on our macOS machines already. But the generated showed the same issues we sometimes have with the debug window on Windows: All text was spaced out. So we went back a step and decided that doing a minor upgrade to Qt 5.12.4, the latest version of the current long term service branch, would be good enough. „Long term service“ (LTS) usually means that software gets security updates and bug fixes but no new features or other breaking changes. Even though the release notes of Qt 5.12.4 mentioned something about an upgrade to OpenSSL, a library providing our secure network access implementation, I didn’t pay much attention to it since hey, this is an LTS release, so they won’t break compatibility. And since this version started fine and didn’t show any glaring issues, we moved on preparing the release.
Ausfall sicherer Verbindungen
Fast-forward to after the release of Mudlet 4.0.0, which was published 2019-08-04 10-35 CEST. Shortly after the first upgrades are made, we get notified that calls to downloadFile() fail, as well as secure connections to MUDs. Especially the error message „Unable to initialize secure context“ rang a bell and I re-read the announce post of Qt 5.12.4. With the experience of a broken release, the announcement of the OpenSSL upgrade became far more mandatory than I initially thought. Having found the underlying issue of the problem, we immediately updated the installation instructions on our automated system and released Mudlet 4.0.1. Sadly the missing secure connection also meant that we effectively disabled our auto-update mechanism. But we hoped including this info in the release announcement would create enough awareness to spread the word. The release of Mudlet 4.0.1 was 2019-08-04 18:28 CEST.
Absturz des Editors
After the release, we didn’t get our well-deserved rest. Users notified us that under certain circumstances Mudlet would lock up and then force-stop on Windows. Not a good sign for sure. Since our core team is based all over Europe, we couldn’t start looking into this issue until the morning of 2019-08-05.
Analyzing the bug with ways to replicate it from our issue tracker, Vadi was able to narrow the crash down to a function in our editor that was unchanged for quite a few releases. So what happened? When testing with local builds of Qt 5.12.2, Vadi noticed that the crash could not be reproduced, which means the version upgrade of Qt changed enough, even on an LTS branch, to make our program crash. But since we couldn’t go back to version 5.12.2 of Qt (remember it was removed by Appveyor just a few days before release), Vadi tested the version of the framework between the crash and the safe option: 5.12.3, which is also installed on the Appveyor build machines.
Luckily, the version proved to be stable and another bugfix release was created: Mudlet 4.0.2.
Ausfall sicherer Verbindungen, Teil II
The downgrade of Qt had one unintended and uncaught side-effect: The framework was not yet able to cooperate with OpenSSL 1.1.1. In the heat of the moment, we forgot that we had to undo the change introduced with Mudlet 4.0.1 to be able to use secure connections. And while this change was relatively easy and painless, our release process still requires a lot of manual steps before a new version can go out. Additionally, the auto-updater is now broken again; so if you are on 4.0.0 or 4.0.2, you will need to download and install the update from mudlet.org/downloads by hand (all existing profiles will stay). But we hope to finally have reached a stable codebase and be able to work on Mudlet 4.1 to deliver the next batch of awesome changes.
Welche Lektionen gibt es?
- Our greatest lesson is easy to summarize: Never trust minor upgrades. Ever. Not even on a long term service branch. This case was a bit unfortunate because we had no choice but change the version on short notice.
- Modification of the release process: We are currently discussing how to handle infrastructure changes during our freeze period of one week before the release. This may lead to postponements of release date if we have to do such last-minute changes again.
- Research shows that Appveyor does provide previous images for a time after an upgrade. Simply use „Previous <image name>“ as the image. We should use that the next time.
- Short-term: Keep ahead of service provider upgrade announcements. These can be easily subscribed at Travis, our Linux and macOS service provider. However, Appveyor does not make it easy to keep up to date here. I need to think of implementing that.
- Medium-term: We would like to create a set of tests that notifies us of similar issues with secure connectivities in the future so we don’t lose our auto-update functionality ever again. But we are unsure how to implement these tests at the moment.
- Long-term: Don’t depend on software a service provider installs for us. For this, I’m working on Windows Docker images that include all our dependencies, while demonnic is doing the same for Linux. We hope this step will make us less dependent on the underlying infrastructure and reduce the time needed to build our software.
We hope that we won’t need to write another post-mortem any time soon, but think this is a great way to share our experience and give you some insights into the works behind the scenes. Enjoy Mudlet and MUD on!