Service outage
Incident Report for
The DNS cache threshold seems to have passed, and everything is back online.
Posted May 01, 2022 - 19:38 UTC
Everything is back online as per usual. Some users may experience DNS issues due to Cloudflare's NXDOMAIN time-to-live, this should resolve itself in around an hour (at 19:33 UTC, or if changing DNS servers to use a known public DNS with an explicit cache flushing policy).

Underlying cause seems to be a brief influx of 2-3x the usual load leading to a cluster-wide failure, and performance optimization done last week does not seem to mitigate the initial issue (but does help with recovery, as the oft-hated 'bucket system' was not needed anymore).

Our suspicion is that this load influx is a result of server owners doing 'scheduled restarts' of their servers at full hours (in this case, 18:00 UTC, a particularly nasty time), and this leads to a thundering herd-style scenario of both clients as well as servers reauthenticating with our services, failing, retrying (albeit often manually), and eventually overloading some unknown backend system.
Posted May 01, 2022 - 18:57 UTC
We're currently investigating issues with some of our services. Please remain patient.
Posted May 01, 2022 - 18:09 UTC
This incident affected: Game Services (CnL, Keymaster).