Lobby servers, Infrastructure
January 5, 2026

Update on the recent servers instability

Recent server outages are caused by several software issues, not a single failure, and are currently under investigation. We are improving diagnostics and fixing issues as they are identified, but cannot yet guarantee full stability. Read more for details.

author
PtaQ
Last updated:
January 5, 2026

Update on the recent servers instability

We would like to discuss the recent increase in server outages. We would like to clarify what we know so far.

First of all, there is not a single definitive root cause and we unfortunately can’t give you guarantees about the stability of the system. We know it’s a software issue.

Over the past one to two months we have identified two, possibly three, distinct issues that are currently being investigated. One of them involves the server CPU reaching its limits at seemingly random times, though it happens more often during peak activity. More recently, we have also observed a new type of issue with a different technical signature.

We are investigating all of these. So far, much of the work has focused on adding proper telemetry and diagnostic tooling to our infrastructure. Until recently, the server's stability was satisfactory, which meant we had little immediate need for deep diagnostics and therefore limited visibility when these problems began to appear.

At this point we have not yet identified a single root cause responsible for the crashes. It seems to get triggered by high player counts which causes a series of different failures to spread across the system without a single root cause standing out. With better observability we are identifying new bugs and resolving them one by one.

Fingers crossed, and thank you for your patience. :fingers_crossed:

Some Frequently Asked Questions:

1) If I get into a game and the server crashes, will my game be interrupted?

No. Game servers and lobby servers are separate. Once a game has started, it will not be interrupted by a lobby server outage.

2) Is Beyond All Reason being DDoS’d?

Based on our analysis so far, this seems unlikely. That said, we cannot fully rule it out yet.

3) Are the servers running in PtaQ’s bedroom?

No. In fact, they are not. We use standard, professional hosting services.

4) How can I help?

We could directly benefit from day-one experience with running Erlang/Elixir applications in production, debugging them, and knowing what to integrate for better visibility. If you have other related skills, head to https://beyond-all-reason.github.io/infrastructure/contributing/.

5) Does that mean BAR can’t grow anymore?

No. It is worth noting that all of the above refers to our currently used, legacy infrastructure. In the longer term, the biggest improvement will come from shipping the new client together with Tachyon, which simplifies the overall system architecture. This is not a short-term fix, but it represents the largest long-term payoff in terms of stability and maintainability. Balancing how much effort we want to spend on the parts of the code base we want to deprecate versus investing into the new architecture is challenging.

6) You didn’t actually tell what the fixed issues were!

Okay, okay, you want some technical details of what we fixed so far I guess:

  • High lock contention in metrics library we integrated causing many operations to time out and system fail over at high load
  • We allowed a few too many commands for not logged in users which triggered unexpected code paths, including unnecessarily fetching and parsing of hundreds of MiBs of JSON from database

More Images

More microblogs

Made in Webflow