Groundbreaking engine improvements are live! Multithreaded pathing and new renderer deployed

Major performance improvements delivered through the amazing work of Ivand and Tarnished Knight

November 6, 2021

Now we’re talking! 

Hi folks! As the alpha stage development progresses, we mostly cover little improvements which pile up and constitute a larger picture of apparent progress. But, this time, it is different for I come to you with a true game-changer.

The bigger - the better

Invoking the godfather of our project, the Cavedog’s Total Annihilation, one of its most defining features and what made the game really stand out in the RTS world of the 90’s was:

The utterly unparalleled scale of gameplay for its time

Even in the original unmodded version it allowed you to run 200 units and even issue commands to all of them at once. At the time  this was unthinkable and no other game engine could match this feat. Later community mods and tricks made it possible to have stable games with up to 1500 units.

As you might know we are set to provide a similar but even greater experience in BAR, which is why we are always thinking about ways in which we can optimize the performance to allow for bigger and bigger battles, because everyone loves big battles :) On that front we might just have hit a bullseye;

Unlocking an order of magnitude higher performance

Read the article to learn more. 

Unleash the FPS!!!

The unquestionable heroes of this post are Ivand and Tarnished Knight. They have spearheaded the BAR engine development, delivering two groundbreaking improvements this week. Firstly the new unit and feature drawing method utilizing GL4 capabilities and secondly a novel multithreaded pathing system.

Collectively these 2 new improvements raise the performance by the unbelievable factor of 3-10!

Considering we are talking about the 15-year old Spring engine, it’s hard to believe we have an opportunity to still witness such big improvements and yet, this is the reality. Check the video for the showcase: 

So how was it achieved?

Learn more about the tech behind the improvements below

Divide and conquer! 

The first of the milestones can be fully credited to the amazing work of Tarnished Knight. He took upon himself the daunting task of figuring out how to utilize multiple processor threads in calculating unit paths.

Until this day it was thought to be nearly impossible to execute in Spring Engine without causing a desync as it was originally written totally with a single-threaded approach in mind. Hence it required a major refactoring of the code, constituting a project few have ever attempted.

The advantages of the new multithreaded pathing system can be summarized as:

  • Full and Partial pathing requests are now multi-threaded 
  • Improved performance of pathing recalculations due to terrain deformation, (e.g. chain explosions, d-gun, and restoring terrain)
  • Fixed issues where units could suddenly re-path away from the original goal over longer paths
  • No noticeable impact on the way the game plays, aside from the performance you shouldn’t even realise changes were made
  • Averaging out the performance boost it yields 4-5x better results as compared to the old pathfinding system

Multi-threaded HA* pathing system - how is it done?

(warning: technical language! this section is meant for interested, tech-savvy readers) 

  1. The work is split up by pathing requests - so each thread handles a complete request
  2. All the requests given within a frame are queued, and dispatched all at once. This also helps a little to limit cache misses
  3. The cache is used slightly less because it cannot be added to during multi-threaded dispatching because that would cause a desync. On the plus-side, we get accurate results for more units, because the cache doesn't give a 100% correct answer
  4. Heatmap cannot be updated during multi-thread dispatch because that can cause a desync; however, it is immediately updated afterwards. This causes a subtle change in how units path, but in practice it doesn't have any noticeable effect and it still creates the awesome effect where streams of opposing units try to avoid each other
  5. Pathing updates due to terrain deformation are multi-threaded. However, there is now an update-rate limiter in place to prevent recalculating tiny changes over the same area over multiple frames

Transition to GL4 and the new unit drawer

When Ivand joined our team he was mainly interested in improving the graphical side of the game. He is the person behind the introduction of the physics-based rendering, map shader improvements, and countless other tweaks that have a giant influence on the awesome visuals we can enjoy in BAR. 

After a while, he realized that in order to really bring BAR to the next level in terms of graphics, he has to get his hands dirty with engine code. And he did, despite all the difficulties of understanding its codebase. 

While working with the engine it became apparent that there is still a potential for major improvement in terms of performance achievable by introducing:

Modern OpenGL4 rendering

It opens the gates to a plethora of possible improvements, from unit rendering to UI performance.  One of the things it makes possible is the new drawing method described below. From the initial benchmarks we did, this rendering method is up to 10x more performant. 

GL4 Unit/Feature accelerated rendering

The unit rendering in spring has always been rather slow. The main reason for that was the outdated yet universally-supported method.

The legacy renderer

With certain simplifications, the main rendering loop looked like this. For each unit/feature that fits the screen:

  • Load model data for a unit
  • Push world transformation (rotation, position, scale) data of a unit
  • Go piece by piece of the unit
  • Push piece transformation data
  • Issue a piece draw command (usually something like 50-100 triangles or so)
  • Pop piece transformation data
  • Pop world transformation data
  • Unload model data for a unit and so on… 

As can be seen the drawing part here is only a single command; the rest is preparation. The drawing batch size is also very small as the rendering is done piece by piece, unit by unit. This is very inefficient and it forces the GPU, a highly parallel device, to stall while waiting for more data to be fed in a serial manner.

Fortunately, the industry realized this quite long ago and new OpenGL capabilities, which we refer to as GL4, allow for much better batching up to the point when all units sharing the same textures can be drawn with a single call! Compare it with thousands of tiny calls before.

Time to get hands dirty with the engine code? But hold on… what about those with old GPUs or incomplete OpenGL drivers that cannot run GL4? We’d like to support them too. So, the software engineering task has suddenly become much more complex: tech details aside – there’s a need to support 4 different rendering methods! Internally the first three methods are classified as legacy renderers and are drawn pretty much the way they were drawn previously. The last GL4 renderer is considered modern and it has very(!) different architecture.

So how does a modern GL4 renderer operate? 

First of all the intermixed tasks of preparation and rendering as per legacy look above are distilled into distinct stages: data preparation and rendering.

The data preparation:

  • Starts during the game load, when all relevant models of units, features and projectiles are uploaded into a big GPU buffer (quickly accessible memory, ready to render)
  • This way there is no need to switch model data buffers anymore!
  • In the beginning of draw calls all units/features/pieces transformation data is calculated and placed into another big buffer on a GPU, so all the data is available for renderer and for the game side widgets to use
  • Units and map features are processed to determine their draw flags: which unit/feature to draw as a model, whether it needs to be transparent or opaque, perhaps some unit is far away it should be replaced with an icon, units outside of current screen space are marked as not drawn at all.

The rendering loop. Here some time is also spent to prepare a batch:

  • The units/features eligible for rendering are grouped by their model type, so a GPU technique called instancing can be leveraged.
  • Rendering data of distinct models is placed into a rendering commands array.
  • Only one render call is issued so all the prepared data is submitted in one go.

The GPU finally has a lot of work to do and its parallel cores don’t stall. The way artistic assets of BAR are done is that all units of a particular faction (Armada or Cortex) share the textures. So only two draw calls like the above are needed to draw the whole bunch of units on the battlefield (one for Armada, another for Cortex), no matter how many units or teams are in there!

The results are rather outstanding, the minimum speedup we usually see is around 2x the FPS increase in light/normal gameplay, up to 3-8x in heavy end-game gameplay. In artificial tests with many units we even see up to 10x FPS speedup.

The work is ongoing to enable the same technology for units/features assigned with custom materials so a lot of fun stuff is ahead.

Honorable mentions

Along with Ivand and Tarnished Knight let's not forget to express our great gratitude for Beherith, who assisted the engine development process on every stage and prepared all the necessary game-side changes to allow for utilization of the new engine features as well as thoroughly battle-tested the improvements and provided benchmarks. Many UI widgets are now using the new rendering pipeline Ivand provided with, and we can thank Beherith for this.

Also, big kudos to all the testers and other affiliated people from all around Spring ecosystem who helped, discussed, and shared their knowledge throughout realization of this project.

Growing stronger everyday 

BAR has reached several big milestones in the last week. Since the announcement of the open Alpha release, in July 2020, our community has grown tenfold reaching the whooping 5,000 Discord members. We also have a devoted and active player base of 500 unique players daily in the multiplayer lobby.

We feel honored to be followed by such an amazing audience

The positive feedback we receive is outstanding, and we would like to thank all of these people around the project, playtesting, giving feedback and making us really enjoy our work! Without you all of this wouldn't be possible.

Another great thing that stems from BAR getting this recent traction is the amount of new developers and contributors interested in pushing the project forward. We are happy to announce, and honor the newest collaborators.

Say hi to:

Badosu | all-round assist developer, looking to improve the UI and aid game and engine development, creator of the new contextual grid build menu 

kek | a proficient developer and linux specialist, most notably setting up and maintaining a flatpack installer for Linux users 

Tristan | 3D artist, helping to improve existing models and planning to produce cinematic rendered videos for BAR Trailers/cutscenes 

Teddy | Veteran player and a proficient developer, helping with new useful widgets development, consulting gameplay/balance, on a mission to produce the first standalone map editor for BAR 

When we were starting BAR development with the original 2-3 devs, no one would have even imagined the scale to which all of this has escalated.

With no engine or lobby developers we were struggling to deliver anything looking like a polished game. And just look at where we are now!

Nobody would have thought about the great things ahead of us. Looking back at how this project has grown is heartwarming. It is hard to believe how many amazing people are willing to sacrifice their free time to improve all aspects of BAR, from game and engine optimizations, through content creation up to infrastructure development and even compatibility.

It really feels like we are doing something exceptional and unseen in the world of open-source/indie game development. Big kudos to everyone involved in making BAR great!

Join us!

Did you find this post interesting? If you are interested in joining us, as a player, contributor or would just like to follow the development, be sure to join our Discord server

Play BAR alpha now! See downloads

Now we’re talking! 

Hi folks! As the alpha stage development progresses, we mostly cover little improvements which pile up and constitute a larger picture of apparent progress. But, this time, it is different for I come to you with a true game-changer.

The bigger - the better

Invoking the godfather of our project, the Cavedog’s Total Annihilation, one of its most defining features and what made the game really stand out in the RTS world of the 90’s was:

The utterly unparalleled scale of gameplay for its time

Even in the original unmodded version it allowed you to run 200 units and even issue commands to all of them at once. At the time  this was unthinkable and no other game engine could match this feat. Later community mods and tricks made it possible to have stable games with up to 1500 units.

As you might know we are set to provide a similar but even greater experience in BAR, which is why we are always thinking about ways in which we can optimize the performance to allow for bigger and bigger battles, because everyone loves big battles :) On that front we might just have hit a bullseye;

Unlocking an order of magnitude higher performance

Read the article to learn more. 

Unleash the FPS!!!

The unquestionable heroes of this post are Ivand and Tarnished Knight. They have spearheaded the BAR engine development, delivering two groundbreaking improvements this week. Firstly the new unit and feature drawing method utilizing GL4 capabilities and secondly a novel multithreaded pathing system.

Collectively these 2 new improvements raise the performance by the unbelievable factor of 3-10!

Considering we are talking about the 15-year old Spring engine, it’s hard to believe we have an opportunity to still witness such big improvements and yet, this is the reality. Check the video for the showcase: 

So how was it achieved?

Learn more about the tech behind the improvements below

Divide and conquer! 

The first of the milestones can be fully credited to the amazing work of Tarnished Knight. He took upon himself the daunting task of figuring out how to utilize multiple processor threads in calculating unit paths.

Until this day it was thought to be nearly impossible to execute in Spring Engine without causing a desync as it was originally written totally with a single-threaded approach in mind. Hence it required a major refactoring of the code, constituting a project few have ever attempted.

The advantages of the new multithreaded pathing system can be summarized as:

  • Full and Partial pathing requests are now multi-threaded 
  • Improved performance of pathing recalculations due to terrain deformation, (e.g. chain explosions, d-gun, and restoring terrain)
  • Fixed issues where units could suddenly re-path away from the original goal over longer paths
  • No noticeable impact on the way the game plays, aside from the performance you shouldn’t even realise changes were made
  • Averaging out the performance boost it yields 4-5x better results as compared to the old pathfinding system

Multi-threaded HA* pathing system - how is it done?

(warning: technical language! this section is meant for interested, tech-savvy readers) 

  1. The work is split up by pathing requests - so each thread handles a complete request
  2. All the requests given within a frame are queued, and dispatched all at once. This also helps a little to limit cache misses
  3. The cache is used slightly less because it cannot be added to during multi-threaded dispatching because that would cause a desync. On the plus-side, we get accurate results for more units, because the cache doesn't give a 100% correct answer
  4. Heatmap cannot be updated during multi-thread dispatch because that can cause a desync; however, it is immediately updated afterwards. This causes a subtle change in how units path, but in practice it doesn't have any noticeable effect and it still creates the awesome effect where streams of opposing units try to avoid each other
  5. Pathing updates due to terrain deformation are multi-threaded. However, there is now an update-rate limiter in place to prevent recalculating tiny changes over the same area over multiple frames

Transition to GL4 and the new unit drawer

When Ivand joined our team he was mainly interested in improving the graphical side of the game. He is the person behind the introduction of the physics-based rendering, map shader improvements, and countless other tweaks that have a giant influence on the awesome visuals we can enjoy in BAR. 

After a while, he realized that in order to really bring BAR to the next level in terms of graphics, he has to get his hands dirty with engine code. And he did, despite all the difficulties of understanding its codebase. 

While working with the engine it became apparent that there is still a potential for major improvement in terms of performance achievable by introducing:

Modern OpenGL4 rendering

It opens the gates to a plethora of possible improvements, from unit rendering to UI performance.  One of the things it makes possible is the new drawing method described below. From the initial benchmarks we did, this rendering method is up to 10x more performant. 

GL4 Unit/Feature accelerated rendering

The unit rendering in spring has always been rather slow. The main reason for that was the outdated yet universally-supported method.

The legacy renderer

With certain simplifications, the main rendering loop looked like this. For each unit/feature that fits the screen:

  • Load model data for a unit
  • Push world transformation (rotation, position, scale) data of a unit
  • Go piece by piece of the unit
  • Push piece transformation data
  • Issue a piece draw command (usually something like 50-100 triangles or so)
  • Pop piece transformation data
  • Pop world transformation data
  • Unload model data for a unit and so on… 

As can be seen the drawing part here is only a single command; the rest is preparation. The drawing batch size is also very small as the rendering is done piece by piece, unit by unit. This is very inefficient and it forces the GPU, a highly parallel device, to stall while waiting for more data to be fed in a serial manner.

Fortunately, the industry realized this quite long ago and new OpenGL capabilities, which we refer to as GL4, allow for much better batching up to the point when all units sharing the same textures can be drawn with a single call! Compare it with thousands of tiny calls before.

Time to get hands dirty with the engine code? But hold on… what about those with old GPUs or incomplete OpenGL drivers that cannot run GL4? We’d like to support them too. So, the software engineering task has suddenly become much more complex: tech details aside – there’s a need to support 4 different rendering methods! Internally the first three methods are classified as legacy renderers and are drawn pretty much the way they were drawn previously. The last GL4 renderer is considered modern and it has very(!) different architecture.

So how does a modern GL4 renderer operate? 

First of all the intermixed tasks of preparation and rendering as per legacy look above are distilled into distinct stages: data preparation and rendering.

The data preparation:

  • Starts during the game load, when all relevant models of units, features and projectiles are uploaded into a big GPU buffer (quickly accessible memory, ready to render)
  • This way there is no need to switch model data buffers anymore!
  • In the beginning of draw calls all units/features/pieces transformation data is calculated and placed into another big buffer on a GPU, so all the data is available for renderer and for the game side widgets to use
  • Units and map features are processed to determine their draw flags: which unit/feature to draw as a model, whether it needs to be transparent or opaque, perhaps some unit is far away it should be replaced with an icon, units outside of current screen space are marked as not drawn at all.

The rendering loop. Here some time is also spent to prepare a batch:

  • The units/features eligible for rendering are grouped by their model type, so a GPU technique called instancing can be leveraged.
  • Rendering data of distinct models is placed into a rendering commands array.
  • Only one render call is issued so all the prepared data is submitted in one go.

The GPU finally has a lot of work to do and its parallel cores don’t stall. The way artistic assets of BAR are done is that all units of a particular faction (Armada or Cortex) share the textures. So only two draw calls like the above are needed to draw the whole bunch of units on the battlefield (one for Armada, another for Cortex), no matter how many units or teams are in there!

The results are rather outstanding, the minimum speedup we usually see is around 2x the FPS increase in light/normal gameplay, up to 3-8x in heavy end-game gameplay. In artificial tests with many units we even see up to 10x FPS speedup.

The work is ongoing to enable the same technology for units/features assigned with custom materials so a lot of fun stuff is ahead.

Honorable mentions

Along with Ivand and Tarnished Knight let's not forget to express our great gratitude for Beherith, who assisted the engine development process on every stage and prepared all the necessary game-side changes to allow for utilization of the new engine features as well as thoroughly battle-tested the improvements and provided benchmarks. Many UI widgets are now using the new rendering pipeline Ivand provided with, and we can thank Beherith for this.

Also, big kudos to all the testers and other affiliated people from all around Spring ecosystem who helped, discussed, and shared their knowledge throughout realization of this project.

Growing stronger everyday 

BAR has reached several big milestones in the last week. Since the announcement of the open Alpha release, in July 2020, our community has grown tenfold reaching the whooping 5,000 Discord members. We also have a devoted and active player base of 500 unique players daily in the multiplayer lobby.

We feel honored to be followed by such an amazing audience

The positive feedback we receive is outstanding, and we would like to thank all of these people around the project, playtesting, giving feedback and making us really enjoy our work! Without you all of this wouldn't be possible.

Another great thing that stems from BAR getting this recent traction is the amount of new developers and contributors interested in pushing the project forward. We are happy to announce, and honor the newest collaborators.

Say hi to:

Badosu | all-round assist developer, looking to improve the UI and aid game and engine development, creator of the new contextual grid build menu 

kek | a proficient developer and linux specialist, most notably setting up and maintaining a flatpack installer for Linux users 

Tristan | 3D artist, helping to improve existing models and planning to produce cinematic rendered videos for BAR Trailers/cutscenes 

Teddy | Veteran player and a proficient developer, helping with new useful widgets development, consulting gameplay/balance, on a mission to produce the first standalone map editor for BAR 

When we were starting BAR development with the original 2-3 devs, no one would have even imagined the scale to which all of this has escalated.

With no engine or lobby developers we were struggling to deliver anything looking like a polished game. And just look at where we are now!

Nobody would have thought about the great things ahead of us. Looking back at how this project has grown is heartwarming. It is hard to believe how many amazing people are willing to sacrifice their free time to improve all aspects of BAR, from game and engine optimizations, through content creation up to infrastructure development and even compatibility.

It really feels like we are doing something exceptional and unseen in the world of open-source/indie game development. Big kudos to everyone involved in making BAR great!

Join us!

Did you find this post interesting? If you are interested in joining us, as a player, contributor or would just like to follow the development, be sure to join our Discord server

Play BAR alpha now! See downloads

Some images

Mentionable team members

Intel received from

More news

Scavenger feed