Search Unity

Raising your game with Burst 1.7

March 14, 2022 in Technology | 9 min. read
Hardware Header
Hardware Header
Topics covered
Share
The latest version of the Burst package comes with some great improvements to both iteration time and the Burst Inspector. In this post, we’ll look at what’s changed and how our High Performance C# (HPC#) compiler technology can now help you improve performance on all platforms with even greater ease.

While our DOTS technology stack leverages Burst to provide highly optimized code, Burst is a standalone package, available in the Package Manager for Unity 2019.4 or newer. Thousands of your projects on all the major desktop, console, and mobile platforms are already taking advantage of Burst.

 

Iterating on iteration time

In previous Burst releases, we made significant strides in improving the day-to-day experience of working with Burst. In Burst 1.7, we have continued that trend by focusing on improving iteration time. What do we mean by iteration time? We mean the “inner loop” of development – you make a change to a C# script, switch back to the Editor, wait for script compilation to finish, wait for Burst compilation to finish, and then enter Play mode to test your change.

In Burst 1.7, we have dramatically reduced the amount of time you’ll find yourself waiting for Burst, for the common scenario of making a few changes to your game code and testing it in Play mode. Burst compilation now occurs earlier in the pipeline (immediately after the script compilation pipeline has finished compiling .NET assemblies), so that in many cases it is finished by the time the resulting code needs to be run. Instead of compiling each Burst entry point separately, as happened in previous versions of Burst, Burst entry points (e.g., a job or function pointer) are now batched together to improve compiler throughput and reduce the number of libraries that the Editor needs to load.

Burst 1.7 also includes a major improvement to Direct Call performance. Direct Call is a feature that we added in Burst 1.5 that permits managed C# code to directly call a Burst-compiled method, without going through BurstCompiler.CompileFunctionPointer. During a domain reload, there is some initialization work that needs to be done to wire up direct call methods, and in Burst 1.7, we have made this initialization up to 33 times faster.

As a last note on the topic of iteration time, we looked at the cost of SharedStatic initialization. SharedStatic is a mechanism that allows the sharing of data between managed C# and HPC#. In Burst 1.7, we have made the SharedStatic initialization up to 13 times faster. 

The following graphs show the performance improvements in Burst 1.7, compared to Burst 1.6. The measurements were taken in a large customer project. The first graph below shows timings taken with a stopwatch (an actual stopwatch, not System.Diagnostics.Stopwatch) observing the Editor, so they should reflect the sort of improvements you can expect to see in day-to-day usage.

Burst 1.7 performance improvements

The second graph below just focuses on Burst, so it excludes anything else that may be occurring in the Editor. For this particular project and modified file, Burst 1.7 is faster than Burst 1.6 in all three timings:

  • Cold cache - Burst has not yet cached any compilation results for the code in your project.
  • Warm cache - Burst has already compiled the code in your project and needs to load the cached compilation results from disk.
  • Change one file - After one file is changed, Burst checks to see which entry points need to be recompiled and compiles those. Note that the amount of improvement in Burst 1.7 is, in general, dependent on which file is changed. For example, if you change a method that is used by all Burst entry points, then the difference between Burst 1.6 and Burst 1.7 will be smaller. In this example, the entry point method itself was changed.
Burst 1.7 performance improvements 2

Burst Inspector

Burst Inspector (accessible via Jobs > Burst > Open Inspector…) is an incredibly useful tool for optimization work. With this tool, you can view the assembly code that will be executed on your target CPU(s). In Burst 1.7, we have added several much-requested features. A screenshot says a thousand words, so without further ado:

Burst Inspector branch markers
Screenshot of Burst Inspector showing branch markers.

As you can see, we have added branch markers to make it easier to visualize code execution paths. Note that branch markers can be switched off with the “Show Branch Flow” checkbox so they don’t get in the way when you don’t need them. A particularly nice aspect of this feature is that you can click on a branch flow arrow, and you’ll jump to the other end of the arrow, like this:

Example of clicking on branch marker to jump to branch destination

Less important blocks of disassembly (e.g., directives or constant data) are now automatically collapsed, but these can still be toggled when you want to view them.

Also new in Burst 1.7 is the ability to select just a section of disassembly and copy it.

Example of selecting and copying a specific section of disassembly

 

Miscellaneous improvements

Here’s a list of smaller but no-less-important improvements in Burst 1.7.

  • Arm Neon vst1* APIs are now fully supported. We added these APIs in Burst 1.6, but guarded them behind an experimental #define. In Burst 1.7, they are no longer guarded behind that #define and are fully supported.
  • System.Span<T> and System.ReadOnlySpan<T> are now supported within Bursted code. These types are not allowed as entry point arguments.
  • Burst now uses LLVM Version 12.0.0 by default, bringing the latest optimization improvements from the LLVM project.
  • We changed the LLVM optimization pipeline to run the loop unroller exclusively after the loop vectorizer. This improves codegen in a lot of cases.
  • We made fmod and floating-point modulus use a faster algorithm to improve performance.
  • Burst now generates a link.xml automatically to avoid IL stripping, causing missing symbols at runtime from static constructor usage.
  • We improved compiler performance when doing large struct copies by detecting more cases where a load/store can be safely converted to a move-memory operation.
  • We made a change to how we display timings when the “Show Timings” option is enabled in the Burst menu. By cleaning up and presenting the information in a clearer way.

What’s next for Burst

Note that Burst 1.7 is the last version to support Unity 2019.4. The next Burst version will have a minimum requirement of Unity 2020.3. If you have any thoughts, questions, or would just like to let us know what you are doing with Burst, then please feel free to leave us a message on the Burst forum.

March 14, 2022 in Technology | 9 min. read
Topics covered