Every game creator knows that smooth performance is essential to creating immersive gaming experiences – and to achieve that, you need to profile your game. Not only do you need to know what tools to use, and how, but when to use them.
Our hot-off-the-press, 70+ page guide to advanced profiling was created together with both internal and external experts. It compiles advice and knowledge on how to profile an application in Unity and identify performance bottlenecks, among other best practices.
Let’s look at some helpful tips from the e-book.
When to profile
Profiling is like detective work, unraveling the mysteries of why performance in your application is lagging, or why code is allocating excess memory.
Profiling tools ultimately help you understand what’s going on “under the hood” of your Unity project. But don’t wait for significant performance problems to start showing before digging into your detective toolbox.
The best gains from profiling are made when you plan early on in your project’s development lifecycle, rather than just before you are about to ship your game. It’s an ongoing, proactive, and iterative process. By profiling early and often, you and your team can understand and establish a “performance signature” for the project. If performance takes a nosedive, for instance, you’ll be able to easily spot when things go wrong, and quickly remedy the issue.
You can also make before-and-after performance comparisons in smaller chunks by using a simple three-point procedure: First, establish a baseline by profiling before you make major changes. Next, profile during the development to track performance and budgeting, and finally, profile after the changes have been implemented to verify whether they had the desired effect.
You should aim to profile a development build of your game, rather than profiling it from within the Unity Editor. There are two reasons for this:
The tools at your disposal
The most accurate profiling results occur by running and profiling builds on target devices and using platform-specific tooling to dig into the hardware characteristics of each targeted platform.
While Unity ships with a range of free and powerful profiling tools for analyzing and optimizing your code, both in-Editor and on hardware, there are also several great native profiling tools designed for each platform, such as those available from Arm, Apple, Sony, and Microsoft. Using a combination provides a more holistic view of application performance across all target devices.
For a full overview of the tools available, check out the profiling tools page here.
Unity’s profiling tools are available in the Editor and Package Manager. Each tool specializes in profiling various parts of the process (a holistic “sum of all parts” workflow). Familiarize yourself with the following profilers so they become a part of your day-to-day toolbox:
How to use the tools
Steve McGreal, a senior Unity engineer and the co-author of our advanced profiling e-book, put together the following high-level overview. Please feel free to use it as a reference sheet.
While the detailed explanation on how to use the tools can be found in the e-book, this flowchart illustrates three main observations to consider for your workflow:
Download the printable PDF version of this chart here. For more, see the linked resources on how to use each of the profiling tools at the end of this post.
Are you within frame budget?
A common way that gamers measure performance is through the frame rate, or frames per second. However, it’s recommended that you use frame time in milliseconds instead.
For example, you might have a game that renders 59 frames in 0.75 seconds at runtime, with the next frame taking 0.25 seconds to render. The average delivered frame rate of 60 fps sounds good, but in reality, players will notice a stutter effect since the last frame takes a quarter of a second to render.
Strive for a specific time budget per frame when profiling and optimizing your game, as this is crucial for creating a smooth and consistent player experience. Each frame will have a time budget based on your target fps. An application targeting 30 fps should always take less than 33.33 ms per frame (1000 ms / 30 fps). Similarly, a target of 60 fps leaves 16.66 ms per frame.
Most modern console and PC games aim to achieve a frame rate of 60 fps or more. In VR games, a regularly high frame rate is actually more important to avoid as it can cause nausea or discomfort to players. Mobile games might also require restrictive frame budgets to avoid overheating the devices they run on. For instance, a mobile game might target 30 fps with a frame budget of only 21–22 ms so that the CPU and GPU cool down between frames.
Use the Unity Profiler to see if you are within frame budget. Below is an image of a profiling capture from a Unity mobile game with ongoing profiling and optimization. The game targets 60 fps on high-spec mobile phones, and 30 fps on medium/low-spec phones, such as the one in this capture:
This is a game running comfortably within the ~22 ms frame budget required for 30 fps without overheating. Note the WaitForTargetfps padding the main thread time, up until VSync and the gray idle times in the render thread and worker thread. Additionally, observe the VBlank interval by looking at the end times of Gfx. Present frame over frame draws up a timescale in the Timeline area or on the Time ruler up top to measure from one of these to the next.
If you’re within the frame budget, including any adjustments made to the budget to account for battery usage and thermal throttling, then you’ve successfully finished performance profiling until next time – congratulations! Now look at memory usage to see if it’s within budget as well.
That being said, if your game is not within frame budget, the next step is to detect the bottleneck. In other words, find out whether the CPU or GPU is taking the longest. If it’s the CPU, determine which thread is the busiest – therein lies the bottleneck.
The point of profiling is to identify bottlenecks as targets for optimization. If you rely on guesswork, you can end up optimizing parts of the game that aren’t bottlenecks, resulting in little or no improvement. Some “optimizations” can even worsen your game’s overall performance.
Are you bound by the CPU main thread?
The main thread is where all of the game logic and scripts perform their work by default; where features and systems such as physics, animation, UI, and rendering take place.
See the screenshot below for an example of what a project that is main thread-bound looks like:
Although the render and worker threads look like the previous example that’s within frame budget, the main thread here is clearly busy with work during the entire frame. Even if you account for the small amount of Profiler overhead at the end of the frame, the main thread is busy for over 45 ms, meaning that this project achieves frame rates of less than 22 fps. There is no marker that shows the main thread idly waiting for VSync; it’s busy for the whole frame.
The next stage of investigation is to identify the parts of the frame that take the longest time, and pinpoint any underlying causes. Use both the Unity Profiler and Profile Analyzer to evaluate and address the biggest costs. Common bottlenecks often derive from physics, non-optimized scripts, Garbage Collector (GC), animation, cameras, and UI. If the source of the issue is not immediately obvious, try enabling Deep Profiling, Call Stacks, or using a native CPU profiler.
In our 95-page performance optimization guide, we collected a list of common pitfalls you can encounter and prepare for.
Are you bound by the CPU render thread?
During the rendering process, the main thread examines the scene and performs Camera culling, depth sorting, and draw call batching, to compile a list of things to render. This list is passed to the render thread, which translates it from Unity’s internal platform-agnostic representation to the graphics API calls required to instruct the GPU on a particular platform.
In the Profiler capture shown below, you can see that the main thread waits for the render thread before it begins to render on the current frame, as indicated by the Gfx.WaitForPresentOnGfxThread marker.
The render thread still submits draw call commands from the previous frame, but isn’t ready to accept new draw calls from the main thread. The render thread spends time in Camera.Render.
The Rendering Profiler module shares an overview of the number of draw call batches and SetPass calls for every frame. The best tool for investigating which draw call batches your render thread issues to the GPU is the Frame Debugger. Common causes of render thread bottlenecks include poor draw call batching, having multiple active cameras in the scene, and inefficient Camera culling.
Are you bound by CPU worker threads?
Being bound by CPU threads, besides the main or render threads, is not that common of an issue but it can arise in projects that use the Data-Oriented Technology Stack (DOTS) – especially if work is moved off the main thread into worker threads using the C# Job System.
Here’s a capture from Play mode in-Editor that highlights a DOTS project running a particle fluid simulation on the CPU:
As you can see, the worker threads are packed tightly with jobs. This suggests a large amount of work being moved off of the main thread. Note that the frame time of 48.14 ms and the gray WaitForJobGroupID marker of 35.57 ms on the main thread indicate that the worker threads are doing more work than can be realistically achieved within a single frame on this CPU.
WaitForJobGroupID shows that the main thread has scheduled jobs to run asynchronously on worker threads, but it needs the results of those jobs before the worker threads have finished running them. The blue Profiler markers beneath WaitForJobGroupID depict the main thread running jobs while it waits, in an attempt to make the jobs finish sooner.
The jobs in your project might not be as parallelized as in this example. Perhaps you just have one long job running in a single worker thread. This is fine, so long as the time between the job being scheduled and the time that it needs to be completed is long enough for the job to run. If it isn’t, you will see the main thread stall, waiting for the job to be complete, as in the above screenshot.
You can use the Flow Events feature in the Timeline view of the CPU Usage Profiler module to see when jobs are scheduled and when their results are expected by the main thread. For more information on writing efficient DOTS code, see our DOTS best practices.
Are you GPU-bound?
You might notice that your main thread spends time waiting for the render thread (as exhibited by Profiler markers such as Gfx.WaitForPresentOnGfxThread). But at the same time, your render thread might display markers such as Gfx.PresentFrame or <GraphicsAPIName>.WaitForLastPresent. This means that your application is GPU-bound. You will therefore need to focus your optimization efforts on GPU bottlenecks to improve overall performance.
The following capture was taken on a Samsung Galaxy S7 using the Vulkan graphics API. Although some of the time spent in Gfx.PresentFrame in this example might be related to waiting for VSync, the extreme length of this Profiler marker proves that the majority of time is spent waiting for the GPU to finish rendering the previous frame.
If your application appears to be GPU-bound, you can use the Frame Debugger to gain a quick understanding of the draw call batches being sent to the GPU. However, this tool can’t present any specific GPU timing information. It only reveals how the scene is constructed.
To carefully investigate the cause of GPU bottlenecks, examine a GPU capture from a suitable GPU Profiler. The tool that you use depends on the target hardware and chosen graphics API.
Common causes of poor GPU performance include inefficient shaders, expensive post-processing effects, transparent overdraw (often from particle effects or UI), large or uncompressed textures, meshes with excessively high polygon counts, and excessive output resolutions (i.e., rendering at 4K).
Get the free e-book
Performance optimization and profiling are massive topics. If you’re looking for more information, check out our recently released e-book, Ultimate guide to profiling Unity games. You’ll get more than 80 pages of tips and tricks created in partnership with multiple experts, including those on our Integrated Support services team.
In fact, some of these experts also helped put together our 100-page guide on performance optimization for mobile and PC/console – packed with actionable tips on how to avoid creating bottlenecks in the first place. For additional resources, take a look at our previous blog post series on physics, UI, and audio settings, graphics and assets on mobile or console, and memory and code architecture.
If you’re interested in learning how your team can gain direct access to engineers, expert advice, and project guidance, peruse Unity’s Success Plans here.
Upcoming profiling webinar
Tune in to our new Ultimate profiling tips webinar featuring experts from SYBO Games, Arm, and Unity for tips on how to identify common performance challenges in mobile games, using both Unity and native profiling tools.
This webinar will cover:
Join our roundtable and live Q&A on June 14, 2022 at 11:00 am ET / 8:00 am PT.
Didn’t find what you’re looking for?
We want to help you make the most of your Unity applications. If there’s any optimization topic that you’d like us to further explore, please let us know in the forums. We’d also like to hear about the formats that you prefer so we can improve our e-books and other learning materials.