Search Unity

Fixing Time.deltaTime in Unity 2020.2 for smoother gameplay: What did it take?

October 1, 2020 in Technology | 18 min. read
Blog header image
Blog header image
Topics covered
Share

Unity 2020.2 beta introduces a fix to an issue that afflicts many development platforms: inconsistent Time.deltaTime values, which lead to jerky, stuttering movements. Read this blog post to understand what was going on and how the upcoming version of Unity helps you create slightly smoother gameplay.

Since the dawn of gaming, achieving framerate-independent movement in video games meant taking frame delta time into account:

void Update()
{
transform.position += m_Velocity * Time.deltaTime;
}

This achieves the desired effect of an object moving at constant average velocity, regardless of the frame rate the game is running at. It should, in theory, also move the object at a steady pace if your frame rate is rock solid. In practice, the picture is quite different. If you looked at actual reported Time.deltaTime values, you might have seen this:

6.854 ms
7.423 ms
6.691 ms
6.707 ms
7.045 ms
7.346 ms
6.513 ms

This is an issue that affects many game engines, including Unity – and we’re thankful to our users for bringing it to our attention. Happily, Unity 2020.2 beta begins to address it.

So why does this happen? Why, when the frame rate is locked to constant 144 fps, is Time.deltaTime not equal to 1144 seconds (~6.94 ms) every time? In this blog post, I’ll take you on the journey of investigating and ultimately fixing this phenomenon.

What is delta time and why is it important?

In layman’s terms, delta time is the amount of time your last frame took to complete. It sounds simple, but it’s not as intuitive as you might think. In most game development books you’ll find this canonical definition of a game loop:

while (true)
{
ProcessInput();
Update();
Render();
}

With a game loop like this, it’s easy to calculate delta time:

var time = GetTime();
while (true)
{
var lastTime = time;
time = GetTime();
var deltaTime = time - lastTime;
ProcessInput();
Update(deltaTime);
Render(deltaTime);
}

While this model is simple and easy to understand, it’s highly inadequate for modern game engines. To achieve high performance, engines nowadays use a technique called “pipelining,” which allows an engine to work on more than one frame at any given time.

Compare this:

To this:

In both of these cases, individual parts of the game loop take the same amount of time, but the second case executes them in parallel, which allows it to push out more than twice as many frames in the same amount of time. Pipelining the engine changes the frame time from being equal to the sum of all pipeline stages to being equal to the longest one.

However, even that is a simplification of what actually happens every frame in the engine:

  • Each pipeline stage takes a different amount of time every frame. Perhaps this frame has more objects on the screen than the last, which would make rendering take longer. Or perhaps the player rolled their face on the keyboard, which made input processing take longer.
  • Since different pipeline stages take different amounts of time, we need to artificially halt the faster ones so they don’t get ahead too much. Most commonly, this is implemented by waiting until some previous frame is flipped to the front buffer (also known as the screen buffer). If VSync is enabled, this additionally synchronizes to the start of the display’s VBLANK period. I’ll touch more on this later.

With that knowledge in mind, let’s take a look at a typical frame timeline in Unity 2020.1. Since platform selection and various settings significantly affect it, this article will assume a Windows Standalone player with multithreaded rendering enabled, graphics jobs disabled, vsync enabled and QualitySettings.maxQueuedFrames set to 2 running on a 144 Hz monitor without dropping any frames. Click on the image to see it in full size:

Unity’s frame pipeline wasn’t implemented from scratch. Instead, it evolved over the last decade to become what it is today. If you go back to past versions of Unity, you will find that it changes every few releases.

You may immediately notice a couple things about it:

  • Once all the work is submitted to the GPU, Unity doesn’t wait for that frame to be flipped to the screen: instead, it waits for the previous one. This is controlled by the QualitySettings.maxQueuedFrames API. This setting describes how far the frame that is currently being displayed can be behind the frame that’s currently rendering. The minimum possible value is 1, since the best you can do is render framen+1 when framen is being displayed on the screen. Since it is set to 2 in this case (which is the default), Unity makes sure that framen gets displayed on the screen before it starts rendering framen+2 (for instance, before Unity starts rendering frame5, it waits for frame3 to appear on the screen).
  • Frame5 takes longer to render on the GPU than a single refresh interval of the monitor (7.22 ms vs 6.94 ms); however, none of the frames are dropped. This happens because QualitySettings.maxQueuedFrames with the value of 2 delays when the actual frame appears on the screen, which produces a buffer in the time that safeguards against dropping frames, as long as the “spike” doesn’t become the norm. If it were set to 1, Unity would have surely dropped the frame, as it would no longer overlap the work.

Even though screen refresh happens every 6.94 ms, Unity’s time sampling presents a different image:

tdeltaTime(5) = 1.4 + 3.19 + 1.51 + 0.5 + 0.67 = 7.27 ms
tdeltaTime(6) = 1.45 + 2.81 + 1.48 + 0.5 + 0.4 = 6.64 ms
tdeltaTime(7) = 1.43 + 3.13 + 1.61 + 0.51 + 0.35 = 7.03 ms

The delta time average in this case ((7.27 + 6.64 + 7.03)/3 = 6.98 ms) is very close to the actual monitor refresh rate (6.94 ms), and if you were to measure this for a longer period of time, it would eventually average out to exactly 6.94 ms. Unfortunately, if you use this delta time as it is to calculate visible object movement, you will introduce a very subtle jitter. To illustrate this, I created a simple Unity project. It contains three green squares moving across the world space:

The camera is attached to the top cube, so it appears perfectly still on the screen. If Time.deltaTime is accurate, the middle and bottom cubes would appear to be still as well. The cubes move twice the width of the display every second: the higher the velocity, the more visible the jitter becomes. To illustrate movement, I placed purple and pink non-moving cubes in fixed positions in the background so that you can tell how fast the cubes are actually moving.

In Unity 2020.1, the middle and the bottom cubes don’t quite match the top cube movement – they jitter slightly. Below is a video captured with a slow-motion camera (slowed down 20x):

Identifying the source of delta time variation

So where do these delta time inconsistencies come from? The display shows each frame for a fixed amount of time, changing the picture every 6.94 ms. This is the real delta time because that’s how much time it takes for a frame to appear on the screen and that’s the amount of time the player of your game will observe each frame for.

Each 6.94 ms interval consists of two parts: processing and sleeping. The example frame timeline shows that the delta time is calculated on the main thread, so it will be our main focus. The processing part of the main thread consists of pumping OS messages, processing input, calling Update and issuing rendering commands. “Wait for render thread” is the sleeping part. The sum of these two intervals is equal to the real frame time:

tprocessing + twaiting = 6.94 ms

Both of these timings fluctuate for various reasons every frame, but their sum remains constant. If the processing time increases, the waiting time will decrease and vice versa, so they always equal exactly 6.94 ms. In fact, the sum of all the parts leading up to the wait always equals 6.94 ms:

tissueGPUCommands(4) + tpumpOSMessages(5) + tprocessInput(5) + tUpdate(5) + twait(5) = 1.51 + 0.5 + 0.67 + 1.45 + 2.81 = 6.94 ms
tissueGPUCommands(5) + tpumpOSMessages(6) + tprocessInput(6) + tUpdate(6) + twait(6) = 1.48 + 0.5 + 0.4 + 1.43 + 3.13 = 6.94 ms
tissueGPUCommands(6) + tpumpOSMessages(7) + tprocessInput(7) + tUpdate(7) + twait(7) = 1.61 + 0.51 + 0.35 + 1.28 + 3.19 = 6.94 ms

However, Unity queries time at the beginning of Update. Because of that, any variation in time it takes to issue rendering commands, pump OS messages or process input events will throw off the result.

A simplified Unity main thread loop can be defined like this:

while (!ShouldQuit())
{
PumpOSMessages();
UpdateInput();
SampleTime(); // We sample time here!
Update();
WaitForRenderThread();
IssueRenderingCommands();
}

The solution to this problem seems to be straightforward: just move the time sampling to after the wait, so the game loop becomes this:

while (!ShouldQuit())
{
PumpOSMessages();
UpdateInput();
Update();
WaitForRenderThread();
SampleTime();
IssueRenderingCommands();
}

However, this change doesn’t work correctly: rendering has different time readings than Update(), which has adverse effects on all sorts of things. One option is to save the sampled time at this point and update engine time only at the beginning of the next frame. However, that would mean the engine would be using time from before rendering the latest frame.

Since moving SampleTime() to after the Update() is not effective, perhaps moving the wait to the beginning of the frame will be more successful:

while (!ShouldQuit())
{
PumpOSMessages();
UpdateInput();
WaitForRenderThread();
SampleTime();
Update();
IssueRenderingCommands();
}

Unfortunately, that causes another issue: now the render thread must finish rendering almost as soon as requested, which means that the rendering thread will benefit only minimally from doing work in parallel.

Let’s look back at the frame timeline:

Unity enforces pipeline synchronization by waiting for the render thread each frame. This is needed so that the main thread doesn’t run too far ahead of what is being displayed on the screen. Render thread is considered to be “done working” when it finishes rendering and waits for a frame to appear on the screen. In other words, it waits for the back buffer to be flipped and become the front buffer. However, the render thread doesn’t actually care when the previous frame was displayed on the screen – only the main thread is concerned about it because it needs to throttle itself. So instead of having the render thread wait for the frame to appear on the screen, this wait can be moved to the main thread. Let's call it WaitForLastPresentation(). The main thread loop becomes:

while (!ShouldQuit()) { PumpOSMessages(); UpdateInput(); WaitForLastPresentation(); SampleTime(); Update(); WaitForRenderThread(); IssueRenderingCommands(); }

Time is now sampled just after the wait portion of the loop, so the timing will be aligned with the monitor’s refresh rate. Time is also sampled at the beginning of the frame, so Update() and Render() see the same timings.

It is very important to note that WaitForLastPresention() does not wait for the framen - 1 to appear on the screen. If that was the case, no pipelining would be done at all. Instead, it waits for framen - QualitySettings.maxQueuedFrames to appear on the screen, which allows the main thread to continue without waiting for the last frame to complete (unless maxQueuedFrames is set to 1, in which case every frame must be completed before a new one starts).

Achieving stability: We need to go deeper!

After implementing this solution, delta time became much more stable than it was before, but some jitter and occasional variance still occurred. We depend on the operating system waking up the engine from sleep on time. This can take multiple microseconds and therefore introduce jitter to the delta time, especially on desktop platforms where multiple programs are running at the same time.

So what do you do now? It turns out that most graphics APIs/platforms allow you to extract the exact timestamp of a frame being presented to the screen (or an off-screen buffer). For instance, Direct3D 11 and 12 have IDXGISwapChain::GetFrameStatistics, while macOS provides CVDisplayLink. There are a few downsides with this approach, though:

  • You need to write separate extraction code for every supported graphics API, which means that time measurement code is now platform-specific and each platform has its own separate implementation. Since each platform behaves differently, a change like this runs the risk of catastrophic consequences.
  • With some graphics APIs, to obtain this timestamp, VSync must be enabled. This means if VSync is disabled, the time must still be calculated manually.

However, I believe this approach is worth the risk and effort. The result obtained using this method is very reliable and produces the timings that directly correspond to what is seen on the display.

Since we now extract sampling time from the graphics API, WaitForLastPresention() and SampleTime() steps are combined into a new step:

while (!ShouldQuit()) { PumpOSMessages(); UpdateInput(); WaitForLastPresentationAndGetTimestamp(); Update(); WaitForRenderThread(); IssueRenderingCommands(); }

With that, the problem of jittery movement is solved.

Input latency considerations

Input latency is a tricky subject. It’s not very easy to measure accurately, and it can be introduced by various different factors: input hardware, operating system, drivers, game engine, game logic, and the display. Here I focus on the game engine factor of the input latency since Unity can’t affect the other factors.

Engine input latency is the time between the input OS message becoming available and the image getting dispatched to the display. Given the main thread loop, you can visualize input latency as part of code (assuming QualitySettings.maxQueuedFrames is set to 2):

PumpOSMessages(); // Pump input OS messages for frame 0 UpdateInput(); // Process input for frame 0 --------------------- // Earliest input event from the OS that didn't become part of frame 0 arrives here! WaitForLastPresentationAndGetTimestamp(); // Wait for frame -2 to appear on the screen Update(); // Update game state for frame 0 WaitForRenderThread(); // Wait until all commands from frame -1 are submitted to the GPU IssueRenderingCommands(); // Send rendering commands for frame 0 to the rendering thread PumpOSMessages(); // Pump input OS messages for frame 1 UpdateInput(); // Process input for frame 1 WaitForLastPresentationAndGetTimestamp(); // Wait for frame -1 to appear on the screen Update(); // Update game state for frame 1, finally seeing the input event that arrived WaitForRenderThread(); // Wait until all commands from frame 0 are submitted to the GPU IssueRenderingCommands(); // Send rendering commands for frame 1 to the rendering thread PumpOSMessages(); // Pump input OS messages for frame 2 UpdateInput(); // Process input for frame 2 WaitForLastPresentationAndGetTimestamp(); // Wait for frame 0 to appear on the screen Update(); // Update game state for frame 2 WaitForRenderThread(); // Wait until all commands from frame 1 are submitted to the GPU IssueRenderingCommands(); // Send rendering commands for frame 2 to the rendering thread PumpOSMessages(); // Pump input OS messages for frame 3 UpdateInput(); // Process input for frame 3 WaitForLastPresentationAndGetTimestamp(); // Wait for frame 1 to appear on the screen. This is where the changes from our input event appear.

Phew, that’s it! Quite a lot of things happen between input being available as an OS message and its results being visible on the screen. If Unity is not dropping frames and the time spent by the game loop is mostly waiting compared to processing, the worst-case scenario of input latency from the engine for 144hz refresh rate is 4 * 6.94 = 27.76 ms, because we’re waiting for previous frames to appear on screen four times (that means four refresh rate intervals).

You can improve latency by pumping OS events and updating input after waiting for the previous frame to be displayed:

while (!ShouldQuit()) { WaitForLastPresentationAndGetTimestamp(); PumpOSMessages(); UpdateInput(); Update(); WaitForRenderThread(); IssueRenderingCommands(); }

This eliminates one wait from the equation, and now the worst-case input latency is 3 * 6.94 = 20.82 ms.

It is possible to reduce input latency even further by reducing QualitySettings.maxQueuedFrames to 1 on platforms that support it. Then, the chain of input processing looks like this:

--------------------- // Input event arrives from the OS! WaitForLastPresentationAndGetTimestamp(); // Wait for frame -2 to appear on the screen PumpOSMessages(); // Pump input OS messages for frame 0 UpdateInput(); // Process input for frame 0 Update(); // Update game state for frame 0 with the input event that we are measuring WaitForRenderThread(); // Wait until all commands from frame -1 are submitted to the GPU IssueRenderingCommands(); // Send rendering commands for frame 0 to the rendering thread WaitForLastPresentationAndGetTimestamp(); // Wait for frame 0 to appear on the screen. This is where the changes from our input event appear.

Now, the worst-case input latency is 2 * 6.94 = 13.88 ms. This is as low as we can possibly go when using VSync.

Warning: Setting QualitySettings.maxQueuedFrames to 1 will essentially disable pipelining in the engine, which will make it much harder to hit your target frame rate. Keep in mind that if you do end up running at a lower frame rate, your input latency will likely be worse than if you kept QualitySettings.maxQueuedFrames at 2. For instance, if it causes you to drop to 72 frames per second, your input latency will be 2 * 172 = 27.8 ms, which is worse than the previous latency of 20.82 ms. If you want to make use of this setting, we suggest you add it as an option to your game settings menu so gamers with fast hardware can reduce QualitySettings.maxQueuedFrames, while gamers with slower hardware can keep the default setting.

VSync effects on input latency

Disabling VSync can also help reduce input latency in certain situations. Recall that input latency is the amount of time that passes between an input becoming available from the OS and the frame that processed the input being displayed on the screen or, as a mathematical equation:

latency = tdisplay - tinput

Given this equation there are two ways to reduce input latency: either make tdisplay lower (get the image to the display sooner) or make tinput higher (query input events later).

Sending image data from the GPU to display is extremely data-intensive. Just do the math: to send a 2560x1440 non-HDR image to the display 144 times per second requires transmitting 12.7 gigabits every second (24 bits per pixel * 2560 * 1440 * 144). This data cannot be transmitted in an instant: the GPU is constantly transmitting pixels to the display. After each frame is transmitted, there’s a brief break, and transmitting the next frame begins. This break period is called VBLANK. When VSync is enabled, you’re essentially telling the OS to flip the frame buffer only during VBLANK:

When you turn VSync off, the back buffer gets flipped to the front buffer the moment rendering is finished, which means that the display will suddenly start taking data from the new image in the middle of its refresh cycle, causing the upper part of the frame to be from the older frame and the lower part of the frame to be from the newer frame:

This phenomenon is known as “tearing.” Tearing allows us to reduce tdisplay for the lower part of the frame, sacrificing visual quality and animation smoothness for input latency. This is especially effective when the game’s frame rate is lower than VSync interval, which allows a partial recovery of the latency caused by a missed VSync. It is also more effective in games where the upper part of the screen is occupied by UI or a skybox, which makes it harder to notice tearing.

Another way disabling VSync can help reduce input latency is by increasing tinput. If the game is capable of rendering at a much higher frame rate than the refresh rate (for instance, at 150 fps on a 60 Hz display), then disabling VSync will make the game pump OS events several times during each refresh interval, which will reduce the average time they’re sitting in the OS input queue waiting for the engine to process them.

Keep in mind that disabling VSync should ultimately be up to the player of your game since it affects visual quality and can potentially cause nausea if the tearing ends up being noticeable. It is a best practice to provide a settings option in your game to enable/disable it if it’s supported by the platform.

Conclusion

With this fix implemented, Unity’s frame timeline looks like this:

But does it actually improve the smoothness of object movement? You bet it does!

We ran the Unity 2020.1 demo we showed at the start of this post in Unity 2020.2.0b1. Here is the resulting slow-motion video:

This fix is available in the 2020.2 beta for these platforms and graphics APIs:

  • Windows, Xbox One, Universal Windows Platform (D3D11 and D3D12)
  • macOS, iOS, tvOS (Metal)
  • Playstation 4
  • Switch

We plan to implement this for the remainder of our supported platforms in the near future.

Follow this forum thread for updates, and let us know what you think about our work so far.

Further reading on frame timing

Unity 2020.2 beta and beyond

If you’re interested in learning more about what’s available in 2020.2, check out the beta blog post and register for the Unity 2020.2 beta webinar. We’ve also recently shared our roadmap plans for 2021.

October 1, 2020 in Technology | 18 min. read
Topics covered