Nobody likes loading screens. Did you know that you can quickly adjust Async Upload Pipeline (AUP) parameters to significantly improve your loading times? This article details how meshes and textures are loaded through the AUP. This understanding could help you speed up loading time significantly - some projects have seen over 2x performance improvements!
Read on to learn how the AUP works from a technical standpoint and what APIs you should be using to get the most out of it.
The latest, most optimal implementation of the Asset Upload Pipeline is available in the 2018.3 beta.
First, let’s take a detailed look at when the AUP is used and how the loading process works.
Prior to 2018.3, the AUP only handled textures. Starting with 2018.3 beta, the AUP now loads textures and meshes, but there are some exceptions. Textures that are read/write enabled, or meshes that are read/write enabled or compressed, will not use the AUP. (Note that Texture Mipmap Streaming, which was introduced in 2018.2, also uses AUP.)
During the build process, the Texture or Mesh Object is written to a serialized file and the large binary data (texture or vertex data) is written to an accompanying .resS file. This layout applies to both player data and asset bundles. The separation of the object and binary data allows for faster loading of the serialized file (which will generally contain small objects), and it enables streamlined loading of the large binary data from the .resS file after. When the Texture or Mesh Object is deserialized, it submits a command to the AUP’s command queue. Once that command completes, the Texture or Mesh data has been uploaded to the GPU and the object can be integrated on the main thread.
During the upload process, the large binary data from the .resS file is read to a fixed-sized ring buffer. Once in memory, the data is uploaded to the GPU in a time-sliced fashion on the render thread. The size of the ring buffer and the duration of the time-slice are the two parameters that you can change to affect the behavior of the system.
The Async Upload Pipeline has the following process for each command:
Multiple commands can be in progress simultaneously, but all must allocate their required memory out of the same shared ring buffer. When the ring buffer fills up, new commands will wait; this waiting will not cause main-thread blocking or affect frame rate, it simply slows the async loading process.
A summary of these impacts are as follows:
Load Pipeline Comparison | |||
Without AUP | AUP | Impact on you | |
Memory Usage | Allocate as data is read out of default heap. (High memory watermarks) | Fixed size ring buffer | Reduced high memory watermarks |
Upload Process | Upload as data is available | Amortized uploading with fixed time-slice | Hitchless uploading |
Post Processing | Performed on loading thread (blocks loading thread) | Performed on jobs in background | Faster Loading |
To take full advantage of the AUP in 2018.3, there are three parameters that can be adjusted at runtime for this system:
These settings can be adjusted through the scripting API or via the QualitySettings menu.
Let’s examine a workload with lots of textures and meshes being uploaded through the Async Upload Pipeline using the default 2ms time slice and a 4MB ring buffer. Since we’re loading, we get 2 time-slices per render frame, so we should have 4 milliseconds of upload time. Looking at the profiler data, we only use about 1.5 milliseconds. We can also see that immediately after the upload, a new read operation is issued now that memory is available in the ring buffer. This is a sign that a larger ring buffer is needed.
Let’s try increasing the Ring Buffer and since we’re in a loading screen, it is also a good idea to increase the upload time-slice. Here’s what a 16MB Ring Buffer and 4-millisecond time slice look like:
Now we can see that we are spending almost all our render thread time uploading, and just a short time between uploads rendering the frame.
Below are the loading times of the sample workload with a variety of upload time slices and Ring Buffer sizes. Tests were run on a MacBook Pro, 2.8GHz Intel Core i7 running OS X El Capitan. Upload speeds and I/O speeds will vary on different platforms and devices. The workload is a subset of the Viking Village sample project that we use internally for performance testing. Because there are other objects being loaded, we aren’t able to get the precise performance win of the different values. It’s safe to say in this case, however, that the texture and mesh loading is at least twice as fast when switching from the 4MB/2MS settings to the 16MB/4MS settings.
Experimenting with these parameters outputs the following results.
To optimize loading times for this particular sample project, we should, therefore, configure settings like this:
QualitySettings.asyncUploadTimeSlice = 4 QualitySettings.asyncUploadBufferSize = 16 QualitySettings.asyncUploadPersistentBuffer = true
General recommendations for optimizing loading speed of textures and meshes:
Q: How often will time-sliced uploading occur on the render thread?
Q: What is loaded through the AUP?
Q: What if the ring buffer is not large enough to hold the data being uploaded(for example a really large texture)?
Q: How do synchronous load APIs work? For example, Resources.Load, AssetBundle.LoadAsset, etc.
We’re always looking for feedback. Let us know what you think in the comments or on the Unity 2018.3 beta forum!