Search Unity

Accessing texture data efficiently

May 25, 2023 in Engine & platform | 15 min. read
Accessing texture data efficiently | Cover (hero) image
Accessing texture data efficiently | Cover (hero) image
Share

Is this article helpful for you?

Thank you for your feedback!

Learn about the benefits and trade-offs of different ways to access the underlying texture pixel data in your Unity project.

Working with pixel data in Unity

Pixel data describes the color of individual pixels in a texture. Unity provides methods that enable you to read from or write to pixel data with C# scripts.

You might use these methods to duplicate or update a texture (for example, adding a detail to a player’s profile picture), or use the texture’s data in a particular way, like reading a texture that represents a world map to determine where to place an object.

There are several ways of writing code that reads from or writes to pixel data. The one you choose depends on what you plan to do with the data and the performance needs of your project.

This blog and the accompanying sample project are intended to help you navigate the available API and common performance pitfalls. An understanding of both will help you write a performant solution or address performance bottlenecks as they appear.

CPU and GPU copies of pixel data

For most types of textures, Unity stores two copies of the pixel data: one in GPU memory, which is required for rendering, and the other in CPU memory. This copy is optional and allows you to read from, write to, and manipulate pixel data on the CPU. A texture with a copy of its pixel data stored in CPU memory is called a readable texture. One detail to note is that RenderTexture exists only in GPU memory.

The differences between CPU and GPU

Memory

The memory available to the CPU differs from that of the GPU on most hardware. Some devices have a form of partially shared memory, but for this blog we will assume the classic PC configuration where the CPU only has direct access to the RAM plugged into the motherboard and the GPU relies on its own video RAM (VRAM). Any data transferred between these different environments has to pass through the PCI bus, which is slower than transferring data within the same type of memory. Due to these costs, you should try to limit the amount of data transferred each frame.

Flowchart visualizing the relationship between CPU and GPU memory, and a cross section of the API
Visualizing the relationship between CPU and GPU memory, and a cross section of the API

Processing

Sampling textures in shaders is the most common GPU pixel data operation. To alter this data, you can copy between textures or render into a texture using a shader. All these operations can be performed quickly by the GPU.

In some cases, it may be preferable to manipulate your texture data on the CPU, which offers more flexibility in how data is accessed. CPU pixel data operations act only on the CPU copy of the data, so require readable textures. If you want to sample the updated pixel data in a shader, you must first copy it from the CPU to the GPU by calling Apply. Depending on the texture involved and the complexity of the operations, it may be faster and easier to stick to CPU operations (for example, when copying several 2D textures into a Texture2DArray asset).

The Unity API provides several methods to access or process texture data. Some operations act on both the GPU and CPU copy if both are present. As a result, the performance of these methods varies depending on whether the textures are readable. Different methods can be used to achieve the same results, but each method has its own performance and ease-of-use characteristics.

Answer the following questions to determine the optimal solution:

  • Can the GPU perform your calculations faster than the CPU?
    • What level of pressure is the process putting on the texture caches? (For example, sampling many high-resolution textures without using mipmaps is likely to slow down the GPU.)
    • Does the process require a random write texture, or can it output to a color or depth attachment? (Writing to random pixels on a texture requires frequent cache flushes that slow down the process.)
  • Is my project already GPU bottlenecked? Even if the GPU is able to execute a process faster than the CPU, can the GPU afford to take on more work without exceeding its frame time budget?
    • If both the GPU and the CPU main thread are near their frame time limit, then perhaps the slow part of a process could be performed by CPU worker threads.
  • How much data needs to be uploaded to or downloaded from the GPU to calculate or process the results?
    • Could a shader or C# job pack the data into a smaller format to reduce the bandwidth required?
    • Could a RenderTexture be downsampled into a smaller resolution version that is downloaded instead?
  • Can the process be performed in chunks? (If a lot of data needs to be processed at once, there’s a risk of the GPU not having enough memory for it.)
  • How quickly are the results required? Can calculations or data transfers be performed asynchronously and handled later? (If too much work is done in a single frame, there is a risk that the GPU won’t have enough time to render the actual graphics for each frame.)

Making a texture readable or nonreadable

By default, texture assets that you import into your project are nonreadable, while textures created from a script are readable.

Readable textures use twice as much memory as nonreadable textures because they need to have a copy of their pixel data in CPU RAM. You should only make a texture readable when you need to, and make them nonreadable when you are done working with the data on the CPU.

To see if a texture asset in your project is readable and make edits, use the Read/Write Enabled option in Texture Import Settings, or the TextureImporter.isReadable API.

To make a texture nonreadable, call its Apply method with the makeNoLongerReadable parameter set to “true” (for example, Texture2D.Apply or Cubemap.Apply). A nonreadable texture can’t be made readable again.

All textures are readable to the Editor in Edit and Play modes. Calling Apply to make the texture nonreadable will update the value of isReadable, preventing you from accessing the CPU data. However, some Unity processes will function as if the texture is readable because they see that the internal CPU data is valid.

Texture Access API examples in GitHub

Example of a texture generated on the CPU each frame
Example of a texture generated on the CPU each frame

Performance differs greatly across the various ways of accessing texture data, especially on the CPU (although less so at lower resolutions). The Unity Texture Access API examples repository on GitHub contains a number of examples showing performance differences between various APIs that allow access to, or manipulation of, texture data. The UI only shows the main thread CPU timings. In some cases, DOTS features like Burst and the job system are used to maximize performance.

Here are the examples included in the GitHub repository:

  • SimpleCopy: Copying all pixels from one texture to another
  • PlasmaTexture: A plasma texture updated on the CPU per frame
  • TransferGPUTexture: Transferring (copying to a different size or format) all pixels on the GPU from a texture to a RenderTexture

Listed below are performance measurements taken from the examples on GitHub. These numbers are used to support the recommendations that follow. The measurements are from a player build on a system with a 3.7 GHz 8-core Xeon® W-2145 CPU and an RTX 2080.

SimpleCopy example

These are the median CPU times for SimpleCopy.UpdateTestCase with a texture size of 2,048.

Note that the Graphics methods complete nearly instantly on the main thread because they simply push work onto the RenderThread, which is later executed by the GPU. Their results will be ready when the next frame is being rendered.

Results

  • 1,326 ms – foreach(mip) for(x in width) for(y in height) SetPixel(x, y, GetPixel(x, y, mip), mip)
  • 32.14 ms – foreach(mip) SetPixels(source.GetPixels(mip), mip)
  • 6.96 ms – foreach(mip) SetPixels32(source.GetPixels32(mip), mip)
  • 6.74 ms – LoadRawTextureData(source.GetRawTextureData())
  • 3.54 ms – Graphics.CopyTexture(readableSource, readableTarget)
  • 2.87 ms – foreach(mip) SetPixelData<byte>(mip, GetPixelData<byte>(mip))
  • 2.87 ms – LoadRawTextureData(source.GetRawTextureData<byte>())
  • 0.00 ms – Graphics.ConvertTexture(source, target)
  • 0.00 ms – Graphics.CopyTexture(nonReadableSource, target)

PlasmaTexture example

These are the median CPU times for PlasmaTexture.UpdateTestCase with a texture size of 512.

You’ll see that SetPixels32 is unexpectedly slower than SetPixels. This is due to having to take the float-based Color result from the plasma pixel calculation and convert it to the byte-based Color32 struct. SetPixels32NoConversion skips this conversion and just assigns a default value to the Color32 output array, resulting in better performance than SetPixels. In order to beat the performance of SetPixels and the underlying color conversion performed by Unity, it is necessary to rework the pixel calculation method itself to directly output a Color32 value. A simple implementation using SetPixelData is almost guaranteed to give better results than careful SetPixels and SetPixels32 approaches.

Results

  • 126.95 ms – SetPixel
  • 113.16 ms – SetPixels32
  • 88.96 ms – SetPixels
  • 86.30 ms – SetPixels32NoConversion
  • 16.91 ms – SetPixelDataBurst
  • 4.27 ms – SetPixelDataBurstParallel

TransferGPUTexture example

These are the Editor GPU times for TransferGPUTexture.UpdateTestCase with a texture size of 8,196:

  • Blit – 1.584 ms
  • CopyTexture – 0.882 ms

Pixel data API recommendations

You can access pixel data in various ways. However, not all methods support every format, texture type, or use case, and some take longer to execute than others. This section goes over recommended methods, and the following section covers those to use with caution.

CopyTexture

CopyTexture is the fastest way to transfer GPU data from one texture into another. It does not perform any format conversion. You can partially copy data by specifying a source and target position, in addition to the width and height of the region. If both textures are readable, the copy operation will also be performed on the CPU data, bringing the total cost of this method closer to that of a CPU-only copy using SetPixelData with the result of GetPixelData from a source texture.

Blit

Blit is a fast and powerful method of transferring GPU data into a RenderTexture using a shader. In practice, this has to set up the graphics pipeline API state to render to the target RenderTexture. It comes with a small resolution-independent setup cost compared to CopyTexture. The default Blit shader used by the method takes an input texture and renders it into the target RenderTexture. By providing a custom material or shader, you can define complex texture-to-texture rendering processes.

GetPixelData and SetPixelData

GetPixelData and SetPixelData (along with GetRawTextureData) are the fastest methods to use when only touching CPU data. Both methods require you to provide a struct type as a template parameter used to reinterpret the data. The methods themselves only need this struct to derive the correct size, so you can just use byte if you don’t want to define a custom struct to represent the texture’s format.

When accessing individual pixels, it’s a good idea to define a custom struct with some utility methods for ease of use. For example, an R5G5B5A1 format struct could be made up out of a ushort data member and a few get/set methods to access the individual channels as bytes.

public struct FormatR5G5B5A1 { public ushort data; const ushort redOffset = 11; const ushort greenOffset = 6; const ushort blueOffset = 1; const ushort alphaOffset = 0; const ushort redMask = 31 << redOffset; const ushort greenMask = 31 << greenOffset; const ushort blueMask = 31 << blueOffset; const ushort alphaMask = 1; public byte red { get { return (byte)((data & redMask) >> redOffset); } } public byte green { get { return (byte)((data & greenMask) >> greenOffset); } } public byte blue { get { return (byte)((data & blueMask) >> blueOffset); } } public byte alpha { get { return (byte)((data & alphaMask) >> alphaOffset); } } }

The above code is an example from an implementation of an object representing a pixel in the R5G5B5A5A1 format; the corresponding property setters are omitted for brevity.

SetPixelData can be used to copy a full mip level of data into the target texture. GetPixelData will return a NativeArray that actually points to one mip level of Unity’s internal CPU texture data. This allows you to directly read/write that data without the need for any copy operations. The catch is that the NativeArray returned by GetPixelData is only guaranteed to be valid until the user code calling GetPixelData returns control to Unity, such as when MonoBehaviour.Update returns. Instead of storing the result of GetPixelData between frames, you have to get the correct NativeArray from GetPixelData for every frame you want to access this data from.

Apply

The Apply method returns after the CPU data has been uploaded to the GPU. The makeNoLongerReadable parameter should be set to “true” where possible to free up the memory of the CPU data after the upload.

RequestIntoNativeArray and RequestIntoNativeSlice

The RequestIntoNativeArray and RequestIntoNativeSlice methods asynchronously download GPU data from the specified Texture into (a slice of) a NativeArray provided by the user.

Calling the methods will return a request handle that can indicate if the requested data is done downloading. Support is limited to only a handful of formats, so use SystemInfo.IsFormatSupported with FormatUsage.ReadPixels to check format support. The AsyncGPUReadback class also has a Request method, which allocates a NativeArray for you. If you need to repeat this operation, you will get better performance if you allocate a NativeArray that you reuse instead.

Methods to use with caution

There are a number of methods that should be used with caution due to potentially significant performance impacts. Let’s take a look at them in more detail.

Pixel accessors with underlying conversions

These methods perform pixel format conversions of varying complexity. The Pixels32 variants are the most performant of the bunch, but even they can still perform format conversions if the underlying format of the texture doesn’t perfectly match the Color32 struct. When using the following methods, it’s best to keep in mind that their performance impact significantly increases by varying degrees as the number of pixels grows:

Fast data accessors with a catch

GetRawTextureData and LoadRawTextureData are Texture2D-only methods that work with arrays containing the raw pixel data of all mip levels, one after another. The layout goes from largest to smallest mip, with each mip being “height” amount of “width” pixel values. These functions are quick to give CPU data access. GetRawTextureData does have a “gotcha” where the non-templated variant returns a copy of the data. This is a bit slower, and does not allow direct manipulation of the underlying buffer managed by Unity. GetPixelData does not have this quirk and can only return a NativeArray pointing to the underlying buffer that remains valid until user code returns control to Unity.

ConvertTexture

ConvertTexture is a way to transfer the GPU data from one texture to another, where the source and destination textures don’t have the same size or format. This conversion process is as efficient as it gets under the circumstances, but it’s not cheap. This is the internal process:

  1. Allocate a temporary RenderTexture matching the destination texture.
  2. Perform a Blit from the source texture to the temporary RenderTexture.
  3. Copy the Blit result from the temporary RenderTexture to the destination texture.

Answer the following questions to help determine if this method is suited to your use case:

  • Do I need to perform this conversion?
    • Can I make sure the source texture is created in the desired size/format for the target platform at import time?
    • Can I change my processes to use the same formats, allowing the result of one process to be directly used as an input for another process?
  • Can I create and use a RenderTexture as the destination instead? Doing so would reduce the conversion process to a single Blit to the destination RenderTexture.

ReadPixels

The ReadPixels method synchronously downloads GPU data from the active RenderTexture (RenderTexture.active) into a Texture2D’s CPU data. This enables you to store or process the output from a rendering operation. Support is limited to only a handful of formats, so use SystemInfo.IsFormatSupported with FormatUsage.ReadPixels to check format support.

Downloading data back from the GPU is a slow process. Before it can begin, ReadPixels has to wait for the GPU to complete all preceding work. It’s best to avoid this method as it will not return until the requested data is available, which will slow down performance. Usability is also a concern because you need GPU data to be in a RenderTexture, which has to be configured as the currently active one. Both usability and performance are better when using the AsyncGPUReadback methods discussed earlier.

Methods to convert between image file formats

The ImageConversion class has methods to convert between Texture2D and several image file formats. LoadImage is able to load JPG, PNG, or EXR (since 2023.1) data into a Texture2D and upload this to the GPU for you. The loaded pixel data can be compressed on the fly depending on Texture2D’s original format. Other methods can convert a Texture2D or pixel data array to an array of JPG, PNG, TGA, or EXR data.

These methods are not particularly fast, but can be useful if your project needs to pass pixel data around through common image file formats. Typical use cases include loading a user’s avatar from disk and sharing it with other players over a network.

Key takeaways and more advanced resources

There are many resources available to learn more about graphics optimization, related topics, and best practices in Unity. The graphics performance and profiling section of the documentation is a good starting point.

You can also check out several technical e-books for advanced users, including Ultimate guide to profiling Unity games,  Optimize your mobile game performance, and Optimize your console and PC game performance.

You’ll find many more advanced best practices on the Unity how-to hub.

Here’s a summary of the key points to remember:

  • When manipulating textures, the first step is to assess which operations can be performed on the GPU for optimal performance. The existing CPU/GPU workload and size of the input/output data are key factors to consider.
  • Using low level functions like GetRawTextureData to implement a specific conversion path where necessary can offer improved performance over the more convenient methods that perform (often redundant) copies and conversions.
  • More complex operations, such as large readbacks and pixel calculations, are only viable on the CPU when performed asynchronously or in parallel. The combination of Burst and the job system allows C# to perform certain operations that would otherwise only be performant on a GPU.
  • Profile frequently: There are many pitfalls you can encounter during development, from unexpected and unnecessary conversions to stalls from waiting on another process. Some performance issues will only start surfacing as the game scales up and certain parts of your code see heavier usage. The example project demonstrates how seemingly small increases in texture resolution can cause certain APIs to become a performance issue.

Share your feedback on texture data with us in the Scripting or General Graphics forums. Be sure to watch for new technical blogs from other Unity developers as part of the ongoing Tech from the Trenches series.

May 25, 2023 in Engine & platform | 15 min. read

Is this article helpful for you?

Thank you for your feedback!

Related Posts