Search Unity

Feature highlight: IL2CPP runtime performance improvements in Unity 2021.2

February 17, 2022 in Technology | 7 min. read
Runtime Performance banner
Runtime Performance banner
Share

Did you know that leveraging the expressivity and safety of C# while writing code for Unity projects can help you reach maximum runtime performance on the platforms you’re targeting? That’s why Unity’s .NET Tech Group works diligently to update the foundational tech stack behind your scripts.

In Unity 2021, we’ve made a number of improvements to speed things up when using the IL2CPP scripting backend. Let’s take a closer look at some of the key changes that’ll make your code noticeably faster.

Delegate invocation

While the delegate invocation mechanism is a strength of C#, the CreateDelegate API is rather complex. Delegates can be open or closed, and can call instance or static methods via virtual or interface calls – there’s even a distinction between generic methods and methods on generic types! All these variations indicate that the runtime must complete several checks to ensure that the proper method is called with the proper invocation.

In Unity 2021.2, IL2CPP precomputes just about everything needed to verify the correct call type at compile time. This means that open delegates require just two indirect call instructions, whereas closed delegates only require one indirect call instruction. This is the same approach to delegate invocation used in .NET Framework and .NET Core.

In light of this, delegate invocation is now faster than it was before, and much quicker in some targeted benchmarks.

Delegate Invocation Performance

Unnecessary boxing checks

Boxing in C# is the process of converting a value type to an object of type System.Object. Since this involves allocation of space on the managed heap, it’s not necessarily a speedy process. But Unity 2021.2 accelerates the IL2CPP runtime by removing some boxing to squeeze more performance out of this operation.

The boxing of nullable types at runtime must be done in a particular way, so that every boxing operation determines whether the given type is nullable. By eliminating some unnecessary checks at runtime, we’ve improved performance for boxing, specifically in nullable and generic cases.

Boxing performance graph

Calls to generic virtual methods

Generic virtual methods are expressive features of C# that are difficult to implement efficiently. Unlike direct method calls, the compiler has little information on the target for generic virtual method calls at build time, meaning that it must locate the target at runtime.

In Unity 2021.2, we have doubled the performance of generic virtual and interface method calls, as shown in the following examples:

interface Interface { T GetValue<T>(); } class Base : Interface { public virtual T GetValue<T>() { return default(T); } } class Derived : Base { public override T GetValue<T>() { return default(T); } } private Base obj = new Derived(); private Interface iface = obj; public void CallToVirtualGenericMemberFunction() { obj.GetValue<int>(); } public void CallToGenericInterfaceMemberFunction() { iface.GetValue<int>(); }
Generic Virtual Method Performance

Enum.HasFlag

Conveniently use C# enum types with the [Flags] attribute to represent different possible combinations of options. Code that checks for a given value in a flags enum often uses the Enum.HasFlag method. In Unity 2021.2, IL2CPP enhances calls to this method, so that they occur more than 100 times faster.

For example, a benchmark like this:

public void CallToEnumHasFlag() { _enum.HasFlag(_flag); }

Takes this generated code...

IL2CPP_EXTERN_C IL2CPP_METHOD_ATTR void EnumHasFlag_CallToEnumHasFlag_m5819FE655D569D7AF856B879164E6416EFEFC30E (EnumHasFlag_t72757859AA4C348BBEE0A64FDBB747AFCFE326C2 * __this, const RuntimeMethod* method) { static bool s_Il2CppMethodInitialized; if (!s_Il2CppMethodInitialized) { il2cpp_codegen_initialize_runtime_metadata((uintptr_t*)&MyEnum_t67450C4DBC081C689DA95C351B619398929DC7A1_il2cpp_TypeInfo_var); s_Il2CppMethodInitialized = true; }{ int32_t L_0 = __this->____enum_0; int32_t L_1 = L_0; RuntimeObject * L_2 = Box(MyEnum_t67450C4DBC081C689DA95C351B619398929DC7A1_il2cpp_TypeInfo_var, &L_1); int32_t L_3 = __this->____flag_1; int32_t L_4 = L_3; RuntimeObject * L_5 = Box(MyEnum_t67450C4DBC081C689DA95C351B619398929DC7A1_il2cpp_TypeInfo_var, &)L_4); NullCheck((Enum_t2A1A94B24E3B776EEF4E5E485E290BB9D4D072E2 *)L_2); bool L_6;L_6 = Enum_HasFlag_m15293B523AA7BA15272699C7304E908106AD7F7B((Enum_t2A1A94B24E3B776EEF4E5E485E290BB9D4D072E *)L_2, (Enum_t2A1A94B24E3B776EEF4E5E485E290BB9D4D072E2 *)L_5, NULL) ; return; }}

…to this generated code.

IL2CPP_EXTERN_C IL2CPP_METHOD_ATTR void EnumHasFlag_CallToEnumHasFlag_m5819FE655D569D7AF856B879164E6416EFEFC30E (EnumHasFlag_t72757859AA4C348BBEE0A64FDBB747AFCFE326C2 * __this,const RuntimeMethod* method) { { int32_t L_0 = __this->____enum_0;int32_t L_1 = L_0; int32_t L_2 = __this->____flag_1; int32_t L_3 = L_2; bool L_4 = il2cpp_codegen_enum_has_flag(L_1, L_3) ; return; }}

Unnecessary boxing and null checks have been removed, which allows the C++ compiler to fully optimize the code. In this case, the microbenchmark went from 162 ns to 1.18 ns.

Enum.HasFlag Performance graph

Constrained calls

Constrained calls are also useful C# features. They enable developers to take what would normally be considered expensive operations, like virtual method calls, and make them much less expensive. But how? 

They essentially provide “hints” to the runtime about the way specific calls will be made. IL2CPP now picks up on more of these hints, to convert expensive calls into cheap direct calls.  

So, suppose your code has a value type like this:

private struct SimpleValueType { }

Along with a generic method that can call Equals virtual method from System.Object on any type:

private static void Equals<T>(T t) { t.Equals(null); }

Then, a benchmark like this...

public void CallToEqualsValueType() { Equals(_simpleValueType); }

…will become 10 times faster.

This is because IL2CPP recognizes that it does not need to make a virtual call here, and instead makes a direct method call. As you can see, many of our constrained call benchmarks show tremendous improvement.

Constrained Call performance graph

Try it out today

As with any performance analysis, we encourage you to profile your scripting code for a better understanding of it. The improvements here are all expressed via targeted benchmarks, but of course, performance characteristics vary widely across projects. We’d love to hear about your experience. Please join us on the Unity forum to share your project performance analysis, improvements, and challenges faced along the way.

February 17, 2022 in Technology | 7 min. read