With the release of ARKit and the iPhone X paired with Unity, developers have an easy-to-use set of tools to create beautiful and expressive characters. This opens up exploring the magic of real-time puppeteering for the upcoming “Windup” animated short, directed by Yibing Jiang.
Unity Labs and the team behind “Windup” have come together to see how far we could push Unity’s ability to capture facial animation in real time on a cinematic character. We also enlisted the help of Roja Huchez of Beast House FX for modeling and rigging of the blend shapes to help bring the character expressions to life.
What the team created is Facial AR Remote, a low-overhead way to capture performance using a connected device directly into the Unity editor. We found using the Remote’s workflow is useful not just for animation authoring, but also for character and blend shape modeling and rigging, creating a streamlined way to build your own animoji or memoji type interactions in Unity. This allows developers to be able to iterate on the model in the editor without needing to build to the device, removing time-consuming steps in the process.
We saw an opportunity to build new animation tools for film projects opening up a future of real-time animation in Unity. There was also a “cool factor” in using AR tools for authoring and an opportunity to continue to push Unity’s real-time rendering. As soon as we had the basics working with data coming from the phone to the editor, our team and everyone around our desks could not stop having fun puppeteering our character. We saw huge potential for this kind of technology. What started as an experiment quickly proved itself both fun and useful. The project quickly expanded into the current Facial AR Remote and feature set.
The team set out expanding the project with Unity’s goal of democratizing development in mind. We wanted the tools and workflows around AR blend shape animation to be easier to use and more available than what was currently available and traditional methods of motion capture. The Facial Remote let us build out some tooling for iterating on blend shapes within the editor without needing to create a new build just to check mesh changes on the phone. What this means is a user is able to take a capture of an actor’s face and record it in Unity. And that capture can be used as a fixed point to iterate and update the character model or re-target the animation to another character without having to redo capture sessions with your actor. We found this workflow very useful for dialing in expressions on our character and refining the individual blend shapes.
The remote is made up of a client phone app, with a stream reader acting as the server in Unity’s editor. The client is a light app that’s able to make use of the latest additions to ARKit and send that data over the network to the Network Stream Source on the Stream Reader GameObject. Using a simple TCP/IP socket and fixed-size byte stream, we send every frame of blendshape, camera and head pose data from the device to the editor. The editor then decodes the stream and to updates the rigged character in real time. To smooth out some jitter due to network latency, the stream reader keeps a tunable buffer of historic frames for when the editor inevitably lags behind the phone. We found this to be a crucial feature for preserving a smooth look on the preview character while staying as close as possible the real actor's current pose. In poor network conditions, the preview will sometimes drop frames to catch up, but all data is still recorded with the original timestamps from the device.
On the editor side, we use the stream data to drive the character for preview as well as baking animation clips. Since we save the raw stream from the phone to disk, we can continue to play back this data on a character as we refine the blend shapes. And since the save data is just a raw stream from the phone, we can even re-target the motion to different characters. Once you have a stream you’re happy with captured, you can bake the stream to an animation clip on a character. This is great since they can use that clip that you have authored like any other animation in Unity to drive a character in Mecanim, Timeline or any of the other ways animation is used.
With the Windup rendering tech demo previously completed, the team was able to use those high-quality assets to start our animation exploration. Since we were able to get a baseline up and running rather quickly, we had a lot of time to iterate on the blend shapes using the tools we were developing. Jitter, smoothing and shape tuning quickly became the major areas of focus for the project. The solves for the jittering were improved by figuring out the connection between frame rate and lag in frame processing as well as removing camera movement from the playback. Removing the ability to move the camera really focused the users on capturing the blend shapes and facilitated us being able to mount the phone in a stand.
Understanding the blend shapes and getting the most out of the blend shape anchors in ARKit is what required the most iteration. It is difficult to understand the minutia of the different shapes from the documentation. So much of the final expression comes from the stylization of the character and how the shapes combine in some expected ways. We found that shapes like the eye/cheek squint shapes and mouth stretch were improved by limiting the influence of the blend shape changes to specific areas of the face. For example, the cheek squint should have little to no effect on the lower eyelid, and the lower eyelid in the squint should have little to no effect on the cheek. It also does not help that we initially missed how the `mouthClosed` shape was a corrective pose to bring the lips closed with the `jawOpen` shape at 100%.
Using information from the Skinned Mesh Renderer to look at the values that made up my expression on any frame, then under- or over-driving those values really helped to dial in the blend shapes. We were able to quickly over or underdrive the current blend shapes and determine if any blend shapes needed to be modified, and by how much. This helped with one of the hardest things to do, getting the right character to a key pose, like the way we wanted the little girl to smile. This was really helped by being able to see what shapes make up a given pose and in this case, it was the amount mouth stretch right and left worked with the smile to give the final shape. We found it helps to think of the shapes the phone provided as little building blocks, not as some face pose a human could make in isolation.
At the very end of art production on the demo, we wanted to try an experiment to improve some of the animation on the character. Armed with the collective understanding of the blend shapes from ARKit, we tried modifying the base neutral pose of the character. Due to the stylization of the little girl character, there was an idea that the base pose of the character had the eyes too wide and a little too much base smile to the face. This left too little in the delta between eyes wide and base, with too wide a delta between base and closed. The effect of the squint blend shapes also needed to be better accounted for. The squint as it turns out seems to always be at ~60-70% when someone closes their eyes for the people we tested on. The change to the neutral pose paid off, and along with all the other work makes for the expressive and dynamic character you see in the demo.
Combining Facial AR Remote and the rest of the tools in Unity, there is no limit to the amazing animations you can create! Soon anyone will be able to puppeteer digital characters, be it kids acting out and recording their favorite characters then sharing with friends and family, game streamers adding extra life to their avatars, or opening up new avenues for professionals and hobbyists to make animated content for broadcast. Get started by downloading Unity 2018 and checking out setup instructions on Facial AR Remote’s github. The team and the rest of Unity look forward to the artistic and creative uses of Facial AR Remote our users will create.