Designing computer vision solutions for home interior applications is complex, as gathering diverse and accurate data can be expensive, time consuming, and riddled with privacy concerns. Read on to find out how Unity’s synthetic dataset generation tools and services enable the development of more capable computer vision applications for the home while mitigating roadblocks and challenges.
One of the most challenging aspects of building performant computer vision (CV) models is curating datasets with sufficient diversity and accurate labeling. Recently, synthetic datasets have brought about viable solutions to these issues by minimizing the need for costly and time consuming acquisition and annotation of real data.
Unity has been at the forefront of this shift to synthetic data. One of the key areas where our customers have sought our expertise in building synthetic training datasets are home interiors. These involve a variety of applications in home automation, security, assistive technologies, healthcare, pet and baby monitoring, home interior design, and more.
For home interior applications, it is particularly challenging to acquire labeled datasets based on real homes due to privacy concerns. This is compounded by training datasets requiring a high degree of diversity in elements such as materials, colors, lighting, and furniture.
To help these customers, we have been developing tools and 3D content libraries for generating photorealistic home environments. To achieve a high level of variation in the data, we use procedural furniture placement and numerous randomized elements, such as camera positions, materials, lighting, time of day, sky and outdoor environment, and placement of additional custom objects into the environment. The image below depicts examples for four of these randomizations.
Our Home Interiors dataset is a small example of what we can generate with these tools. This dataset includes semantic and instance segmentation images (with labels for furniture, fixtures, walls, ceilings, and floors), as well as 2D and 3D bounding box annotations for furniture. Having several types of labels enables a wide range of CV tasks that may not be possible with datasets that are limited to one or two kinds of ground truth.
We will now explore a few home interior use cases where we have been helping customers overcome the difficulties of acquiring real datasets.
Computer vision has gained immense traction with devices that automate and coordinate various tasks in the home. In many domains, applications that previously used sensor arrays are now shifting to use CV powered cameras in order to perform tasks more effectively and completely.
A good example are smart robotic vacuum cleaners. While these devices were originally created almost 30 years ago, one of their main shortcomings is the inability to distinguish between trash and other objects on the ground. In addition, they have difficulty navigating areas with many obstacles. The power of Unity has allowed us to create bespoke solutions to these problems. For instance, using Unity’s physics engine, we can realistically place a variety of fabric objects with randomized folds and creases on the ground. Such a dataset can be used to teach a vacuum cleaner to identify clothes and avoid them.
Smart cameras are another area where we have worked with our customers to help improve their CV models using synthetic data. These cameras need to detect the presence of people in the room to control and personalize elements such as lighting and temperature accordingly. One example task in this domain where our tools have proven to be valuable is teaching a model to identify pets. For this purpose, one needs a large variety of pet images with varied poses, animations, fur colors, and more, within home environments. Unity’s capabilities as a game engine mean there are already a wide array of features built in that make this task possible, including tools for animating skeletal models such as those of pets and simulating effects such as motion blur to achieve a more realistic output.
The synthetic approach also makes it easier to generate data that matches a real camera’s characteristics in terms of position and look direction, perspective projection, aspect ratio, lens distortion, and other image properties (contrast, saturation, etc.). For instance, in a synthetic setup, simulating night vision camera data is as easy as simulating an ordinary RGB camera. Furthermore, the quick iteration times with synthetic data mean you can easily switch your dataset from night vision to RGB if you decide to change your real camera.
Another area where CV has taken center stage involves solutions that strive to keep our homes safe or increase our quality of life through healthcare and assistive technologies.
To help the visually impaired safely navigate their homes, we need models that can form a holistic understanding of the layout of a house and the objects in it. Such models would need to identify safe paths in a home by detecting elements such as furniture, walls, staircases, and doors. In addition, the camera needs to know how far walls and other obstacles are in order to warn the user if they get too close. To attack this problem, we use a variety of single and multiple story home layouts and randomize the placement and types of furniture, as well as the degree to which doors and windows are open. In addition to dataset diversity, this level of complexity requires multiple types of ground-truth to train on, including segmentation, bounding boxes, and depth maps, making it all the more difficult to compile highly diverse and accurately labeled real datasets.
Synthetic home interiors can also serve as a realistic environment for creating models that perceive humans in home settings. We achieve this by adding 3D human models into the simulation and randomizing their pose, height, build, hair, skin, etc. These CV models can be utilized to monitor a person’s physical routines and detect anomalies in elements such as walking gait and dexterity. Systems powered by such models could also alert emergency services in case a user falls or does not move for long periods of time, especially in unusual places in the home that are not typical sleeping spots. Similarly, such a system can detect whether a user has forgotten to take their medication and remind them, by learning the usual body and arm movements involved in taking a specific type of medicine.
Machine learning has recently helped computers develop inklings of artistic vision, and interior design is one example. Some online retailers already offer users the ability to visualize products in their homes using smartphone cameras. This is achieved with CV models that can identify and measure the extent and area of the various surfaces (floor, tables, walls) in a room so as to realistically display virtual products on them. This is just a small example of the wide range of possibilities for augmented reality that CV unlocks.
For instance, one of the use-cases we have worked on is detecting the materials and the existing interior design style of a home based on pictures uploaded by the user. A model with this kind of capability can be utilized in an online recommendation system, helping users quickly find products that match their homes. What makes this possible is Unity’s physically based rendering engine, which allows us to realistically render a wide range of materials such as different woods and laminates. In addition, our bespoke randomization tools make it easy to modify the various parameters of these materials in each generated frame.
CV models can also detect the general shape and size of furniture objects present in a room in order to recommend and accurately visualize new furniture of similar size in the users’ homes. In a similar vein, a color scheme visualizer can be built that virtually assigns various colors to the users’ furniture, helping them decide which color palette to choose if they want to give their homes a fresh look.
The applications we explored here are just a fraction of what this technology enables. Synthetic data unlocks the ability to improve model performance in a wide variety of CV tasks. Some additional examples include mapping the interiors of older buildings in order to build high accuracy plans for remodeling, recognizing locations, and procedurally generating synthetic environments. The ability to customize a synthetic dataset to specifically target your needs means that applications are now only limited to your imagination. While the focus of this post was on home interiors, our procedural tools have the capability to generate and randomize a large variety of environments.
Contact our team of experts to discuss a custom synthetic dataset for your home interior computer vision needs. If your application extends beyond the home, we support a wide array of customization options for your specific environments and labeling needs. Download the Home Interiors dataset to see an example of what a dataset could look like.