6 min read

Last week, the team at Facebook AI open-sourced AI Habitat which is a new simulation platform for embodied AI research. The AI Habitat is designed to train embodied agents, eg, virtual robots in photo-realistic 3D environments.

The blog post reads, “Our goal in sharing AI Habitat is to provide the most universal simulator to date for embodied research, with an open, modular design that’s both powerful and flexible enough to bring reproducibility and standardized benchmarks to this subfield.”

Last week the Facebook AI team also shared Replica, a dataset of reconstructions of various indoor spaces. The 3D reconstructions could be of a staged apartment, retail store, or any indoor spaces. Currently, AI Habitat can run Replica’s state-of-the-art reconstructions and can also work with existing 3D assets created for embodied research including the Gibson and Matterport3D data sets.


AI Habitat’s modular software stack involves the principles of compatibility and flexibility. The blog reads, “We incorporated direct feedback from the research community to develop this degree of flexibility, and also pushed the state of the art in training speeds, making the simulator able render environments orders of magnitude faster than previous simulators.”

This platform has already been tested and is now available. The Facebook team recently hosted an autonomous navigation challenge that ran on the platform. The winning teams will be awarded the Google Cloud credits at the Habitat Embodied Agents workshop at CVPR 2019.

AI Habitat is also the part of Facebook AI’s ongoing effort for creating systems that rely less on large annotated data sets that are used for supervised training.

The blog reads, “As more researchers adopt the platform, we can collectively develop embodied AI techniques more quickly, as well as realize the larger benefits of replacing yesterday’s training data sets with active environments that better reflect the world we’re preparing machine assistants to operate in.”

The Facebook AI researchers had proposed a paper, Habitat: A Platform for Embodied AI Research in April this year. The paper highlights the set of design requirements the team sought to fulfill. Have a look at a few of the requirements below:

Performant rendering engine: The team aimed for a resource efficient rendering engine for producing multiple channels of visual information including RGB (Red, Green, Blue), depth, semantic instance segmentation, surface normals, etc for multiple operating agents.

Scene dataset ingestion API: Next, there was a requirement for making the platform agnostic to 3D scene datasets that allow users to use their own datasets. So, the team then aimed for a dataset ingestion API.

Agent API: It helps users to specify parameterized embodied agents with well-defined geometry, physics, as well as actuation characteristics.

Sensor suite API: It helps in the specification of arbitrary numbers of parameterized sensors including, RGB, depth, contact, GPS, compass sensors that are attached to each agent.

AI Habitat features a stack of three layers

With AI Habitat, the team aims to retain the simulation-related benefits that past projects demonstrated including speeding experimentations and RL-based training, and further applying them to a widely compatible and realistic platform.

AI Habitat features a stack of three modular layers, where each of them can be configured or even replaced to work with different kinds of agents, evaluation protocols, training techniques, and environments. The simulation engine known as the Habitat-Sim forms the base of the stack including built-in support for existing 3D environment data sets, including Gibson, Matterport3D, etc. Habitat-Sim can also be used in abstracting the details of specific data sets and further applying them across simulations.

Habitat-API is the second layer in AI Habitat’s software stack which is a high-level library that defines tasks such as visual navigation and question answering. This API incorporates the use of additional data, configurations and further simplifies and standardizes the training as well as evaluation of embodied agents.

The third and final layer of this platform where users specify training and evaluation parameters, such as how difficulty might ramp across multiple runs and further ask about what metrics to focus on.

According to the researchers, the future of AI Habitat and embodied AI research lies in the simulated environments that are indistinguishable from real life.  

Replica data sets by FRL researchers

In the case of Replica, the FRL (Facebook Reality Labs) researchers created the data set consisting of scans of 18 scenes that range in size, from an office conference room to a two-floor house. The team also annotated the environments with semantic labels, such as “window” and “stairs,” that included labels for individual objects, such as book or plant. And for creating such a data set, FRL researchers used proprietary camera technology as well as a spatial AI technique that’s based on the simultaneous localization and mapping (SLAM) approaches. Replica further captures the details in the raw video, reconstructing dense 3D meshes along with high-resolution as well as high dynamic range textures.

The data used for generating Replica removes any personal details including family photos that could identify an individual. The researchers had to manually fill in the small holes that are inevitably missed during scanning. They also used a 3D paint tool for applying annotations directly onto meshes.

The blog reads, “Running Replica’s assets on the AI Habitat platform reveals how versatile active environments are to the research community, not just for embodied AI but also for running experiments related to CV and ML.”

Habitat Challenge for the embodied platform

The researchers held the Habitat Challenge in April-May this year, a competition that focused on evaluating the task of goal-directed visual navigation. The aim was to demonstrate the utility of AI Habitat’s modular approach as well as emphasis on 3D photo-realism.

This challenge required participants to upload the code which was different from the traditional one where usually people upload predictions that are based on a task related to a given benchmark. Also, the code was run on new environments that their agents were not familiar with.

The top-performing teams are Team Arnold (a group of researchers from CMU) and Team Mid-Level Vision (a group of researchers from Berkeley and Stanford).

The blog further reads, “Though AI Habitat and Replica are already powerful open resources, these releases are part of a larger commitment to research that’s grounded in physical environments. This is work that we’re pursuing through advanced simulations, as well as with robots that learn almost entirely through unsimulated, physical training. Traditional AI training methods have a head start on embodied techniques that’s measured in years, if not decades.”

To know more about this news, check out Facebook AI’s blog post.

Read Next

Facebook researchers show random methods without any training can outperform modern sentence embeddings models for sentence classification

Facebook researchers build a persona-based dialog dataset with 5M personas to train end-to-end dialogue systems

Facebook AI researchers investigate how AI agents can develop their own conceptual shared language