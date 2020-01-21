Training an artificial intelligence agent to do something like navigate a complex 3D world is computationally expensive and time-consuming. In order to better create these potentially useful systems, Facebook engineers derived huge efficiency benefits from, essentially, leaving the slowest of the pack behind.

It's part of the company's new focus on "embodied AI," meaning machine learning systems that interact intelligently with their surroundings. That could mean lots of things — responding to a voice command using conversational context, for instance, but also more subtle things like a robot knowing it has entered the wrong room of a house. Exactly why Facebook is so interested in that I'll leave to your own speculation, but the fact is they've recruited and funded serious researchers to look into this and related domains of AI work.

To create such "embodied" systems, you need to train them using a reasonable facsimile of the real world. One can't expect an AI that's never seen an actual hallway to know what walls and doors are. And given how slow real robots actually move in real life you can't expect them to learn their lessons here. That's what led Facebook to create Habitat, a set of simulated real-world environments meant to be photorealistic enough that what an AI learns by navigating them could also be applied to the real world.





Such simulators, which are common in robotics and AI training, are also useful because, being simulators, you can run many instances of them at the same time — for simple ones, thousands simultaneously, each one with an agent in it attempting to solve a problem and eventually reporting back its findings to the central system that dispatched it.

Unfortunately, photorealistic 3D environments use a lot of computation compared to simpler virtual ones, meaning that researchers are limited to a handful of simultaneous instances, slowing learning to a comparative crawl.

The Facebook researchers, led by Dhruv Batra and Erik Wijmans, the former a professor and the latter a PhD student at Georgia Tech, found a way to speed up this process by an order of magnitude or more. And the result is an AI system that can navigate a 3D environment from a starting point to goal with a 99.9% success rate and few mistakes.

Simple navigation is foundational to a working "embodied AI" or robot, which is why the team chose to pursue it without adding any extra difficulties.

"It's the first task. Forget the question answering, forget the context — can you just get from point A to point B? When the agent has a map this is easy, but with no map it's an open problem," said Batra. "Failing at navigation means whatever stack is built on top of it is going to come tumbling down."

The problem, they found, was that the training systems were spending too much time waiting on slowpokes. Perhaps it's unfair to call them that — these are AI agents that for whatever reason are simply unable to complete their task quickly.

"It's not necessarily that they're learning slowly," explained Wijmans. "But if you're simulating navigating a one-bedroom apartment, it's much easier to do that than navigate a 10-bedroom mansion."

The central system is designed to wait for all its dispatched agents to complete their virtual tasks and report back. If a single agent takes 10 times longer than the rest, that means there's a huge amount of wasted time while the system sits around waiting so it can update its information and send out a new batch.

This little explanatory gif shows how when one agent gets stuck, it delays others learning from its experience.

The innovation of the Facebook team is to intelligently cut off these unfortunate laggards before they finish. After a certain amount of time in simulation, they're done, and whatever data they've collected gets added to the hoard.

