Vision and Language Navigation Using Minimal Voice Instructions
Keywords:
Indoor Navigation, Computer Vision, Natural Language Processing, Matterport 3DAbstract
The proposed system aims to design an algorithm
that can be used to navigate any 3-D mapped environment, using
the Matterport 3D Simulator by giving only minimal voice
instructions. During the training phase, the nodes of a selected
environment are traversed sequentially in the Simulator and an
object recognition algorithm is applied on the panorama at each
node. This helps in identifying and tagging the objects in the
vicinity of each viewpoint. For the testing phase, a natural
language instruction, specifying the goal location is taken as
input. The goal location is identified from among the various
viewpoints in the 3D environment by matching it to the tags
generated in the testing phase. A shortest path algorithm is
employed to navigate from the starting location to the goal
location. The proposed system focuses on the implementation of
the algorithm which combines natural language processing and
computer vision and can be employed by agents for indoor
navigation.