Role of AI in Robot Localization and Mapping

Artificial intelligence (AI) techniques like deep learning (DL) are contributing significantly to advancements in robot localization and mapping (RLAM). By analyzing data from various sensors like cameras and light detection and ranging (LiDAR), AI algorithms assist in building detailed maps and accurately estimating robot location, which empowers robots to autonomously perform complex tasks and navigate in diverse environments. Thus, AI-based LAM facilitates the development of more independent and intelligent robots for different applications. This article deliberates on the application of AI techniques, specifically DL, in various aspects of RLAM.

Image credit: Zapp2Photo/Shutterstock
Image credit: Zapp2Photo/Shutterstock

Introduction to RLAM

Robots that are required to autonomously make decisions must be able to effectively perceive their environment and determine their system states using onboard sensors. Robust and precise localization and constant processing of new information and adaptation to different scenarios are essential to enable a high level of autonomy in robots.

Localization is the ability to obtain the robot's motion's internal system states, including velocities, orientations, and locations, while mapping indicates the ability to perceive the external environmental states and capture the semantics, appearance, and geometry of a two-dimensional (2D) or three-dimensional (3D) scene. Both localization and mapping can act independently to sense the external or internal states or jointly as simultaneous localization and mapping (SLAM) to track pose and develop a consistent environmental model in a global frame.

Importance of DL in RLAM

Recently, DL-based LAM has received significant attention as it solves the RLAM problem in a data-driven manner and serves as an alternative to hand-designed algorithms developed by exploiting geometric theories or physical models.

Learning methods leverage highly expressive deep neural networks (DNNs) as universal approximators and discover task-relevant features automatically, which enables learned models to become resilient to circumstances like inaccurate camera calibration, motion blur, dynamic lighting conditions, and featureless areas.

Additionally, learning approaches connect abstract elements with human-understandable terms, such as semantic labeling in SLAM. Learning methods also enable spatial machine intelligence systems to learn from previous experience and exploit new information proactively.

Developing a generic data-driven model eliminates the need for human effort to specify mathematical and physical rules to solve the domain-specific problem before deployment. This ability allows learning machines to discover new computational solutions automatically, develop themselves, and improve their models in new scenarios or while confronting unknown circumstances. Learned representations also effectively support high-level tasks like decision-making and path planning by constructing task-driven maps.

Moreover, using DNNs/DL allows for full exploitation of the sensor data and computational power. The numerous parameters within a DNN framework are optimized automatically by minimizing a loss function and training on large datasets using backpropagation and gradient-descent algorithms. 

Existing DL-based LAM approaches are categorized into odometry estimation, global localization, mapping, and SLAM.

Odometry Estimation

Odometry estimation involves calculating the relative change in pose between two or more sensor data frames based on rotation and translation. It constantly tracks self-motion and is followed by a process for integrating the pose changes about an initial state to derive a global pose based on orientation and position.

Odometry estimation provides pose information and as an odometry motion model to assist the robot control feedback loop. For instance, visual odometry measures a camera's ego motion and integrates the relative motion between images into global poses.

DL methods extract high-level feature representations from images to solve the visual odometry problem. For instance, DeepVO combines recurrent neural networks (RNN) and convolutional neural networks (ConvNet) to enable end-to-end learning of visual odometry. In mobile robotics, the integration of inertial and visual data as visual-inertial odometry (VIO) is a well-defined problem. A DNN framework was proposed in a study to realize VIO in an end-to-end manner.

The VINet utilizes a ConvNet-based visual encoder for extracting visual features from two consecutive (red, green, blue) RGB images. Additionally, it uses an inertial encoder for inertial feature extraction from a sequence of inertial measurement unit data with a long short-term memory network.

A study trained a DNN to regress linear velocities from inertial data, calibrated the collected accelerations to meet the learned velocities' constraints, and doubly integrated the accelerations into locations using a conventional physical model. Data-driven methods leverage DNN to develop a mapping function from point cloud scan sequences to pose estimates to solve LIDAR odometry in an end-to-end manner.


Mapping reconstructs and builds a consistent model for describing the surrounding environment. Mapping is used to provide environment information for high-level robot tasks, retrieve the inquiry observation for global localization, and limit the error drifts of odometry estimation.

DL is utilized as an important tool to discover scene semantics and geometry from high-dimensional raw data for mapping. DL-based mapping methods are categorized into general, semantic, and geometric mapping, based on whether the neural network learns the explicit semantics or geometry of a scene, or encodes the scene into an implicit neural representation.

For instance, a convolutional decoder based on an octree-based formulation has been designed to enable scene reconstruction at a significantly higher resolution. Deep autoencoders identify the high-level compact representation of high-dimensional data. For instance, CodeSLAM encodes observed images into an optimizable and compact representation that contains the essential information of a dense scene.

Global Localization

Global localization retrieves the mobile agent's global pose in a known scene using prior knowledge. This is realized by matching the inquiry input data with a pre-built 3D or 2D map, a scene that has been previously visited, or other spatial references.

Global localization can be leveraged to solve the 'kidnapped robot' problem/decrease the pose drift of a dead reckoning system. DL is employed to address the data association problem that becomes complicated by the changes in scene dynamics, weather, illumination, and views between the map and inquiry data.

DL-based approaches based on a pre-trained ConvNet model are reused to obtain image-level features, which are then utilized to evaluate the similarities against other images.


SLAM integrates global localization, mapping, and odometry estimation processes as front-ends and optimizes these modules jointly to improve performance in both mapping and localization.

The combination of DL and SLAM is primarily embodied in two aspects, including replacing one or multiple SLAM modules, like feature extraction, inter-frame estimation, and depth information estimation, through DL and combining semantic map and SLAM.

For instance, a point tracking system has been developed using two deep convolutional neural networks (DCNNs), with the first network, MagicPoint, extracting 2D feature points from a single image, and the output of MagicPoint is then matched with the second network, MagicWarp.

The points extracted by MagicPoint are used as SLAM features. This method only utilizes the location of points without requiring the descriptor of local points, unlike the conventional method. Similarly, a new monocular vision odometer system, UnDeepVO, was developed to estimate the depth of the monocular image and the pose of the monocular camera. UnDeepVO has two distinct characteristics: unsupervised DL method and absolute scale recovery.

It was trained to restore scale using binocular images and evaluated using continuous monocular images. Experiments on the KITTI dataset demonstrated that UnDeepVO has higher accuracy compared to other mono-visual visual odometry methods in pose estimation.

A ConvNet was employed to measure the depth information of semantically segmented RGB images and RGB images based on the monocular camera. The global semantic map was obtained when the information was combined with conventional SLAM. Experiments on ICL-NUIM and LSD-SLAM NYUDv2 datasets showed that the method has good robustness and accuracy.

SLAM based on RGB-D image sequence was performed, and semantic maps were reconstructed in a study. DVO SLAM was utilized to obtain the robot pose, and the single-frame RGB-D image was segmented semantically using FuseNet. Additionally, the RGB-D image sequences were mapped to the global coordinate system after obtaining the robot pose, and the overlapping parts of the multi-frame images were associated to semantically segment the multi-frame images and obtain a consistent semantic map.

Experiment results displayed that the multi-frame images' semantics segmentation results in multi-view are more accurate compared to single-frame images. Thus, the combination of semantic map and SLAM obtained by DL is advantageous for robot navigation. Overall, DL is revolutionizing RLAM by enabling data-driven solutions that outperform traditional methods. However, safety, reliability, interpretability, and scalability issues must be mitigated before large-scale deployment of DL methods for RLAM.

References and Further Reading

Chen, C., Wang, B., Lu, C. X., Trigoni, N., Markham, A. (2020). A survey on deep learning for localization and mapping: Towards the age of spatial machine intelligence.

Jia, Y., Yan, X., Xu, Y. (2019). A Survey of simultaneous localization and mapping for robot. 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 1, 857-861).

Sünderhauf, N., Brock, O., Scheirer, W., Hadsell, R., Fox, D. et al. (2018). The limits and potentials of deep learning for robotics. The International journal of robotics research, 37(4-5), 405-420.

Last Updated: Feb 13, 2024

Samudrapom Dam

Written by

Samudrapom Dam

Samudrapom Dam is a freelance scientific and business writer based in Kolkata, India. He has been writing articles related to business and scientific topics for more than one and a half years. He has extensive experience in writing about advanced technologies, information technology, machinery, metals and metal products, clean technologies, finance and banking, automotive, household products, and the aerospace industry. He is passionate about the latest developments in advanced technologies, the ways these developments can be implemented in a real-world situation, and how these developments can positively impact common people.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Dam, Samudrapom. (2024, February 13). Role of AI in Robot Localization and Mapping. AZoAi. Retrieved on April 16, 2024 from

  • MLA

    Dam, Samudrapom. "Role of AI in Robot Localization and Mapping". AZoAi. 16 April 2024. <>.

  • Chicago

    Dam, Samudrapom. "Role of AI in Robot Localization and Mapping". AZoAi. (accessed April 16, 2024).

  • Harvard

    Dam, Samudrapom. 2024. Role of AI in Robot Localization and Mapping. AZoAi, viewed 16 April 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.