Reinforcement Learning for Boosting Autonomous Highway Safety

In a paper published in the journal Sensors, researchers addressed robust safety in autonomous highway driving using reinforcement learning (RL) by introducing the replay buffer constrained policy optimization (RECPO) method to ensure policies-maintained safety while maximizing rewards.

Study: Reinforcement Learning for Boosting Autonomous Highway Safety. Image Credit: AlinStock/Shutterstock
Study: Reinforcement Learning for Boosting Autonomous Highway Safety. Image Credit: AlinStock/Shutterstock

They transformed the problem into a constrained Markov decision process (CMDP) and optimized highway driving policies. Deployed in-car autonomous racing learning activity (CARLA) simulation, RECPO outperformed traditional methods, achieving zero collisions and enhancing decision-making stability.


Past research in autonomous driving explored rule-based and optimization-driven algorithms using control Lyapunov functions (CLF), control barrier functions (CBF), and finite state machines (FSM) to enhance safety and traffic flow. Recent advancements in machine learning (ML), including deep RL and model predictive control (MPC), aimed to improve decision-making efficiency but faced challenges in dynamic environments. Safe RL frameworks like CMDP and CPO integrated constraints to optimize policies, addressing safety and adaptability concerns in autonomous systems.

Safe RL Framework Overview

In transforming highway autonomous driving into a safe RL framework like CMDP, key components are defined: state space (S), action space (A), reward function (R), and cost function (C). CMDP extends the standard MDP by introducing constraints on strategies, ensuring safe decision-making under uncertainty. The observation space (S') includes EV data (S'EV), surrounding vehicle data (S'SV), and environmental information (SEN). The A encompasses longitudinal and lateral control behaviors, while the C evaluates risks like collisions and illegal maneuvers, guiding the agent toward safe driving practices.

The goal is to optimize a policy to maximize the expected cumulative reward while adhering to safety constraints defined by the cost function. This approach integrates real-time observations and safety considerations, which are crucial for autonomous vehicles navigating complex highway environments.

Advanced Autonomous Driving Optimization

The RECPO algorithm is applied in autonomous highway driving, integrating advanced techniques to optimize policy updates while ensuring adherence to safety constraints. The algorithm operates by continually sampling and storing trajectories in a replay buffer, utilizing importance weights to efficiently reuse historical data for faster learning and mitigating issues like catastrophic forgetting.

Policy updates in RECPO are guided by safety considerations using a trust region approach. It ensures that policy iterations maximize expected rewards without compromising safety. The algorithm evaluates policy updates based on safety metrics and applies different optimization methods depending on the safety level of the current policy. Additionally, RECPO updates the parameters of value networks using gradient-based methods to enhance prediction accuracy, supporting effective decision-making by the policy network in complex driving environments.

Experimental Evaluation Summary

This study evaluated the effectiveness, robustness, and safety of the RECPO algorithm using the Carla simulation platform in a constructed highway driving scenario. Comparative experiments included advanced deep deterministic policy gradient (DDPG)-based strategies, intelligent driver model (IDM) + minimizing overall braking induced by lane changes (MOBIL), and the traditional CPO algorithm. The simulations utilized a custom-built 10-km long, three-lane highway scenario, ensuring realistic driving conditions with controlled variables like speed variations and lane changes.

During the experimental training phase, RECPO, CPO, and DDPG algorithms were trained and compared based on performance metrics such as rewards, safety, and efficiency. RECPO showed a significant advantage in reward convergence, reaching high values early in training compared to CPO. Both algorithms demonstrated effective safety improvements, with RECPO achieving faster convergence rates and superior performance metrics in cost reduction and success rates compared to CPO and DDPG. The importance of a sampling-based experience replay pool in RECPO facilitated quicker learning and adaptation to the driving environment, underscoring its efficacy in autonomous decision-making.

After deployment and performance testing, RECPO and CPO outperformed IDM + MOBIL in terms of driving efficiency, passenger comfort, and safety. RECPO maintained stable acceleration and low jerk during high-speed maneuvers, enhancing passenger comfort compared to IDM + MOBIL. Safety evaluations revealed RECPO's robust performance in maintaining safe distances and achieving a 100% success rate, contrasting sharply with DDPG's poor performance and high collision rates. RECPO demonstrated superior adaptability and faster convergence rates, highlighting its potential for enhancing autonomous driving systems in complex scenarios. This version condenses the key findings and conclusions while maintaining the essential details of the experimental setup and results.


To sum up, this paper proposed a decision-making framework for highway autonomous driving that prioritized safety and ensured robust performance. The framework utilized a CPO method to construct an RL policy optimizing rewards while adhering to safety constraints. Importance sampling techniques were introduced to facilitate data collection and storage in a Replay buffer, preventing catastrophic forgetting and enhancing policy optimization. The RECPO method optimized autonomous driving strategies in a CMDP framework, showing enhanced convergence speed, safety, and decision stability in CARLA simulations. Future work targets complex scenarios and sensor uncertainties with advanced neural networks and hierarchical decision frameworks.

Journal reference:
  • Zhao, R., et al. (2024). Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning. Sensors, 24:13, 4140. DOI:10.3390/s24134140,
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2024, July 05). Reinforcement Learning for Boosting Autonomous Highway Safety. AZoAi. Retrieved on July 17, 2024 from

  • MLA

    Chandrasekar, Silpaja. "Reinforcement Learning for Boosting Autonomous Highway Safety". AZoAi. 17 July 2024. <>.

  • Chicago

    Chandrasekar, Silpaja. "Reinforcement Learning for Boosting Autonomous Highway Safety". AZoAi. (accessed July 17, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2024. Reinforcement Learning for Boosting Autonomous Highway Safety. AZoAi, viewed 17 July 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Optimizing Conversational Bots for Rule Adherence