Hierarchical Reinforcement Learning Enhances Multi-UAV Air Combat

In an article published in the journal Scientific Reports, researchers from China proposed an innovative approach to improve the decision-making efficiency of multiple unmanned aerial vehicles (UAVs) in air combat scenarios. They combined hierarchical reinforcement learning with experience decomposition and transformation techniques to enable the UAVs to learn complex combat strategies quickly and steadily. Moreover, their method was tested and validated using the Java scriptable building simulation model (JSBSim) simulation platform and showed superior performance compared to several baseline algorithms.

Study: Hierarchical Reinforcement Learning Enhances Multi-UAV Air Combat. Image credit: Mike Mareen/Shutterstock
Study: Hierarchical Reinforcement Learning Enhances Multi-UAV Air Combat. Image credit: Mike Mareen/Shutterstock


UAVs have become increasingly important in various domains, such as operations, surveillance, and reconnaissance, due to their cost-effectiveness and low risk of casualties. In recent years, they have been involved in several military operations worldwide, and many countries have invested in developing UAV-based combat decision-making technologies. However, many existing methods rely on rule-based design, which faces challenges in dealing with complex and dynamic multi-UAV combat environments.

Reinforcement learning (RL) is a subset of machine learning methods that enables agents to learn optimal behaviors through trial-and-error interactions with the environment. It has been applied to air combat decision-making problems, but it also suffers from some limitations, such as low training efficiency, high action space complexity, and poor adaptation to intricate scenarios. Therefore, there is a need for more effective and efficient RL methods for multi-UAV air combat decision-making.

About the Research

In the present paper, the authors designed a hierarchical decision-making network for multi-UAV air combat scenarios based on hierarchical reinforcement learning, which is a branch of RL that can simplify complex problems into smaller and more manageable subtasks. Their technique separated the UAV decision control task into two types of actions: flight actions and attack actions. Flight actions involved controlling the UAV’s flight angle, speed, and altitude, while attack actions involved choosing whether to engage in an attack and selecting the target. The new method used two parallel decision-making networks, one for flight actions and one for attack actions, to reduce the spatial dimensions of action decision-making and improve learning efficiency.

The developed approach also introduced an experience decomposition and transformation technique, which aimed to increase the quality and quantity of the training experiences. It decomposed the complex combat tasks into several stages based on the number of enemy UAVs destroyed and recalculated the associated rewards. This way, the technique not only expanded the experience gained during each combat round but also broke down the complex battle process into simpler subgoals, thereby reducing the learning complexity of the model.

The study implemented the presented method using the monotonic value function factorization (QMIX algorithm), which is a value decomposition network that could learn decentralized policies for each agent while maintaining a centralized critic that estimated the joint action-value function. Moreover, the technique was evaluated using the JSBSim simulation platform, which is an open-source flight dynamics model that could simulate the performance and behavior of various aircraft. The authors created different combat scenarios, such as four versus four (4v4) and eight versus eight (8v8), and compared the method with several baseline algorithms, such as value decomposition networks (VDN), counterfactual multi-agent policy gradients (COMA), and QMIX.

Research Findings

The outcomes showed that the proposed method significantly outperformed the baseline algorithms regarding the win rate, the combat loss rate, the convergence speed, and the stability. The method achieved a win rate of over 90% in both 4v4 and 8v8 scenarios, while the baseline algorithms ranged from 60% to 80%. The method also had a lower combat loss rate, which indicates the percentage of lost UAVs by the end of the battle, than the baseline algorithms. Moreover, the method converged faster and more stable than the baseline algorithms, indicating that it learned more efficiently and robustly.

The authors also conducted ablation studies to analyze the impact of different components of the method on the performance. They found that both the hierarchical decision-making network and the experience decomposition technique contributed to the improvement of the method, but the experience decomposition technique had a greater effect on the convergence speed, while the hierarchical decision-making network had a greater effect on the stability.

Furthermore, the authors tested the method in various disadvantageous combat situations, such as five versus eight (5v8), six versus eight (6v8), and seven versus (7v8), to evaluate its effectiveness in realistic scenarios. The results showed that the method still achieved a high win rate and a low combat loss rate in these situations, while the baseline algorithms performed poorly. The method also demonstrated some emergent strategies derived from the multi-UAV combat model, such as efficient attack, dispersal of the formation, and high-speed circling to find a gaming advantage.


In summary, the novel approach is efficient for enhancing the decision-making efficiency of multi-UAV air combat. This technique could effectively improve the training efficiency and performance of UAV agents in complex and dynamic air combat scenarios. Additionally, it could help design decision-making methods tailored to more complex and realistic multi-UAV combat environments. Moreover, it can also be extended to other domains that involve multi-agent cooperation and competition, such as robotics, games, and transportation.

The researchers acknowledged limitations and challenges and suggested directions for future research. They recommended that further work could explore more advanced RL algorithms, such as meta-learning, transfer learning, or multi-task learning, to enhance the learning speed and performance of the model. More realistic factors, such as communication constraints, sensor noise, or environmental disturbances, could also be incorporated into the simulation platform to improve the robustness of the novel technique.

Journal reference:
Muhammad Osama

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Osama, Muhammad. (2024, March 01). Hierarchical Reinforcement Learning Enhances Multi-UAV Air Combat. AZoAi. Retrieved on May 27, 2024 from https://www.azoai.com/news/20240301/Hierarchical-Reinforcement-Learning-Enhances-Multi-UAV-Air-Combat.aspx.

  • MLA

    Osama, Muhammad. "Hierarchical Reinforcement Learning Enhances Multi-UAV Air Combat". AZoAi. 27 May 2024. <https://www.azoai.com/news/20240301/Hierarchical-Reinforcement-Learning-Enhances-Multi-UAV-Air-Combat.aspx>.

  • Chicago

    Osama, Muhammad. "Hierarchical Reinforcement Learning Enhances Multi-UAV Air Combat". AZoAi. https://www.azoai.com/news/20240301/Hierarchical-Reinforcement-Learning-Enhances-Multi-UAV-Air-Combat.aspx. (accessed May 27, 2024).

  • Harvard

    Osama, Muhammad. 2024. Hierarchical Reinforcement Learning Enhances Multi-UAV Air Combat. AZoAi, viewed 27 May 2024, https://www.azoai.com/news/20240301/Hierarchical-Reinforcement-Learning-Enhances-Multi-UAV-Air-Combat.aspx.


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Enhancing UAV Safety with Machine Learning