Assessing the Impact of LLMs on Aviation Tasks

In a paper published in the Journal of the Air Transport Research Society, researchers explored the impact of large language models (LLMs) on air transportation. Artificial intelligence (AI) has already enhanced various aviation aspects like flight plan optimization, autonomous systems, predictive analytics, and passenger/crew assistance.

Study: Assessing The Impact of LLMs on Aviation Tasks. Image Credit: TippaPatt/Shutterstock
Study: Assessing The Impact of LLMs on Aviation Tasks. Image Credit: TippaPatt/Shutterstock

With their advanced text processing and generation capabilities, LLMs promise to revolutionize these areas further. The study has two main contributions: an experimental evaluation of 12 widely used LLMs on air transportation-related tasks, including fact retrieval, complex reasoning, and explanations, and a survey of graduate students at Beihang University, a leading aviation university in China, to understand their experiences and uses of LLMs. This research significantly advances the dissemination and application of LLMs in the aviation sector.


Past work on LLMs highlights their reliance on the transformer architecture, which uses self-attention mechanisms to capture contextual nuances. LLMs undergo training on extensive and diverse datasets, frequently fine-tuned for specific tasks, leading to varied performance across different benchmarks. They have transformed fields like machine translation, content generation, customer service, software development, healthcare, legal research, and finance.

Despite their significant advancements, challenges persist, including the substantial computational resources required for training and deploying these models and concerns about bias and fairness in their outputs stemming from inherent biases in the training data.

LLM Evaluation Summary

Through a comprehensive suite of experiments, this evaluation focuses on LLMs' performance, reliability, and applicability within the aviation field. The experiments target fact retrieval, complex reasoning, and explanation tasks, covering diverse aviation-related queries.

For instance, fact retrieval questions assessed the models' ability to retrieve precise data like engine types and airline alliances. In contrast, complex reasoning questions evaluated the models' capability to handle scenarios involving fuel hedging strategies and operational cost management. Explanation tasks explored the models' proficiency in articulating industry-specific terms and challenges.

The experiments revealed varying levels of accuracy among different LLMs. Models like Claude-2, Cohere, and enhanced representation through knowledge integration (ERNIE) demonstrated high precision in fact retrieval tasks but exhibited lower recall, indicating a tendency to miss some positive cases.

In complex reasoning tasks, models varied in their ability to provide accurate and insightful answers, with generative pre-trained transformer 3.5 (GPT-3.5) and LLM meta-AI 2 (Llama-2) performing well in explaining calculations and industry dynamics. Explanation tasks highlighted the models' ability to understand and articulate industry challenges, with GPT-3.5 and Llama-2 again showing strong performance by including a broad range of contemporary issues.

The aggregated results emphasize the importance of balancing precision and recall in LLMs for aviation applications, where accurate data-driven decisions are crucial. While most models showed high precision, recall values were generally lower, suggesting areas for improvement. Analysis of response speed and textual similarities revealed notable patterns: Mistral and GPT-3.5 were the fastest in generating answers, while Chinese models like ERNIE were slower.

Textual similarity analysis showed high conceptual overlap among several models, indicating similar training methodologies or data sources. These findings underscore the need for continued optimization to enhance the accuracy and applicability of LLMs in the high-stakes aviation industry.

LLM Usage Survey

The survey conducted among graduate students at Beihang University gathered 325 valid responses, providing insights into LLMs' attitudes towards and usage patterns. The average age of the participants was 23 years (20-37), with males constituting 70% of respondents, most of whom began using LLMs within the past six months. However, there was a noticeable delay among female respondents initially.

The frequency of LLM usage varied significantly among participants. Around 60% of both male and female respondents reported using LLMs at most once a week. However, about a third of the respondents, evenly split between genders, indicated daily usage, suggesting a significant portion of regular users among the surveyed population.

Regarding specific LLM models used, OpenAI's GPT-3.5 and GPT-4 were the most prevalent among respondents, particularly GPT-3.5 in its free variant. The analysts utilized other models, reflecting a concentrated preference among users for the more widely known and accessible models. The survey also highlighted a broad range of purposes for which LLMs were employed, predominantly in education and research contexts, with significant usage in computer science-related subjects and supporting academic tasks like coding and literature reviews.


To sum up, the study comprehensively evaluated LLMs' potential in the air transportation industry, combining experimental assessments and student surveys from Beihang University. While LLMs excel in fact-retrieval accuracy, their recall abilities need improvement, which is crucial for aviation's data-intensive operations. They demonstrate varying levels of reasoning depth, with models like GPT3.5 showing promising diversity in responses.

Survey insights underscored students' optimism for LLMs' transformative role in aviation, tempered by concerns over reliability and safety standards. Future research should focus on enhancing LLMs' specificity for aviation applications, aiming to optimize operational efficiency and safety standards in sectors like air traffic control and pilot training.

Journal reference:
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2024, July 10). Assessing the Impact of LLMs on Aviation Tasks. AZoAi. Retrieved on July 17, 2024 from

  • MLA

    Chandrasekar, Silpaja. "Assessing the Impact of LLMs on Aviation Tasks". AZoAi. 17 July 2024. <>.

  • Chicago

    Chandrasekar, Silpaja. "Assessing the Impact of LLMs on Aviation Tasks". AZoAi. (accessed July 17, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2024. Assessing the Impact of LLMs on Aviation Tasks. AZoAi, viewed 17 July 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Preference Alignment Framework for Enhancing Multi-Modal Large Language Models