Why Large Language Models Need Stronger Security and Ethical Governance

A sweeping academic review reveals how phishing, jailbreaking, hallucinations, and bias threaten public trust in AI and why robust defenses, combined with ethical oversight, are critical for the safe future of large language models.

Review: The ethical security of large language models: A systematic review. Image Credit: napaporn sawaspardit / Shutterstock

Large language models (LLMs) such as generative pre-trained transformer (GPT), bidirectional encoder representations from transformers (BERT), and T5 have transformed sectors ranging from education and healthcare to digital governance. Their ability to generate fluent, human-like text enables automation and accelerates information workflows. However, this same capability increases exposure to cyber-attacks, model manipulation, misinformation, and biased outputs that can mislead users or amplify social inequalities. Academic researchers warn that without systematic regulation and defense mechanisms, LLM misuse may threaten data security, public trust, and social stability. Based on these challenges, further research is required to improve model governance, strengthen defenses, and mitigate ethical risks.

Comprehensive Review of Ethical Security Risks

A research team from Shanghai Jiao Tong University and East China Normal University published a comprehensive review in the journal Frontiers of Engineering Management (2025) examining ethical security risks in large language models. The study screened over 10,000 documents and distilled 73 key works to summarize threats, including phishing attacks, malicious code generation, data leakage, hallucinations, social bias, and jailbreaking. The review further evaluates defense tools, including adversarial training, input preprocessing, watermarking, and model alignment strategies.

Misuse-Based Risks and Malicious Model Attacks

The review categorizes LLM-related security threats into two major domains: misuse-based risks and malicious attacks targeting models. Misuse includes phishing emails crafted with near-native fluency, automated malware scripting, identity spoofing, and the production of large-scale false information. Malicious attacks occur at both the data/model levels, such as model inversion, poisoning, extraction, and at the user-interaction level, including prompt injection and jailbreak techniques. These attacks may access private training data, bypass safety filters, or induce harmful content output.

Parameter Processing as a Defense Approach

On defense strategy, the study summarizes three mainstream technical approaches: parameter processing, which removes redundant parameters to reduce attack exposure; input preprocessing, which paraphrases prompts or detects adversarial triggers without retraining; and adversarial training, including red-teaming frameworks that simulate attacks to improve robustness.

Input Preprocessing and Adversarial Training Methods

The review also introduces detection technologies like semantic watermarking and CheckGPT, which can identify model-generated text with up to 98–99% accuracy. Despite progress, defenses often lag behind evolving attacks, indicating an urgent need for scalable, low-cost, multilingual-adaptive solutions.

Ethical Governance Beyond Technical Safeguards

The authors emphasize that technical safeguards must coexist with ethical governance. They argue that hallucination, bias, privacy leakage, and misinformation are social-level risks, not merely engineering problems. To ensure trust in LLM-based systems, future models should integrate transparency, verifiable content traceability, and cross-disciplinary oversight. Ethical review frameworks, dataset audit mechanisms, and public awareness education will become essential in preventing misuse and protecting vulnerable groups.

Future Directions for Secure and Responsible LLMs

The study suggests that secure and ethical development of LLMs will shape how societies adopt AI. Robust defense systems may protect financial systems from phishing, reduce medical misinformation, and maintain scientific integrity. Meanwhile, watermark-based traceability and red-teaming may become industry standards for model deployment. The researchers encourage future work toward AI responsible governance, unified regulation frameworks, safer training datasets, and model transparency reporting. If well-managed, LLMs can evolve into reliable tools to support education, digital healthcare, and innovation ecosystems while minimizing risks related to cybercrime and social misinformation.

Source:
Journal reference:

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.

or

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Why AI Still Can’t Fool Us: Study Reveals How Chatbots Miss Human Conversation Cues