A sweeping academic review reveals how phishing, jailbreaking, hallucinations, and bias threaten public trust in AI and why robust defenses, combined with ethical oversight, are critical for the safe future of large language models.

Review: The ethical security of large language models: A systematic review. Image Credit: napaporn sawaspardit / Shutterstock
Large language models (LLMs) such as generative pre-trained transformer (GPT), bidirectional encoder representations from transformers (BERT), and T5 have transformed sectors ranging from education and healthcare to digital governance. Their ability to generate fluent, human-like text enables automation and accelerates information workflows. However, this same capability increases exposure to cyber-attacks, model manipulation, misinformation, and biased outputs that can mislead users or amplify social inequalities. Academic researchers warn that without systematic regulation and defense mechanisms, LLM misuse may threaten data security, public trust, and social stability. Based on these challenges, further research is required to improve model governance, strengthen defenses, and mitigate ethical risks.
Comprehensive Review of Ethical Security Risks
A research team from Shanghai Jiao Tong University and East China Normal University published a comprehensive review in the journal Frontiers of Engineering Management (2025) examining ethical security risks in large language models. The study screened over 10,000 documents and distilled 73 key works to summarize threats, including phishing attacks, malicious code generation, data leakage, hallucinations, social bias, and jailbreaking. The review further evaluates defense tools, including adversarial training, input preprocessing, watermarking, and model alignment strategies.
Misuse-Based Risks and Malicious Model Attacks
The review categorizes LLM-related security threats into two major domains: misuse-based risks and malicious attacks targeting models. Misuse includes phishing emails crafted with near-native fluency, automated malware scripting, identity spoofing, and the production of large-scale false information. Malicious attacks occur at both the data/model levels, such as model inversion, poisoning, extraction, and at the user-interaction level, including prompt injection and jailbreak techniques. These attacks may access private training data, bypass safety filters, or induce harmful content output.
Parameter Processing as a Defense Approach
On defense strategy, the study summarizes three mainstream technical approaches: parameter processing, which removes redundant parameters to reduce attack exposure; input preprocessing, which paraphrases prompts or detects adversarial triggers without retraining; and adversarial training, including red-teaming frameworks that simulate attacks to improve robustness.
Input Preprocessing and Adversarial Training Methods
The review also introduces detection technologies like semantic watermarking and CheckGPT, which can identify model-generated text with up to 98–99% accuracy. Despite progress, defenses often lag behind evolving attacks, indicating an urgent need for scalable, low-cost, multilingual-adaptive solutions.
Ethical Governance Beyond Technical Safeguards
The authors emphasize that technical safeguards must coexist with ethical governance. They argue that hallucination, bias, privacy leakage, and misinformation are social-level risks, not merely engineering problems. To ensure trust in LLM-based systems, future models should integrate transparency, verifiable content traceability, and cross-disciplinary oversight. Ethical review frameworks, dataset audit mechanisms, and public awareness education will become essential in preventing misuse and protecting vulnerable groups.
Future Directions for Secure and Responsible LLMs
The study suggests that secure and ethical development of LLMs will shape how societies adopt AI. Robust defense systems may protect financial systems from phishing, reduce medical misinformation, and maintain scientific integrity. Meanwhile, watermark-based traceability and red-teaming may become industry standards for model deployment. The researchers encourage future work toward AI responsible governance, unified regulation frameworks, safer training datasets, and model transparency reporting. If well-managed, LLMs can evolve into reliable tools to support education, digital healthcare, and innovation ecosystems while minimizing risks related to cybercrime and social misinformation.
Source:
Journal reference: