A new survey from researchers in Shandong charts the rise of omni-modal language models that merge perception, reasoning, and generation across multiple data types, offering a roadmap for building AI systems with human-like cognitive versatility.

Review: A survey on omni-modal language models. Image Credit: 3Dsss / Shutterstock
The newly published survey, “A Survey on Omni-Modal Language Models,” provides a systematic overview of the technological evolution, structural paradigms, and evaluation frameworks of omni-modal language models (OMLMs). These advanced AI systems unify perception, reasoning, and generation across multiple modalities. The study highlights the central role of OMLMs in advancing the pursuit of Artificial General Intelligence (AGI).
Background and authorship
The research was conducted by Lu Chen, a master’s student at the School of Computer and Artificial Intelligence, Shandong Jianzhu University, in collaboration with Dr. Zheyun Qin, a postdoctoral researcher at the School of Computer Science and Technology, Shandong University. Their joint work, titled “A Survey on Omni-Modal Language Models,” was published in the AI+ Journal.
Overview of omni-modal language models (OMLMs)
The paper provides a detailed review of the evolution, architecture, and evaluation methodologies of OMLMs, a new generation of artificial intelligence systems that can integrate and reason across diverse modalities, including text, images, audio, and video. Unlike traditional multimodal systems that rely predominantly on one input form, OMLMs achieve modality alignment, semantic fusion, and joint representation learning within a unified semantic space. This integration enables dynamic collaboration among modalities, facilitating end-to-end task processing from perception and reasoning to generation, thereby approximating human-like cognitive behavior.
Technical innovations and applications
The study introduces lightweight adaptation strategies, including modality pruning and adaptive scheduling, designed to enhance computational efficiency and enable real-time deployment in resource-constrained environments. These strategies are particularly applicable in medical diagnostics, industrial inspection, and intelligent education systems, underscoring OMLMs’ scalability across domains.
Through its multidimensional evaluation framework, the survey also benchmarks representative OMLM architectures and assesses performance across general and task-specific scenarios, providing a comprehensive understanding of their practical potential.
Expert perspectives
“Omni-modal models represent a paradigm shift in artificial intelligence. By integrating perception, understanding, and reasoning within a unified framework, they bring AI closer to the characteristics of human cognition,” said Lu Chen, the first author of the paper.
Corresponding author Dr. Zheyun Qin added, “Our survey not only summarizes the current progress of omni-modal research but also provides forward-looking insights into structural flexibility and efficient deployment.”
Significance and future outlook
This work serves as a comprehensive reference for researchers and practitioners in multimodal intelligence, offering theoretical foundations and technical guidance for the next generation of AI systems that merge large language models with multimodal perception technologies. The survey underscores OMLMs’ potential to bridge cognitive gaps in artificial systems and propel the field closer to general-purpose, human-aligned intelligence.
Source:
Journal reference: