By uniting domain-specific training with efficient fine-tuning, EnvGPT empowers scientists and policymakers with accurate, real-world insights, showing that smaller, smarter AI can rival the giants in environmental science.

Research: Fine-tuning large language models for interdisciplinary environmental challenges. Image Credit: tonton / Shutterstock
Environmental science integrates diverse disciplines, such as ecology, hydrology, and climate science, requiring models that can understand specialized jargon and heterogeneous data. While general-purpose large language models (LLMs) have advanced in fields such as medicine and law, they struggle with domain-specific environmental tasks due to their limited training on relevant corpora.
Previous efforts like ClimateGPT and WaterGPT focused on narrow subdomains, lacking a unified, cross-disciplinary approach. Based on these challenges, there is a critical need to develop integrated frameworks that generate high-quality environmental data and enable rigorous model evaluation.
Introducing EnvGPT: A Specialized Environmental Language Model
Published in the journal Environmental Science and Ecotechnology, researchers from Southern University of Science and Technology and Tsinghua University unveiled EnvGPT, a fine-tuned language model designed explicitly for environmental science.
The study presents a comprehensive pipeline including a multi-agent instruction generator (EnvInstruct), a balanced 100-million-token dataset (ChatEnv), and a 4998-item benchmark (EnvBench) to train and evaluate the model across five core environmental themes, namely climate change and atmospheric science, ecosystems and biodiversity, water resources and aquatic environment, soil and land-use management, and renewable energy and environmental management.
Data and Methodology Behind EnvGPT
The research team constructed EnvCorpus from open-access environmental journals, covering five key themes, and used a multi-agent GPT-4 system to generate 112,946 instruction–response pairs. EnvGPT was fine-tuned using low-rank adaptation (LoRA), significantly reducing computational cost while maintaining performance. Training was completed on four RTX 4090 GPUs over a period of roughly three days.
To assess performance, the study combined automated metrics, LLM-as-a-judge evaluations, rubric-based scoring, a university-level multiple-choice exam (EnviroExam), and real-world testing on the ELLE dataset. On the independently designed EnvBench, EnvGPT outperformed similarly sized models like LLAMA-3.1-8B and Vicuna-1.5-7B, and even performed comparably to GPT-4o-mini on some quality dimensions and approached Qwen2.5-72B on EnviroExam.
Notably, it achieved 92.06% accuracy on the EnviroExam benchmark, a test based on university-level multiple-choice questions, surpassing baseline models by ~8 points. The model also excelled in real-world applicability, especially in interdisciplinary and complex reasoning tasks, as validated by the ELLE dataset.
Implications for Research and Policy
According to the authors, targeted domain-specific fine-tuning enables compact models to reach near state-of-the-art performance on environmental tasks, supporting research, education, and policy applications.
Future Directions and Open Access
EnvGPT can support researchers, educators, and policymakers by providing accurate, domain-aware responses to complex environmental queries. By openly releasing ChatEnv and EnvBench, the team promotes transparency and reproducibility while encouraging community-driven improvements.
Looking ahead, they plan to integrate retrieval-augmented generation, tool use, and multimodal data to keep the model adaptive and aligned with rapidly evolving scientific knowledge.
Source:
Journal reference:
- Zhang, Y., Lin, S., Xiong, Y., Li, N., Zhong, L., Ding, L., & Hu, Q. (2025). Fine-tuning large language models for interdisciplinary environmental challenges. Environmental Science and Ecotechnology, 27, 100608. DOI: 10.1016/j.ese.2025.100608, https://www.sciencedirect.com/science/article/pii/S2666498425000869?via%3Dihub