A new study shows how pairing effective retrieval methods with fine-tuned AI models can transform the way professionals query building codes, replacing tedious manual searches with smarter, more accurate compliance tools.

Image Credit: Tadija Savic / Shutterstock
Building Smarter QA Systems for Building Codes
Researchers have focused on building a QA system that can answer user queries based on building codes and reduce the laborious traditional method of manual querying. One possible solution to create a robust QA system is to utilize the power of Retrieval-Augmented Generation (RAG). Researchers have explored the potential of several retrieval methods and the efficiency of LoRA to fine-tune LLMs. Retrievers and LLMs are core components of an RAG system, and their performance affects the overall performance of the QA system.
Why RAG for Building Codes?
Manual querying of building codes is often tedious, error-prone, and time-consuming. To address these challenges, researchers have turned to RAG. This framework integrates two key components: a retriever, which identifies and extracts relevant information from documents, and a language model, which generates precise answers by combining the retrieved content with the query.
Challenges in Retrieval and Language Models
While RAG holds strong promise, both components present inherent challenges. Retrievers vary widely in performance, each with its own advantages and limitations. At the same time, language models are susceptible to hallucinations and typically require fine-tuning to adapt effectively to specialized domains. Recognizing this, researchers at the University of Alberta — Mr. Aqib, Dr. Qipei, Mr. Hamza, and Professor Chui — explored the performance of several retrievers and investigated the impact of fine-tuning LLMs for building code applications, demonstrating that such adaptation significantly enhances generational accuracy and domain alignment.
Evaluation and Key Findings
Their study systematically evaluated multiple retrievers, with ES emerging as the most effective. Experiments also showed that retrieving the top-3 to top-5 documents was sufficient to capture query-relevant context, achieving consistently high BERT F1 scores. In parallel, the researchers also fine-tuned a range of LLMs spanning 1B to 24B parameters to better capture the nuances of building code language. Among these, Llama-3.1-8B delivered the strongest results, achieving a 6.83% relative improvement in BERT F1-score over its pre-trained state.
Future Directions
Together, these findings underscore the value of combining robust retrieval strategies with fine-tuned language models for building code compliance and query answering. For future work, Aqib mentioned that "there is need to develop a fully integrated end-to-end RAG framework, validated against manually curated datasets. Moreover, continued domain-specific fine-tuning could bring performance closer to that of state-of-the-art commercial models such as GPT-4."
Publication Details
This paper, "Fine-tuning large language models and evaluating retrieval methods for improved question answering on building codes," was published in Smart Construction (ISSN: 2960-2033), a peer-reviewed open access journal dedicated to original research articles, communications, reviews, perspectives, reports, and commentaries across all areas of intelligent construction, operation, and maintenance, covering both fundamental research and engineering applications. The journal is now indexed in Scopus, and article submission is completely free of charge until 2026.
Source:
Journal reference: