Fine-Tuned AI Models and Smart Retrieval Power Next-Generation QA Systems for Building Codes

A new study shows how pairing effective retrieval methods with fine-tuned AI models can transform the way professionals query building codes, replacing tedious manual searches with smarter, more accurate compliance tools.

Image Credit: Tadija Savic  / Shutterstock

Building Smarter QA Systems for Building Codes

Researchers have focused on building a QA system that can answer user queries based on building codes and reduce the laborious traditional method of manual querying. One possible solution to create a robust QA system is to utilize the power of Retrieval-Augmented Generation (RAG). Researchers have explored the potential of several retrieval methods and the efficiency of LoRA to fine-tune LLMs. Retrievers and LLMs are core components of an RAG system, and their performance affects the overall performance of the QA system.

Why RAG for Building Codes?

Manual querying of building codes is often tedious, error-prone, and time-consuming. To address these challenges, researchers have turned to RAG. This framework integrates two key components: a retriever, which identifies and extracts relevant information from documents, and a language model, which generates precise answers by combining the retrieved content with the query.

Challenges in Retrieval and Language Models

While RAG holds strong promise, both components present inherent challenges. Retrievers vary widely in performance, each with its own advantages and limitations. At the same time, language models are susceptible to hallucinations and typically require fine-tuning to adapt effectively to specialized domains. Recognizing this, researchers at the University of Alberta — Mr. Aqib, Dr. Qipei, Mr. Hamza, and Professor Chui — explored the performance of several retrievers and investigated the impact of fine-tuning LLMs for building code applications, demonstrating that such adaptation significantly enhances generational accuracy and domain alignment.

Evaluation and Key Findings

Their study systematically evaluated multiple retrievers, with ES emerging as the most effective. Experiments also showed that retrieving the top-3 to top-5 documents was sufficient to capture query-relevant context, achieving consistently high BERT F1 scores. In parallel, the researchers also fine-tuned a range of LLMs spanning 1B to 24B parameters to better capture the nuances of building code language. Among these, Llama-3.1-8B delivered the strongest results, achieving a 6.83% relative improvement in BERT F1-score over its pre-trained state.

Future Directions

Together, these findings underscore the value of combining robust retrieval strategies with fine-tuned language models for building code compliance and query answering. For future work, Aqib mentioned that "there is need to develop a fully integrated end-to-end RAG framework, validated against manually curated datasets. Moreover, continued domain-specific fine-tuning could bring performance closer to that of state-of-the-art commercial models such as GPT-4."

Publication Details

This paper, "Fine-tuning large language models and evaluating retrieval methods for improved question answering on building codes," was published in Smart Construction (ISSN: 2960-2033), a peer-reviewed open access journal dedicated to original research articles, communications, reviews, perspectives, reports, and commentaries across all areas of intelligent construction, operation, and maintenance, covering both fundamental research and engineering applications. The journal is now indexed in Scopus, and article submission is completely free of charge until 2026.

Source:
Journal reference:
  • Aqib M, Hamza M, Mei Q, Chui Y. Fine-tuning large language models and evaluating retrieval methods for improved question answering on building codes. Smart Constr. 2025(3):0021, DOI:10.55092/sc20250021, https://www.elspub.com/papers/j/1934646370727886848 

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.

or

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
AI Chatbots Handle Suicide Extremes But Struggle With Grey Areas