AI Chatbots Struggle With Mental Health Drug Side Effects, Georgia Tech Study Finds

While AI chatbots offer 24/7 support and empathetic responses, new research shows they frequently miss the mark when it comes to detecting and addressing adverse drug reactions in mental health care, highlighting the urgent need for safer, smarter digital support tools.

Research: Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use. Image Credit: Ratana21 / ShutterstockResearch: Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use. Image Credit: Ratana21 / Shutterstock

Seeking advice from artificial intelligence can be tempting. Powered by large language models (LLMs), AI chatbots are available 24/7, are often free to use, and draw on troves of data to answer questions. Now, people with mental health conditions are asking AI for advice when experiencing potential side effects of psychiatric medicines - a decidedly higher-risk situation than asking it to summarize a report. 

One question puzzling the AI research community is how AI performs when asked about mental health emergencies. Globally, including in the U.S., there is a significant gap in mental health treatment, with many individuals having limited to no access to mental healthcare. It's no surprise that people have started turning to AI chatbots with urgent health-related questions.

Now, researchers at the Georgia Institute of Technology have developed a new framework to evaluate how well AI chatbots can detect potential adverse drug reactions in chat conversations and how closely their advice aligns with human experts. The study was led by Munmun De Choudhury, Associate Professor in the School of Interactive Computing, and Mohit Chandra, a third-year Ph.D. student in the field of computer science. De Choudhury is also a faculty member in the Georgia Tech Institute for People and Technology.

"People use AI chatbots for anything and everything," said Chandra, the study's first author. "When people have limited access to healthcare providers, they are increasingly likely to turn to AI agents to make sense of what's happening to them and what they can do to address their problem. We were curious how these tools would fare, given that mental health scenarios can be very subjective and nuanced."

De Choudhury, Chandra, and their colleagues introduced their new framework at the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics on April 29, 2025.

Putting AI to the Test

Going into their research, De Choudhury and Chandra wanted to answer two main questions: First, can AI chatbots accurately detect whether someone is having side effects or adverse reactions to medication? Second, if they can accurately detect these scenarios, can AI agents then recommend good strategies or action plans to mitigate or reduce harm? 

The researchers collaborated with a team of psychiatrists and psychiatry students to establish clinically accurate answers from a human perspective, using these insights to analyze AI responses.

To build their dataset, they visited Reddit, the internet's public square, where many have gone for years to ask questions about medications and their side effects. 

They evaluated nine LLMs, including general-purpose models (such as GPT-4 and LLama-3.1) and specialized medical models trained on medical data. Using the evaluation criteria provided by the psychiatrists, they calculated the precision of the LLMs in detecting adverse reactions and correctly categorizing the types of adverse reactions caused by psychiatric medications.

Additionally, they prompted LLMs to generate answers to queries posted on Reddit and compared the alignment of LLM answers with those provided by the clinicians over four criteria: (1) emotion and tone expressed, (2) answer readability, (3) proposed harm-reduction strategies, and (4) actionability of the proposed strategies.

The research team found that LLMs stumble when comprehending the nuances of an adverse drug reaction and distinguishing different types of side effects. They also discovered that while LLMs sounded like human psychiatrists in their tones and emotions, such as being helpful and polite, they had difficulty providing accurate, actionable advice aligned with the experts. 

Better Bots, Better Outcomes

The team's findings could help AI developers build safer, more effective chatbots. Chandra's ultimate goals are to inform policymakers of the importance of accurate chatbots and help researchers and developers improve LLMs by making their advice more actionable and personalized. 

Chandra notes that improving AI for psychiatric and mental health concerns would be particularly life-changing for communities that lack access to mental healthcare.

"When you look at populations with little or no access to mental healthcare, these models are incredible tools for people to use in their daily lives," Chandra said. "They are always available, they can explain complex things in your native language, and they become a great option to go to for your queries.

 "When the AI gives you incorrect information by mistake, it could have serious implications on real life," Chandra added. "Studies like this are important because they help reveal the shortcomings of LLMs and identify where we can improve."

Source:
Journal reference:
  • Mohit Chandra, Siddharth Sriraman, Gaurav Verma, Harneet Singh Khanuja, Jose Suarez Campayo, Zihang Li, Michael L. Birnbaum, and Munmun De Choudhury. 2025. Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 11083–11113, Albuquerque, New Mexico. Association for Computational Linguistics, https://aclanthology.org/2025.naacl-long.553/

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
JMIR Calls For Groundbreaking Research On AI Therapy Bots And Virtual Companions