The introduction of vision-enabled artificial intelligence (AI) into medical scribes – the recording devices used by doctors to document patient encounters in real time – could improve the accuracy of patient notes and save clinicians valuable time.
Combining Audio and Visual Data in AI Scribes
Researchers from Flinders' College of Medicine and Public Health found that a vision-enabled AI scribe, employing a combination of Google's Gemini model and Ray-Ban Meta smart glasses, substantially improved the documentation accuracy of pharmacist-patient consultations and reduced omissions and errors in clinical notes.
"AI scribes are already helping clinicians by listening to consultations, but healthcare involves far more than spoken words," says research author Bradley Menz, an academic pharmacist in Flinders' College of Medicine and Public Health.
"A lot of clinically important information is visual. Important visual cues during consultations include patients' medicine containers, prescriptions and devices, as well as their body language. When an AI system can use both what it hears and what sees in these consultations, it captures more of the details that matter for patient care."
Study Design Using Smart Glasses and AI Models
In the study, 10 clinical pharmacists recorded 110 'mock' medication-history interviews, which contained more than 100 different medicine containers, including tablets, capsules, injections and creams.
Researchers wore Meta AI Ray-Ban glasses to record the interview before passing the video footage through to the AI scribe, which was developed using Google's Gemini AI model.
Accuracy Gains with Vision-Enabled AI Systems
An AI scribe that analysed both video and audio achieved 98 per cent accuracy, compared with 81 per cent when the same system processed only audio information.
A significant benefit was capturing medication strength and form, which are crucial details for safe dosing. The AI scribe with video input captured this information 97 per cent of the time, while audio-only recordings fell to 28 per cent.
Role of Human Oversight in AI Scribing
"This is an augmented tool, not a replacement for clinical judgement," says Mr Menz. "The clinician still needs to review and sign off the document.
"The AI scribe can contain a verification step, take screenshots of medication packages, and generate a full spoken transcript, giving the health professional a much stronger basis for checking what the AI has produced."
Future of AI Scribes in Healthcare Workflows
Senior author, Associate Professor Ashley Hopkins, says the study may point to the next stage of AI scribe usage in health care.
"AI scribes have gained traction because they reduce the burden of documentation and give clinicians more time with their patients. These findings suggest that the next step - when the scribe can see as well as hear – produces a more accurate and complete draft," says Associate Professor Hopkins. "This means less time editing AI-documentation and even more time focusing on patient care.
"These findings suggest the next step may be that all scribe systems can interpret visual information as well as speech, which could open the door to wider clinical uses."
Limitations and Ethical Considerations
The authors say the study has some limitation and underlines the need for human oversight and careful governance before these tools are adopted more broadly. The paper also highlights privacy, consent, data security and workflow integration as important issues that will need to be addressed as vision-enabled AI scribes move closer to practice.
Study Publication and Acknowledgements
The paper – *Vision-Enabled AI scribes reduce omissions in clinical conversations: evidence from simulated medication histories, by Bradley Menz, Nicholas Scarfo, Natansh Modi (University of South Australia), Erik Cornelisse, Lee Li, Jin Quan Eugene Tan, Jimit Gandhi (University of South Australia), Dorsa Maher, Dib Kousa, Kezia Daniel, Vidya Menon, Stephen Bacchi, Ross McKinnon, Michael Wiese (University of South Australia), Andrew Rowland, Michael Sorich and Ashley Hopkins – was published in npj Digital Medicine (2026). https://doi.org/10.1038/s41746-026-02494-9 *Please note that it is an unedited version of the manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note that there may be errors present that affect the content, and all legal disclaimers apply.
Acknowledgements: The PhD scholarship of B.D.M is supported by the National Health and Medical Research Council, Australia (APP2030913). A.M.H holds an Emerging Leader Investigator Fellow, National Health and Medical Research Council, Australia (APP2008119). M.J.S. is supported by a Beat Cancer Research Fellowship from the Cancer Council South Australia. S.B. is supported by a Fulbright Scholarship.
Source:
Journal reference:
- Menz, B. D., Scarfo, N. L., Modi, N. D., Cornelisse, E., Li, L. X., Tan, J. Q., Gandhi, J., Maher, D., Kousa, D., Daniel, K., Menon, V., Bacchi, S., McKinnon, R. A., Wiese, M. D., Rowland, A., Sorich, M. J., & Hopkins, A. M. (2026). Vision-Enabled AI scribes reduce omissions in clinical conversations: Evidence from simulated medication histories. Npj Digital Medicine. DOI: 10.1038/s41746-026-02494-9, https://www.nature.com/articles/s41746-026-02494-9