NLP Approach for ICD-10 Inference in Ophthalmology Reports

Download PDF Copy

By Muhammad OsamaReviewed by Susha Cheriyedath, M.Sc.Apr 29 2024

In an article recently published in the journal Scientific Reports, researchers from Germany introduced a technique to automatically infer the International Classification of Diseases 10th Revision (ICD-10) codes from German ophthalmology medical records using natural language processing (NLP). Their method employed the Word2vec approach and nearest-neighbor search to associate queries with the most suitable ICD-10 code.

Study: NLP Approach for ICD-10 Inference in Ophthalmology Reports Image Credit: chaponta/Shutterstock — *Study: NLP Approach for ICD-10 Inference in Ophthalmology Reports. Image Credit: chaponta/Shutterstock*

Background

ICD-10 serves as a crucial system of diagnosis codes, offering a common language for reporting and monitoring diseases and health conditions. These codes play pivotal roles in diverse areas such as reimbursement, registries, and research. However, the majority of diagnoses within physicians' reports are recorded in natural language, necessitating either manual or automated conversion into ICD-10 codes before they can be used for secondary purposes.

Manual coding proves to be time-consuming, expensive, and prone to errors. Conversely, automated coding presents challenges due to the complexity and variable nature of natural language, particularly for languages other than English. Therefore, there is a pressing need to develop robust and precise methods for detecting ICD-10 codes from natural language medical records, especially in specialized fields like ophthalmology, which is a highly specialized field with a large number of diagnoses.

About the Research

In this paper, the authors aimed to develop and assess an approach for accurately inferring ICD-10 codes using NLP techniques. NLP, a subset of artificial intelligence, focuses on understanding and generating natural language. The researchers utilized a Word2vec-based approach, a data-driven method that maps words and phrases into a multidimensional numerical space based on semantic relatedness.

For example, the term 'cataract' could be represented as a 300-dimensional vector, capturing its semantic meaning within the text corpus. Multi-word phrases like 'rhegmatogenous retinal detachment' were treated as single entities in the embedding space to preserve their semantic meaning.

To construct the embedding space, the researchers utilized a dataset comprising two gigabytes of historical physicians' reports from a university eye hospital in Freiburg, Germany. Additionally, they assembled an extensive ICD-10 thesaurus derived from the alphanumeric standard for the encoding of diagnoses (Alpha-ID) catalog and manual annotations of ICD-10 codes from the hospital's database.

The study labeled the embedding space by positioning the centroids of the thesaurus entries corresponding to each ICD-10 code. Furthermore, these centroids served as targets for the nearest-neighbor method, linking inference queries with the most suitable ICD-10 code.

In cases where queries comprised multiple words, the authors computed the column-wise numerical average of the embedding vectors. For phrase queries, the phrase embedding was integrated into the array before averaging. Subsequently, the closest ICD-10 cluster in the embedding space was identified. The authors also calculated the cosine distance among the centroids of the ICD cluster and the query embedding, enabling thresholding to decrease misclassifications when no suitable neighbor was found.

The proposed method underwent evaluation using data from three different eye hospitals, ensuring no overlap in patients or personnel. The first hospital provided the Word2vec embedding space, while the other two hospitals served as external sites. The researchers obtained diagnosis segments from 100 physicians' reports at each hospital, anonymized them manually, and submitted them for inference. The respective senders then assessed the inferred ICD codes to determine their accuracy for the overarching disease group, either completely or approximately.

Research Findings

The authors extracted 3332-word sequences, of which 2806 were recognized as diagnoses. Employing their methodology, they inferred 2806 corresponding ICD-10 codes. Remarkably, in the first hospital, 98% of the codes were classified as entirely correct, with 99% accuracy for superordinate disease categorization. In contrast, the percentages for the second hospital were 69% and 86%, respectively.

Similarly, for the third hospital, the figures were 69% and 91%. Notably, common eye disorders like disorders of accommodation and refraction, as well as senile cataracts, achieved accuracy rates exceeding 80% over all the hospitals. However, accuracy varied significantly for certain disease groups, including disorders of the globe and other retinal disorders. Rare diseases generally exhibited low accuracy, with some conditions registering 0% accuracy in the third hospital.

The study attributed the high accuracy in the first hospital to the close alignment between the training data and the queries, coupled with the comprehensive coverage of the ICD thesaurus. It identified out-of-vocabulary queries as significant contributors to performance degradation in the second and third hospitals, primarily due to center-specific abbreviations. Additionally, it noted that non-ophthalmological disease diagnoses resulting from local peculiarities might also contribute to performance differences.

The proposed method has the potential to streamline the secondary use of electronic healthcare records for registry research, which necessitates consistent and comprehensive diagnosis extraction. This approach can enhance the quality and efficiency of reimbursement, registries, and research by minimizing the manual effort and errors associated with coding. Furthermore, it can be readily adapted to other languages and specialties through the utilization of appropriate training data and thesauri.

Conclusion

In summary, the research showcased the feasibility and potential of employing NLP to automatically deduce ICD-10 codes from ophthalmology medical records. However, the observed variations in performance across hospitals highlighted the need for further refinement and adaptation to terminologies specific to each hospital. The researchers suggested that future work could concentrate on enhancing the training dataset, improving the method's capability to handle rare diseases, and refining the methodology to better accommodate regional variations.

Journal reference:

Böhringer, D., Angelova, P., Fuhrmann, L. et al. Automatic inference of ICD-10 codes from German ophthalmologic physicians’ letters using natural language processing. Sci Rep 14, 9035 (2024). https://doi.org/10.1038/s41598-024-59926-3, https://www.nature.com/articles/s41598-024-59926-3.

Posted in: AI Research News

Comments (0)

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Osama, Muhammad. (2024, April 29). NLP Approach for ICD-10 Inference in Ophthalmology Reports. AZoAi. Retrieved on January 05, 2026 from https://www.azoai.com/news/20240429/NLP-Approach-for-ICD-10-Inference-in-Ophthalmology-Reports.aspx.
MLA
Osama, Muhammad. "NLP Approach for ICD-10 Inference in Ophthalmology Reports". AZoAi. 05 January 2026. <https://www.azoai.com/news/20240429/NLP-Approach-for-ICD-10-Inference-in-Ophthalmology-Reports.aspx>.
Chicago
Osama, Muhammad. "NLP Approach for ICD-10 Inference in Ophthalmology Reports". AZoAi. https://www.azoai.com/news/20240429/NLP-Approach-for-ICD-10-Inference-in-Ophthalmology-Reports.aspx. (accessed January 05, 2026).
Harvard
Osama, Muhammad. 2024. NLP Approach for ICD-10 Inference in Ophthalmology Reports. AZoAi, viewed 05 January 2026, https://www.azoai.com/news/20240429/NLP-Approach-for-ICD-10-Inference-in-Ophthalmology-Reports.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.