NLP Approach for ICD-10 Inference in Ophthalmology Reports

In an article recently published in the journal Scientific Reports, researchers from Germany introduced a technique to automatically infer the International Classification of Diseases 10th Revision (ICD-10) codes from German ophthalmology medical records using natural language processing (NLP). Their method employed the Word2vec approach and nearest-neighbor search to associate queries with the most suitable ICD-10 code.

Study: NLP Approach for ICD-10 Inference in Ophthalmology Reports  Image Credit: chaponta/Shutterstock
Study: NLP Approach for ICD-10 Inference in Ophthalmology Reports. Image Credit: chaponta/Shutterstock

Background

ICD-10 serves as a crucial system of diagnosis codes, offering a common language for reporting and monitoring diseases and health conditions. These codes play pivotal roles in diverse areas such as reimbursement, registries, and research. However, the majority of diagnoses within physicians' reports are recorded in natural language, necessitating either manual or automated conversion into ICD-10 codes before they can be used for secondary purposes.

Manual coding proves to be time-consuming, expensive, and prone to errors. Conversely, automated coding presents challenges due to the complexity and variable nature of natural language, particularly for languages other than English. Therefore, there is a pressing need to develop robust and precise methods for detecting ICD-10 codes from natural language medical records, especially in specialized fields like ophthalmology, which is a highly specialized field with a large number of diagnoses.

About the Research

In this paper, the authors aimed to develop and assess an approach for accurately inferring ICD-10 codes using NLP techniques. NLP, a subset of artificial intelligence, focuses on understanding and generating natural language. The researchers utilized a Word2vec-based approach, a data-driven method that maps words and phrases into a multidimensional numerical space based on semantic relatedness.

For example, the term 'cataract' could be represented as a 300-dimensional vector, capturing its semantic meaning within the text corpus. Multi-word phrases like 'rhegmatogenous retinal detachment' were treated as single entities in the embedding space to preserve their semantic meaning.

To construct the embedding space, the researchers utilized a dataset comprising two gigabytes of historical physicians' reports from a university eye hospital in Freiburg, Germany. Additionally, they assembled an extensive ICD-10 thesaurus derived from the alphanumeric standard for the encoding of diagnoses (Alpha-ID) catalog and manual annotations of ICD-10 codes from the hospital's database.

The study labeled the embedding space by positioning the centroids of the thesaurus entries corresponding to each ICD-10 code. Furthermore, these centroids served as targets for the nearest-neighbor method, linking inference queries with the most suitable ICD-10 code.

In cases where queries comprised multiple words, the authors computed the column-wise numerical average of the embedding vectors. For phrase queries, the phrase embedding was integrated into the array before averaging. Subsequently, the closest ICD-10 cluster in the embedding space was identified. The authors also calculated the cosine distance among the centroids of the ICD cluster and the query embedding, enabling thresholding to decrease misclassifications when no suitable neighbor was found.

The proposed method underwent evaluation using data from three different eye hospitals, ensuring no overlap in patients or personnel. The first hospital provided the Word2vec embedding space, while the other two hospitals served as external sites. The researchers obtained diagnosis segments from 100 physicians' reports at each hospital, anonymized them manually, and submitted them for inference. The respective senders then assessed the inferred ICD codes to determine their accuracy for the overarching disease group, either completely or approximately.

Research Findings

The authors extracted 3332-word sequences, of which 2806 were recognized as diagnoses. Employing their methodology, they inferred 2806 corresponding ICD-10 codes. Remarkably, in the first hospital, 98% of the codes were classified as entirely correct, with 99% accuracy for superordinate disease categorization. In contrast, the percentages for the second hospital were 69% and 86%, respectively.

Similarly, for the third hospital, the figures were 69% and 91%. Notably, common eye disorders like disorders of accommodation and refraction, as well as senile cataracts, achieved accuracy rates exceeding 80% over all the hospitals. However, accuracy varied significantly for certain disease groups, including disorders of the globe and other retinal disorders. Rare diseases generally exhibited low accuracy, with some conditions registering 0% accuracy in the third hospital.

The study attributed the high accuracy in the first hospital to the close alignment between the training data and the queries, coupled with the comprehensive coverage of the ICD thesaurus. It identified out-of-vocabulary queries as significant contributors to performance degradation in the second and third hospitals, primarily due to center-specific abbreviations. Additionally, it noted that non-ophthalmological disease diagnoses resulting from local peculiarities might also contribute to performance differences.

The proposed method has the potential to streamline the secondary use of electronic healthcare records for registry research, which necessitates consistent and comprehensive diagnosis extraction. This approach can enhance the quality and efficiency of reimbursement, registries, and research by minimizing the manual effort and errors associated with coding. Furthermore, it can be readily adapted to other languages and specialties through the utilization of appropriate training data and thesauri.

Conclusion

In summary, the research showcased the feasibility and potential of employing NLP to automatically deduce ICD-10 codes from ophthalmology medical records. However, the observed variations in performance across hospitals highlighted the need for further refinement and adaptation to terminologies specific to each hospital. The researchers suggested that future work could concentrate on enhancing the training dataset, improving the method's capability to handle rare diseases, and refining the methodology to better accommodate regional variations.

Journal reference:
Muhammad Osama

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Osama, Muhammad. (2024, April 29). NLP Approach for ICD-10 Inference in Ophthalmology Reports. AZoAi. Retrieved on May 19, 2024 from https://www.azoai.com/news/20240429/NLP-Approach-for-ICD-10-Inference-in-Ophthalmology-Reports.aspx.

  • MLA

    Osama, Muhammad. "NLP Approach for ICD-10 Inference in Ophthalmology Reports". AZoAi. 19 May 2024. <https://www.azoai.com/news/20240429/NLP-Approach-for-ICD-10-Inference-in-Ophthalmology-Reports.aspx>.

  • Chicago

    Osama, Muhammad. "NLP Approach for ICD-10 Inference in Ophthalmology Reports". AZoAi. https://www.azoai.com/news/20240429/NLP-Approach-for-ICD-10-Inference-in-Ophthalmology-Reports.aspx. (accessed May 19, 2024).

  • Harvard

    Osama, Muhammad. 2024. NLP Approach for ICD-10 Inference in Ophthalmology Reports. AZoAi, viewed 19 May 2024, https://www.azoai.com/news/20240429/NLP-Approach-for-ICD-10-Inference-in-Ophthalmology-Reports.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Automating Information Extraction in Criminal Investigations Using a Hybrid Classification Model