In a paper published in the journal PLOS ONE, researchers presented a pioneering system that leveraged machine learning (ML) and knowledge graph (KG) technology to streamline medical diagnosis and treatment. The system provided rapid and accurate solutions through meticulous construction and refinement of a comprehensive KG, empowering medical professionals to navigate complex medical research and identify effective treatments.
Moreover, the system incorporated advanced knowledge distillation schemes, enabling users to input keywords and retrieve comprehensive information on related entities and relationships. It also introduced the concept of "joints" — entities with multiple inputs and outputs — resulting in a joint-version KG that enhanced the system's utility and comprehensiveness. Advanced methodologies such as multiple levels refinement (MLR) and knowledge distillation schemes enhanced the system's robustness and accessibility, offering a transformative tool for healthcare practitioners and researchers.
Previous work in KGs traces back to Google's introduction in 2012, yet the concept has roots in decades of artificial intelligence research. While KGs find widespread use in diverse applications today, their creation from various sources remains an uphill task. However, despite their utility, creating KGs from disparate sources presents significant challenges, including the need for extensive data reading, processing, and filtering. Additionally, ensuring the quality and accuracy of the extracted information poses a formidable obstacle in developing comprehensive and reliable KGs.
Iterative KG Construction Methodology
The methodology introduces a unique approach leveraging MLR techniques to iteratively enhance the quality of entities and relations within triplets during KG construction. This iterative process creates various KGs, including foundational, joint-version, and interactive KGs, as depicted in the design flow.
Initially, in level one, articles are classified into different categories using visualizing scientific landscapes (VOSViewer), leading to the construction of corresponding KGs. Subsequently, in level two, natural language processing (NLP) techniques such as tokenization, tagging, and named entity recognition are applied to generate triplets from each article. However, the initial KGs may contain irrelevant information, necessitating further refinement.
At level three, researchers focus on refining three distinct KGs—medication, symptoms, and comorbidity. Professionals scrutinize and validate entities during this phase, filtering out incomplete triplets. Despite this validation, the relative importance of each entity remains unclear. Consequently, joint-version KGs are developed in level four to analyze overlapping entities among triplets, considering entities with multiple relations more significant. This process facilitates the exploration of entity relationships within a more nuanced context.
At level five, researchers introduce knowledge distillation techniques to extract desired entities from the KGs and visualize their connections. This final processing stage allows users to explore entity relationships efficiently. The methodology's strength lies in its ability to dynamically generate KGs, refine entities iteratively, and explore relationships comprehensively in a single framework, a capability not found in existing KG construction approaches.
Data collection forms the methodology's foundation, with articles from medical literature databases spanning relevant topics such as asthma treatment and diagnosis. Employing ML with VOSViewer facilitates article analysis, aiding in categorization and subsequent KG construction. The initial approach involves utilizing NLP techniques to process text data and extract triplets, forming the basis of the KG. However, refining these initial KGs to remove irrelevant information and validate entities becomes crucial in subsequent processing stages. Through iterative refinement and interactive exploration, the methodology offers a comprehensive solution for constructing and analyzing KGs in healthcare applications.
Approach, Distillation, Challenges
In the methodology, a novel approach employs MLR techniques to refine entities and relations within triplets during KG construction iteratively. This iterative process results in the creation of foundational, joint-version, and interactive KGs. Articles are initially categorized into domains using VOSViewer, followed by applying NLP techniques such as tokenization and tagging to extract triplets from each article. Subsequent refinement at level three involves scrutinizing and validating entities to filter out incomplete triplets, ensuring the accuracy and reliability of the KGs.
The knowledge distillation process extracts valuable insights from the KGs, enabling users to inquire about topics like asthma and visualize relevant information. For instance, knowledge distilled from medication, symptoms, and comorbidity KGs provides comprehensive information on asthma treatments, symptoms, and associated conditions. These results demonstrate the methodology's effectiveness in providing users with comprehensive and relevant information.
Despite the advantages, constructing a KG poses challenges, including resource-intensive requirements for development and maintenance, ensuring data quality and accuracy, and striking a balance between detailed relationships and manageable complexity. Managing these challenges is crucial to maximize the benefits of a KG while mitigating potential drawbacks.
To sum up, the methodology for constructing a professional KG in healthcare involves a systematic approach encompassing five refinement levels. Beginning with the VOSViewer categorization of articles, the refinement process progresses through various stages, including NLP-based triplet generation, entity validation, joint-entity identification, and interactive exploration.
Each refinement level addresses specific challenges and contributes to the overall quality and usability of the KG. The iterative design flow ensures the construction of the KG and the maintenance of entity quality throughout the process. The final result is a comprehensive KG with distilled knowledge and interactive visualization capabilities, providing valuable insights for medical professionals and researchers.