Deep Learning for 5HMC Detection in RNA Sequences

Download PDF Copy

By Dr Silpaja Chandrasekar, PhDReviewed by Susha Cheriyedath, M.Sc.Apr 30 2024

In a paper published in the journal Scientific Reports, researchers introduced Deep5HMC to detect 5-hydroxymethylcytosine (5HMC) in RNA samples by combining machine learning (ML) algorithms and advanced feature extraction techniques.

*The configuration of the proposed DNN Mode circle represents processing nodes. Image Credit: https://www.nature.com/articles/s41598-024-59777-y*

Deep5HMC achieved notable accuracy through K-fold cross-validation, surpassing previous methods. This breakthrough held promise for early diagnosis of conditions like cancer and cardiovascular disease, offering potential medical assessment and treatment protocol advancements.

Related Work

Past work on ribonucleic acid (RNA) modifications, particularly the elusive 5HMC, has underscored its pivotal role in genetic processes across various organisms and diseases. Numerous studies have highlighted its influence on RNA splicing, translation, decay, and its associations with conditions like cancer and cardiovascular disease. While traditional detection methods like chromatography and polymerase chain reaction (PCR) are effective, they come with significant drawbacks—they are both costly and time-consuming.

Recent advancements have seen the emergence of machine learning-based approaches, like support vector machine (SVM) and convolutional neural networks (CNN), for more efficient identification of 5HMC sites. However, existing models often need help with prediction accuracy due to reliance on traditional learning processes.

Exploring RNA Modifications

A comprehensive benchmark dataset containing both training and test samples was essential to ensure the effectiveness of the learning model. This dataset comprised diverse samples, including positive and negative examples of 5HMC sequences.

After preprocessing with cluster database at high identity with tolerance (CD-HIT) software to remove sequences with more than 20% similarity, the dataset contained 1324 samples, evenly split between 662 positive 5HMC sequences and 662 negative non-5HMC sequences. The dataset was further divided to validate the performance of the model, randomly selecting 10% of the samples to create an independent test set, leaving 90% for model training and evaluation.

Next, seven distinct feature extraction methodologies were employed to convert RNA samples into numerical feature vectors, facilitating compatibility with learning models. These methodologies included descriptor k-mer, reverse complement k-mer (RC-Kmer), pseudo k-tuple nucleotide composition (PseKNC), Tri-nucleotide-based Auto Covariance (TAC), tri-nucleotide-based cross covariance (TCC), and dinucleotide-based cross covariance (DCC). These techniques captured essential characteristics of RNA sequences, allowing for the construction of a comprehensive feature vector representation.

Principal component analysis (PCA) was applied to refine the feature vectors and eliminate noisy or redundant information for feature selection. This unsupervised technique reduced the dimensionality of the feature space while retaining the most significant features, resulting in a clearer and more effective representation of the data.

The core component of the model was a deep neural network (DNN) inspired by the human neuron system's intricate complexity. The DNN architecture consisted of input, hidden, and output layers, with three hidden layers incorporated to facilitate learning. Researchers utilized the hyperbolic tangent (Tanh) activation function for the hidden layers and the sigmoid function for the output layer, ensuring the model's ability to capture complex patterns and perform binary classification accurately.

Finally, grid search techniques were employed to fine-tune hyperparameters and optimize the model's performance. This method exhaustively evaluated various combinations of hyperparameter values to identify the most accurate model configuration. However, specific hyperparameters were considered, and the range of values explored was not explicitly mentioned, which could enhance the credibility and reproducibility of the study.

The methodology encompassed comprehensive dataset preparation, feature extraction, dimensionality reduction, deep learning model construction, and hyperparameter optimization, aiming to develop a robust and accurate model for predicting 5HMC sites in RNA sequences.

Model Performance Evaluation

The study rigorously evaluated the proposed model's performance using a range of performance metrics, including accuracy, sensitivity, specificity, Matthew's correlation coefficient (MCC), and recall. These metrics comprehensively assessed the model's effectiveness in accurately predicting 5HMC sites in RNA sequences. By employing these evaluation criteria, the study ensured a thorough understanding of the model's strengths and limitations across different aspects of classification performance.

Additionally, optimizing model hyperparameters played a crucial role in enhancing the model's performance. The study identified optimal values for key parameters such as activation functions, learning rates, and the number of training epochs through experiments and grid search techniques. This optimization process significantly improved the model's accuracy and efficiency, demonstrating the importance of fine-tuning hyperparameters for achieving optimal performance in deep learning models.

Furthermore, the study compared existing classifiers and models to benchmark the proposed model's performance. The study showcased the proposed model's superiority over traditional classifiers and existing models in accurately predicting 5HMC sites by comparing accuracy, sensitivity, specificity, and other metrics. This comparative analysis provided valuable insights into the advancements offered by the proposed model and its potential impact on RNA sequence analysis and modification prediction.

Conclusion

To summarize, the Deep5HMC model presented a promising solution for accurately identifying 5HMC modifications in RNA sequences. Deep5HMC achieved an impressive prediction accuracy of 84.07%, outperforming previous methods by 7.59%. This study highlighted the potential of Deep5HMC as a cost-effective and efficient tool for early disease detection and diagnosis associated with 5HMC alterations. Future research directions aim to optimize Deep5HMC further and explore its integration with complementary computational or experimental approaches to enhance RNA modification analysis.

Journal reference:

Khan, S., et al. (2024). Sequence-Based Model Using Deep Neural Network and Hybrid Features for Identification of 5-Hydroxymethylcytosine Modification. Scientific Reports, 14:1, 9116. https://doi.org/10.1038/s41598-024-59777-y, https://www.nature.com/articles/s41598-024-59777-y

Posted in: AI Research News

Comments (0)

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Chandrasekar, Silpaja. (2024, May 01). Deep Learning for 5HMC Detection in RNA Sequences. AZoAi. Retrieved on August 21, 2025 from https://www.azoai.com/news/20240430/Deep-Learning-for-5HMC-Detection-in-RNA-Sequences.aspx.
MLA
Chandrasekar, Silpaja. "Deep Learning for 5HMC Detection in RNA Sequences". AZoAi. 21 August 2025. <https://www.azoai.com/news/20240430/Deep-Learning-for-5HMC-Detection-in-RNA-Sequences.aspx>.
Chicago
Chandrasekar, Silpaja. "Deep Learning for 5HMC Detection in RNA Sequences". AZoAi. https://www.azoai.com/news/20240430/Deep-Learning-for-5HMC-Detection-in-RNA-Sequences.aspx. (accessed August 21, 2025).
Harvard
Chandrasekar, Silpaja. 2024. Deep Learning for 5HMC Detection in RNA Sequences. AZoAi, viewed 21 August 2025, https://www.azoai.com/news/20240430/Deep-Learning-for-5HMC-Detection-in-RNA-Sequences.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.