Pre-trained large RNA language model enhancesRNA N4-acetylcytidine site prediction. |
RNA N4-acetylcytidine (ac4C) modification plays a crucial role in gene expression regulation. However, existing prediction methods face limitations in capturing RNA sequence features, particularly in handling sequence complexity and long-range dependencies. To enhance the accuracy of RNA-ac4C modification sites prediction, this study introduces, for the first time, the transformer-based RNAErnie pre-trained model, which deeply extracts semantic information from RNA sequences. This model is combined with six traditional feature extraction methods (such as One-hot, ENAC, etc.) to form a multidimensional feature set. On this basis, we propose the Voting-ac4C model, which utilizes a deep neural network for feature selection. The selected features are then fed into a soft voting ensemble learning model, integrating the strengths of various machine learning algorithms to predict RNA-ac4C modification sites. Experimental results demonstrate that compared to single features or models, Voting-ac4C achieves significant improvements across multiple metrics, including AUC, SN, SP, ACC, and MCC. This study provides a novel approach for RNA modification sites prediction and highlights the potential applications of pre-trained models in biological sequence analysis.
Download | Description | |
---|---|---|
Data (in this study) | train-positive train-negative test-positive test-negative | Positive sample data in the training set Negative sample data in the training set Positive sample data in the testing set Negative sample data in the testing set |
Code Resources | Voting-ac4C code | Code Resources for Voting-ac4C model |
Yanna Jia et al., Pre-trained large RNA language model enhances RNA N4-acetylcytidine site prediction. 2024. |
Zilong Zhang | zhangzilong@hainanu.edu.cn | ||
Feifei Cui | feifeicui@hainanu.edu.cn | ||
Yanna Jia | 23220854050004@hainanu.edu.cn | ||