Lian Liu1, Xiujuan Lei1, Jia Meng2 and Zhen Wei2,*
1School of Computer Sciences, Shannxi Normal University, Xi’an, Shaanxi, 710119, China; 2Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China;
*To whom correspondence should be addressed: firstname.lastname@example.org (ZW).
N6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications. It plays important roles in various biological processes, such as, splicing, RNA localization and degradation, many of which are related to the functions of introns. Although a number of computational approaches have been proposed for predicting the m6A sites in different species, none of them were optimized for intronic m6A sites. Because existing experimental data overwhelmingly relied on polyA selection in sample preparation and the intronic RNAs are usually underrepresented the captured RNA library, the accuracy of general m6A site prediction approaches is limited for intronic m6A site prediction task. We proposed here a computational framework, WITMSG, dedicated for large-scale prediction of intronic m6A RNA methylation sites in human for the first time. Based on the random forest algorithm and using only known intronic m6A sites as the training data, WITMSG takes advantage of both conventional sequence features and a variety of genomic characteristics for improved prediction performance of intron-specific m6A sites. We show that WITMSG outperformed competing approaches (trained with all the m6A sites or intronic m6A sites only) in 10-fold cross-validation (AUC: 0.940) and when tested on independent datasets (AUC: 0.946). WITMSG was also applied intronome-wide in human to predict all possible intronic m6A sites, and the prediction results are freely accessible at: intron.rnamd.com.
Predicted human intronic m6A sites:
Prediction results of all intronic DRACH motifs: