N

ncBench

Comparison and benchmark of deep learning methods for non-coding RNA classification

This repository aims to facilitate the use of state-of-the-art non-coding RNA classifiers.

Datasets

We provide three datasets.

  • Dataset1 is a version of the dataset proposed by Fiannaca et al. in 2017 along with the nRC tool. There is data leakage in the original dataset: 347 ncRNAs are present in both the training set and the test set. We fix this issue by removing the problematic sequences from the test set.
  • Dataset1-nd is a variant of the previous dataset, in which we remove sequences containing degenerate nucleotides.
  • Dataset2 is the one presented by Lima et al. in 2023.

The datasets are available in the datasets/ folder. See the notebook dataset_creation.ipynb for more details on the obtention / formatting of the datasets.

Methods

Some tools from the state of the art lack sufficient documentations or benefit from a few changes in the code for easier use. When necessary, we forked the original github repositories and made some modifications. We provide conda environment files and executable scripts. A guide on the use of included methods is provided in methods_guide.md.

Cite

[insert citation]