I

IRSOM

4608557f Update links of datasets · by Ludovic ISHIOMIN

Prerequisites

Automatic installation

Dependency

In order to run the automatic installation, one need to install conda with python3. The installation of conda with python3 is described here.

Run the installation

Dowload the installation script from here. The installation can be run by:

chmod +x script_irsom.sh
./script_irsom.sh ENV_NAME

where ENV_NAME is the conda environment name which will be created.

Manual installation

Get sources

The IRSOM tools can be download as follow:

git clone https://forge.ibisc.univ-evry.fr/lplaton/IRSOM.git

IRSOM has been devleoped on python3.

Python package

  1. Matplotlib
  2. Pandas
  3. Plotnine
  4. Numpy
  5. TensorFlow
  6. Docopt

These packages can be install by running the following command:

pip install -r ${path_IRSOM}/pip_package.txt

where ${path_IRSOM} is the path to the directory of IRSOM.

For better performance, we recommend to rebuild TensorFlow from sources.

Compilation Featurer

The repository contains a compiled Featurer compiled on linux in the bin folder. If needed, the binary can be compiles using the Qt5 tools by doing:

cd ${path_IRSOM}/bin/
qmake ${path_IRSOM}/Featurer/Featurer.pro
make

Datasets

An archive containing all the datasets can be download here. To download a specific dataset, one can access a folder containing all the datasets here.

Basic usage

Train a model

Default usage:

python ${path_IRSOM}scripts/train.py --featurer=${path_IRSOM}/bin/Featurer -c coding.fasta -n noncoding.fasta --output=output_dir_of_model

The use of multiple fasta files are allowed with this script. For example, we can create a model with 2 coding fasta files and 3 non-coding fasta files with the following command:

python ${path_IRSOM}scripts/train.py --featurer=${path_IRSOM}/bin/Featurer -c coding1.fasta -c coding2.fasta -n noncoding1.fasta -n noncoding2.fasta -n noncoding3.fasta

The model parameters can be set by the command parameters:

  • --dim0= SOM dimension 0 (by default at 3)
  • --dim1= SOM dimension 1 (by default at 3).
  • --batch_size= the size of the batch given at each iteration (by default at 10).
  • --penality= Coefficient of the regularization term (by default at 0.001).

By default the computed features are removed from the output directory. To keep this files, use the parameter --keep_features.

Predict

Default usage:

python ${path_IRSOM}/scripts/predict.py --featurer=${path_IRSOM}/bin/Featurer --file=fasta_file.fasta --model=${path_IRSOM}/model/species/ --output=output_dir_of_result [--reject=${rejection_threshold}]

The rejection threshold can be set with the option --reject. By default there is no rejection.

As for the train script, the features are removed by default. To keep them, use the parameter --keep_features.