Victoria BOURGEAIS

add LICENSE, README, package requirements

This diff is collapsed. Click to expand it.
# Deep GONet
From the article entitled **GraphGONet: a self-explaining neural network encapsulating the Gene Ontology graph for phenotype prediction on gene expression** (submitted to Bioinformatics) by Victoria Bourgeais, Farida Zehraoui, and Blaise Hanczar.
---
## Description
GraphGONet is a self-explaining neural network integrating the Gene Ontology into its hidden layers.
## Get started
The code is implemented in Python using the [PyTorch](https://pytorch.org/) framework v1.7.1 (see [requirements.txt](https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet/blob/master/requirements.txt) for more details)
### Dataset
The full microarray dataset can be downloaded on ArrayExpress database under the id [E-MTAB-3732](https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-3732/). Here, you can find the pre-processed training and test sets:
[training set](https://entrepot.ibisc.univ-evry.fr/f/5b57ab5a69de4f6ab26b/?dl=1)
[test set](https://entrepot.ibisc.univ-evry.fr/f/057f1ffa0e6c4aab9bee/?dl=1)
<!-- Additional files for NN architecture: [filesforNNarch](https://entrepot.ibisc.univ-evry.fr/f/6f1c513798df41999b5d/?dl=1) -->
TCGA dataset can be downloaded from [GDC portal](https://portal.gdc.cancer.gov/).
<!--
Here, you can find the pre-processed training and test sets:
[training set](https://entrepot.ibisc.univ-evry.fr/f/5b57ab5a69de4f6ab26b/?dl=1)
[test set](https://entrepot.ibisc.univ-evry.fr/f/057f1ffa0e6c4aab9bee/?dl=1)
Additional files for NN architecture: [filesforNNarch](https://entrepot.ibisc.univ-evry.fr/f/6f1c513798df41999b5d/?dl=1)
-->
### Usage
<!--
There exists 3 functions (flag *processing*): one is dedicated to the training of the model (*train*), another one to the evaluation of the model on the test set (*evaluate*), and the last one to the prediction of the outcomes of the samples from the test set (*predict*).
-->
#### 1) Train
On the microarray dataset:
```bash
python GraphGONet.py --n_inputs=36834 --n_nodes=10663 --n_nodes_annotated=8249 --n_classes=1 --mask="top" --selection_ratio=0.01 --n_epochs=50 --es --patience=5 --class_weight
```
On TCGA dataset:
```bash
python GraphGONet.py --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --mask="top" --selection_ratio=0.01 --n_epochs=50 --es --patience=5 --class_weight
```
<!--
#### 2) Evaluate
```bash
python DeepGONet.py --type_training="LGO" --alpha=1e-2 --EPOCHS=600 --is_training=False --restore=True --processing="evaluate"
```
#### 3) Predict
```bash
python DeepGONet.py --type_training="LGO" --alpha=1e-2 --EPOCHS=600 --is_training=False --restore=True --processing="predict"
```
The outcomes are saved into a numpy array.
-->
#### Help
All the details about the command line flags can be provided by the following command:
```bash
python GraphGONet.py --help
```
For most of the flags, the default values can be employed. *log_dir* and *save_dir* can be modified to your own repositories. Only the flags in the command lines displayed have to be adjusted to achieve the desired objective.
### Comparison with random selection
On the microarray dataset:
```bash
python GraphGONet.py --n_inputs=36834 --n_nodes=10663 --n_nodes_annotated=8249 --n_classes=1 --mask="random" --selection_ratio=0.01 --n_epochs=50 --es --patience=5 --class_weight
```
### Comparison with no selection
On the microarray dataset:
```bash
python GraphGONet.py --n_inputs=36834 --n_nodes=10663 --n_nodes_annotated=8249 --n_classes=1 --n_epochs=50 --es --patience=5 --class_weight
```
<!--
### Interpretation tool
Please see the notebook entitled *Interpretation_tool.ipynb* to perform the biological interpretation of the results.
-->
\ No newline at end of file
Python==3.6.7
captum==0.3.1
jupyterlab==3.0.16
matplotlib==3.3.4
numpy==1.19.5
networkx==2.5
obonet==0.2.6
pandas==1.1.5
scikit-learn==0.24.2
seaborn==0.11.1
sklearn-pandas==2.2.0
torch==1.7.1+cu110
torch-cluster==1.5.8
torch-geometric==1.6.3
torch-scatter==2.0.5
torch-sparse==0.6.8
torch-spline-conv==1.2.0
torchsummary==1.5.1
torchtuples==0.2.0
torchvision==0.8.2+cu110
tqdm==4.56.0