@@ -10,56 +10,44 @@ GraphGONet is a self-explaining neural network integrating the Gene Ontology int
## Get started
The code is implemented in Python using the [PyTorch](https://pytorch.org/) framework v1.7.1 (see [requirements.txt](https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet/blob/master/requirements.txt) for more details)
The code is implemented in Python (3.6.7) using the [PyTorch](https://pytorch.org/) framework v1.7.1 (see [requirements.txt](https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet/blob/master/requirements.txt) for more details about the additional packages used).
### Dataset
## Dataset
The full microarray dataset can be downloaded on ArrayExpress database under the id [E-MTAB-3732](https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-3732/). Here, you can find the pre-processed training and test sets:
<!-- Additional files for NN architecture: [filesforNNarch](https://entrepot.ibisc.univ-evry.fr/f/6f1c513798df41999b5d/?dl=1) -->
The full microarray dataset can be downloaded on ArrayExpress database under the id [E-MTAB-3732](https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-3732/).
TCGA dataset can be downloaded from [GDC portal](https://portal.gdc.cancer.gov/).
<!--
Here, you can find the pre-processed training and test sets:
Additional files for NN architecture: [filesforNNarch](https://entrepot.ibisc.univ-evry.fr/f/6f1c513798df41999b5d/?dl=1)
-->
Here, you can find the pre-processed training, validation and test sets with additional files for NN architecture to test the network: https://entrepot.ibisc.univ-evry.fr/d/d4764174275347f09862/
### Usage
## Usage
Example on TCGA dataset:
<!--
There exists 3 functions (flag *processing*): one is dedicated to the training of the model (*train*), another one to the evaluation of the model on the test set (*evaluate*), and the last one to the prediction of the outcomes of the samples from the test set (*predict*).
For most of the flags, the default values can be employed. *log_dir* and *save_dir* can be modified to your own repositories. Only the flags in the command lines displayed have to be adjusted to achieve the desired objective.
### Comparison with random selection
### Train the model with a small number of training samples
For most of the flags, the default values can be employed. *dir_data*, *dir_files*, and *dir_log* can be set to your own repositories. Only the flags in the command lines displayed have to be adjusted to reproduce the results from the paper. If you have enough GPU memory, you can choose to switch to the entire GO graph (argument *type_graph* set to "entire"). The graph can be reconstructed by following the notebooks: Build_GONet_graph_part{1,2,3}.ipynb located in the notebooks directory. Then, you should change the value of the arguments *n_nodes* and *n_nodes_annotated* in the command line.
### Interpretation tool
Please see the notebook entitled *Interpretation_tool.ipynb* to perform the biological interpretation of the results.
-->
\ No newline at end of file
Please see the notebook entitled *Interpretation_tool.ipynb* (located in the notebooks directory) to perform the biological interpretation of the results.
graph=nx.read_gpickle(os.path.join(args.dir_files,"gobp-{}-converted".format(args.type_graph)))#read the GO graph wich will be converted into the hidden layers of the network
parser.add_argument('--dir_files',type=str,default='files',help='repository for all the files needed for the training and the evaluation')
parser.add_argument('--dir_data',type=str,default='data/E-MTAB-3732',help='repository of the dataset')
parser.add_argument('--dir_data',type=str,default='data',help='repository of the dataset')
parser.add_argument('--file_extension',type=int,default=None,help="option to save different models with the same setting")
parser.add_argument('--save',action='store_true',help="Do you need to save the model?")
parser.add_argument('--restore',action='store_true',help="Do you want to restore a previous model?")
parser.add_argument('--processing',type=str,default="train_and_evaluate",help="What to do with the model? {train,train_and_evaluate,evaluate,predict}")
# -- Architecture of the neural network --
parser.add_argument('--type_graph',type=str,default="truncated",help='type of GO graph considered (truncated,entire)')
parser.add_argument('--n_samples',type=int,default=None,help="number of samples to use")
parser.add_argument('--n_inputs',type=int,default=36834,help="number of features")
parser.add_argument('--n_nodes',type=int,default=10663,help="number of nodes of GO graph")
parser.add_argument('--n_nodes_annotated',type=int,default=8249,help="number of nodes annotated with the genes")
parser.add_argument('--dim_init_emb',type=int,default=1,help="initial dimension of the nodes embedding")
parser.add_argument('--n_prop1',type=int,default=1,help="number of neurons in the GO layers")
parser.add_argument('--n_layers',type=int,default=1,help="number of layers")
#DatasetLoader based on the tutorial "Creating Your Own Datasets" - pytorch-geometric
classGeneExpressionDataset(Dataset):
"""Face Landmarks dataset."""
...
...
@@ -52,7 +52,6 @@ class GeneExpressionDataset(Dataset):
sss=StratifiedShuffleSplit(n_splits=1,train_size=n_samples,test_size=self.X.shape[0]-n_samples,random_state=42)#keeping the proportion of the original classes