add notebooks to get the interpretation of a prediction and to build the GO laye…

…rs architecture of the NN

add notebooks to get the interpretation of a prediction and to build the GO laye…
…rs architecture of the NN
Victoria BOURGEAIS
Commit 7843e64698ef10a4fb5968429713223671cea614 7843e646 1 parent 2abe471b
Showing 8 changed files with 32 additions and 38 deletions
.gitignore
README.md
notebooks/1-Build_GONet_graph_part1.ipynb
notebooks/1-Build_GONet_graph_part2.ipynb
notebooks/1-Build_GONet_graph_part3.ipynb
notebooks/Interpretation.ipynb
scripts/GraphGONet.py
scripts/base_model.py
--- a/.gitignore
View file @7843e64
+++ b/.gitignore
View file @7843e64
 files/
 .ipynb_checkpoints/
 scripts/__pycache__/
+ log/
+ 
--- a/README.md
View file @7843e64
+++ b/README.md
View file @7843e64
@@ -10,7 +10,7 @@ GraphGONet is a self-explaining neural network integrating the Gene Ontology int
 
 ## Get started
 
- The code is implemented in Python (3.6.7) using the [PyTorch](https://pytorch.org/) framework v1.7.1 (see [requirements.txt](https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet/blob/master/requirements.txt) for more details about the additional packages used).
+ The code is implemented in Python (3.6.7) using [PyTorch v1.7.1](https://pytorch.org/) and [PyTorch-geometric v1.6.3](https://pytorch-geometric.readthedocs.io/en/1.6.3/modules/nn.html) (see [requirements.txt](https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet/blob/master/requirements.txt) for more details about the additional packages used).
 
 ## Dataset
 
@@ -31,49 +31,38 @@ There exists 3 functions (flag *processing*): one is dedicated to the training o
 
 <!-- On the microarray dataset:
 ```bash
- python3 GraphGONet.py --save --n_inputs=36834 --n_nodes=10663 --n_nodes_annotated=8249 --n_classes=1 --mask="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight 
+ python3 scripts/GraphGONet.py --save --n_inputs=36834 --n_nodes=10663 --n_nodes_annotated=8249 --n_classes=1 --selection_op="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight 
 ```
 -->
 
 ```bash
- python3 GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --mask="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight 
+ python3 scripts/GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --selection_op="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight 
 ```
 
 <!--
 ### 2) Evaluate
 
- 
- ```bash
- python DeepGONet.py --type_training="LGO" --alpha=1e-2 --EPOCHS=600 --is_training=False --restore=True --processing="evaluate"
- ```
- 
 ### 3) Predict
 
- 
- ```bash
- python DeepGONet.py --type_training="LGO" --alpha=1e-2 --EPOCHS=600 --is_training=False --restore=True --processing="predict"
- ```
- 
- 
 The outcomes are saved into a numpy array.
 -->
 
 ### Comparison with random selection
 
 ```bash
- python GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --mask="random" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight 
+ python scripts/GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --selection_op="random" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight 
 ```
 
 ### Comparison with no selection
 
 ```bash
- python GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --n_epochs=50 --es --patience=5 --class_weight 
+ python scripts/GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --n_epochs=50 --es --patience=5 --class_weight 
 ```
 
 ### Train the model with a small number of training samples
 
 ```bash
- python GraphGONet.py --save --n_samples=50 --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --mask="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight 
+ python scripts/GraphGONet.py --save --n_samples=50 --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --selection_op="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight 
 ```
 
 ### Help
@@ -81,7 +70,7 @@ python GraphGONet.py --save --n_samples=50 --n_inputs=18427 --n_nodes=10636 --n_
 All the details about the command line flags can be provided by the following command:
 
 ```bash
- python GraphGONet.py --help
+ python scripts/GraphGONet.py --help
 ```
 
 For most of the flags, the default values can be employed. *dir_data*, *dir_files*, and *dir_log* can be set to your own repositories. Only the flags in the command lines displayed have to be adjusted to reproduce the results from the paper. If you have enough GPU memory, you can choose to switch to the entire GO graph (argument *type_graph* set to "entire"). The graph can be reconstructed by following the notebooks: Build_GONet_graph_part{1,2,3}.ipynb located in the notebooks directory. Then, you should change the value of the arguments *n_nodes* and *n_nodes_annotated* in the command line. 
--- a/notebooks/1-Build_GONet_graph_part1.ipynb 0 → 100644
View file @7843e64
+++ b/notebooks/1-Build_GONet_graph_part1.ipynb 0 → 100644
View file @7843e64
--- a/notebooks/1-Build_GONet_graph_part2.ipynb 0 → 100644
View file @7843e64
+++ b/notebooks/1-Build_GONet_graph_part2.ipynb 0 → 100644
View file @7843e64
--- a/notebooks/1-Build_GONet_graph_part3.ipynb 0 → 100644
View file @7843e64
+++ b/notebooks/1-Build_GONet_graph_part3.ipynb 0 → 100644
View file @7843e64
--- a/notebooks/Interpretation.ipynb 0 → 100644
View file @7843e64
+++ b/notebooks/Interpretation.ipynb 0 → 100644
View file @7843e64
--- a/scripts/GraphGONet.py
View file @7843e64
+++ b/scripts/GraphGONet.py
View file @7843e64
@@ -90,7 +90,7 @@ def train(args):
 	print("Launching the learning")
 	device = torch.device(args.device)
 	model = Net(n_genes=args.n_inputs,n_nodes=args.n_nodes,n_nodes_annot=args.n_nodes_annotated,n_nodes_emb=args.dim_init,n_classes=args.n_classes,
-                n_prop1=args.n_prop1,adj_mat_fc1=connection_matrix.values,mask=args.mask,ratio=args.selection_ratio).to(device)
+                n_prop1=args.n_prop1,adj_mat_fc1=connection_matrix.values,selection=args.selection_op,ratio=args.selection_ratio).to(device)
 	print(model)
 	print("(model mem allocation) - Memory available : {:.2e}".format(torch.cuda.memory_reserved(0)-torch.cuda.memory_allocated(0)))
 
@@ -309,7 +309,7 @@ def main():
 	parser.add_argument('--n_classes', type=int, default=1, help="number of classes")
 
 	# -- Learning and Hyperparameters --
- 	parser.add_argument('--mask', type=str, default=None, help='type of selection (random,top)')
+ 	parser.add_argument('--selection_op', type=str, default=None, help='type of selection (random,top)')
 	parser.add_argument('--selection_ratio', type=float, default=0.5, help='selection ratio')
 	parser.add_argument('--optimizer', type=str, default='adam', help="optimizer {adam, momentum, adagrad, rmsprop}")
 	parser.add_argument('--lr', type=float, default=0.001, help='learning rate')
@@ -329,10 +329,10 @@ def main():
 	if not(os.path.isdir(args.dir_log)):
 		os.mkdir(args.dir_log)
     
- 	if args.mask:
- 		args.dir_save=os.path.join(args.dir_log,'GraphGONet_MASK={}_SELECTRATIO={}'.format(args.mask,args.selection_ratio))
+ 	if args.selection_op:
+ 		args.dir_save=os.path.join(args.dir_log,'GraphGONet_SELECTOP={}_SELECTRATIO={}'.format(args.selection_op,args.selection_ratio))
 	else:
- 		args.dir_save=os.path.join(args.dir_log,'GraphGONet_MASK={}'.format(args.mask))
+ 		args.dir_save=os.path.join(args.dir_log,'GraphGONet_SELECTOP={}'.format(args.selection_op))
 
 	if args.n_samples:
 		args.dir_save+="_N_SAMPLES={}".format(args.n_samples)
--- a/scripts/base_model.py
View file @7843e64
+++ b/scripts/base_model.py
View file @7843e64
@@ -189,9 +189,9 @@ def concatenate_and_mask(x: Tensor, batch: Tensor, idx_nodes_kept : Tensor, num_
         output[i,mask]=x[i*num_nodes_kept[i]:(i+1)*num_nodes_kept[i]].view(-1) #shape : (num_nodes_by_graph,1) -> (_,max_num_nodes)
     return output
 
- class NoSelection(torch.nn.Module):
+ class Mask(torch.nn.Module):
     def __init__(self, in_channels, method, n_nodes, **kwargs):
-         super(NoSelection, self).__init__()
+         super(Mask, self).__init__()
         self.method = method
         self.in_channels = in_channels
         if self.method.__name__ == "global_mean_pool": 
@@ -211,8 +211,8 @@ class NoSelection(torch.nn.Module):
 
 class Net(torch.nn.Module):
     def __init__(self,n_genes,n_nodes,n_nodes_annot,n_nodes_emb,n_prop1,n_classes,adj_mat_fc1,
-                  propagation="DAGProp",mask=None,ratio=1.0,
-                  selection="concatenate_and_mask"):
+                  propagation="DAGProp",selection=None,ratio=1.0,
+                  mask="concatenate_and_mask"):
         super(Net, self).__init__()
         self.n_genes = n_genes
         self.n_nodes = n_nodes
@@ -226,15 +226,15 @@ class Net(torch.nn.Module):
         with torch.no_grad():
             self.fc1.weight.mul_(self.adj_mat_fc1) #mask all the connections btw genes and neurons that do not represent GO annotations
         self.propagation = eval(propagation)(in_channels=n_nodes_emb, out_channels=n_prop1,aggr = "mean") # expected dim: [nSamples, nNodes, nChannels]
-         if mask:
+         if selection:
             self.ratio = ratio
-             if mask=="random":
-                 self.mask = RandomSelection(in_channels=n_prop1,ratio=ratio)
-             elif mask=="top":
-                 self.mask = TopSelection(in_channels=n_prop1,ratio=ratio)
+             if selection=="random":
+                 self.selection = RandomSelection(in_channels=n_prop1,ratio=ratio)
+             elif selection=="top":
+                 self.selection = TopSelection(in_channels=n_prop1,ratio=ratio)
         else:
-             selection="concatenate"
-         self.selection = NoSelection(method=globals()[selection],in_channels=n_prop1,n_nodes=n_nodes) #option no selection => concatenate  
+             mask="concatenate"
+         self.mask = Mask(method=globals()[mask],in_channels=n_prop1,n_nodes=n_nodes) #option no selection => concatenate  
         self.fc2 = Linear(in_features=n_nodes,out_features=n_classes) 
 
     def forward(self,transcriptomic_data,graph_data):
@@ -247,10 +247,13 @@ class Net(torch.nn.Module):
         
         num_nodes = scatter_add(batch.new_ones(x.size(0),dtype=torch.int16), batch, dim=0)
         
-         if self.mask: 
-             x, edge_index, _, batch,idx_nodes_kept,_ = self.mask(x, edge_index, None, batch)
-         
-         x = self.selection(x,batch,idx_nodes_kept,num_nodes)
+         if self.selection: 
+             x, edge_index, _, batch,idx_nodes_kept,_ = self.selection(x, edge_index, None, batch)
+         if self.mask.method.__name__ == "concatenate_and_mask":
+             x = self.mask(x,batch,idx_nodes_kept,num_nodes)
+         else:
+             x = self.mask(x,batch)
+             
         x = self.fc2(x)
         
         if self.n_classes>=2: