Victoria BOURGEAIS

add notebooks to get the interpretation of a prediction and to build the GO laye…

…rs architecture of the NN
files/
.ipynb_checkpoints/
scripts/__pycache__/
log/
......
......@@ -10,7 +10,7 @@ GraphGONet is a self-explaining neural network integrating the Gene Ontology int
## Get started
The code is implemented in Python (3.6.7) using the [PyTorch](https://pytorch.org/) framework v1.7.1 (see [requirements.txt](https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet/blob/master/requirements.txt) for more details about the additional packages used).
The code is implemented in Python (3.6.7) using [PyTorch v1.7.1](https://pytorch.org/) and [PyTorch-geometric v1.6.3](https://pytorch-geometric.readthedocs.io/en/1.6.3/modules/nn.html) (see [requirements.txt](https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet/blob/master/requirements.txt) for more details about the additional packages used).
## Dataset
......@@ -31,49 +31,38 @@ There exists 3 functions (flag *processing*): one is dedicated to the training o
<!-- On the microarray dataset:
```bash
python3 GraphGONet.py --save --n_inputs=36834 --n_nodes=10663 --n_nodes_annotated=8249 --n_classes=1 --mask="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
python3 scripts/GraphGONet.py --save --n_inputs=36834 --n_nodes=10663 --n_nodes_annotated=8249 --n_classes=1 --selection_op="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
```
-->
```bash
python3 GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --mask="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
python3 scripts/GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --selection_op="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
```
<!--
### 2) Evaluate
```bash
python DeepGONet.py --type_training="LGO" --alpha=1e-2 --EPOCHS=600 --is_training=False --restore=True --processing="evaluate"
```
### 3) Predict
```bash
python DeepGONet.py --type_training="LGO" --alpha=1e-2 --EPOCHS=600 --is_training=False --restore=True --processing="predict"
```
The outcomes are saved into a numpy array.
-->
### Comparison with random selection
```bash
python GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --mask="random" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
python scripts/GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --selection_op="random" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
```
### Comparison with no selection
```bash
python GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --n_epochs=50 --es --patience=5 --class_weight
python scripts/GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --n_epochs=50 --es --patience=5 --class_weight
```
### Train the model with a small number of training samples
```bash
python GraphGONet.py --save --n_samples=50 --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --mask="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
python scripts/GraphGONet.py --save --n_samples=50 --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --selection_op="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
```
### Help
......@@ -81,7 +70,7 @@ python GraphGONet.py --save --n_samples=50 --n_inputs=18427 --n_nodes=10636 --n_
All the details about the command line flags can be provided by the following command:
```bash
python GraphGONet.py --help
python scripts/GraphGONet.py --help
```
For most of the flags, the default values can be employed. *dir_data*, *dir_files*, and *dir_log* can be set to your own repositories. Only the flags in the command lines displayed have to be adjusted to reproduce the results from the paper. If you have enough GPU memory, you can choose to switch to the entire GO graph (argument *type_graph* set to "entire"). The graph can be reconstructed by following the notebooks: Build_GONet_graph_part{1,2,3}.ipynb located in the notebooks directory. Then, you should change the value of the arguments *n_nodes* and *n_nodes_annotated* in the command line.
......
This diff is collapsed. Click to expand it.
This diff could not be displayed because it is too large.
This diff could not be displayed because it is too large.
This diff could not be displayed because it is too large.
......@@ -90,7 +90,7 @@ def train(args):
print("Launching the learning")
device = torch.device(args.device)
model = Net(n_genes=args.n_inputs,n_nodes=args.n_nodes,n_nodes_annot=args.n_nodes_annotated,n_nodes_emb=args.dim_init,n_classes=args.n_classes,
n_prop1=args.n_prop1,adj_mat_fc1=connection_matrix.values,mask=args.mask,ratio=args.selection_ratio).to(device)
n_prop1=args.n_prop1,adj_mat_fc1=connection_matrix.values,selection=args.selection_op,ratio=args.selection_ratio).to(device)
print(model)
print("(model mem allocation) - Memory available : {:.2e}".format(torch.cuda.memory_reserved(0)-torch.cuda.memory_allocated(0)))
......@@ -309,7 +309,7 @@ def main():
parser.add_argument('--n_classes', type=int, default=1, help="number of classes")
# -- Learning and Hyperparameters --
parser.add_argument('--mask', type=str, default=None, help='type of selection (random,top)')
parser.add_argument('--selection_op', type=str, default=None, help='type of selection (random,top)')
parser.add_argument('--selection_ratio', type=float, default=0.5, help='selection ratio')
parser.add_argument('--optimizer', type=str, default='adam', help="optimizer {adam, momentum, adagrad, rmsprop}")
parser.add_argument('--lr', type=float, default=0.001, help='learning rate')
......@@ -329,10 +329,10 @@ def main():
if not(os.path.isdir(args.dir_log)):
os.mkdir(args.dir_log)
if args.mask:
args.dir_save=os.path.join(args.dir_log,'GraphGONet_MASK={}_SELECTRATIO={}'.format(args.mask,args.selection_ratio))
if args.selection_op:
args.dir_save=os.path.join(args.dir_log,'GraphGONet_SELECTOP={}_SELECTRATIO={}'.format(args.selection_op,args.selection_ratio))
else:
args.dir_save=os.path.join(args.dir_log,'GraphGONet_MASK={}'.format(args.mask))
args.dir_save=os.path.join(args.dir_log,'GraphGONet_SELECTOP={}'.format(args.selection_op))
if args.n_samples:
args.dir_save+="_N_SAMPLES={}".format(args.n_samples)
......
......@@ -189,9 +189,9 @@ def concatenate_and_mask(x: Tensor, batch: Tensor, idx_nodes_kept : Tensor, num_
output[i,mask]=x[i*num_nodes_kept[i]:(i+1)*num_nodes_kept[i]].view(-1) #shape : (num_nodes_by_graph,1) -> (_,max_num_nodes)
return output
class NoSelection(torch.nn.Module):
class Mask(torch.nn.Module):
def __init__(self, in_channels, method, n_nodes, **kwargs):
super(NoSelection, self).__init__()
super(Mask, self).__init__()
self.method = method
self.in_channels = in_channels
if self.method.__name__ == "global_mean_pool":
......@@ -211,8 +211,8 @@ class NoSelection(torch.nn.Module):
class Net(torch.nn.Module):
def __init__(self,n_genes,n_nodes,n_nodes_annot,n_nodes_emb,n_prop1,n_classes,adj_mat_fc1,
propagation="DAGProp",mask=None,ratio=1.0,
selection="concatenate_and_mask"):
propagation="DAGProp",selection=None,ratio=1.0,
mask="concatenate_and_mask"):
super(Net, self).__init__()
self.n_genes = n_genes
self.n_nodes = n_nodes
......@@ -226,15 +226,15 @@ class Net(torch.nn.Module):
with torch.no_grad():
self.fc1.weight.mul_(self.adj_mat_fc1) #mask all the connections btw genes and neurons that do not represent GO annotations
self.propagation = eval(propagation)(in_channels=n_nodes_emb, out_channels=n_prop1,aggr = "mean") # expected dim: [nSamples, nNodes, nChannels]
if mask:
if selection:
self.ratio = ratio
if mask=="random":
self.mask = RandomSelection(in_channels=n_prop1,ratio=ratio)
elif mask=="top":
self.mask = TopSelection(in_channels=n_prop1,ratio=ratio)
if selection=="random":
self.selection = RandomSelection(in_channels=n_prop1,ratio=ratio)
elif selection=="top":
self.selection = TopSelection(in_channels=n_prop1,ratio=ratio)
else:
selection="concatenate"
self.selection = NoSelection(method=globals()[selection],in_channels=n_prop1,n_nodes=n_nodes) #option no selection => concatenate
mask="concatenate"
self.mask = Mask(method=globals()[mask],in_channels=n_prop1,n_nodes=n_nodes) #option no selection => concatenate
self.fc2 = Linear(in_features=n_nodes,out_features=n_classes)
def forward(self,transcriptomic_data,graph_data):
......@@ -247,10 +247,13 @@ class Net(torch.nn.Module):
num_nodes = scatter_add(batch.new_ones(x.size(0),dtype=torch.int16), batch, dim=0)
if self.mask:
x, edge_index, _, batch,idx_nodes_kept,_ = self.mask(x, edge_index, None, batch)
x = self.selection(x,batch,idx_nodes_kept,num_nodes)
if self.selection:
x, edge_index, _, batch,idx_nodes_kept,_ = self.selection(x, edge_index, None, batch)
if self.mask.method.__name__ == "concatenate_and_mask":
x = self.mask(x,batch,idx_nodes_kept,num_nodes)
else:
x = self.mask(x,batch)
x = self.fc2(x)
if self.n_classes>=2:
......