add intepretation tool

Victoria BOURGEAIS
Commit bfbcf3ccf9b5ffde3cacdae8bc4b1b654934ae0f bfbcf3cc 1 parent 10147d57
Showing 5 changed files with 1276 additions and 4 deletions
.gitignore
Interpretation_tool.ipynb
README.md
filesforNNarch.zip
requirements.txt
--- a/.gitignore
View file @bfbcf3c
+++ b/.gitignore
View file @bfbcf3c
- adj_matrix.csv
- first_matrix_connection_GO.csv
- go_level.csv
 X_test.npz
 X_train.npz
+ filesforNNarch.zip
 __pycache__/
--- a/Interpretation_tool.ipynb 0 → 100644
View file @bfbcf3c
+++ b/Interpretation_tool.ipynb 0 → 100644
View file @bfbcf3c
+ {
+  "cells": [
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "# DeepGONet - interpretation tool\n",
+     "\n",
+     "**Objective:** this tool points out the main biological functions used for cancer predictions and quantify their  contribution via LRP relevance score. It can be used at three levels: disease, subdisease, and patient.\n",
+     "\n",
+     "To be used, the model has to be trained first and saved with the script *DeepGONet.py* (flag ``save=True``).\n",
+     "Moreover, the package [*innvestigate*](https://github.com/albermax/innvestigate) has to be installed beforehand."
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "#### Load packages"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 1,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "import warnings\n",
+     "import os"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 2,
+    "metadata": {},
+    "outputs": [
+     {
+      "name": "stderr",
+      "output_type": "stream",
+      "text": [
+       "/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+       "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+       "/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+       "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+       "/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+       "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+       "/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+       "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+       "/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+       "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+       "/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+       "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+      ]
+     }
+    ],
+    "source": [
+     "import scipy\n",
+     "import tensorflow as tf\n",
+     "import numpy as np\n",
+     "import pandas as pd\n",
+     "import matplotlib.pyplot as plt\n",
+     "import pickle\n",
+     "import seaborn\n",
+     "from sklearn import preprocessing\n",
+     "import gc\n",
+     "from tqdm import tqdm\n",
+     "from tqdm import tqdm_notebook\n",
+     "from sklearn.utils import class_weight\n",
+     "import matplotlib.backends.backend_pdf"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 3,
+    "metadata": {},
+    "outputs": [
+     {
+      "name": "stderr",
+      "output_type": "stream",
+      "text": [
+       "Using TensorFlow backend.\n"
+      ]
+     }
+    ],
+    "source": [
+     "from keras.optimizers import Adam,SGD,Nadam\n",
+     "from sklearn.utils import class_weight\n",
+     "from keras.utils import to_categorical, plot_model\n",
+     "from keras.models import Model\n",
+     "from keras import backend as K\n",
+     "from keras import regularizers\n",
+     "from keras.layers import Layer\n",
+     "from keras.layers import Input, Dense, BatchNormalization, Dropout, ReLU, Activation,Lambda\n",
+     "from keras.models import Model\n",
+     "from keras.callbacks import ReduceLROnPlateau\n",
+     "from keras.backend import clear_session, set_session,l2_normalize\n",
+     "from keras.utils import multi_gpu_model"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 4,
+    "metadata": {},
+    "outputs": [
+     {
+      "data": {
+       "text/plain": [
+        "['/job:localhost/replica:0/task:0/device:GPU:0']"
+       ]
+      },
+      "execution_count": 4,
+      "metadata": {},
+      "output_type": "execute_result"
+     }
+    ],
+    "source": [
+     "from keras import backend as K\n",
+     "K.tensorflow_backend._get_available_gpus()"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "#### Environnement definition"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 5,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "warnings.filterwarnings(\"ignore\")\n",
+     "os.environ[\"CUDA_DEVICE_ORDER\"]=\"PCI_BUS_ID\"\n",
+     "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\" #your GPU device\n",
+     "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' "
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 9,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "dir_data = #TO BE DEFINED\n",
+     "save_dir= #TO BE DEFINED"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "## 1. Definition of the network"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "### a) Definition of the variables of the network"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "- number of layers/neurons by layer"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 6,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "NB_LAYERS = 6\n",
+     "N_HIDDEN_6 = 90 #Level 2\n",
+     "N_HIDDEN_5 = 255 #Level 3\n",
+     "N_HIDDEN_4 = 515 #Level 4\n",
+     "N_HIDDEN_3 = 951 #Level 5\n",
+     "N_HIDDEN_2 = 1386 #Level 6\n",
+     "N_HIDDEN_1 = 1574 #Level 7\n",
+     "SHAPE_FEATURES = 54675 #Input Layer\n",
+     "SHAPE_LABEL = 1"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "- control of the regularization term $L_{GO}$"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 7,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "ALPHA = 1e-2 #fixe"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "- training hyperparameters"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 8,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "OPT = \"adam\"\n",
+     "LR = 1e-03\n",
+     "DROP_RATIO = 1.0\n",
+     "\n",
+     "EPOCHS = 600\n",
+     "BATCH_SIZE = 2**9 #parameter that will be used later\n",
+     "\n",
+     "NB_STACK = 1\n",
+     "BN = False"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "### b) Definition of the functions"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 10,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "def mlp_for_sigmoid(drop_ratio=0.60,trainable=True):\n",
+     "    \n",
+     "    inputs = Input(shape=(SHAPE_FEATURES,))\n",
+     "    \n",
+     "    first_layer = Dense(N_HIDDEN_1,name='first_layer')(inputs)   \n",
+     "    x = ReLU()(first_layer)        \n",
+     "    x = Dropout(rate=drop_ratio)(x)\n",
+     "    \n",
+     "    second_layer = Dense(N_HIDDEN_2,name='second_layer')(x)    \n",
+     "    x = ReLU()(second_layer)\n",
+     "    x = Dropout(rate=drop_ratio)(x)\n",
+     "    \n",
+     "    third_layer = Dense(N_HIDDEN_3,name='third_layer')(x)    \n",
+     "    x = ReLU()(third_layer)   \n",
+     "    x = Dropout(rate=drop_ratio)(x)   \n",
+     "    \n",
+     "    fourth_layer = Dense(N_HIDDEN_4,name='fourth_layer')(x)    \n",
+     "    x = ReLU()(fourth_layer)   \n",
+     "    x = Dropout(rate=drop_ratio)(x)\n",
+     "    \n",
+     "    fifth_layer = Dense(N_HIDDEN_5,name='fifth_layer')(x)\n",
+     "    x = ReLU()(fifth_layer)\n",
+     "    x = Dropout(rate=drop_ratio)(x)\n",
+     "    \n",
+     "    sixth_layer = Dense(N_HIDDEN_6,name='sixth_layer')(x)\n",
+     "    x = ReLU()(sixth_layer)\n",
+     "    x = Dropout(rate=drop_ratio)(x)\n",
+     "    \n",
+     "    last_layer = Dense(SHAPE_LABEL,name='last_layer')(x)    \n",
+     "    predictions = Activation('sigmoid')(last_layer)\n",
+     "    \n",
+     "    model = Model(inputs=inputs, outputs=predictions)\n",
+     "    \n",
+     "    return model"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "## 2. Evaluation of the model"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "### a) Load the data"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 11,
+    "metadata": {},
+    "outputs": [
+     {
+      "name": "stdout",
+      "output_type": "stream",
+      "text": [
+       "CPU times: user 7.97 s, sys: 206 ms, total: 8.18 s\n",
+       "Wall time: 17.5 s\n"
+      ]
+     }
+    ],
+    "source": [
+     "%%time\n",
+     "loaded = np.load(os.path.join(dir_data,\"X_test.npz\"))\n",
+     "X_test = loaded['x']\n",
+     "y_test = loaded['y']\n",
+     "if SHAPE_LABEL>=2: \n",
+     "    y_test=to_categorical(y_test)"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 12,
+    "metadata": {},
+    "outputs": [
+     {
+      "data": {
+       "text/plain": [
+        "(4462, 54675)"
+       ]
+      },
+      "execution_count": 12,
+      "metadata": {},
+      "output_type": "execute_result"
+     }
+    ],
+    "source": [
+     "X_test.shape"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 13,
+    "metadata": {},
+    "outputs": [
+     {
+      "data": {
+       "text/plain": [
+        "(4462, 1)"
+       ]
+      },
+      "execution_count": 13,
+      "metadata": {},
+      "output_type": "execute_result"
+     }
+    ],
+    "source": [
+     "y_test.shape"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "### b) Restoration of the model learned with TF"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "#### i. Prepare the file \"model_vars.txt\"\n",
+     "It has to be executed only once. It means that if the notebook is executed again later, you can skip this step. You can directly jump to the step below."
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": null,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "%%time\n",
+     "PATH_REL_META = os.path.join(save_dir,'model.meta')\n",
+     "\n",
+     "#start tensorflow session\n",
+     "with tf.Session() as sess:\n",
+     "    #import graph\n",
+     "    saver = tf.train.import_meta_graph(PATH_REL_META)\n",
+     "    #load weights for graph\n",
+     "    saver.restore(sess, os.path.join(save_dir,\"model\"))\n",
+     "\n",
+     "    #get all global variables (including model variables)\n",
+     "    vars_global = tf.global_variables()\n",
+     "\n",
+     "    #get their name and value and put them into dictionary\n",
+     "    model_vars = {}\n",
+     "    \n",
+     "    with tf.device('/gpu:0'): #TO BE CHANGED if you didn't use this one.\n",
+     "        \n",
+     "        for var in vars_global:\n",
+     "            try:\n",
+     "                model_vars[var.name] = sess.run(var)\n",
+     "            except:\n",
+     "                print(\"For var={}, an exception occurred\".format(var.name))"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": null,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "with open(os.path.join(save_dir,\"model_vars.txt\"), \"wb\") as fp:\n",
+     "    #Pickling\n",
+     "    pickle.dump(model_vars, fp)"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": null,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "tf.reset_default_graph()"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "#### ii. Fix the value of the model parameters (weights and biases)"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 14,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "indices = ['first_layer','second_layer','third_layer','fourth_layer',\n",
+     "                        'fifth_layer','sixth_layer','last_layer']"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 15,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "with open(os.path.join(save_dir,\"model_vars.txt\"), \"rb\") as fp:\n",
+     "    model_vars=pickle.load(fp)"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 16,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "model_for_innvestigate = mlp_for_sigmoid(drop_ratio=DROP_RATIO,trainable=False)"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 17,
+    "metadata": {},
+    "outputs": [
+     {
+      "name": "stdout",
+      "output_type": "stream",
+      "text": [
+       "CPU times: user 299 ms, sys: 391 ms, total: 690 ms\n",
+       "Wall time: 1.06 s\n"
+      ]
+     }
+    ],
+    "source": [
+     "%%time\n",
+     "for layer_idx,layer_name in enumerate(indices):\n",
+     "    if layer_name==\"last_layer\":\n",
+     "        layer_idx=\"_out\"\n",
+     "    else:\n",
+     "        layer_idx+=1\n",
+     "    patternW='W{}:0'.format(layer_idx)\n",
+     "    patternB='B{}:0'.format(layer_idx)\n",
+     "    model_for_innvestigate.get_layer(name=layer_name).set_weights([model_vars[patternW], model_vars[patternB]])"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "#### iii. Launch the model on the test set"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 18,
+    "metadata": {},
+    "outputs": [
+     {
+      "name": "stdout",
+      "output_type": "stream",
+      "text": [
+       "4462/4462 [==============================] - 2s 440us/step\n",
+       "test loss, test acc: [0.1820492276450989, 0.924697445091887]\n"
+      ]
+     }
+    ],
+    "source": [
+     "adam = Adam(lr=LR)\n",
+     "model_for_innvestigate.compile(optimizer=adam,loss='binary_crossentropy',metrics=['accuracy'])\n",
+     "results = model_for_innvestigate.evaluate(X_test, y_test)\n",
+     "print('test loss, test acc:', results)"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 19,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "y_result = model_for_innvestigate.predict(X_test)"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 20,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "y_result_final = (y_result>0.5).astype(dtype=int)"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 21,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "y_result_final=y_result_final.reshape(-1)"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 22,
+    "metadata": {},
+    "outputs": [
+     {
+      "data": {
+       "text/plain": [
+        "array([0, 1, 1, ..., 1, 1, 1])"
+       ]
+      },
+      "execution_count": 22,
+      "metadata": {},
+      "output_type": "execute_result"
+     }
+    ],
+    "source": [
+     "y_result_final"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 23,
+    "metadata": {},
+    "outputs": [
+     {
+      "data": {
+       "text/plain": [
+        "(2950,)"
+       ]
+      },
+      "execution_count": 23,
+      "metadata": {},
+      "output_type": "execute_result"
+     }
+    ],
+    "source": [
+     "y_result_final[y_result_final==1].shape"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 24,
+    "metadata": {},
+    "outputs": [
+     {
+      "data": {
+       "text/plain": [
+        "(1512,)"
+       ]
+      },
+      "execution_count": 24,
+      "metadata": {},
+      "output_type": "execute_result"
+     }
+    ],
+    "source": [
+     "y_result_final[y_result_final==0].shape"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 25,
+    "metadata": {},
+    "outputs": [
+     {
+      "name": "stdout",
+      "output_type": "stream",
+      "text": [
+       "[[1344  168]\n",
+       " [ 168 2782]]\n",
+       "              precision    recall  f1-score   support\n",
+       "\n",
+       "         0.0       0.89      0.89      0.89      1512\n",
+       "         1.0       0.94      0.94      0.94      2950\n",
+       "\n",
+       "   micro avg       0.92      0.92      0.92      4462\n",
+       "   macro avg       0.92      0.92      0.92      4462\n",
+       "weighted avg       0.92      0.92      0.92      4462\n",
+       "\n"
+      ]
+     }
+    ],
+    "source": [
+     "from sklearn.metrics import confusion_matrix,classification_report\n",
+     "print(confusion_matrix(y_test,y_result_final))\n",
+     "print(classification_report(y_test,y_result_final))"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "## 3. Biological Interpretation"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 26,
+    "metadata": {},
+    "outputs": [
+     {
+      "name": "stdout",
+      "output_type": "stream",
+      "text": [
+       "CPU times: user 10.5 s, sys: 501 ms, total: 11 s\n",
+       "Wall time: 51.7 s\n"
+      ]
+     }
+    ],
+    "source": [
+     "%%time\n",
+     "matrix_connection = []\n",
+     "matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h1.csv\"),index_col=0))\n",
+     "matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h2.csv\"),index_col=0))\n",
+     "matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h3.csv\"),index_col=0))\n",
+     "matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h4.csv\"),index_col=0))\n",
+     "matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h5.csv\"),index_col=0))\n",
+     "matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h6.csv\"),index_col=0))"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 27,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "import innvestigate\n",
+     "import innvestigate.utils as iutils"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 28,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "model_wo_sigmoid = Model(model_for_innvestigate.input, model_for_innvestigate.layers[len(model_for_innvestigate.layers)-2].output)"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 29,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "analyzer = innvestigate.create_analyzer(\"lrp.epsilon_IB\",\n",
+     "                                        model_wo_sigmoid,\n",
+     "                                        reverse_keep_tensors=True,epsilon=1e-07)"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "### a) Use-case 1: Patient level"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 31,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "idx_sample = #TO BE DEFINED\n",
+     "nb_go_to_show = 5"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "Check the real, predicted, and probability class associated to this sample"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 54,
+    "metadata": {},
+    "outputs": [
+     {
+      "data": {
+       "text/plain": [
+        "array([1.], dtype=float32)"
+       ]
+      },
+      "execution_count": 54,
+      "metadata": {},
+      "output_type": "execute_result"
+     }
+    ],
+    "source": [
+     "y_test[idx_sample]"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 55,
+    "metadata": {},
+    "outputs": [
+     {
+      "data": {
+       "text/plain": [
+        "1"
+       ]
+      },
+      "execution_count": 55,
+      "metadata": {},
+      "output_type": "execute_result"
+     }
+    ],
+    "source": [
+     "y_result_final[idx_sample]"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 56,
+    "metadata": {},
+    "outputs": [
+     {
+      "data": {
+       "text/plain": [
+        "array([0.9999999], dtype=float32)"
+       ]
+      },
+      "execution_count": 56,
+      "metadata": {},
+      "output_type": "execute_result"
+     }
+    ],
+    "source": [
+     "y_result[idx_sample]"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "After our model predicts the outcome of a patient with a probability score, the LRP relevance of each neuron is computed. "
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 52,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "analysis = analyzer.analyze(X_test)"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 53,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "LRP_matrix = analyzer._reversed_tensors"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "#### Interpretation of the prediction "
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "The neurons are sorted according to their relevance score and the most important ones are returned with their corresponding GO term and biological function."
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "##### First layer"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 57,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "idx_layer = 0"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 58,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "indexes = np.argsort(LRP_matrix[4][1][idx_sample])[::-1][:nb_go_to_show]\n",
+     "lrp_scores = np.sort(LRP_matrix[4][1][idx_sample])[::-1][:nb_go_to_show]"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 59,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "list_go=dict()\n",
+     "for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
+     "    list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "Top-5"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 63,
+    "metadata": {
+     "scrolled": true
+    },
+    "outputs": [
+     {
+      "name": "stdout",
+      "output_type": "stream",
+      "text": [
+       "GO:0015031 : 1.83\n",
+       "GO:0006468 : 1.11\n",
+       "GO:0030335 : 1.09\n",
+       "GO:0007268 : 0.87\n",
+       "GO:0006412 : 0.86\n"
+      ]
+     }
+    ],
+    "source": [
+     "for go, lrp_score in list_go.items():\n",
+     "    print(go,\":\",np.round(lrp_score,decimals=2))"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "The biological function associated to each GO term can be obtained with public tool like [QuickGO](https://www.ebi.ac.uk/QuickGO/)."
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "##### Second layer"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 64,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "idx_layer += 1"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 65,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "indexes = np.argsort(LRP_matrix[7][1][idx_sample])[::-1][:nb_go_to_show]\n",
+     "lrp_scores = np.sort(LRP_matrix[7][1][idx_sample])[::-1][:nb_go_to_show]"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 66,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "list_go=dict()\n",
+     "for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
+     "    list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "Top-5"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 67,
+    "metadata": {},
+    "outputs": [
+     {
+      "name": "stdout",
+      "output_type": "stream",
+      "text": [
+       "GO:0044257 : 0.61\n",
+       "GO:0071420 : 0.56\n",
+       "GO:0043009 : 0.54\n",
+       "GO:1901258 : 0.53\n",
+       "GO:0010737 : 0.49\n"
+      ]
+     }
+    ],
+    "source": [
+     "for go, lrp_score in list_go.items():\n",
+     "    print(go,\":\",np.round(lrp_score,decimals=2))"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "##### Third layer"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 68,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "idx_layer += 1"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 73,
+    "metadata": {},
+    "outputs": [
+     {
+      "data": {
+       "text/plain": [
+        "2"
+       ]
+      },
+      "execution_count": 73,
+      "metadata": {},
+      "output_type": "execute_result"
+     }
+    ],
+    "source": [
+     "idx_layer"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 75,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "indexes = np.argsort(LRP_matrix[10][1][idx_sample])[::-1][:nb_go_to_show]\n",
+     "lrp_scores = np.sort(LRP_matrix[10][1][idx_sample])[::-1][:nb_go_to_show]"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 76,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "list_go=dict()\n",
+     "for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
+     "    list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "Top-5"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 77,
+    "metadata": {
+     "scrolled": true
+    },
+    "outputs": [
+     {
+      "name": "stdout",
+      "output_type": "stream",
+      "text": [
+       "GO:1901355 : 0.49\n",
+       "GO:0035556 : 0.41\n",
+       "GO:0048382 : 0.36\n",
+       "GO:1905144 : 0.31\n",
+       "GO:0048864 : 0.29\n"
+      ]
+     }
+    ],
+    "source": [
+     "for go, lrp_score in list_go.items():\n",
+     "    print(go,\":\",np.round(lrp_score,decimals=2))"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "##### Fourth layer"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 78,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "idx_layer += 1"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 79,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "indexes = np.argsort(LRP_matrix[13][1][idx_sample])[::-1][:nb_go_to_show]\n",
+     "lrp_scores = np.sort(LRP_matrix[13][1][idx_sample])[::-1][:nb_go_to_show]"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 80,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "list_go=dict()\n",
+     "for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
+     "    list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "Top-5"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 81,
+    "metadata": {
+     "scrolled": true
+    },
+    "outputs": [
+     {
+      "name": "stdout",
+      "output_type": "stream",
+      "text": [
+       "GO:0071709 : 0.54\n",
+       "GO:0055085 : 0.43\n",
+       "GO:0014070 : 0.39\n",
+       "GO:0042127 : 0.38\n",
+       "GO:0010243 : 0.38\n"
+      ]
+     }
+    ],
+    "source": [
+     "for go, lrp_score in list_go.items():\n",
+     "    print(go,\":\",np.round(lrp_score,decimals=2))"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "##### Fifth layer"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 82,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "idx_layer += 1"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 83,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "indexes = np.argsort(LRP_matrix[16][1][idx_sample])[::-1][:nb_go_to_show]\n",
+     "lrp_scores = np.sort(LRP_matrix[16][1][idx_sample])[::-1][:nb_go_to_show]"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 84,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "list_go=dict()\n",
+     "for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
+     "    list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "Top-5"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 85,
+    "metadata": {
+     "scrolled": true
+    },
+    "outputs": [
+     {
+      "name": "stdout",
+      "output_type": "stream",
+      "text": [
+       "GO:0044091 : 0.62\n",
+       "GO:0060322 : 0.61\n",
+       "GO:0050794 : 0.57\n",
+       "GO:0050808 : 0.5\n",
+       "GO:0051098 : 0.5\n"
+      ]
+     }
+    ],
+    "source": [
+     "for go, lrp_score in list_go.items():\n",
+     "    print(go,\":\",np.round(lrp_score,decimals=2))"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "##### Sixth layer"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 86,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "idx_layer += 1"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 87,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "indexes = np.argsort(LRP_matrix[19][1][idx_sample])[::-1][:nb_go_to_show]\n",
+     "lrp_scores = np.sort(LRP_matrix[19][1][idx_sample])[::-1][:nb_go_to_show]"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 88,
+    "metadata": {},
+    "outputs": [],
+    "source": [
+     "list_go=dict()\n",
+     "for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
+     "    list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
+    ]
+   },
+   {
+    "cell_type": "markdown",
+    "metadata": {},
+    "source": [
+     "Top-5"
+    ]
+   },
+   {
+    "cell_type": "code",
+    "execution_count": 89,
+    "metadata": {
+     "scrolled": true
+    },
+    "outputs": [
+     {
+      "name": "stdout",
+      "output_type": "stream",
+      "text": [
+       "GO:0010463 : 0.7\n",
+       "GO:0030534 : 0.7\n",
+       "GO:0006739 : 0.63\n",
+       "GO:0031128 : 0.61\n",
+       "GO:0048856 : 0.61\n"
+      ]
+     }
+    ],
+    "source": [
+     "for go, lrp_score in list_go.items():\n",
+     "    print(go,\":\",np.round(lrp_score,decimals=2))"
+    ]
+   }
+  ],
+  "metadata": {
+   "celltoolbar": "Format de la Cellule Texte Brut",
+   "kernelspec": {
+    "display_name": "venv",
+    "language": "python",
+    "name": "venv"
+   },
+   "language_info": {
+    "codemirror_mode": {
+     "name": "ipython",
+     "version": 3
+    },
+    "file_extension": ".py",
+    "mimetype": "text/x-python",
+    "name": "python",
+    "nbconvert_exporter": "python",
+    "pygments_lexer": "ipython3",
+    "version": "3.6.7"
+   }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 2
+ }
--- a/README.md
View file @bfbcf3c
+++ b/README.md
View file @bfbcf3c
@@ -20,6 +20,8 @@ The full dataset can be downloaded on ArrayExpress database under the id [E-MTAB
 
 [test set](https://entrepot.ibisc.univ-evry.fr/f/057f1ffa0e6c4aab9bee/?dl=1) 
 
+ Additional files for NN architecture: [filesforNNarch](https://entrepot.ibisc.univ-evry.fr/f/6f1c513798df41999b5d/?dl=1) 
+ 
 ### Usage
 
 Deep GONet was achieved with the $L_{GO}$ regularization and the hyperparameter $\alpha=1e^{-2}$.  
@@ -81,3 +83,7 @@ Without regularization:
 ```bash
 python DeepGONet.py --alpha=0 --EPOCHS=100 --is_training=True --display_step=5 --save=True --processing="train"
 ```
+ 
+ ###  Interpretation tool
+ 
+ Please see the notebook entitled *Interpretation_tool.ipynb* to perform the biological interpretation of the results.
\ No newline at end of file
--- a/filesforNNarch.zip
View file @bfbcf3c
+++ b/filesforNNarch.zip
View file @bfbcf3c
--- a/requirements.txt
View file @bfbcf3c
+++ b/requirements.txt
View file @bfbcf3c
@@ -12,4 +12,5 @@ scipy==1.2.0
 seaborn==0.9.0
 tensorflow==1.12.0
 tensorflow-gpu==1.12.0
- tqdm==4.36.1
\ No newline at end of file
+ tqdm==4.36.1
+ innvestigate==1.0.8