Victoria BOURGEAIS

add intepretation tool

adj_matrix.csv
first_matrix_connection_GO.csv
go_level.csv
X_test.npz
X_train.npz
filesforNNarch.zip
__pycache__/
......
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# DeepGONet - interpretation tool\n",
"\n",
"**Objective:** this tool points out the main biological functions used for cancer predictions and quantify their contribution via LRP relevance score. It can be used at three levels: disease, subdisease, and patient.\n",
"\n",
"To be used, the model has to be trained first and saved with the script *DeepGONet.py* (flag ``save=True``).\n",
"Moreover, the package [*innvestigate*](https://github.com/albermax/innvestigate) has to be installed beforehand."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Load packages"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"import os"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
"/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
"/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
"/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
"/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
"/nhome/siniac/vbourgeais/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
]
}
],
"source": [
"import scipy\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import pickle\n",
"import seaborn\n",
"from sklearn import preprocessing\n",
"import gc\n",
"from tqdm import tqdm\n",
"from tqdm import tqdm_notebook\n",
"from sklearn.utils import class_weight\n",
"import matplotlib.backends.backend_pdf"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Using TensorFlow backend.\n"
]
}
],
"source": [
"from keras.optimizers import Adam,SGD,Nadam\n",
"from sklearn.utils import class_weight\n",
"from keras.utils import to_categorical, plot_model\n",
"from keras.models import Model\n",
"from keras import backend as K\n",
"from keras import regularizers\n",
"from keras.layers import Layer\n",
"from keras.layers import Input, Dense, BatchNormalization, Dropout, ReLU, Activation,Lambda\n",
"from keras.models import Model\n",
"from keras.callbacks import ReduceLROnPlateau\n",
"from keras.backend import clear_session, set_session,l2_normalize\n",
"from keras.utils import multi_gpu_model"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['/job:localhost/replica:0/task:0/device:GPU:0']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from keras import backend as K\n",
"K.tensorflow_backend._get_available_gpus()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Environnement definition"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"warnings.filterwarnings(\"ignore\")\n",
"os.environ[\"CUDA_DEVICE_ORDER\"]=\"PCI_BUS_ID\"\n",
"os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\" #your GPU device\n",
"os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"dir_data = #TO BE DEFINED\n",
"save_dir= #TO BE DEFINED"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Definition of the network"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### a) Definition of the variables of the network"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- number of layers/neurons by layer"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"NB_LAYERS = 6\n",
"N_HIDDEN_6 = 90 #Level 2\n",
"N_HIDDEN_5 = 255 #Level 3\n",
"N_HIDDEN_4 = 515 #Level 4\n",
"N_HIDDEN_3 = 951 #Level 5\n",
"N_HIDDEN_2 = 1386 #Level 6\n",
"N_HIDDEN_1 = 1574 #Level 7\n",
"SHAPE_FEATURES = 54675 #Input Layer\n",
"SHAPE_LABEL = 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- control of the regularization term $L_{GO}$"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"ALPHA = 1e-2 #fixe"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- training hyperparameters"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"OPT = \"adam\"\n",
"LR = 1e-03\n",
"DROP_RATIO = 1.0\n",
"\n",
"EPOCHS = 600\n",
"BATCH_SIZE = 2**9 #parameter that will be used later\n",
"\n",
"NB_STACK = 1\n",
"BN = False"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### b) Definition of the functions"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"def mlp_for_sigmoid(drop_ratio=0.60,trainable=True):\n",
" \n",
" inputs = Input(shape=(SHAPE_FEATURES,))\n",
" \n",
" first_layer = Dense(N_HIDDEN_1,name='first_layer')(inputs) \n",
" x = ReLU()(first_layer) \n",
" x = Dropout(rate=drop_ratio)(x)\n",
" \n",
" second_layer = Dense(N_HIDDEN_2,name='second_layer')(x) \n",
" x = ReLU()(second_layer)\n",
" x = Dropout(rate=drop_ratio)(x)\n",
" \n",
" third_layer = Dense(N_HIDDEN_3,name='third_layer')(x) \n",
" x = ReLU()(third_layer) \n",
" x = Dropout(rate=drop_ratio)(x) \n",
" \n",
" fourth_layer = Dense(N_HIDDEN_4,name='fourth_layer')(x) \n",
" x = ReLU()(fourth_layer) \n",
" x = Dropout(rate=drop_ratio)(x)\n",
" \n",
" fifth_layer = Dense(N_HIDDEN_5,name='fifth_layer')(x)\n",
" x = ReLU()(fifth_layer)\n",
" x = Dropout(rate=drop_ratio)(x)\n",
" \n",
" sixth_layer = Dense(N_HIDDEN_6,name='sixth_layer')(x)\n",
" x = ReLU()(sixth_layer)\n",
" x = Dropout(rate=drop_ratio)(x)\n",
" \n",
" last_layer = Dense(SHAPE_LABEL,name='last_layer')(x) \n",
" predictions = Activation('sigmoid')(last_layer)\n",
" \n",
" model = Model(inputs=inputs, outputs=predictions)\n",
" \n",
" return model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Evaluation of the model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### a) Load the data"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 7.97 s, sys: 206 ms, total: 8.18 s\n",
"Wall time: 17.5 s\n"
]
}
],
"source": [
"%%time\n",
"loaded = np.load(os.path.join(dir_data,\"X_test.npz\"))\n",
"X_test = loaded['x']\n",
"y_test = loaded['y']\n",
"if SHAPE_LABEL>=2: \n",
" y_test=to_categorical(y_test)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(4462, 54675)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_test.shape"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(4462, 1)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_test.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### b) Restoration of the model learned with TF"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### i. Prepare the file \"model_vars.txt\"\n",
"It has to be executed only once. It means that if the notebook is executed again later, you can skip this step. You can directly jump to the step below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"PATH_REL_META = os.path.join(save_dir,'model.meta')\n",
"\n",
"#start tensorflow session\n",
"with tf.Session() as sess:\n",
" #import graph\n",
" saver = tf.train.import_meta_graph(PATH_REL_META)\n",
" #load weights for graph\n",
" saver.restore(sess, os.path.join(save_dir,\"model\"))\n",
"\n",
" #get all global variables (including model variables)\n",
" vars_global = tf.global_variables()\n",
"\n",
" #get their name and value and put them into dictionary\n",
" model_vars = {}\n",
" \n",
" with tf.device('/gpu:0'): #TO BE CHANGED if you didn't use this one.\n",
" \n",
" for var in vars_global:\n",
" try:\n",
" model_vars[var.name] = sess.run(var)\n",
" except:\n",
" print(\"For var={}, an exception occurred\".format(var.name))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open(os.path.join(save_dir,\"model_vars.txt\"), \"wb\") as fp:\n",
" #Pickling\n",
" pickle.dump(model_vars, fp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tf.reset_default_graph()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### ii. Fix the value of the model parameters (weights and biases)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"indices = ['first_layer','second_layer','third_layer','fourth_layer',\n",
" 'fifth_layer','sixth_layer','last_layer']"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"with open(os.path.join(save_dir,\"model_vars.txt\"), \"rb\") as fp:\n",
" model_vars=pickle.load(fp)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"model_for_innvestigate = mlp_for_sigmoid(drop_ratio=DROP_RATIO,trainable=False)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 299 ms, sys: 391 ms, total: 690 ms\n",
"Wall time: 1.06 s\n"
]
}
],
"source": [
"%%time\n",
"for layer_idx,layer_name in enumerate(indices):\n",
" if layer_name==\"last_layer\":\n",
" layer_idx=\"_out\"\n",
" else:\n",
" layer_idx+=1\n",
" patternW='W{}:0'.format(layer_idx)\n",
" patternB='B{}:0'.format(layer_idx)\n",
" model_for_innvestigate.get_layer(name=layer_name).set_weights([model_vars[patternW], model_vars[patternB]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### iii. Launch the model on the test set"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4462/4462 [==============================] - 2s 440us/step\n",
"test loss, test acc: [0.1820492276450989, 0.924697445091887]\n"
]
}
],
"source": [
"adam = Adam(lr=LR)\n",
"model_for_innvestigate.compile(optimizer=adam,loss='binary_crossentropy',metrics=['accuracy'])\n",
"results = model_for_innvestigate.evaluate(X_test, y_test)\n",
"print('test loss, test acc:', results)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"y_result = model_for_innvestigate.predict(X_test)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"y_result_final = (y_result>0.5).astype(dtype=int)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"y_result_final=y_result_final.reshape(-1)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 1, ..., 1, 1, 1])"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_result_final"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2950,)"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_result_final[y_result_final==1].shape"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1512,)"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_result_final[y_result_final==0].shape"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1344 168]\n",
" [ 168 2782]]\n",
" precision recall f1-score support\n",
"\n",
" 0.0 0.89 0.89 0.89 1512\n",
" 1.0 0.94 0.94 0.94 2950\n",
"\n",
" micro avg 0.92 0.92 0.92 4462\n",
" macro avg 0.92 0.92 0.92 4462\n",
"weighted avg 0.92 0.92 0.92 4462\n",
"\n"
]
}
],
"source": [
"from sklearn.metrics import confusion_matrix,classification_report\n",
"print(confusion_matrix(y_test,y_result_final))\n",
"print(classification_report(y_test,y_result_final))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Biological Interpretation"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 10.5 s, sys: 501 ms, total: 11 s\n",
"Wall time: 51.7 s\n"
]
}
],
"source": [
"%%time\n",
"matrix_connection = []\n",
"matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h1.csv\"),index_col=0))\n",
"matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h2.csv\"),index_col=0))\n",
"matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h3.csv\"),index_col=0))\n",
"matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h4.csv\"),index_col=0))\n",
"matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h5.csv\"),index_col=0))\n",
"matrix_connection.append(pd.read_csv(os.path.join(dir_data,\"matrix_connexion_h6.csv\"),index_col=0))"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"import innvestigate\n",
"import innvestigate.utils as iutils"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"model_wo_sigmoid = Model(model_for_innvestigate.input, model_for_innvestigate.layers[len(model_for_innvestigate.layers)-2].output)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"analyzer = innvestigate.create_analyzer(\"lrp.epsilon_IB\",\n",
" model_wo_sigmoid,\n",
" reverse_keep_tensors=True,epsilon=1e-07)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### a) Use-case 1: Patient level"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"idx_sample = #TO BE DEFINED\n",
"nb_go_to_show = 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the real, predicted, and probability class associated to this sample"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1.], dtype=float32)"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_test[idx_sample]"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_result_final[idx_sample]"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.9999999], dtype=float32)"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_result[idx_sample]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After our model predicts the outcome of a patient with a probability score, the LRP relevance of each neuron is computed. "
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"analysis = analyzer.analyze(X_test)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"LRP_matrix = analyzer._reversed_tensors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Interpretation of the prediction "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The neurons are sorted according to their relevance score and the most important ones are returned with their corresponding GO term and biological function."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### First layer"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"idx_layer = 0"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"indexes = np.argsort(LRP_matrix[4][1][idx_sample])[::-1][:nb_go_to_show]\n",
"lrp_scores = np.sort(LRP_matrix[4][1][idx_sample])[::-1][:nb_go_to_show]"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"list_go=dict()\n",
"for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
" list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Top-5"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GO:0015031 : 1.83\n",
"GO:0006468 : 1.11\n",
"GO:0030335 : 1.09\n",
"GO:0007268 : 0.87\n",
"GO:0006412 : 0.86\n"
]
}
],
"source": [
"for go, lrp_score in list_go.items():\n",
" print(go,\":\",np.round(lrp_score,decimals=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The biological function associated to each GO term can be obtained with public tool like [QuickGO](https://www.ebi.ac.uk/QuickGO/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Second layer"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"idx_layer += 1"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"indexes = np.argsort(LRP_matrix[7][1][idx_sample])[::-1][:nb_go_to_show]\n",
"lrp_scores = np.sort(LRP_matrix[7][1][idx_sample])[::-1][:nb_go_to_show]"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"list_go=dict()\n",
"for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
" list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Top-5"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GO:0044257 : 0.61\n",
"GO:0071420 : 0.56\n",
"GO:0043009 : 0.54\n",
"GO:1901258 : 0.53\n",
"GO:0010737 : 0.49\n"
]
}
],
"source": [
"for go, lrp_score in list_go.items():\n",
" print(go,\":\",np.round(lrp_score,decimals=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Third layer"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [],
"source": [
"idx_layer += 1"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"idx_layer"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [],
"source": [
"indexes = np.argsort(LRP_matrix[10][1][idx_sample])[::-1][:nb_go_to_show]\n",
"lrp_scores = np.sort(LRP_matrix[10][1][idx_sample])[::-1][:nb_go_to_show]"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [],
"source": [
"list_go=dict()\n",
"for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
" list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Top-5"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GO:1901355 : 0.49\n",
"GO:0035556 : 0.41\n",
"GO:0048382 : 0.36\n",
"GO:1905144 : 0.31\n",
"GO:0048864 : 0.29\n"
]
}
],
"source": [
"for go, lrp_score in list_go.items():\n",
" print(go,\":\",np.round(lrp_score,decimals=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Fourth layer"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [],
"source": [
"idx_layer += 1"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [],
"source": [
"indexes = np.argsort(LRP_matrix[13][1][idx_sample])[::-1][:nb_go_to_show]\n",
"lrp_scores = np.sort(LRP_matrix[13][1][idx_sample])[::-1][:nb_go_to_show]"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [],
"source": [
"list_go=dict()\n",
"for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
" list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Top-5"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GO:0071709 : 0.54\n",
"GO:0055085 : 0.43\n",
"GO:0014070 : 0.39\n",
"GO:0042127 : 0.38\n",
"GO:0010243 : 0.38\n"
]
}
],
"source": [
"for go, lrp_score in list_go.items():\n",
" print(go,\":\",np.round(lrp_score,decimals=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Fifth layer"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [],
"source": [
"idx_layer += 1"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [],
"source": [
"indexes = np.argsort(LRP_matrix[16][1][idx_sample])[::-1][:nb_go_to_show]\n",
"lrp_scores = np.sort(LRP_matrix[16][1][idx_sample])[::-1][:nb_go_to_show]"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [],
"source": [
"list_go=dict()\n",
"for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
" list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Top-5"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GO:0044091 : 0.62\n",
"GO:0060322 : 0.61\n",
"GO:0050794 : 0.57\n",
"GO:0050808 : 0.5\n",
"GO:0051098 : 0.5\n"
]
}
],
"source": [
"for go, lrp_score in list_go.items():\n",
" print(go,\":\",np.round(lrp_score,decimals=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Sixth layer"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [],
"source": [
"idx_layer += 1"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [],
"source": [
"indexes = np.argsort(LRP_matrix[19][1][idx_sample])[::-1][:nb_go_to_show]\n",
"lrp_scores = np.sort(LRP_matrix[19][1][idx_sample])[::-1][:nb_go_to_show]"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [],
"source": [
"list_go=dict()\n",
"for idx,idx_go in enumerate(indexes[:nb_go_to_show]):\n",
" list_go[matrix_connection[idx_layer].columns[idx_go]]= lrp_scores[idx]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Top-5"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GO:0010463 : 0.7\n",
"GO:0030534 : 0.7\n",
"GO:0006739 : 0.63\n",
"GO:0031128 : 0.61\n",
"GO:0048856 : 0.61\n"
]
}
],
"source": [
"for go, lrp_score in list_go.items():\n",
" print(go,\":\",np.round(lrp_score,decimals=2))"
]
}
],
"metadata": {
"celltoolbar": "Format de la Cellule Texte Brut",
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
......@@ -20,6 +20,8 @@ The full dataset can be downloaded on ArrayExpress database under the id [E-MTAB
[test set](https://entrepot.ibisc.univ-evry.fr/f/057f1ffa0e6c4aab9bee/?dl=1)
Additional files for NN architecture: [filesforNNarch](https://entrepot.ibisc.univ-evry.fr/f/6f1c513798df41999b5d/?dl=1)
### Usage
Deep GONet was achieved with the $L_{GO}$ regularization and the hyperparameter $\alpha=1e^{-2}$.
......@@ -81,3 +83,7 @@ Without regularization:
```bash
python DeepGONet.py --alpha=0 --EPOCHS=100 --is_training=True --display_step=5 --save=True --processing="train"
```
### Interpretation tool
Please see the notebook entitled *Interpretation_tool.ipynb* to perform the biological interpretation of the results.
\ No newline at end of file
......
No preview for this file type
......@@ -12,4 +12,5 @@ scipy==1.2.0
seaborn==0.9.0
tensorflow==1.12.0
tensorflow-gpu==1.12.0
tqdm==4.36.1
\ No newline at end of file
tqdm==4.36.1
innvestigate==1.0.8
......