Victoria BOURGEAIS

add notebooks to get the interpretation of a prediction and to build the GO laye…

…rs architecture of the NN
files/
.ipynb_checkpoints/
scripts/__pycache__/
log/
......
......@@ -10,7 +10,7 @@ GraphGONet is a self-explaining neural network integrating the Gene Ontology int
## Get started
The code is implemented in Python (3.6.7) using the [PyTorch](https://pytorch.org/) framework v1.7.1 (see [requirements.txt](https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet/blob/master/requirements.txt) for more details about the additional packages used).
The code is implemented in Python (3.6.7) using [PyTorch v1.7.1](https://pytorch.org/) and [PyTorch-geometric v1.6.3](https://pytorch-geometric.readthedocs.io/en/1.6.3/modules/nn.html) (see [requirements.txt](https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet/blob/master/requirements.txt) for more details about the additional packages used).
## Dataset
......@@ -31,49 +31,38 @@ There exists 3 functions (flag *processing*): one is dedicated to the training o
<!-- On the microarray dataset:
```bash
python3 GraphGONet.py --save --n_inputs=36834 --n_nodes=10663 --n_nodes_annotated=8249 --n_classes=1 --mask="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
python3 scripts/GraphGONet.py --save --n_inputs=36834 --n_nodes=10663 --n_nodes_annotated=8249 --n_classes=1 --selection_op="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
```
-->
```bash
python3 GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --mask="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
python3 scripts/GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --selection_op="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
```
<!--
### 2) Evaluate
```bash
python DeepGONet.py --type_training="LGO" --alpha=1e-2 --EPOCHS=600 --is_training=False --restore=True --processing="evaluate"
```
### 3) Predict
```bash
python DeepGONet.py --type_training="LGO" --alpha=1e-2 --EPOCHS=600 --is_training=False --restore=True --processing="predict"
```
The outcomes are saved into a numpy array.
-->
### Comparison with random selection
```bash
python GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --mask="random" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
python scripts/GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --selection_op="random" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
```
### Comparison with no selection
```bash
python GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --n_epochs=50 --es --patience=5 --class_weight
python scripts/GraphGONet.py --save --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --n_epochs=50 --es --patience=5 --class_weight
```
### Train the model with a small number of training samples
```bash
python GraphGONet.py --save --n_samples=50 --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --mask="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
python scripts/GraphGONet.py --save --n_samples=50 --n_inputs=18427 --n_nodes=10636 --n_nodes_annotated=8288 --n_classes=12 --selection_op="top" --selection_ratio=0.001 --n_epochs=50 --es --patience=5 --class_weight
```
### Help
......@@ -81,7 +70,7 @@ python GraphGONet.py --save --n_samples=50 --n_inputs=18427 --n_nodes=10636 --n_
All the details about the command line flags can be provided by the following command:
```bash
python GraphGONet.py --help
python scripts/GraphGONet.py --help
```
For most of the flags, the default values can be employed. *dir_data*, *dir_files*, and *dir_log* can be set to your own repositories. Only the flags in the command lines displayed have to be adjusted to reproduce the results from the paper. If you have enough GPU memory, you can choose to switch to the entire GO graph (argument *type_graph* set to "entire"). The graph can be reconstructed by following the notebooks: Build_GONet_graph_part{1,2,3}.ipynb located in the notebooks directory. Then, you should change the value of the arguments *n_nodes* and *n_nodes_annotated* in the command line.
......
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Configuration of the architecture of the GO hidden layers of GraphGONet (step 1)\n",
"\n",
"## Summary\n",
"The following notebook will extract the annotations between the input genes and the GO terms. <br>\n",
"Note that the variable **SUBONTOLOGY** indicates the subontology considered as the structure of the GO layers. By default, it is BP to follow the paper but MF or CC can be chosen. <br>\n",
"The variable **DATASET** needs to be set based on which dataset you are using. The input features are different between the two datasets used in the article, so it will not be the same annotation database we will refer to. If you desire to apply the method on another dataset not studied in the article, you should check the type and the version of the annotation database (see variable *annot_ref_db*). <br>\n",
"You should indicate your repository where the data should be saved or loaded."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Load packages"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%config Completer.use_jedi = False"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import rpy2.robjects as robjects\n",
"from rpy2.robjects.packages import importr\n",
"from rpy2.robjects import pandas2ri\n",
"import rpy2.rinterface as rinterface\n",
"import rpy2.robjects.help as rh"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"R[write to console]: Error in (function () : \n",
" org.Hs.egPFAM is defunct. Please use select() if you need access to\n",
" PFAM or PROSITE accessions.\n",
"\n",
"R[write to console]: De plus : \n",
"R[write to console]: Warning messages:\n",
"\n",
"R[write to console]: 1: \n",
"R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :\n",
"R[write to console]: \n",
" \n",
"R[write to console]: les bibliothèques ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ ne contiennent aucun package\n",
"\n",
"R[write to console]: 2: \n",
"R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :\n",
"R[write to console]: \n",
" \n",
"R[write to console]: les bibliothèques ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ ne contiennent aucun package\n",
"\n",
"R[write to console]: 3: \n",
"R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :\n",
"R[write to console]: \n",
" \n",
"R[write to console]: les bibliothèques ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ ne contiennent aucun package\n",
"\n",
"R[write to console]: 4: \n",
"R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :\n",
"R[write to console]: \n",
" \n",
"R[write to console]: les bibliothèques ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ ne contiennent aucun package\n",
"\n",
"R[write to console]: 5: \n",
"R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :\n",
"R[write to console]: \n",
" \n",
"R[write to console]: les bibliothèques ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ ne contiennent aucun package\n",
"\n",
"R[write to console]: 6: \n",
"R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :\n",
"R[write to console]: \n",
" \n",
"R[write to console]: les bibliothèques ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ ne contiennent aucun package\n",
"\n",
"R[write to console]: 7: \n",
"R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :\n",
"R[write to console]: \n",
" \n",
"R[write to console]: les bibliothèques ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ ne contiennent aucun package\n",
"\n",
"R[write to console]: 8: \n",
"R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :\n",
"R[write to console]: \n",
" \n",
"R[write to console]: les bibliothèques ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ ne contiennent aucun package\n",
"\n",
"R[write to console]: 9: \n",
"R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :\n",
"R[write to console]: \n",
" \n",
"R[write to console]: les bibliothèques ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ ne contiennent aucun package\n",
"\n",
"R[write to console]: 10: \n",
"R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :\n",
"R[write to console]: \n",
" \n",
"R[write to console]: les bibliothèques ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ ne contiennent aucun package\n",
"\n",
"/usr/local/lib/python3.6/dist-packages/rpy2/robjects/packages.py:262: UserWarning: R C-API Rf_findVarInFrame()\n",
" warn(str(rre))\n",
"R[write to console]: Error in (function () : \n",
" org.Hs.egPROSITE is defunct. Please use select() if you need access to\n",
" PFAM or PROSITE accessions.\n",
"\n"
]
}
],
"source": [
"base = importr('base')\n",
"utils = importr(\"utils\")\n",
"biomanager = importr(\"BiocManager\")\n",
"annotate = importr(\"annotate\")\n",
"godb = importr(\"GO.db\")\n",
"AnnotationDbi = importr(\"AnnotationDbi\")\n",
"biomaRt = importr(\"biomaRt\")\n",
"grdevices = importr('grDevices')\n",
"ah = importr(\"AnnotationHub\")\n",
"org = importr(\"org.Hs.eg.db\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"R[write to console]: \n",
"\n"
]
},
{
"data": {
"text/html": [
"\n",
" <span>StrVector with 25 elements.</span>\n",
" <table>\n",
" <tbody>\n",
" <tr>\n",
" \n",
" <td>\n",
" 'hgu133pl...\n",
" </td>\n",
" \n",
" <td>\n",
" 'org.Hs.e...\n",
" </td>\n",
" \n",
" <td>\n",
" 'Annotati...\n",
" </td>\n",
" \n",
" <td>\n",
" ...\n",
" </td>\n",
" \n",
" <td>\n",
" 'datasets'\n",
" </td>\n",
" \n",
" <td>\n",
" 'methods'\n",
" </td>\n",
" \n",
" <td>\n",
" 'base'\n",
" </td>\n",
" \n",
" </tr>\n",
" </tbody>\n",
" </table>\n",
" "
],
"text/plain": [
"<rpy2.robjects.vectors.StrVector object at 0x7fec4b03e308> [RTYPES.STRSXP]\n",
"R classes: ('character',)\n",
"['hgu133pl..., 'org.Hs.e..., 'Annotati..., 'BiocFile..., ..., 'utils', 'datasets', 'methods', 'base']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"robjects.r('''\n",
" f <- function() {\n",
"\n",
" library(\"hgu133plus2.db\")\n",
" }\n",
" ''')\n",
"r_f = robjects.globalenv['f']\n",
"r_f()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"packageVersion = robjects.r['package.version']"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"sessionInfo = robjects.r['sessionInfo']"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"R version 4.0.3 (2020-10-10)\n",
"Platform: x86_64-pc-linux-gnu (64-bit)\n",
"Running under: Ubuntu 18.04.6 LTS\n",
"\n",
"Matrix products: default\n",
"BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3\n",
"LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so\n",
"\n",
"locale:\n",
" [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C \n",
" [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 \n",
" [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 \n",
" [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C \n",
" [9] LC_ADDRESS=C LC_TELEPHONE=C \n",
"[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C \n",
"\n",
"attached base packages:\n",
" [1] parallel stats4 tools stats graphics grDevices utils \n",
" [8] datasets methods base \n",
"\n",
"other attached packages:\n",
" [1] hgu133plus2.db_3.2.3 org.Hs.eg.db_3.12.0 AnnotationHub_2.22.0\n",
" [4] BiocFileCache_1.14.0 dbplyr_2.0.0 biomaRt_2.46.0 \n",
" [7] GO.db_3.12.1 annotate_1.68.0 XML_3.99-0.5 \n",
"[10] AnnotationDbi_1.52.0 IRanges_2.24.0 S4Vectors_0.28.0 \n",
"[13] Biobase_2.50.0 BiocGenerics_0.36.0 BiocManager_1.30.10 \n",
"\n",
"loaded via a namespace (and not attached):\n",
" [1] progress_1.2.2 tidyselect_1.1.0 \n",
" [3] BiocVersion_3.12.0 purrr_0.3.4 \n",
" [5] vctrs_0.3.6 generics_0.1.0 \n",
" [7] htmltools_0.5.0 yaml_2.2.1 \n",
" [9] interactiveDisplayBase_1.28.0 blob_1.2.1 \n",
"[11] rlang_0.4.9 pillar_1.4.7 \n",
"[13] later_1.1.0.1 glue_1.4.2 \n",
"[15] DBI_1.1.0 rappdirs_0.3.1 \n",
"[17] bit64_4.0.5 lifecycle_0.2.0 \n",
"[19] stringr_1.4.0 memoise_1.1.0 \n",
"[21] fastmap_1.0.1 httpuv_1.5.4 \n",
"[23] curl_4.3 Rcpp_1.0.6 \n",
"[25] xtable_1.8-4 promises_1.1.1 \n",
"[27] openssl_1.4.3 mime_0.9 \n",
"[29] bit_4.0.4 hms_0.5.3 \n",
"[31] askpass_1.1 digest_0.6.27 \n",
"[33] stringi_1.5.3 dplyr_1.0.2 \n",
"[35] shiny_1.5.0 magrittr_2.0.1 \n",
"[37] RSQLite_2.2.1 tibble_3.0.5 \n",
"[39] crayon_1.3.4 pkgconfig_2.0.3 \n",
"[41] ellipsis_0.3.1 xml2_1.3.2 \n",
"[43] prettyunits_1.1.1 assertthat_0.2.1 \n",
"[45] httr_1.4.2 R6_2.5.0 \n",
"[47] compiler_4.0.3 \n",
"\n"
]
}
],
"source": [
"res=sessionInfo()\n",
"print(res)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Set environnement"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"SUBONTOLOGY = \"BP\""
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"Which dataset are you using? TCGA or microarray TCGA\n"
]
}
],
"source": [
"DATASET=input(\"Which dataset are you using? TCGA or microarray\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"dir_files = \"../files\" #can be modified to your own path"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"dir_data = \"../data\" #can be modified to your own path"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"filename = os.path.join(dir_data,\"id_genes.npz\") #corresponds to the column names of the dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set the annotation database that will be used as a reference to identify the annotations between the gene ID and GO terms."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"annot_ref_db = org.org_Hs_eg_db if DATASET==\"TCGA\" else robjects.r('hgu133plus2.db')"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"keyRef = \"ENSEMBL\" if DATASET==\"TCGA\" else \"PROBE\""
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"def getInformations(annot_ref,inputId,type_var):\n",
" return AnnotationDbi.select(x=annot_ref, keys=inputId, columns=robjects.StrVector([\"ENTREZID\",\"GO\"]),keytype=robjects.StrVector([type_var]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Collect the identifier of the variables in the dataset"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"['ENSG00000000003',\n",
" 'ENSG00000000005',\n",
" 'ENSG00000000419',\n",
" 'ENSG00000000457',\n",
" 'ENSG00000000460',\n",
" 'ENSG00000000938',\n",
" 'ENSG00000000971',\n",
" 'ENSG00000001036',\n",
" 'ENSG00000001084',\n",
" 'ENSG00000001167',\n",
" 'ENSG00000001460',\n",
" 'ENSG00000001461',\n",
" 'ENSG00000001497',\n",
" 'ENSG00000001561',\n",
" 'ENSG00000001617',\n",
" 'ENSG00000001626',\n",
" 'ENSG00000001629',\n",
" 'ENSG00000001630',\n",
" 'ENSG00000001631',\n",
" 'ENSG00000002016',\n",
" 'ENSG00000002079',\n",
" 'ENSG00000002330',\n",
" 'ENSG00000002549',\n",
" 'ENSG00000002586',\n",
" 'ENSG00000002587',\n",
" 'ENSG00000002726',\n",
" 'ENSG00000002745',\n",
" 'ENSG00000002746',\n",
" 'ENSG00000002822',\n",
" 'ENSG00000002834',\n",
" 'ENSG00000002919',\n",
" 'ENSG00000002933',\n",
" 'ENSG00000003056',\n",
" 'ENSG00000003096',\n",
" 'ENSG00000003137',\n",
" 'ENSG00000003147',\n",
" 'ENSG00000003249',\n",
" 'ENSG00000003393',\n",
" 'ENSG00000003400',\n",
" 'ENSG00000003402',\n",
" 'ENSG00000003436',\n",
" 'ENSG00000003509',\n",
" 'ENSG00000003756',\n",
" 'ENSG00000003987',\n",
" 'ENSG00000003989',\n",
" 'ENSG00000004059',\n",
" 'ENSG00000004139',\n",
" 'ENSG00000004142',\n",
" 'ENSG00000004399',\n",
" 'ENSG00000004455',\n",
" 'ENSG00000004468',\n",
" 'ENSG00000004478',\n",
" 'ENSG00000004487',\n",
" 'ENSG00000004534',\n",
" 'ENSG00000004660',\n",
" 'ENSG00000004700',\n",
" 'ENSG00000004766',\n",
" 'ENSG00000004776',\n",
" 'ENSG00000004777',\n",
" 'ENSG00000004779',\n",
" 'ENSG00000004799',\n",
" 'ENSG00000004809',\n",
" 'ENSG00000004838',\n",
" 'ENSG00000004846',\n",
" 'ENSG00000004848',\n",
" 'ENSG00000004864',\n",
" 'ENSG00000004866',\n",
" 'ENSG00000004897',\n",
" 'ENSG00000004939',\n",
" 'ENSG00000004948',\n",
" 'ENSG00000004961',\n",
" 'ENSG00000004975',\n",
" 'ENSG00000005001',\n",
" 'ENSG00000005007',\n",
" 'ENSG00000005020',\n",
" 'ENSG00000005022',\n",
" 'ENSG00000005059',\n",
" 'ENSG00000005073',\n",
" 'ENSG00000005075',\n",
" 'ENSG00000005100',\n",
" 'ENSG00000005102',\n",
" 'ENSG00000005108',\n",
" 'ENSG00000005156',\n",
" 'ENSG00000005175',\n",
" 'ENSG00000005187',\n",
" 'ENSG00000005189',\n",
" 'ENSG00000005194',\n",
" 'ENSG00000005206',\n",
" 'ENSG00000005238',\n",
" 'ENSG00000005243',\n",
" 'ENSG00000005249',\n",
" 'ENSG00000005302',\n",
" 'ENSG00000005339',\n",
" 'ENSG00000005379',\n",
" 'ENSG00000005381',\n",
" 'ENSG00000005421',\n",
" 'ENSG00000005436',\n",
" 'ENSG00000005448',\n",
" 'ENSG00000005469',\n",
" 'ENSG00000005471',\n",
" 'ENSG00000005483',\n",
" 'ENSG00000005486',\n",
" 'ENSG00000005513',\n",
" 'ENSG00000005700',\n",
" 'ENSG00000005801',\n",
" 'ENSG00000005810',\n",
" 'ENSG00000005812',\n",
" 'ENSG00000005844',\n",
" 'ENSG00000005882',\n",
" 'ENSG00000005884',\n",
" 'ENSG00000005889',\n",
" 'ENSG00000005893',\n",
" 'ENSG00000005961',\n",
" 'ENSG00000005981',\n",
" 'ENSG00000006007',\n",
" 'ENSG00000006015',\n",
" 'ENSG00000006016',\n",
" 'ENSG00000006025',\n",
" 'ENSG00000006042',\n",
" 'ENSG00000006047',\n",
" 'ENSG00000006059',\n",
" 'ENSG00000006062',\n",
" 'ENSG00000006071',\n",
" 'ENSG00000006116',\n",
" 'ENSG00000006118',\n",
" 'ENSG00000006125',\n",
" 'ENSG00000006128',\n",
" 'ENSG00000006194',\n",
" 'ENSG00000006210',\n",
" 'ENSG00000006282',\n",
" 'ENSG00000006283',\n",
" 'ENSG00000006327',\n",
" 'ENSG00000006377',\n",
" 'ENSG00000006432',\n",
" 'ENSG00000006451',\n",
" 'ENSG00000006453',\n",
" 'ENSG00000006459',\n",
" 'ENSG00000006468',\n",
" 'ENSG00000006530',\n",
" 'ENSG00000006534',\n",
" 'ENSG00000006555',\n",
" 'ENSG00000006576',\n",
" 'ENSG00000006606',\n",
" 'ENSG00000006607',\n",
" 'ENSG00000006611',\n",
" 'ENSG00000006625',\n",
" 'ENSG00000006634',\n",
" 'ENSG00000006638',\n",
" 'ENSG00000006652',\n",
" 'ENSG00000006659',\n",
" 'ENSG00000006695',\n",
" 'ENSG00000006704',\n",
" 'ENSG00000006712',\n",
" 'ENSG00000006715',\n",
" 'ENSG00000006740',\n",
" 'ENSG00000006744',\n",
" 'ENSG00000006747',\n",
" 'ENSG00000006756',\n",
" 'ENSG00000006757',\n",
" 'ENSG00000006788',\n",
" 'ENSG00000006831',\n",
" 'ENSG00000006837',\n",
" 'ENSG00000007001',\n",
" 'ENSG00000007038',\n",
" 'ENSG00000007047',\n",
" 'ENSG00000007062',\n",
" 'ENSG00000007080',\n",
" 'ENSG00000007129',\n",
" 'ENSG00000007168',\n",
" 'ENSG00000007171',\n",
" 'ENSG00000007174',\n",
" 'ENSG00000007202',\n",
" 'ENSG00000007216',\n",
" 'ENSG00000007237',\n",
" 'ENSG00000007255',\n",
" 'ENSG00000007264',\n",
" 'ENSG00000007306',\n",
" 'ENSG00000007312',\n",
" 'ENSG00000007314',\n",
" 'ENSG00000007341',\n",
" 'ENSG00000007350',\n",
" 'ENSG00000007372',\n",
" 'ENSG00000007376',\n",
" 'ENSG00000007384',\n",
" 'ENSG00000007392',\n",
" 'ENSG00000007402',\n",
" 'ENSG00000007516',\n",
" 'ENSG00000007520',\n",
" 'ENSG00000007541',\n",
" 'ENSG00000007545',\n",
" 'ENSG00000007866',\n",
" 'ENSG00000007908',\n",
" 'ENSG00000007923',\n",
" 'ENSG00000007933',\n",
" 'ENSG00000007944',\n",
" 'ENSG00000007952',\n",
" 'ENSG00000007968',\n",
" 'ENSG00000008018',\n",
" 'ENSG00000008056',\n",
" 'ENSG00000008083',\n",
" 'ENSG00000008086',\n",
" 'ENSG00000008118',\n",
" 'ENSG00000008128',\n",
" 'ENSG00000008130',\n",
" 'ENSG00000008196',\n",
" 'ENSG00000008197',\n",
" 'ENSG00000008226',\n",
" 'ENSG00000008256',\n",
" 'ENSG00000008277',\n",
" 'ENSG00000008282',\n",
" 'ENSG00000008283',\n",
" 'ENSG00000008294',\n",
" 'ENSG00000008300',\n",
" 'ENSG00000008311',\n",
" 'ENSG00000008323',\n",
" 'ENSG00000008324',\n",
" 'ENSG00000008382',\n",
" 'ENSG00000008394',\n",
" 'ENSG00000008405',\n",
" 'ENSG00000008438',\n",
" 'ENSG00000008441',\n",
" 'ENSG00000008513',\n",
" 'ENSG00000008516',\n",
" 'ENSG00000008517',\n",
" 'ENSG00000008710',\n",
" 'ENSG00000008735',\n",
" 'ENSG00000008838',\n",
" 'ENSG00000008853',\n",
" 'ENSG00000008869',\n",
" 'ENSG00000008952',\n",
" 'ENSG00000008988',\n",
" 'ENSG00000009307',\n",
" 'ENSG00000009335',\n",
" 'ENSG00000009413',\n",
" 'ENSG00000009694',\n",
" 'ENSG00000009709',\n",
" 'ENSG00000009724',\n",
" 'ENSG00000009765',\n",
" 'ENSG00000009780',\n",
" 'ENSG00000009790',\n",
" 'ENSG00000009830',\n",
" 'ENSG00000009844',\n",
" 'ENSG00000009950',\n",
" 'ENSG00000009954',\n",
" 'ENSG00000010017',\n",
" 'ENSG00000010030',\n",
" 'ENSG00000010072',\n",
" 'ENSG00000010165',\n",
" 'ENSG00000010219',\n",
" 'ENSG00000010244',\n",
" 'ENSG00000010256',\n",
" 'ENSG00000010270',\n",
" 'ENSG00000010278',\n",
" 'ENSG00000010282',\n",
" 'ENSG00000010292',\n",
" 'ENSG00000010295',\n",
" 'ENSG00000010310',\n",
" 'ENSG00000010318',\n",
" 'ENSG00000010319',\n",
" 'ENSG00000010322',\n",
" 'ENSG00000010327',\n",
" 'ENSG00000010361',\n",
" 'ENSG00000010379',\n",
" 'ENSG00000010404',\n",
" 'ENSG00000010438',\n",
" 'ENSG00000010539',\n",
" 'ENSG00000010610',\n",
" 'ENSG00000010626',\n",
" 'ENSG00000010671',\n",
" 'ENSG00000010704',\n",
" 'ENSG00000010803',\n",
" 'ENSG00000010810',\n",
" 'ENSG00000010818',\n",
" 'ENSG00000010932',\n",
" 'ENSG00000011007',\n",
" 'ENSG00000011009',\n",
" 'ENSG00000011021',\n",
" 'ENSG00000011028',\n",
" 'ENSG00000011052',\n",
" 'ENSG00000011083',\n",
" 'ENSG00000011105',\n",
" 'ENSG00000011114',\n",
" 'ENSG00000011132',\n",
" 'ENSG00000011143',\n",
" 'ENSG00000011198',\n",
" 'ENSG00000011201',\n",
" 'ENSG00000011243',\n",
" 'ENSG00000011258',\n",
" 'ENSG00000011260',\n",
" 'ENSG00000011275',\n",
" 'ENSG00000011295',\n",
" 'ENSG00000011304',\n",
" 'ENSG00000011332',\n",
" 'ENSG00000011347',\n",
" 'ENSG00000011376',\n",
" 'ENSG00000011405',\n",
" 'ENSG00000011422',\n",
" 'ENSG00000011426',\n",
" 'ENSG00000011451',\n",
" 'ENSG00000011454',\n",
" 'ENSG00000011465',\n",
" 'ENSG00000011478',\n",
" 'ENSG00000011485',\n",
" 'ENSG00000011523',\n",
" 'ENSG00000011566',\n",
" 'ENSG00000011590',\n",
" 'ENSG00000011600',\n",
" 'ENSG00000011638',\n",
" 'ENSG00000011677',\n",
" 'ENSG00000012048',\n",
" 'ENSG00000012061',\n",
" 'ENSG00000012124',\n",
" 'ENSG00000012171',\n",
" 'ENSG00000012174',\n",
" 'ENSG00000012211',\n",
" 'ENSG00000012223',\n",
" 'ENSG00000012232',\n",
" 'ENSG00000012504',\n",
" 'ENSG00000012660',\n",
" 'ENSG00000012779',\n",
" 'ENSG00000012817',\n",
" 'ENSG00000012822',\n",
" 'ENSG00000012963',\n",
" 'ENSG00000012983',\n",
" 'ENSG00000013016',\n",
" 'ENSG00000013275',\n",
" 'ENSG00000013288',\n",
" 'ENSG00000013293',\n",
" 'ENSG00000013297',\n",
" 'ENSG00000013306',\n",
" 'ENSG00000013364',\n",
" 'ENSG00000013374',\n",
" 'ENSG00000013375',\n",
" 'ENSG00000013392',\n",
" 'ENSG00000013441',\n",
" 'ENSG00000013503',\n",
" 'ENSG00000013523',\n",
" 'ENSG00000013561',\n",
" 'ENSG00000013563',\n",
" 'ENSG00000013573',\n",
" 'ENSG00000013583',\n",
" 'ENSG00000013588',\n",
" 'ENSG00000013619',\n",
" 'ENSG00000013725',\n",
" 'ENSG00000013810',\n",
" 'ENSG00000014123',\n",
" 'ENSG00000014138',\n",
" 'ENSG00000014164',\n",
" 'ENSG00000014216',\n",
" 'ENSG00000014257',\n",
" 'ENSG00000014641',\n",
" 'ENSG00000014824',\n",
" 'ENSG00000014914',\n",
" 'ENSG00000014919',\n",
" 'ENSG00000015133',\n",
" 'ENSG00000015153',\n",
" 'ENSG00000015171',\n",
" 'ENSG00000015285',\n",
" 'ENSG00000015413',\n",
" 'ENSG00000015475',\n",
" 'ENSG00000015479',\n",
" 'ENSG00000015520',\n",
" 'ENSG00000015532',\n",
" 'ENSG00000015568',\n",
" 'ENSG00000015592',\n",
" 'ENSG00000015676',\n",
" 'ENSG00000016082',\n",
" 'ENSG00000016391',\n",
" 'ENSG00000016402',\n",
" 'ENSG00000016490',\n",
" 'ENSG00000016602',\n",
" 'ENSG00000016864',\n",
" 'ENSG00000017260',\n",
" 'ENSG00000017427',\n",
" 'ENSG00000017483',\n",
" 'ENSG00000017797',\n",
" 'ENSG00000018189',\n",
" 'ENSG00000018236',\n",
" 'ENSG00000018280',\n",
" 'ENSG00000018408',\n",
" 'ENSG00000018510',\n",
" 'ENSG00000018607',\n",
" 'ENSG00000018610',\n",
" 'ENSG00000018625',\n",
" 'ENSG00000018699',\n",
" 'ENSG00000018869',\n",
" 'ENSG00000019102',\n",
" 'ENSG00000019144',\n",
" 'ENSG00000019169',\n",
" 'ENSG00000019186',\n",
" 'ENSG00000019485',\n",
" 'ENSG00000019505',\n",
" 'ENSG00000019549',\n",
" 'ENSG00000019582',\n",
" 'ENSG00000019991',\n",
" 'ENSG00000019995',\n",
" 'ENSG00000020129',\n",
" 'ENSG00000020181',\n",
" 'ENSG00000020219',\n",
" 'ENSG00000020256',\n",
" 'ENSG00000020426',\n",
" 'ENSG00000020577',\n",
" 'ENSG00000020633',\n",
" 'ENSG00000020922',\n",
" 'ENSG00000021300',\n",
" 'ENSG00000021355',\n",
" 'ENSG00000021461',\n",
" 'ENSG00000021488',\n",
" 'ENSG00000021574',\n",
" 'ENSG00000021645',\n",
" 'ENSG00000021762',\n",
" 'ENSG00000021776',\n",
" 'ENSG00000021826',\n",
" 'ENSG00000021852',\n",
" 'ENSG00000022267',\n",
" 'ENSG00000022277',\n",
" 'ENSG00000022355',\n",
" 'ENSG00000022556',\n",
" 'ENSG00000022567',\n",
" 'ENSG00000022840',\n",
" 'ENSG00000022976',\n",
" 'ENSG00000023041',\n",
" 'ENSG00000023171',\n",
" 'ENSG00000023191',\n",
" 'ENSG00000023228',\n",
" 'ENSG00000023287',\n",
" 'ENSG00000023318',\n",
" 'ENSG00000023330',\n",
" 'ENSG00000023445',\n",
" 'ENSG00000023516',\n",
" 'ENSG00000023572',\n",
" 'ENSG00000023608',\n",
" 'ENSG00000023697',\n",
" 'ENSG00000023734',\n",
" 'ENSG00000023839',\n",
" 'ENSG00000023892',\n",
" 'ENSG00000023902',\n",
" 'ENSG00000023909',\n",
" 'ENSG00000024048',\n",
" 'ENSG00000024422',\n",
" 'ENSG00000024526',\n",
" 'ENSG00000024862',\n",
" 'ENSG00000025039',\n",
" 'ENSG00000025156',\n",
" 'ENSG00000025293',\n",
" 'ENSG00000025423',\n",
" 'ENSG00000025434',\n",
" 'ENSG00000025708',\n",
" 'ENSG00000025770',\n",
" 'ENSG00000025772',\n",
" 'ENSG00000025796',\n",
" 'ENSG00000025800',\n",
" 'ENSG00000026025',\n",
" 'ENSG00000026036',\n",
" 'ENSG00000026103',\n",
" 'ENSG00000026297',\n",
" 'ENSG00000026508',\n",
" 'ENSG00000026559',\n",
" 'ENSG00000026652',\n",
" 'ENSG00000026751',\n",
" 'ENSG00000026950',\n",
" 'ENSG00000027001',\n",
" 'ENSG00000027075',\n",
" 'ENSG00000027644',\n",
" 'ENSG00000027697',\n",
" 'ENSG00000027847',\n",
" 'ENSG00000027869',\n",
" 'ENSG00000028116',\n",
" 'ENSG00000028137',\n",
" 'ENSG00000028203',\n",
" 'ENSG00000028277',\n",
" 'ENSG00000028310',\n",
" 'ENSG00000028528',\n",
" 'ENSG00000028839',\n",
" 'ENSG00000029153',\n",
" 'ENSG00000029363',\n",
" 'ENSG00000029364',\n",
" 'ENSG00000029534',\n",
" 'ENSG00000029559',\n",
" 'ENSG00000029639',\n",
" 'ENSG00000029725',\n",
" 'ENSG00000029993',\n",
" 'ENSG00000030066',\n",
" 'ENSG00000030110',\n",
" 'ENSG00000030304',\n",
" 'ENSG00000030419',\n",
" 'ENSG00000030582',\n",
" 'ENSG00000031003',\n",
" 'ENSG00000031081',\n",
" 'ENSG00000031691',\n",
" 'ENSG00000031698',\n",
" 'ENSG00000031823',\n",
" 'ENSG00000032219',\n",
" 'ENSG00000032389',\n",
" 'ENSG00000032444',\n",
" 'ENSG00000032742',\n",
" 'ENSG00000033011',\n",
" 'ENSG00000033030',\n",
" 'ENSG00000033050',\n",
" 'ENSG00000033100',\n",
" 'ENSG00000033122',\n",
" 'ENSG00000033170',\n",
" 'ENSG00000033178',\n",
" 'ENSG00000033327',\n",
" 'ENSG00000033627',\n",
" 'ENSG00000033800',\n",
" 'ENSG00000033867',\n",
" 'ENSG00000034053',\n",
" 'ENSG00000034152',\n",
" 'ENSG00000034239',\n",
" 'ENSG00000034510',\n",
" 'ENSG00000034533',\n",
" 'ENSG00000034677',\n",
" 'ENSG00000034693',\n",
" 'ENSG00000034713',\n",
" 'ENSG00000034971',\n",
" 'ENSG00000035115',\n",
" 'ENSG00000035141',\n",
" 'ENSG00000035403',\n",
" 'ENSG00000035499',\n",
" 'ENSG00000035664',\n",
" 'ENSG00000035681',\n",
" 'ENSG00000035687',\n",
" 'ENSG00000035720',\n",
" 'ENSG00000035862',\n",
" 'ENSG00000035928',\n",
" 'ENSG00000036054',\n",
" 'ENSG00000036257',\n",
" 'ENSG00000036448',\n",
" 'ENSG00000036473',\n",
" 'ENSG00000036530',\n",
" 'ENSG00000036549',\n",
" 'ENSG00000036565',\n",
" 'ENSG00000036672',\n",
" 'ENSG00000036828',\n",
" 'ENSG00000037042',\n",
" 'ENSG00000037241',\n",
" 'ENSG00000037280',\n",
" 'ENSG00000037474',\n",
" 'ENSG00000037637',\n",
" 'ENSG00000037749',\n",
" 'ENSG00000037757',\n",
" 'ENSG00000037897',\n",
" 'ENSG00000037965',\n",
" 'ENSG00000038002',\n",
" 'ENSG00000038210',\n",
" 'ENSG00000038219',\n",
" 'ENSG00000038274',\n",
" 'ENSG00000038295',\n",
" 'ENSG00000038358',\n",
" 'ENSG00000038382',\n",
" 'ENSG00000038427',\n",
" 'ENSG00000038532',\n",
" 'ENSG00000038945',\n",
" 'ENSG00000039068',\n",
" 'ENSG00000039123',\n",
" 'ENSG00000039139',\n",
" 'ENSG00000039319',\n",
" 'ENSG00000039523',\n",
" 'ENSG00000039537',\n",
" 'ENSG00000039560',\n",
" 'ENSG00000039600',\n",
" 'ENSG00000039650',\n",
" 'ENSG00000039987',\n",
" 'ENSG00000040199',\n",
" 'ENSG00000040275',\n",
" 'ENSG00000040341',\n",
" 'ENSG00000040487',\n",
" 'ENSG00000040531',\n",
" 'ENSG00000040608',\n",
" 'ENSG00000040633',\n",
" 'ENSG00000040731',\n",
" 'ENSG00000040933',\n",
" 'ENSG00000041353',\n",
" 'ENSG00000041357',\n",
" 'ENSG00000041515',\n",
" 'ENSG00000041802',\n",
" 'ENSG00000041880',\n",
" 'ENSG00000041982',\n",
" 'ENSG00000041988',\n",
" 'ENSG00000042062',\n",
" 'ENSG00000042088',\n",
" 'ENSG00000042286',\n",
" 'ENSG00000042304',\n",
" 'ENSG00000042317',\n",
" 'ENSG00000042429',\n",
" 'ENSG00000042445',\n",
" 'ENSG00000042493',\n",
" 'ENSG00000042753',\n",
" 'ENSG00000042781',\n",
" 'ENSG00000042813',\n",
" 'ENSG00000042832',\n",
" 'ENSG00000042980',\n",
" 'ENSG00000043039',\n",
" 'ENSG00000043093',\n",
" 'ENSG00000043143',\n",
" 'ENSG00000043355',\n",
" 'ENSG00000043462',\n",
" 'ENSG00000043514',\n",
" 'ENSG00000043591',\n",
" 'ENSG00000044012',\n",
" 'ENSG00000044090',\n",
" 'ENSG00000044115',\n",
" 'ENSG00000044446',\n",
" 'ENSG00000044459',\n",
" 'ENSG00000044524',\n",
" 'ENSG00000044574',\n",
" 'ENSG00000046604',\n",
" 'ENSG00000046647',\n",
" 'ENSG00000046651',\n",
" 'ENSG00000046653',\n",
" 'ENSG00000046774',\n",
" 'ENSG00000046889',\n",
" 'ENSG00000047056',\n",
" 'ENSG00000047188',\n",
" 'ENSG00000047230',\n",
" 'ENSG00000047249',\n",
" 'ENSG00000047315',\n",
" 'ENSG00000047346',\n",
" 'ENSG00000047365',\n",
" 'ENSG00000047410',\n",
" 'ENSG00000047457',\n",
" 'ENSG00000047578',\n",
" 'ENSG00000047579',\n",
" 'ENSG00000047597',\n",
" 'ENSG00000047617',\n",
" 'ENSG00000047621',\n",
" 'ENSG00000047634',\n",
" 'ENSG00000047644',\n",
" 'ENSG00000047648',\n",
" 'ENSG00000047662',\n",
" 'ENSG00000047849',\n",
" 'ENSG00000047932',\n",
" 'ENSG00000047936',\n",
" 'ENSG00000048028',\n",
" 'ENSG00000048052',\n",
" 'ENSG00000048140',\n",
" 'ENSG00000048162',\n",
" 'ENSG00000048342',\n",
" 'ENSG00000048392',\n",
" 'ENSG00000048405',\n",
" 'ENSG00000048462',\n",
" 'ENSG00000048471',\n",
" 'ENSG00000048540',\n",
" 'ENSG00000048544',\n",
" 'ENSG00000048545',\n",
" 'ENSG00000048649',\n",
" 'ENSG00000048707',\n",
" 'ENSG00000048740',\n",
" 'ENSG00000048828',\n",
" 'ENSG00000048991',\n",
" 'ENSG00000049089',\n",
" 'ENSG00000049130',\n",
" 'ENSG00000049167',\n",
" 'ENSG00000049192',\n",
" 'ENSG00000049239',\n",
" 'ENSG00000049245',\n",
" 'ENSG00000049246',\n",
" 'ENSG00000049247',\n",
" 'ENSG00000049249',\n",
" 'ENSG00000049283',\n",
" 'ENSG00000049323',\n",
" 'ENSG00000049449',\n",
" 'ENSG00000049540',\n",
" 'ENSG00000049541',\n",
" 'ENSG00000049618',\n",
" 'ENSG00000049656',\n",
" 'ENSG00000049759',\n",
" 'ENSG00000049768',\n",
" 'ENSG00000049769',\n",
" 'ENSG00000049860',\n",
" 'ENSG00000049883',\n",
" 'ENSG00000050030',\n",
" 'ENSG00000050130',\n",
" 'ENSG00000050165',\n",
" 'ENSG00000050327',\n",
" 'ENSG00000050344',\n",
" 'ENSG00000050393',\n",
" 'ENSG00000050405',\n",
" 'ENSG00000050426',\n",
" 'ENSG00000050438',\n",
" 'ENSG00000050555',\n",
" 'ENSG00000050628',\n",
" 'ENSG00000050730',\n",
" 'ENSG00000050748',\n",
" 'ENSG00000050767',\n",
" 'ENSG00000050820',\n",
" 'ENSG00000051009',\n",
" 'ENSG00000051108',\n",
" 'ENSG00000051128',\n",
" 'ENSG00000051180',\n",
" 'ENSG00000051341',\n",
" 'ENSG00000051382',\n",
" 'ENSG00000051523',\n",
" 'ENSG00000051596',\n",
" 'ENSG00000051620',\n",
" 'ENSG00000051825',\n",
" 'ENSG00000052126',\n",
" 'ENSG00000052344',\n",
" 'ENSG00000052723',\n",
" 'ENSG00000052749',\n",
" 'ENSG00000052795',\n",
" 'ENSG00000052802',\n",
" 'ENSG00000052841',\n",
" 'ENSG00000052850',\n",
" 'ENSG00000053108',\n",
" 'ENSG00000053254',\n",
" 'ENSG00000053328',\n",
" 'ENSG00000053371',\n",
" 'ENSG00000053372',\n",
" 'ENSG00000053438',\n",
" 'ENSG00000053501',\n",
" 'ENSG00000053524',\n",
" 'ENSG00000053702',\n",
" 'ENSG00000053747',\n",
" 'ENSG00000053770',\n",
" 'ENSG00000053900',\n",
" 'ENSG00000053918',\n",
" 'ENSG00000054116',\n",
" 'ENSG00000054118',\n",
" 'ENSG00000054148',\n",
" 'ENSG00000054179',\n",
" 'ENSG00000054219',\n",
" 'ENSG00000054267',\n",
" 'ENSG00000054277',\n",
" 'ENSG00000054282',\n",
" 'ENSG00000054356',\n",
" 'ENSG00000054392',\n",
" 'ENSG00000054523',\n",
" 'ENSG00000054598',\n",
" 'ENSG00000054611',\n",
" 'ENSG00000054654',\n",
" 'ENSG00000054690',\n",
" 'ENSG00000054793',\n",
" 'ENSG00000054796',\n",
" 'ENSG00000054803',\n",
" 'ENSG00000054938',\n",
" 'ENSG00000054965',\n",
" 'ENSG00000054967',\n",
" 'ENSG00000054983',\n",
" 'ENSG00000055044',\n",
" 'ENSG00000055070',\n",
" 'ENSG00000055118',\n",
" 'ENSG00000055130',\n",
" 'ENSG00000055147',\n",
" 'ENSG00000055163',\n",
" 'ENSG00000055208',\n",
" 'ENSG00000055211',\n",
" 'ENSG00000055332',\n",
" 'ENSG00000055483',\n",
" 'ENSG00000055609',\n",
" 'ENSG00000055732',\n",
" 'ENSG00000055813',\n",
" 'ENSG00000055917',\n",
" 'ENSG00000055950',\n",
" 'ENSG00000055955',\n",
" 'ENSG00000055957',\n",
" 'ENSG00000056050',\n",
" 'ENSG00000056097',\n",
" 'ENSG00000056277',\n",
" 'ENSG00000056291',\n",
" 'ENSG00000056487',\n",
" 'ENSG00000056558',\n",
" 'ENSG00000056586',\n",
" 'ENSG00000056736',\n",
" 'ENSG00000056972',\n",
" 'ENSG00000056998',\n",
" 'ENSG00000057019',\n",
" 'ENSG00000057149',\n",
" 'ENSG00000057252',\n",
" 'ENSG00000057294',\n",
" 'ENSG00000057468',\n",
" 'ENSG00000057593',\n",
" 'ENSG00000057608',\n",
" 'ENSG00000057657',\n",
" 'ENSG00000057663',\n",
" 'ENSG00000057704',\n",
" 'ENSG00000057757',\n",
" 'ENSG00000057935',\n",
" 'ENSG00000058056',\n",
" 'ENSG00000058063',\n",
" 'ENSG00000058085',\n",
" 'ENSG00000058091',\n",
" 'ENSG00000058262',\n",
" 'ENSG00000058272',\n",
" 'ENSG00000058335',\n",
" 'ENSG00000058404',\n",
" 'ENSG00000058453',\n",
" 'ENSG00000058600',\n",
" 'ENSG00000058668',\n",
" 'ENSG00000058673',\n",
" 'ENSG00000058729',\n",
" 'ENSG00000058799',\n",
" 'ENSG00000058804',\n",
" 'ENSG00000058866',\n",
" 'ENSG00000059122',\n",
" 'ENSG00000059145',\n",
" 'ENSG00000059377',\n",
" 'ENSG00000059378',\n",
" 'ENSG00000059573',\n",
" 'ENSG00000059588',\n",
" 'ENSG00000059691',\n",
" 'ENSG00000059728',\n",
" 'ENSG00000059758',\n",
" 'ENSG00000059769',\n",
" 'ENSG00000059804',\n",
" 'ENSG00000059915',\n",
" 'ENSG00000060069',\n",
" 'ENSG00000060138',\n",
" 'ENSG00000060140',\n",
" 'ENSG00000060237',\n",
" 'ENSG00000060303',\n",
" 'ENSG00000060339',\n",
" 'ENSG00000060491',\n",
" 'ENSG00000060558',\n",
" 'ENSG00000060566',\n",
" 'ENSG00000060642',\n",
" 'ENSG00000060656',\n",
" 'ENSG00000060688',\n",
" 'ENSG00000060709',\n",
" 'ENSG00000060718',\n",
" 'ENSG00000060749',\n",
" 'ENSG00000060762',\n",
" 'ENSG00000060971',\n",
" 'ENSG00000060982',\n",
" 'ENSG00000061273',\n",
" 'ENSG00000061337',\n",
" 'ENSG00000061455',\n",
" 'ENSG00000061492',\n",
" 'ENSG00000061656',\n",
" 'ENSG00000061676',\n",
" 'ENSG00000061794',\n",
" 'ENSG00000061918',\n",
" 'ENSG00000061936',\n",
" 'ENSG00000061938',\n",
" 'ENSG00000061987',\n",
" 'ENSG00000062038',\n",
" 'ENSG00000062096',\n",
" 'ENSG00000062194',\n",
" 'ENSG00000062282',\n",
" 'ENSG00000062370',\n",
" 'ENSG00000062485',\n",
" 'ENSG00000062524',\n",
" 'ENSG00000062582',\n",
" 'ENSG00000062598',\n",
" 'ENSG00000062650',\n",
" 'ENSG00000062716',\n",
" 'ENSG00000062725',\n",
" 'ENSG00000062822',\n",
" 'ENSG00000063015',\n",
" 'ENSG00000063046',\n",
" 'ENSG00000063127',\n",
" 'ENSG00000063169',\n",
" 'ENSG00000063176',\n",
" 'ENSG00000063177',\n",
" 'ENSG00000063180',\n",
" 'ENSG00000063241',\n",
" 'ENSG00000063244',\n",
" 'ENSG00000063245',\n",
" 'ENSG00000063322',\n",
" 'ENSG00000063438',\n",
" 'ENSG00000063515',\n",
" 'ENSG00000063587',\n",
" 'ENSG00000063601',\n",
" 'ENSG00000063660',\n",
" 'ENSG00000063761',\n",
" 'ENSG00000063854',\n",
" 'ENSG00000063978',\n",
" 'ENSG00000064012',\n",
" 'ENSG00000064042',\n",
" 'ENSG00000064102',\n",
" 'ENSG00000064115',\n",
" 'ENSG00000064195',\n",
" 'ENSG00000064199',\n",
" 'ENSG00000064201',\n",
" 'ENSG00000064205',\n",
" 'ENSG00000064218',\n",
" 'ENSG00000064225',\n",
" 'ENSG00000064270',\n",
" 'ENSG00000064300',\n",
" 'ENSG00000064309',\n",
" 'ENSG00000064313',\n",
" 'ENSG00000064393',\n",
" 'ENSG00000064419',\n",
" 'ENSG00000064489',\n",
" 'ENSG00000064490',\n",
" 'ENSG00000064545',\n",
" 'ENSG00000064547',\n",
" 'ENSG00000064601',\n",
" 'ENSG00000064607',\n",
" 'ENSG00000064651',\n",
" 'ENSG00000064652',\n",
" 'ENSG00000064655',\n",
" 'ENSG00000064666',\n",
" 'ENSG00000064687',\n",
" 'ENSG00000064692',\n",
" 'ENSG00000064703',\n",
" 'ENSG00000064726',\n",
" 'ENSG00000064763',\n",
" 'ENSG00000064787',\n",
" 'ENSG00000064835',\n",
" 'ENSG00000064886',\n",
" 'ENSG00000064932',\n",
" 'ENSG00000064933',\n",
" 'ENSG00000064961',\n",
" 'ENSG00000064989',\n",
" 'ENSG00000064995',\n",
" 'ENSG00000064999',\n",
" 'ENSG00000065000',\n",
" 'ENSG00000065029',\n",
" 'ENSG00000065054',\n",
" 'ENSG00000065057',\n",
" 'ENSG00000065060',\n",
" 'ENSG00000065135',\n",
" 'ENSG00000065150',\n",
" 'ENSG00000065154',\n",
" 'ENSG00000065183',\n",
" 'ENSG00000065243',\n",
" 'ENSG00000065268',\n",
" 'ENSG00000065308',\n",
" 'ENSG00000065320',\n",
" 'ENSG00000065325',\n",
" 'ENSG00000065328',\n",
" 'ENSG00000065357',\n",
" 'ENSG00000065361',\n",
" 'ENSG00000065371',\n",
" 'ENSG00000065413',\n",
" 'ENSG00000065427',\n",
" 'ENSG00000065457',\n",
" 'ENSG00000065485',\n",
" 'ENSG00000065491',\n",
" 'ENSG00000065518',\n",
" 'ENSG00000065526',\n",
" 'ENSG00000065534',\n",
" 'ENSG00000065548',\n",
" 'ENSG00000065559',\n",
" 'ENSG00000065600',\n",
" 'ENSG00000065609',\n",
" 'ENSG00000065613',\n",
" 'ENSG00000065615',\n",
" 'ENSG00000065618',\n",
" 'ENSG00000065621',\n",
" 'ENSG00000065665',\n",
" 'ENSG00000065675',\n",
" 'ENSG00000065717',\n",
" 'ENSG00000065802',\n",
" 'ENSG00000065809',\n",
" 'ENSG00000065833',\n",
" 'ENSG00000065882',\n",
" 'ENSG00000065883',\n",
" 'ENSG00000065911',\n",
" 'ENSG00000065923',\n",
" 'ENSG00000065970',\n",
" 'ENSG00000065978',\n",
" 'ENSG00000065989',\n",
" 'ENSG00000066027',\n",
" 'ENSG00000066032',\n",
" 'ENSG00000066044',\n",
" 'ENSG00000066056',\n",
" 'ENSG00000066084',\n",
" 'ENSG00000066117',\n",
" 'ENSG00000066135',\n",
" 'ENSG00000066136',\n",
" 'ENSG00000066185',\n",
" 'ENSG00000066230',\n",
" 'ENSG00000066248',\n",
" 'ENSG00000066279',\n",
" 'ENSG00000066294',\n",
" 'ENSG00000066322',\n",
" 'ENSG00000066336',\n",
" 'ENSG00000066379',\n",
" 'ENSG00000066382',\n",
" 'ENSG00000066405',\n",
" 'ENSG00000066422',\n",
" 'ENSG00000066427',\n",
" 'ENSG00000066455',\n",
" 'ENSG00000066468',\n",
" 'ENSG00000066557',\n",
" 'ENSG00000066583',\n",
" 'ENSG00000066629',\n",
" 'ENSG00000066651',\n",
" 'ENSG00000066654',\n",
" 'ENSG00000066697',\n",
" 'ENSG00000066735',\n",
" 'ENSG00000066739',\n",
" 'ENSG00000066777',\n",
" 'ENSG00000066813',\n",
" 'ENSG00000066827',\n",
" 'ENSG00000066855',\n",
" 'ENSG00000066923',\n",
" 'ENSG00000066926',\n",
" 'ENSG00000066933',\n",
" 'ENSG00000067048',\n",
" 'ENSG00000067057',\n",
" 'ENSG00000067064',\n",
" 'ENSG00000067066',\n",
" 'ENSG00000067082',\n",
" 'ENSG00000067113',\n",
" 'ENSG00000067141',\n",
" 'ENSG00000067167',\n",
" ...]"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loaded = np.load(filename,allow_pickle=True)\n",
"list_genes = loaded[\"genes\"].astype(str).tolist()\n",
"list_genes"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"56602"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"NB_VARS=len(list_genes) \n",
"NB_VARS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Mapping with the Gene Ontology"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"R[write to console]: 'select()' returned 1:many mapping between keys and columns\n",
"\n"
]
}
],
"source": [
"df_All = getInformations(annot_ref=annot_ref_db,inputId=robjects.StrVector(list_genes),type_var=keyRef) # "
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<rpy2.rinterface_lib.sexp.NULLType object at 0x7fec80543b48> [RTYPES.NILSXP]"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_All.to_csvfile(os.path.join(dir_files,\"annotations-gene-GO.csv\"))"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 178 ms, sys: 7.24 ms, total: 185 ms\n",
"Wall time: 183 ms\n"
]
}
],
"source": [
"%%time\n",
"tmp = pd.read_csv(os.path.join(dir_files,\"annotations-gene-GO.csv\"),dtype={\"GO\":object,\"ENTREZID\":object})"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ENSEMBL</th>\n",
" <th>ENTREZID</th>\n",
" <th>GO</th>\n",
" <th>EVIDENCE</th>\n",
" <th>ONTOLOGY</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ENSG00000000003</td>\n",
" <td>7105</td>\n",
" <td>GO:0005515</td>\n",
" <td>IPI</td>\n",
" <td>MF</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ENSG00000000003</td>\n",
" <td>7105</td>\n",
" <td>GO:0005887</td>\n",
" <td>IBA</td>\n",
" <td>CC</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ENSG00000000003</td>\n",
" <td>7105</td>\n",
" <td>GO:0039532</td>\n",
" <td>IMP</td>\n",
" <td>BP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ENSG00000000003</td>\n",
" <td>7105</td>\n",
" <td>GO:0043123</td>\n",
" <td>HMP</td>\n",
" <td>BP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>ENSG00000000003</td>\n",
" <td>7105</td>\n",
" <td>GO:0070062</td>\n",
" <td>HDA</td>\n",
" <td>CC</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ENSEMBL ENTREZID GO EVIDENCE ONTOLOGY\n",
"1 ENSG00000000003 7105 GO:0005515 IPI MF\n",
"2 ENSG00000000003 7105 GO:0005887 IBA CC\n",
"3 ENSG00000000003 7105 GO:0039532 IMP BP\n",
"4 ENSG00000000003 7105 GO:0043123 HMP BP\n",
"5 ENSG00000000003 7105 GO:0070062 HDA CC"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tmp.head()"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"(372639, 5)"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tmp.shape"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"tmp = tmp.dropna()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Select a subontology"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"onto_bp = tmp[tmp.ONTOLOGY == SUBONTOLOGY]"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(158050, 5)"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"onto_bp.shape"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"18427"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"NB_VARS_REAL_LINKED_TO_GO = len(onto_bp[keyRef].unique())\n",
"NB_VARS_REAL_LINKED_TO_GO"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('Number of genes missing :', 38175)"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\"Number of genes missing :\", NB_VARS - NB_VARS_REAL_LINKED_TO_GO)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"67.44461326454896"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(NB_VARS-NB_VARS_REAL_LINKED_TO_GO)/NB_VARS*100"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv-pytorch",
"language": "python",
"name": "venv-pytorch"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
This diff could not be displayed because it is too large.
This diff could not be displayed because it is too large.
This diff could not be displayed because it is too large.
......@@ -90,7 +90,7 @@ def train(args):
print("Launching the learning")
device = torch.device(args.device)
model = Net(n_genes=args.n_inputs,n_nodes=args.n_nodes,n_nodes_annot=args.n_nodes_annotated,n_nodes_emb=args.dim_init,n_classes=args.n_classes,
n_prop1=args.n_prop1,adj_mat_fc1=connection_matrix.values,mask=args.mask,ratio=args.selection_ratio).to(device)
n_prop1=args.n_prop1,adj_mat_fc1=connection_matrix.values,selection=args.selection_op,ratio=args.selection_ratio).to(device)
print(model)
print("(model mem allocation) - Memory available : {:.2e}".format(torch.cuda.memory_reserved(0)-torch.cuda.memory_allocated(0)))
......@@ -309,7 +309,7 @@ def main():
parser.add_argument('--n_classes', type=int, default=1, help="number of classes")
# -- Learning and Hyperparameters --
parser.add_argument('--mask', type=str, default=None, help='type of selection (random,top)')
parser.add_argument('--selection_op', type=str, default=None, help='type of selection (random,top)')
parser.add_argument('--selection_ratio', type=float, default=0.5, help='selection ratio')
parser.add_argument('--optimizer', type=str, default='adam', help="optimizer {adam, momentum, adagrad, rmsprop}")
parser.add_argument('--lr', type=float, default=0.001, help='learning rate')
......@@ -329,10 +329,10 @@ def main():
if not(os.path.isdir(args.dir_log)):
os.mkdir(args.dir_log)
if args.mask:
args.dir_save=os.path.join(args.dir_log,'GraphGONet_MASK={}_SELECTRATIO={}'.format(args.mask,args.selection_ratio))
if args.selection_op:
args.dir_save=os.path.join(args.dir_log,'GraphGONet_SELECTOP={}_SELECTRATIO={}'.format(args.selection_op,args.selection_ratio))
else:
args.dir_save=os.path.join(args.dir_log,'GraphGONet_MASK={}'.format(args.mask))
args.dir_save=os.path.join(args.dir_log,'GraphGONet_SELECTOP={}'.format(args.selection_op))
if args.n_samples:
args.dir_save+="_N_SAMPLES={}".format(args.n_samples)
......
......@@ -189,9 +189,9 @@ def concatenate_and_mask(x: Tensor, batch: Tensor, idx_nodes_kept : Tensor, num_
output[i,mask]=x[i*num_nodes_kept[i]:(i+1)*num_nodes_kept[i]].view(-1) #shape : (num_nodes_by_graph,1) -> (_,max_num_nodes)
return output
class NoSelection(torch.nn.Module):
class Mask(torch.nn.Module):
def __init__(self, in_channels, method, n_nodes, **kwargs):
super(NoSelection, self).__init__()
super(Mask, self).__init__()
self.method = method
self.in_channels = in_channels
if self.method.__name__ == "global_mean_pool":
......@@ -211,8 +211,8 @@ class NoSelection(torch.nn.Module):
class Net(torch.nn.Module):
def __init__(self,n_genes,n_nodes,n_nodes_annot,n_nodes_emb,n_prop1,n_classes,adj_mat_fc1,
propagation="DAGProp",mask=None,ratio=1.0,
selection="concatenate_and_mask"):
propagation="DAGProp",selection=None,ratio=1.0,
mask="concatenate_and_mask"):
super(Net, self).__init__()
self.n_genes = n_genes
self.n_nodes = n_nodes
......@@ -226,15 +226,15 @@ class Net(torch.nn.Module):
with torch.no_grad():
self.fc1.weight.mul_(self.adj_mat_fc1) #mask all the connections btw genes and neurons that do not represent GO annotations
self.propagation = eval(propagation)(in_channels=n_nodes_emb, out_channels=n_prop1,aggr = "mean") # expected dim: [nSamples, nNodes, nChannels]
if mask:
if selection:
self.ratio = ratio
if mask=="random":
self.mask = RandomSelection(in_channels=n_prop1,ratio=ratio)
elif mask=="top":
self.mask = TopSelection(in_channels=n_prop1,ratio=ratio)
if selection=="random":
self.selection = RandomSelection(in_channels=n_prop1,ratio=ratio)
elif selection=="top":
self.selection = TopSelection(in_channels=n_prop1,ratio=ratio)
else:
selection="concatenate"
self.selection = NoSelection(method=globals()[selection],in_channels=n_prop1,n_nodes=n_nodes) #option no selection => concatenate
mask="concatenate"
self.mask = Mask(method=globals()[mask],in_channels=n_prop1,n_nodes=n_nodes) #option no selection => concatenate
self.fc2 = Linear(in_features=n_nodes,out_features=n_classes)
def forward(self,transcriptomic_data,graph_data):
......@@ -247,10 +247,13 @@ class Net(torch.nn.Module):
num_nodes = scatter_add(batch.new_ones(x.size(0),dtype=torch.int16), batch, dim=0)
if self.mask:
x, edge_index, _, batch,idx_nodes_kept,_ = self.mask(x, edge_index, None, batch)
x = self.selection(x,batch,idx_nodes_kept,num_nodes)
if self.selection:
x, edge_index, _, batch,idx_nodes_kept,_ = self.selection(x, edge_index, None, batch)
if self.mask.method.__name__ == "concatenate_and_mask":
x = self.mask(x,batch,idx_nodes_kept,num_nodes)
else:
x = self.mask(x,batch)
x = self.fc2(x)
if self.n_classes>=2:
......