Louis BECQUEY

beta 1.5 pre-commit for docker building

...@@ -18,9 +18,7 @@ Dockerfile ...@@ -18,9 +18,7 @@ Dockerfile
18 LICENSE 18 LICENSE
19 CHANGELOG 19 CHANGELOG
20 *.md 20 *.md
21 -scripts/automate.sh 21 +scripts/*.sh
22 -scripts/kill_rnanet.sh
23 -scripts/build_docker_image.sh
24 scripts/*.tar 22 scripts/*.tar
25 scripts/measure.py 23 scripts/measure.py
26 scripts/recompute_some_chains.py 24 scripts/recompute_some_chains.py
......
1 # execution outputs: 1 # execution outputs:
2 nohup.out 2 nohup.out
3 log_of_the_run.sh 3 log_of_the_run.sh
4 +latest_run.log
4 5
5 # results 6 # results
6 results/* 7 results/*
......
...@@ -25,13 +25,13 @@ RUN apk update && apk add --no-cache \ ...@@ -25,13 +25,13 @@ RUN apk update && apk add --no-cache \
25 \ 25 \
26 mv /RNANet/scripts/x3dna-dssr /usr/local/bin/x3dna-dssr && chmod +x /usr/local/bin/x3dna-dssr && \ 26 mv /RNANet/scripts/x3dna-dssr /usr/local/bin/x3dna-dssr && chmod +x /usr/local/bin/x3dna-dssr && \
27 \ 27 \
28 - curl -SL http://eddylab.org/infernal/infernal-1.1.3.tar.gz | tar xz && cd infernal-1.1.3 && \ 28 + curl -SL http://eddylab.org/infernal/infernal-1.1.4.tar.gz | tar xz && cd infernal-1.1.4 && \
29 ./configure && make -j 16 && make install && cd easel && make install && cd / && \ 29 ./configure && make -j 16 && make install && cd easel && make install && cd / && \
30 \ 30 \
31 curl -SL https://github.com/epruesse/SINA/releases/download/v1.7.1/sina-1.7.1-linux.tar.gz | tar xz && mv sina-1.7.1-linux /sina && \ 31 curl -SL https://github.com/epruesse/SINA/releases/download/v1.7.1/sina-1.7.1-linux.tar.gz | tar xz && mv sina-1.7.1-linux /sina && \
32 ln -s /sina/bin/sina /usr/local/bin/sina && \ 32 ln -s /sina/bin/sina /usr/local/bin/sina && \
33 \ 33 \
34 - rm -rf /infernal-1.1.3 && \ 34 + rm -rf /infernal-1.1.4 && \
35 \ 35 \
36 apk del openblas-dev gcc g++ gfortran binutils \ 36 apk del openblas-dev gcc g++ gfortran binutils \
37 curl \ 37 curl \
......
...@@ -10,16 +10,16 @@ ...@@ -10,16 +10,16 @@
10 # Required computational resources 10 # Required computational resources
11 - CPU: no requirements. The program is optimized for multi-core CPUs, you might want to use Intel Xeons, AMD Ryzens, etc. 11 - CPU: no requirements. The program is optimized for multi-core CPUs, you might want to use Intel Xeons, AMD Ryzens, etc.
12 - GPU: not required 12 - GPU: not required
13 -- RAM: 16 GB with a large swap partition is okay. 32 GB is recommended (usage peaks at ~27 GB) 13 +- RAM: 16 GB with a large swap partition is okay. 32 GB is recommended (usage peaks at ~27 GB, but this number depends on your number of CPU cores)
14 - Storage: to date, it takes 60 GB for the 3D data (36 GB if you don't use the --extract option), 11 GB for the sequence data, and 7GB for the outputs (5.6 GB database, 1 GB archive of CSV files). You need to add a few more for the dependencies. Pick a 100GB partition and you are good to go. The computation speed is way better if you use a fast storage device (e.g. SSD instead of hard drive, or even better, a NVMe SSD) because of constant I/O with the SQlite database. 14 - Storage: to date, it takes 60 GB for the 3D data (36 GB if you don't use the --extract option), 11 GB for the sequence data, and 7GB for the outputs (5.6 GB database, 1 GB archive of CSV files). You need to add a few more for the dependencies. Pick a 100GB partition and you are good to go. The computation speed is way better if you use a fast storage device (e.g. SSD instead of hard drive, or even better, a NVMe SSD) because of constant I/O with the SQlite database.
15 - Network : We query the Rfam public MySQL server on port 4497. Make sure your network enables communication (there should not be any issue on private networks, but maybe you company/university closes ports by default). You will get an error message if the port is not open. Around 30 GB of data is downloaded. 15 - Network : We query the Rfam public MySQL server on port 4497. Make sure your network enables communication (there should not be any issue on private networks, but maybe you company/university closes ports by default). You will get an error message if the port is not open. Around 30 GB of data is downloaded.
16 16
17 # Method 1 : Installation using Docker 17 # Method 1 : Installation using Docker
18 18
19 -* Step 1 : Download the [Docker container](https://entrepot.ibisc.univ-evry.fr/d/1aff90a9ef214a19b848/files/?p=/rnanet_v1.3_docker.tar&dl=1). Open a terminal and move to the appropriate directory. 19 +* Step 1 : Download the [Docker container](https://entrepot.ibisc.univ-evry.fr/d/1aff90a9ef214a19b848/files/?p=/rnanet_v1.5b_docker.tar&dl=1). Open a terminal and move to the appropriate directory.
20 * Step 2 : Extract the archive to a Docker image named *rnanet* in your local installation 20 * Step 2 : Extract the archive to a Docker image named *rnanet* in your local installation
21 ``` 21 ```
22 -$ docker load -i rnanet_v1.3_docker.tar 22 +$ docker load -i rnanet_v1.5b_docker.tar
23 ``` 23 ```
24 * Step 3 : Run the container, giving it 3 folders to mount as volumes: a first to store the 3D data, a second to store the sequence data and alignments, and a third to output the results, data and logs: 24 * Step 3 : Run the container, giving it 3 folders to mount as volumes: a first to store the 3D data, a second to store the sequence data and alignments, and a third to output the results, data and logs:
25 ``` 25 ```
...@@ -36,7 +36,7 @@ nohup bash -c 'time docker run --rm -v /path/to/3D/data/folder:/3D -v /path/to/s ...@@ -36,7 +36,7 @@ nohup bash -c 'time docker run --rm -v /path/to/3D/data/folder:/3D -v /path/to/s
36 36
37 You need to install the dependencies: 37 You need to install the dependencies:
38 - DSSR, you need to register to the X3DNA forum [here](http://forum.x3dna.org/site-announcements/download-instructions/) and then download the DSSR binary [on that page](http://forum.x3dna.org/downloads/3dna-download/). Make sure to have the `x3dna-dssr` binary in your $PATH variable so that RNANet.py finds it. 38 - DSSR, you need to register to the X3DNA forum [here](http://forum.x3dna.org/site-announcements/download-instructions/) and then download the DSSR binary [on that page](http://forum.x3dna.org/downloads/3dna-download/). Make sure to have the `x3dna-dssr` binary in your $PATH variable so that RNANet.py finds it.
39 -- Infernal, to download at [Eddylab](http://eddylab.org/infernal/), several options are available depending on your preferences. Make sure to have the `cmalign`, `esl-alimanip`, `esl-alipid` and `esl-reformat` binaries in your $PATH variable, so that RNANet.py can find them. 39 +- Infernal, to download at [Eddylab](http://eddylab.org/infernal/), several options are available depending on your preferences. Make sure to have the `cmalign`, `cmfetch`, `cmbuild`, `esl-alimanip`, `esl-alipid` and `esl-reformat` binaries in your $PATH variable, so that RNANet.py can find them.
40 - SINA, follow [these instructions](https://sina.readthedocs.io/en/latest/install.html) for example. Make sure to have the `sina` binary in your $PATH. 40 - SINA, follow [these instructions](https://sina.readthedocs.io/en/latest/install.html) for example. Make sure to have the `sina` binary in your $PATH.
41 - Sqlite 3, available under the name *sqlite* in every distro's package manager, 41 - Sqlite 3, available under the name *sqlite* in every distro's package manager,
42 - Python >= 3.8, (Unfortunately, python3.6 is no longer supported, because of changes in the multiprocessing and Threading packages. Untested with Python 3.7.\*) 42 - Python >= 3.8, (Unfortunately, python3.6 is no longer supported, because of changes in the multiprocessing and Threading packages. Untested with Python 3.7.\*)
...@@ -112,13 +112,14 @@ The most useful options in that list are ...@@ -112,13 +112,14 @@ The most useful options in that list are
112 * Computation of sequence identity matrices 112 * Computation of sequence identity matrices
113 * Statistics over the sequence lengths, nucleotide frequencies, and basepair types by RNA family 113 * Statistics over the sequence lengths, nucleotide frequencies, and basepair types by RNA family
114 * Overall database content statistics 114 * Overall database content statistics
115 - * Detailed analysis of the eta-theta pseudotorsion angles (use `--stats-opts "--wadley"` after `-s`) or 3D distance matrices and their averages per family (use `--stats-opts "--distance-matrices"`) 115 + * Detailed analysis of the eta-theta pseudotorsion angles (use `--stats-opts="--wadley"` after `-s`) or 3D distance matrices and their averages per family (use `--stats-opts="--distance-matrices"`)
116 * ` --redundant`, to yield all the available data and not only the BGSU NR-List respresentatives 116 * ` --redundant`, to yield all the available data and not only the BGSU NR-List respresentatives
117 117
118 # Computation time 118 # Computation time
119 119
120 To give you an estimation, our last full run took exactly 12h, excluding the time to download the MMCIF files containing RNA (around 25GB to download) and the time to compute statistics. 120 To give you an estimation, our last full run took exactly 12h, excluding the time to download the MMCIF files containing RNA (around 25GB to download) and the time to compute statistics.
121 Measured the 23rd of June 2020 on a 16-core AMD Ryzen 7 3700X CPU @3.60GHz, plus 32 Go RAM, and a 7200rpm Hard drive. Total CPU time spent: 135 hours (user+kernel modes), corresponding to 12h (actual time spent with the 16-core CPU). 121 Measured the 23rd of June 2020 on a 16-core AMD Ryzen 7 3700X CPU @3.60GHz, plus 32 Go RAM, and a 7200rpm Hard drive. Total CPU time spent: 135 hours (user+kernel modes), corresponding to 12h (actual time spent with the 16-core CPU).
122 +Another recent full run, including the MMCIF downloads and computation of heavy statistics (`--wadley --distance-matrices`) last 13h (real time) on a 60-core Xeon E7-4850v4@2.10GHz and 120 Go of RAM. The user+kernel time was about 300h.
122 123
123 Update runs are much quicker, around 3 hours. It depends mostly on what RNA families are concerned by the update. 124 Update runs are much quicker, around 3 hours. It depends mostly on what RNA families are concerned by the update.
124 125
...@@ -135,9 +136,11 @@ By default, this computes: ...@@ -135,9 +136,11 @@ By default, this computes:
135 * Statistics over the sequence lengths, nucleotide frequencies, and basepair types by RNA family 136 * Statistics over the sequence lengths, nucleotide frequencies, and basepair types by RNA family
136 * Overall database content statistics 137 * Overall database content statistics
137 138
139 +If you have run RNANet once with option `--extract`, additionally, you can compute more by passing the options:
140 +* With option `--distance-matrices` to compute pairwise residue distances within the chain for every chain, and compute average and standard deviations by RNA families. This is supposed to capture the average shape of an RNA family. The distance matrices are the size of the family's covariance model (match states). Unresolved nucleotides or deletions to the covariance model are NaNs.
141 +
138 If you have run RNANet once with options `--no-homology` and `--extract`, you unlock new statistics over unmapped chains. 142 If you have run RNANet once with options `--no-homology` and `--extract`, you unlock new statistics over unmapped chains.
139 * You will be allowed to use option `--wadley` to reproduce Wadley & al. (2007) results automatically. These are clustering results of the pseudotorsions angles of the backbone. 143 * You will be allowed to use option `--wadley` to reproduce Wadley & al. (2007) results automatically. These are clustering results of the pseudotorsions angles of the backbone.
140 -* (experimental) You will be allowed to use option `--distance-matrices` to compute pairwise residue distances within the chain for every chain, and compute average and standard deviations by RNA families. This is supposed to capture the average shape of an RNA family.
141 144
142 # Output files 145 # Output files
143 146
......
...@@ -969,6 +969,7 @@ class Pipeline: ...@@ -969,6 +969,7 @@ class Pipeline:
969 self.REUSE_ALL = False 969 self.REUSE_ALL = False
970 self.REDUNDANT = False 970 self.REDUNDANT = False
971 self.ALIGNOPTS = None 971 self.ALIGNOPTS = None
972 + self.STATSOPTS = None
972 self.USESINA = False 973 self.USESINA = False
973 self.SELECT_ONLY = None 974 self.SELECT_ONLY = None
974 self.ARCHIVE = False 975 self.ARCHIVE = False
...@@ -1102,6 +1103,8 @@ class Pipeline: ...@@ -1102,6 +1103,8 @@ class Pipeline:
1102 self.REUSE_ALL = True 1103 self.REUSE_ALL = True
1103 elif opt == "cmalign-opts": 1104 elif opt == "cmalign-opts":
1104 self.ALIGNOPTS = arg 1105 self.ALIGNOPTS = arg
1106 + elif opt == "stats-opts":
1107 + self.STATSOPTS = " ".split(arg)
1105 elif opt == "--all": 1108 elif opt == "--all":
1106 self.REUSE_ALL = True 1109 self.REUSE_ALL = True
1107 self.USE_KNOWN_ISSUES = False 1110 self.USE_KNOWN_ISSUES = False
...@@ -1545,9 +1548,12 @@ class Pipeline: ...@@ -1545,9 +1548,12 @@ class Pipeline:
1545 1548
1546 # Run statistics files 1549 # Run statistics files
1547 subprocess.run([python_executable, fileDir+"/scripts/regression.py", runDir + "/results/RNANet.db"]) 1550 subprocess.run([python_executable, fileDir+"/scripts/regression.py", runDir + "/results/RNANet.db"])
1548 - subprocess.run([python_executable, fileDir+"/statistics.py", "--3d-folder", path_to_3D_data, 1551 + if self.STATSOPTS is None:
1552 + subprocess.run([python_executable, fileDir+"/statistics.py", "--3d-folder", path_to_3D_data,
1549 "--seq-folder", path_to_seq_data, "-r", str(self.CRYSTAL_RES)]) 1553 "--seq-folder", path_to_seq_data, "-r", str(self.CRYSTAL_RES)])
1550 - 1554 + else:
1555 + subprocess.run([python_executable, fileDir+"/statistics.py", "--3d-folder", path_to_3D_data,
1556 + "--seq-folder", path_to_seq_data, "-r", str(self.CRYSTAL_RES)] + self.STATSOPTS)
1551 # Save additional informations 1557 # Save additional informations
1552 with sqlite3.connect(runDir+"/results/RNANet.db") as conn: 1558 with sqlite3.connect(runDir+"/results/RNANet.db") as conn:
1553 conn.execute('pragma journal_mode=wal') 1559 conn.execute('pragma journal_mode=wal')
......
1 +6ydp_1_AA_1176-2737
2 +6ydw_1_AA_1176-2737
1 2z9q_1_A_1-72 3 2z9q_1_A_1-72
2 1ml5_1_b_5-121 4 1ml5_1_b_5-121
3 1ml5_1_a_1-2914 5 1ml5_1_a_1-2914
...@@ -9,6 +11,9 @@ ...@@ -9,6 +11,9 @@
9 1qza_1_B_1-73 11 1qza_1_B_1-73
10 1ls2_1_B_1-73 12 1ls2_1_B_1-73
11 1gsg_1_T_1-72 13 1gsg_1_T_1-72
14 +7d1a_1_A_805-902
15 +7d0g_1_A_805-913
16 +7d0f_1_A_817-913
12 3jcr_1_H_1-115 17 3jcr_1_H_1-115
13 1vy7_1_AY_1-73 18 1vy7_1_AY_1-73
14 1vy7_1_CY_1-73 19 1vy7_1_CY_1-73
...@@ -18,15 +23,21 @@ ...@@ -18,15 +23,21 @@
18 4v48_1_A9_3-118 23 4v48_1_A9_3-118
19 4v47_1_A9_3-118 24 4v47_1_A9_3-118
20 2ob7_1_A_10-319 25 2ob7_1_A_10-319
21 -1x1l_1_A_1-132 26 +1x1l_1_A_1-130
22 -1zc8_1_Z_1-93 27 +1zc8_1_Z_1-91
23 -2ob7_1_D_1-132 28 +2ob7_1_D_1-130
24 -4v42_1_BB_5-121
25 4v42_1_BA_1-2914 29 4v42_1_BA_1-2914
30 +4v42_1_BB_5-121
26 1r2x_1_C_1-58 31 1r2x_1_C_1-58
27 1r2w_1_C_1-58 32 1r2w_1_C_1-58
28 1eg0_1_L_1-56 33 1eg0_1_L_1-56
29 -5zzm_1_N_1-2904 34 +3dg2_1_A_1-1542
35 +3dg0_1_A_1-1542
36 +4v48_1_BA_1-1543
37 +4v47_1_BA_1-1542
38 +3dg4_1_A_1-1542
39 +3dg5_1_A_1-1542
40 +5zzm_1_N_1-2903
30 2rdo_1_B_1-2904 41 2rdo_1_B_1-2904
31 3dg2_1_B_1-2904 42 3dg2_1_B_1-2904
32 3dg0_1_B_1-2904 43 3dg0_1_B_1-2904
...@@ -34,21 +45,17 @@ ...@@ -34,21 +45,17 @@
34 4v47_1_A0_1-2904 45 4v47_1_A0_1-2904
35 3dg4_1_B_1-2904 46 3dg4_1_B_1-2904
36 3dg5_1_B_1-2904 47 3dg5_1_B_1-2904
37 -3dg2_1_A_1-1542
38 -3dg0_1_A_1-1542
39 -4v48_1_BA_1-1543
40 -4v47_1_BA_1-1542
41 -3dg4_1_A_1-1542
42 -3dg5_1_A_1-1542
43 1eg0_1_O_1-73 48 1eg0_1_O_1-73
44 1zc8_1_A_1-59 49 1zc8_1_A_1-59
45 -1mvr_1_D_1-61
46 -4adx_1_9_1-123
47 -1zn1_1_B_1-59
48 1jgq_1_A_2-1520 50 1jgq_1_A_2-1520
49 4v42_1_AA_2-1520 51 4v42_1_AA_2-1520
50 1jgo_1_A_2-1520 52 1jgo_1_A_2-1520
51 1jgp_1_A_2-1520 53 1jgp_1_A_2-1520
54 +1mvr_1_D_1-59
55 +4c9d_1_D_29-1
56 +4c9d_1_C_29-1
57 +4adx_1_9_1-121
58 +1zn1_1_B_1-59
52 1emi_1_B_1-108 59 1emi_1_B_1-108
53 3iy9_1_A_498-1027 60 3iy9_1_A_498-1027
54 3ep2_1_B_1-50 61 3ep2_1_B_1-50
...@@ -61,7 +68,7 @@ ...@@ -61,7 +68,7 @@
61 3cw1_1_V_1-138 68 3cw1_1_V_1-138
62 3cw1_1_v_1-138 69 3cw1_1_v_1-138
63 2iy3_1_B_9-105 70 2iy3_1_B_9-105
64 -3jcr_1_N_1-107 71 +3jcr_1_N_1-106
65 2vaz_1_A_64-177 72 2vaz_1_A_64-177
66 2ftc_1_R_81-1466 73 2ftc_1_R_81-1466
67 3jcr_1_M_1-141 74 3jcr_1_M_1-141
...@@ -70,9 +77,10 @@ ...@@ -70,9 +77,10 @@
70 3iy8_1_A_1-540 77 3iy8_1_A_1-540
71 4v5z_1_BY_2-113 78 4v5z_1_BY_2-113
72 4v5z_1_BZ_1-70 79 4v5z_1_BZ_1-70
73 -4v5z_1_B1_2-125 80 +4v5z_1_B1_2-123
74 -4adx_1_0_1-2925 81 +1mvr_1_B_1-96
75 -1mvr_1_B_3-96 82 +4adx_1_0_1-2923
76 3eq4_1_Y_1-69 83 3eq4_1_Y_1-69
77 -6uz7_1_8_2140-2827 84 +7a5p_1_2_259-449
85 +6uz7_1_8_2140-2825
78 4v5z_1_AA_1-1563 86 4v5z_1_AA_1-1563
......
1 +6ydp_1_AA_1176-2737
2 +Could not find nucleotides of chain AA in annotation 6ydp.json. Either there is a problem with 6ydp mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
3 +
4 +6ydw_1_AA_1176-2737
5 +Could not find nucleotides of chain AA in annotation 6ydw.json. Either there is a problem with 6ydw mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
6 +
1 2z9q_1_A_1-72 7 2z9q_1_A_1-72
2 DSSR warning 2z9q.json: no nucleotides found. Ignoring 2z9q_1_A_1-72. 8 DSSR warning 2z9q.json: no nucleotides found. Ignoring 2z9q_1_A_1-72.
3 9
...@@ -31,6 +37,15 @@ DSSR warning 1ls2.json: no nucleotides found. Ignoring 1ls2_1_B_1-73. ...@@ -31,6 +37,15 @@ DSSR warning 1ls2.json: no nucleotides found. Ignoring 1ls2_1_B_1-73.
31 1gsg_1_T_1-72 37 1gsg_1_T_1-72
32 DSSR warning 1gsg.json: no nucleotides found. Ignoring 1gsg_1_T_1-72. 38 DSSR warning 1gsg.json: no nucleotides found. Ignoring 1gsg_1_T_1-72.
33 39
40 +7d1a_1_A_805-902
41 +Could not find nucleotides of chain A in annotation 7d1a.json. Either there is a problem with 7d1a mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
42 +
43 +7d0g_1_A_805-913
44 +Could not find nucleotides of chain A in annotation 7d0g.json. Either there is a problem with 7d0g mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
45 +
46 +7d0f_1_A_817-913
47 +Could not find nucleotides of chain A in annotation 7d0f.json. Either there is a problem with 7d0f mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
48 +
34 3jcr_1_H_1-115 49 3jcr_1_H_1-115
35 DSSR warning 3jcr.json: no nucleotides found. Ignoring 3jcr_1_H_1-115. 50 DSSR warning 3jcr.json: no nucleotides found. Ignoring 3jcr_1_H_1-115.
36 51
...@@ -58,21 +73,21 @@ DSSR warning 4v47.json: no nucleotides found. Ignoring 4v47_1_A9_3-118. ...@@ -58,21 +73,21 @@ DSSR warning 4v47.json: no nucleotides found. Ignoring 4v47_1_A9_3-118.
58 2ob7_1_A_10-319 73 2ob7_1_A_10-319
59 DSSR warning 2ob7.json: no nucleotides found. Ignoring 2ob7_1_A_10-319. 74 DSSR warning 2ob7.json: no nucleotides found. Ignoring 2ob7_1_A_10-319.
60 75
61 -1x1l_1_A_1-132 76 +1x1l_1_A_1-130
62 -DSSR warning 1x1l.json: no nucleotides found. Ignoring 1x1l_1_A_1-132. 77 +DSSR warning 1x1l.json: no nucleotides found. Ignoring 1x1l_1_A_1-130.
63 -
64 -1zc8_1_Z_1-93
65 -DSSR warning 1zc8.json: no nucleotides found. Ignoring 1zc8_1_Z_1-93.
66 78
67 -2ob7_1_D_1-132 79 +1zc8_1_Z_1-91
68 -DSSR warning 2ob7.json: no nucleotides found. Ignoring 2ob7_1_D_1-132. 80 +DSSR warning 1zc8.json: no nucleotides found. Ignoring 1zc8_1_Z_1-91.
69 81
70 -4v42_1_BB_5-121 82 +2ob7_1_D_1-130
71 -Could not find nucleotides of chain BB in annotation 4v42.json. Either there is a problem with 4v42 mmCIF download, or the bases are not resolved in the structure. Delete it and retry. 83 +DSSR warning 2ob7.json: no nucleotides found. Ignoring 2ob7_1_D_1-130.
72 84
73 4v42_1_BA_1-2914 85 4v42_1_BA_1-2914
74 Could not find nucleotides of chain BA in annotation 4v42.json. Either there is a problem with 4v42 mmCIF download, or the bases are not resolved in the structure. Delete it and retry. 86 Could not find nucleotides of chain BA in annotation 4v42.json. Either there is a problem with 4v42 mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
75 87
88 +4v42_1_BB_5-121
89 +Could not find nucleotides of chain BB in annotation 4v42.json. Either there is a problem with 4v42 mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
90 +
76 1r2x_1_C_1-58 91 1r2x_1_C_1-58
77 DSSR warning 1r2x.json: no nucleotides found. Ignoring 1r2x_1_C_1-58. 92 DSSR warning 1r2x.json: no nucleotides found. Ignoring 1r2x_1_C_1-58.
78 93
...@@ -82,8 +97,26 @@ DSSR warning 1r2w.json: no nucleotides found. Ignoring 1r2w_1_C_1-58. ...@@ -82,8 +97,26 @@ DSSR warning 1r2w.json: no nucleotides found. Ignoring 1r2w_1_C_1-58.
82 1eg0_1_L_1-56 97 1eg0_1_L_1-56
83 DSSR warning 1eg0.json: no nucleotides found. Ignoring 1eg0_1_L_1-56. 98 DSSR warning 1eg0.json: no nucleotides found. Ignoring 1eg0_1_L_1-56.
84 99
85 -5zzm_1_N_1-2904 100 +3dg2_1_A_1-1542
86 -DSSR warning 5zzm.json: no nucleotides found. Ignoring 5zzm_1_N_1-2904. 101 +DSSR warning 3dg2.json: no nucleotides found. Ignoring 3dg2_1_A_1-1542.
102 +
103 +3dg0_1_A_1-1542
104 +DSSR warning 3dg0.json: no nucleotides found. Ignoring 3dg0_1_A_1-1542.
105 +
106 +4v48_1_BA_1-1543
107 +DSSR warning 4v48.json: no nucleotides found. Ignoring 4v48_1_BA_1-1543.
108 +
109 +4v47_1_BA_1-1542
110 +DSSR warning 4v47.json: no nucleotides found. Ignoring 4v47_1_BA_1-1542.
111 +
112 +3dg4_1_A_1-1542
113 +DSSR warning 3dg4.json: no nucleotides found. Ignoring 3dg4_1_A_1-1542.
114 +
115 +3dg5_1_A_1-1542
116 +DSSR warning 3dg5.json: no nucleotides found. Ignoring 3dg5_1_A_1-1542.
117 +
118 +5zzm_1_N_1-2903
119 +DSSR warning 5zzm.json: no nucleotides found. Ignoring 5zzm_1_N_1-2903.
87 120
88 2rdo_1_B_1-2904 121 2rdo_1_B_1-2904
89 DSSR warning 2rdo.json: no nucleotides found. Ignoring 2rdo_1_B_1-2904. 122 DSSR warning 2rdo.json: no nucleotides found. Ignoring 2rdo_1_B_1-2904.
...@@ -106,39 +139,12 @@ DSSR warning 3dg4.json: no nucleotides found. Ignoring 3dg4_1_B_1-2904. ...@@ -106,39 +139,12 @@ DSSR warning 3dg4.json: no nucleotides found. Ignoring 3dg4_1_B_1-2904.
106 3dg5_1_B_1-2904 139 3dg5_1_B_1-2904
107 DSSR warning 3dg5.json: no nucleotides found. Ignoring 3dg5_1_B_1-2904. 140 DSSR warning 3dg5.json: no nucleotides found. Ignoring 3dg5_1_B_1-2904.
108 141
109 -3dg2_1_A_1-1542
110 -DSSR warning 3dg2.json: no nucleotides found. Ignoring 3dg2_1_A_1-1542.
111 -
112 -3dg0_1_A_1-1542
113 -DSSR warning 3dg0.json: no nucleotides found. Ignoring 3dg0_1_A_1-1542.
114 -
115 -4v48_1_BA_1-1543
116 -DSSR warning 4v48.json: no nucleotides found. Ignoring 4v48_1_BA_1-1543.
117 -
118 -4v47_1_BA_1-1542
119 -DSSR warning 4v47.json: no nucleotides found. Ignoring 4v47_1_BA_1-1542.
120 -
121 -3dg4_1_A_1-1542
122 -DSSR warning 3dg4.json: no nucleotides found. Ignoring 3dg4_1_A_1-1542.
123 -
124 -3dg5_1_A_1-1542
125 -DSSR warning 3dg5.json: no nucleotides found. Ignoring 3dg5_1_A_1-1542.
126 -
127 1eg0_1_O_1-73 142 1eg0_1_O_1-73
128 DSSR warning 1eg0.json: no nucleotides found. Ignoring 1eg0_1_O_1-73. 143 DSSR warning 1eg0.json: no nucleotides found. Ignoring 1eg0_1_O_1-73.
129 144
130 1zc8_1_A_1-59 145 1zc8_1_A_1-59
131 DSSR warning 1zc8.json: no nucleotides found. Ignoring 1zc8_1_A_1-59. 146 DSSR warning 1zc8.json: no nucleotides found. Ignoring 1zc8_1_A_1-59.
132 147
133 -1mvr_1_D_1-61
134 -DSSR warning 1mvr.json: no nucleotides found. Ignoring 1mvr_1_D_1-61.
135 -
136 -4adx_1_9_1-123
137 -DSSR warning 4adx.json: no nucleotides found. Ignoring 4adx_1_9_1-123.
138 -
139 -1zn1_1_B_1-59
140 -DSSR warning 1zn1.json: no nucleotides found. Ignoring 1zn1_1_B_1-59.
141 -
142 1jgq_1_A_2-1520 148 1jgq_1_A_2-1520
143 Could not find nucleotides of chain A in annotation 1jgq.json. Either there is a problem with 1jgq mmCIF download, or the bases are not resolved in the structure. Delete it and retry. 149 Could not find nucleotides of chain A in annotation 1jgq.json. Either there is a problem with 1jgq mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
144 150
...@@ -151,6 +157,21 @@ Could not find nucleotides of chain A in annotation 1jgo.json. Either there is a ...@@ -151,6 +157,21 @@ Could not find nucleotides of chain A in annotation 1jgo.json. Either there is a
151 1jgp_1_A_2-1520 157 1jgp_1_A_2-1520
152 Could not find nucleotides of chain A in annotation 1jgp.json. Either there is a problem with 1jgp mmCIF download, or the bases are not resolved in the structure. Delete it and retry. 158 Could not find nucleotides of chain A in annotation 1jgp.json. Either there is a problem with 1jgp mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
153 159
160 +1mvr_1_D_1-59
161 +DSSR warning 1mvr.json: no nucleotides found. Ignoring 1mvr_1_D_1-59.
162 +
163 +4c9d_1_D_29-1
164 +Mapping is reversed, this case is not supported (yet).
165 +
166 +4c9d_1_C_29-1
167 +Mapping is reversed, this case is not supported (yet).
168 +
169 +4adx_1_9_1-121
170 +DSSR warning 4adx.json: no nucleotides found. Ignoring 4adx_1_9_1-121.
171 +
172 +1zn1_1_B_1-59
173 +DSSR warning 1zn1.json: no nucleotides found. Ignoring 1zn1_1_B_1-59.
174 +
154 1emi_1_B_1-108 175 1emi_1_B_1-108
155 DSSR warning 1emi.json: no nucleotides found. Ignoring 1emi_1_B_1-108. 176 DSSR warning 1emi.json: no nucleotides found. Ignoring 1emi_1_B_1-108.
156 177
...@@ -187,8 +208,8 @@ DSSR warning 3cw1.json: no nucleotides found. Ignoring 3cw1_1_v_1-138. ...@@ -187,8 +208,8 @@ DSSR warning 3cw1.json: no nucleotides found. Ignoring 3cw1_1_v_1-138.
187 2iy3_1_B_9-105 208 2iy3_1_B_9-105
188 DSSR warning 2iy3.json: no nucleotides found. Ignoring 2iy3_1_B_9-105. 209 DSSR warning 2iy3.json: no nucleotides found. Ignoring 2iy3_1_B_9-105.
189 210
190 -3jcr_1_N_1-107 211 +3jcr_1_N_1-106
191 -DSSR warning 3jcr.json: no nucleotides found. Ignoring 3jcr_1_N_1-107. 212 +DSSR warning 3jcr.json: no nucleotides found. Ignoring 3jcr_1_N_1-106.
192 213
193 2vaz_1_A_64-177 214 2vaz_1_A_64-177
194 DSSR warning 2vaz.json: no nucleotides found. Ignoring 2vaz_1_A_64-177. 215 DSSR warning 2vaz.json: no nucleotides found. Ignoring 2vaz_1_A_64-177.
...@@ -214,19 +235,22 @@ DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_BY_2-113. ...@@ -214,19 +235,22 @@ DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_BY_2-113.
214 4v5z_1_BZ_1-70 235 4v5z_1_BZ_1-70
215 DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_BZ_1-70. 236 DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_BZ_1-70.
216 237
217 -4v5z_1_B1_2-125 238 +4v5z_1_B1_2-123
218 -DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_B1_2-125. 239 +DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_B1_2-123.
219 240
220 -4adx_1_0_1-2925 241 +1mvr_1_B_1-96
221 -DSSR warning 4adx.json: no nucleotides found. Ignoring 4adx_1_0_1-2925. 242 +DSSR warning 1mvr.json: no nucleotides found. Ignoring 1mvr_1_B_1-96.
222 243
223 -1mvr_1_B_3-96 244 +4adx_1_0_1-2923
224 -DSSR warning 1mvr.json: no nucleotides found. Ignoring 1mvr_1_B_3-96. 245 +DSSR warning 4adx.json: no nucleotides found. Ignoring 4adx_1_0_1-2923.
225 246
226 3eq4_1_Y_1-69 247 3eq4_1_Y_1-69
227 DSSR warning 3eq4.json: no nucleotides found. Ignoring 3eq4_1_Y_1-69. 248 DSSR warning 3eq4.json: no nucleotides found. Ignoring 3eq4_1_Y_1-69.
228 249
229 -6uz7_1_8_2140-2827 250 +7a5p_1_2_259-449
251 +Could not find nucleotides of chain 2 in annotation 7a5p.json. Either there is a problem with 7a5p mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
252 +
253 +6uz7_1_8_2140-2825
230 Could not find nucleotides of chain 8 in annotation 6uz7.json. Either there is a problem with 6uz7 mmCIF download, or the bases are not resolved in the structure. Delete it and retry. 254 Could not find nucleotides of chain 8 in annotation 6uz7.json. Either there is a problem with 6uz7 mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
231 255
232 4v5z_1_AA_1-1563 256 4v5z_1_AA_1-1563
......
...@@ -4,7 +4,7 @@ cd /home/lbecquey/Projects/RNANet ...@@ -4,7 +4,7 @@ cd /home/lbecquey/Projects/RNANet
4 rm -rf latest_run.log errors.txt 4 rm -rf latest_run.log errors.txt
5 5
6 # Run RNANet 6 # Run RNANet
7 -bash -c 'time python3.8 ./RNAnet.py --3d-folder /home/lbecquey/Data/RNA/3D/ --seq-folder /home/lbecquey/Data/RNA/sequences/ -r 20.0 --extract -s --archive' > latest_run.log 2>&1 7 +bash -c 'time python3.8 ./RNAnet.py --3d-folder /home/lbecquey/Data/RNA/3D/ --seq-folder /home/lbecquey/Data/RNA/sequences/ --sina -r 20.0 --extract -s --archive' > latest_run.log 2>&1
8 echo 'Compressing RNANet.db.gz...' >> latest_run.log 8 echo 'Compressing RNANet.db.gz...' >> latest_run.log
9 touch results/RNANet.db # update last modification date 9 touch results/RNANet.db # update last modification date
10 gzip -k /home/lbecquey/Projects/RNANet/results/RNANet.db # compress it 10 gzip -k /home/lbecquey/Projects/RNANet/results/RNANet.db # compress it
......