- CPU: no requirements. The program is optimized for multi-core CPUs, you might want to use Intel Xeons, AMD Ryzens, etc.
- GPU: not required
- RAM: 16 GB with a large swap partition is okay. 32 GB is recommended (usage peaks at ~27 GB)
- RAM: 16 GB with a large swap partition is okay. 32 GB is recommended (usage peaks at ~27 GB, but this number depends on your number of CPU cores)
- Storage: to date, it takes 60 GB for the 3D data (36 GB if you don't use the --extract option), 11 GB for the sequence data, and 7GB for the outputs (5.6 GB database, 1 GB archive of CSV files). You need to add a few more for the dependencies. Pick a 100GB partition and you are good to go. The computation speed is way better if you use a fast storage device (e.g. SSD instead of hard drive, or even better, a NVMe SSD) because of constant I/O with the SQlite database.
- Network : We query the Rfam public MySQL server on port 4497. Make sure your network enables communication (there should not be any issue on private networks, but maybe you company/university closes ports by default). You will get an error message if the port is not open. Around 30 GB of data is downloaded.
# Method 1 : Installation using Docker
* Step 1 : Download the [Docker container](https://entrepot.ibisc.univ-evry.fr/d/1aff90a9ef214a19b848/files/?p=/rnanet_v1.3_docker.tar&dl=1). Open a terminal and move to the appropriate directory.
* Step 1 : Download the [Docker container](https://entrepot.ibisc.univ-evry.fr/d/1aff90a9ef214a19b848/files/?p=/rnanet_v1.5b_docker.tar&dl=1). Open a terminal and move to the appropriate directory.
* Step 2 : Extract the archive to a Docker image named *rnanet* in your local installation
```
$ docker load -i rnanet_v1.3_docker.tar
$ docker load -i rnanet_v1.5b_docker.tar
```
* Step 3 : Run the container, giving it 3 folders to mount as volumes: a first to store the 3D data, a second to store the sequence data and alignments, and a third to output the results, data and logs:
- DSSR, you need to register to the X3DNA forum [here](http://forum.x3dna.org/site-announcements/download-instructions/) and then download the DSSR binary [on that page](http://forum.x3dna.org/downloads/3dna-download/). Make sure to have the `x3dna-dssr` binary in your $PATH variable so that RNANet.py finds it.
- Infernal, to download at [Eddylab](http://eddylab.org/infernal/), several options are available depending on your preferences. Make sure to have the `cmalign`, `esl-alimanip`, `esl-alipid` and `esl-reformat` binaries in your $PATH variable, so that RNANet.py can find them.
- Infernal, to download at [Eddylab](http://eddylab.org/infernal/), several options are available depending on your preferences. Make sure to have the `cmalign`, `cmfetch`, `cmbuild`, `esl-alimanip`, `esl-alipid` and `esl-reformat` binaries in your $PATH variable, so that RNANet.py can find them.
- SINA, follow [these instructions](https://sina.readthedocs.io/en/latest/install.html) for example. Make sure to have the `sina` binary in your $PATH.
- Sqlite 3, available under the name *sqlite* in every distro's package manager,
- Python >= 3.8, (Unfortunately, python3.6 is no longer supported, because of changes in the multiprocessing and Threading packages. Untested with Python 3.7.\*)
...
...
@@ -112,13 +112,14 @@ The most useful options in that list are
* Computation of sequence identity matrices
* Statistics over the sequence lengths, nucleotide frequencies, and basepair types by RNA family
* Overall database content statistics
* Detailed analysis of the eta-theta pseudotorsion angles (use `--stats-opts "--wadley"` after `-s`) or 3D distance matrices and their averages per family (use `--stats-opts "--distance-matrices"`)
* Detailed analysis of the eta-theta pseudotorsion angles (use `--stats-opts="--wadley"` after `-s`) or 3D distance matrices and their averages per family (use `--stats-opts="--distance-matrices"`)
*` --redundant`, to yield all the available data and not only the BGSU NR-List respresentatives
# Computation time
To give you an estimation, our last full run took exactly 12h, excluding the time to download the MMCIF files containing RNA (around 25GB to download) and the time to compute statistics.
Measured the 23rd of June 2020 on a 16-core AMD Ryzen 7 3700X CPU @3.60GHz, plus 32 Go RAM, and a 7200rpm Hard drive. Total CPU time spent: 135 hours (user+kernel modes), corresponding to 12h (actual time spent with the 16-core CPU).
Another recent full run, including the MMCIF downloads and computation of heavy statistics (`--wadley --distance-matrices`) last 13h (real time) on a 60-core Xeon E7-4850v4@2.10GHz and 120 Go of RAM. The user+kernel time was about 300h.
Update runs are much quicker, around 3 hours. It depends mostly on what RNA families are concerned by the update.
...
...
@@ -135,9 +136,11 @@ By default, this computes:
* Statistics over the sequence lengths, nucleotide frequencies, and basepair types by RNA family
* Overall database content statistics
If you have run RNANet once with option `--extract`, additionally, you can compute more by passing the options:
* With option `--distance-matrices` to compute pairwise residue distances within the chain for every chain, and compute average and standard deviations by RNA families. This is supposed to capture the average shape of an RNA family. The distance matrices are the size of the family's covariance model (match states). Unresolved nucleotides or deletions to the covariance model are NaNs.
If you have run RNANet once with options `--no-homology` and `--extract`, you unlock new statistics over unmapped chains.
* You will be allowed to use option `--wadley` to reproduce Wadley & al. (2007) results automatically. These are clustering results of the pseudotorsions angles of the backbone.
* (experimental) You will be allowed to use option `--distance-matrices` to compute pairwise residue distances within the chain for every chain, and compute average and standard deviations by RNA families. This is supposed to capture the average shape of an RNA family.
Could not find nucleotides of chain AA in annotation 6ydp.json. Either there is a problem with 6ydp mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
6ydw_1_AA_1176-2737
Could not find nucleotides of chain AA in annotation 6ydw.json. Either there is a problem with 6ydw mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
2z9q_1_A_1-72
DSSR warning 2z9q.json: no nucleotides found. Ignoring 2z9q_1_A_1-72.
DSSR warning 1gsg.json: no nucleotides found. Ignoring 1gsg_1_T_1-72.
7d1a_1_A_805-902
Could not find nucleotides of chain A in annotation 7d1a.json. Either there is a problem with 7d1a mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
7d0g_1_A_805-913
Could not find nucleotides of chain A in annotation 7d0g.json. Either there is a problem with 7d0g mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
7d0f_1_A_817-913
Could not find nucleotides of chain A in annotation 7d0f.json. Either there is a problem with 7d0f mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
3jcr_1_H_1-115
DSSR warning 3jcr.json: no nucleotides found. Ignoring 3jcr_1_H_1-115.
DSSR warning 2ob7.json: no nucleotides found. Ignoring 2ob7_1_A_10-319.
1x1l_1_A_1-132
DSSR warning 1x1l.json: no nucleotides found. Ignoring 1x1l_1_A_1-132.
1zc8_1_Z_1-93
DSSR warning 1zc8.json: no nucleotides found. Ignoring 1zc8_1_Z_1-93.
1x1l_1_A_1-130
DSSR warning 1x1l.json: no nucleotides found. Ignoring 1x1l_1_A_1-130.
2ob7_1_D_1-132
DSSR warning 2ob7.json: no nucleotides found. Ignoring 2ob7_1_D_1-132.
1zc8_1_Z_1-91
DSSR warning 1zc8.json: no nucleotides found. Ignoring 1zc8_1_Z_1-91.
4v42_1_BB_5-121
Could not find nucleotides of chain BB in annotation 4v42.json. Either there is a problem with 4v42 mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
2ob7_1_D_1-130
DSSR warning 2ob7.json: no nucleotides found. Ignoring 2ob7_1_D_1-130.
4v42_1_BA_1-2914
Could not find nucleotides of chain BA in annotation 4v42.json. Either there is a problem with 4v42 mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
4v42_1_BB_5-121
Could not find nucleotides of chain BB in annotation 4v42.json. Either there is a problem with 4v42 mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
1r2x_1_C_1-58
DSSR warning 1r2x.json: no nucleotides found. Ignoring 1r2x_1_C_1-58.
DSSR warning 3dg5.json: no nucleotides found. Ignoring 3dg5_1_B_1-2904.
3dg2_1_A_1-1542
DSSR warning 3dg2.json: no nucleotides found. Ignoring 3dg2_1_A_1-1542.
3dg0_1_A_1-1542
DSSR warning 3dg0.json: no nucleotides found. Ignoring 3dg0_1_A_1-1542.
4v48_1_BA_1-1543
DSSR warning 4v48.json: no nucleotides found. Ignoring 4v48_1_BA_1-1543.
4v47_1_BA_1-1542
DSSR warning 4v47.json: no nucleotides found. Ignoring 4v47_1_BA_1-1542.
3dg4_1_A_1-1542
DSSR warning 3dg4.json: no nucleotides found. Ignoring 3dg4_1_A_1-1542.
3dg5_1_A_1-1542
DSSR warning 3dg5.json: no nucleotides found. Ignoring 3dg5_1_A_1-1542.
1eg0_1_O_1-73
DSSR warning 1eg0.json: no nucleotides found. Ignoring 1eg0_1_O_1-73.
1zc8_1_A_1-59
DSSR warning 1zc8.json: no nucleotides found. Ignoring 1zc8_1_A_1-59.
1mvr_1_D_1-61
DSSR warning 1mvr.json: no nucleotides found. Ignoring 1mvr_1_D_1-61.
4adx_1_9_1-123
DSSR warning 4adx.json: no nucleotides found. Ignoring 4adx_1_9_1-123.
1zn1_1_B_1-59
DSSR warning 1zn1.json: no nucleotides found. Ignoring 1zn1_1_B_1-59.
1jgq_1_A_2-1520
Could not find nucleotides of chain A in annotation 1jgq.json. Either there is a problem with 1jgq mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
...
...
@@ -151,6 +157,21 @@ Could not find nucleotides of chain A in annotation 1jgo.json. Either there is a
1jgp_1_A_2-1520
Could not find nucleotides of chain A in annotation 1jgp.json. Either there is a problem with 1jgp mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
1mvr_1_D_1-59
DSSR warning 1mvr.json: no nucleotides found. Ignoring 1mvr_1_D_1-59.
4c9d_1_D_29-1
Mapping is reversed, this case is not supported (yet).
4c9d_1_C_29-1
Mapping is reversed, this case is not supported (yet).
4adx_1_9_1-121
DSSR warning 4adx.json: no nucleotides found. Ignoring 4adx_1_9_1-121.
1zn1_1_B_1-59
DSSR warning 1zn1.json: no nucleotides found. Ignoring 1zn1_1_B_1-59.
1emi_1_B_1-108
DSSR warning 1emi.json: no nucleotides found. Ignoring 1emi_1_B_1-108.
DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_BZ_1-70.
4v5z_1_B1_2-125
DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_B1_2-125.
4v5z_1_B1_2-123
DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_B1_2-123.
4adx_1_0_1-2925
DSSR warning 4adx.json: no nucleotides found. Ignoring 4adx_1_0_1-2925.
1mvr_1_B_1-96
DSSR warning 1mvr.json: no nucleotides found. Ignoring 1mvr_1_B_1-96.
1mvr_1_B_3-96
DSSR warning 1mvr.json: no nucleotides found. Ignoring 1mvr_1_B_3-96.
4adx_1_0_1-2923
DSSR warning 4adx.json: no nucleotides found. Ignoring 4adx_1_0_1-2923.
3eq4_1_Y_1-69
DSSR warning 3eq4.json: no nucleotides found. Ignoring 3eq4_1_Y_1-69.
6uz7_1_8_2140-2827
7a5p_1_2_259-449
Could not find nucleotides of chain 2 in annotation 7a5p.json. Either there is a problem with 7a5p mmCIF download, or the bases are not resolved in the structure. Delete it and retry.
6uz7_1_8_2140-2825
Could not find nucleotides of chain 8 in annotation 6uz7.json. Either there is a problem with 6uz7 mmCIF download, or the bases are not resolved in the structure. Delete it and retry.