Louis BECQUEY

KnownIssues update

1 1
2 -* [Required computational resources](#required-computational-resources) 2 +* [Required hardware resources](#required-computational-resources)
3 * [Method 1 : Using Docker](#method-1-:-installation-using-docker) 3 * [Method 1 : Using Docker](#method-1-:-installation-using-docker)
4 * [Method 2 : Classical command-line installation](#method-2-:-classical-command-line-installation-linux-only) 4 * [Method 2 : Classical command-line installation](#method-2-:-classical-command-line-installation-linux-only)
5 * [Command options](#command-options) 5 * [Command options](#command-options)
...@@ -16,7 +16,7 @@ ...@@ -16,7 +16,7 @@
16 - or customize options --cmalign-opts and --cmalign-rrna-opts with cmalign arguments --cpu (number of cores to use) and --mxsize (max memory to allocate per core), so that it fits your machine. In very hard cases, also increase the parameter --maxtau from 0.05 to 0.1, but this reduces the quality of the alignments. 16 - or customize options --cmalign-opts and --cmalign-rrna-opts with cmalign arguments --cpu (number of cores to use) and --mxsize (max memory to allocate per core), so that it fits your machine. In very hard cases, also increase the parameter --maxtau from 0.05 to 0.1, but this reduces the quality of the alignments.
17 - In regular "update" mode, when the alignments already exists, less RAM is required, 64GB should be fine. If not, use the same options than the first time for your update runs. 17 - In regular "update" mode, when the alignments already exists, less RAM is required, 64GB should be fine. If not, use the same options than the first time for your update runs.
18 - In 'no homology' mode, just for annotation of the structures without mapping to families, each core can peak to ~3GB (but not all at the same time if you are lucky). Use option --maxcores to reduce the number of cores if you do not have enough RAM. 32GB is fine in most cases. 18 - In 'no homology' mode, just for annotation of the structures without mapping to families, each core can peak to ~3GB (but not all at the same time if you are lucky). Use option --maxcores to reduce the number of cores if you do not have enough RAM. 32GB is fine in most cases.
19 -- **Storage**: to date, it takes 60 GB for the 3D data (36 GB if you don't use the --extract option), 11 GB for the sequence data, and 7GB for the outputs (5.6 GB database, 1 GB archive of CSV files). You need to add a few more for the dependencies. If you compute geometry statistics and parameter distributions, you need to count a 80GB more (permanent) and 100GB more (that will be deleted at the end of the run). So, pick a 500GB partition and you are good to go. The computation speed is much higher if you use a fast storage device (e.g. SSD instead of hard drive, or even better, a NVMe M.2) because of constant I/O with the SQlite database. 19 +- **Storage**: to date, it takes 60 GB for the 3D data (36 GB if you don't use the --extract option), 11 GB for the sequence data, and 7GB for the outputs (5.6 GB database, 1 GB archive of CSV files). You need to add a few more for the dependencies. If you compute geometry statistics and parameter distributions, you need to count a 80GB more (permanent) and 100GB more (that will be deleted at the end of the run). So, pick a 250GB partition and you are good to go. The computation speed is much higher if you use a fast storage device (e.g. SSD instead of hard drive, or even better, a NVMe M.2) because of constant I/O with the SQlite database.
20 - **Network** : We query the Rfam public MySQL server on port 4497. Make sure your network enables communication (there should not be any issue on private networks, but your university may close ports by default). You will get an error message if the port is not open. Around 30 GB of data is downloaded. 20 - **Network** : We query the Rfam public MySQL server on port 4497. Make sure your network enables communication (there should not be any issue on private networks, but your university may close ports by default). You will get an error message if the port is not open. Around 30 GB of data is downloaded.
21 21
22 The IBISC-EvryRNA server example : 22 The IBISC-EvryRNA server example :
...@@ -29,7 +29,7 @@ The IBISC-EvryRNA server example : ...@@ -29,7 +29,7 @@ The IBISC-EvryRNA server example :
29 * Step 1 : Download the [Docker container](https://entrepot.ibisc.univ-evry.fr/d/1aff90a9ef214a19b848/files/?p=/rnanet_v1.6b_docker.tar&dl=1). Open a terminal and move to the appropriate directory. 29 * Step 1 : Download the [Docker container](https://entrepot.ibisc.univ-evry.fr/d/1aff90a9ef214a19b848/files/?p=/rnanet_v1.6b_docker.tar&dl=1). Open a terminal and move to the appropriate directory.
30 * Step 2 : Extract the archive to a Docker image named *rnanet* in your local installation 30 * Step 2 : Extract the archive to a Docker image named *rnanet* in your local installation
31 ``` 31 ```
32 -$ docker load -i rnanet_v1.5b_docker.tar 32 +$ docker load -i rnanet_v1.6b_docker.tar
33 ``` 33 ```
34 * Step 3 : Run the container, giving it 3 folders to mount as volumes: a first to store the 3D data, a second to store the sequence data and alignments, and a third to output the results, data and logs: 34 * Step 3 : Run the container, giving it 3 folders to mount as volumes: a first to store the 3D data, a second to store the sequence data and alignments, and a third to output the results, data and logs:
35 ``` 35 ```
......
1 # Known Issues 1 # Known Issues
2 2
3 ## Annotation and numbering issues 3 ## Annotation and numbering issues
4 -* Some GDPs that are listed as HETATMs in the mmCIF files are not detected correctly to be real nucleotides. (e.g. 1e8o-E) 4 +* [SOLVED] Some GDPs that are listed as HETATMs in the mmCIF files are not detected correctly to be real nucleotides. (e.g. 1e8o-E)
5 * Some chains are truncated in different pieces with different chain names. Reason unknown (e.g. 6ztp-AX) 5 * Some chains are truncated in different pieces with different chain names. Reason unknown (e.g. 6ztp-AX)
6 -* Some chains are not correctly renamed A in the produced separate files (e.g. 1d4r-B) 6 +* [SOLVED] Some chains are not correctly renamed A in the produced separate files (e.g. 1d4r-B)
7 7
8 ## Alignment issues 8 ## Alignment issues
9 -* Chain names appear in triple in the FASTA header (e.g. 1d4r[1]-B 1d4r[1]-B 1d4r[1]-B) 9 +* [SOLVED] Chain names appear in triple in the FASTA header (e.g. 1d4r[1]-B 1d4r[1]-B 1d4r[1]-B)
10 10
11 # Known feature requests 11 # Known feature requests
12 * Automated annotation of detected Recurrent Interaction Networks (RINs), see http://carnaval.lri.fr/ . 12 * Automated annotation of detected Recurrent Interaction Networks (RINs), see http://carnaval.lri.fr/ .
...@@ -17,7 +17,7 @@ ...@@ -17,7 +17,7 @@
17 * Possibly, more metrics about the alignments coming from Infernal. 17 * Possibly, more metrics about the alignments coming from Infernal.
18 * Run cmscan ourselves from the NDB instead of using Rfam-PDB mappings ? (Iff this actually makes a real difference, untested yet) 18 * Run cmscan ourselves from the NDB instead of using Rfam-PDB mappings ? (Iff this actually makes a real difference, untested yet)
19 * Use and save Infernal alignment bounds and truncation information 19 * Use and save Infernal alignment bounds and truncation information
20 -* Save if a chain is a representative in BGSU list 20 +* Save if a chain is a representative or not in BGSU list, so that they can be filtered easily
21 * Annotate unstructured regions (on a nucleotide basis) 21 * Annotate unstructured regions (on a nucleotide basis)
22 22
23 ## Technical to-do list 23 ## Technical to-do list
......