New documentation

Louis BECQUEY
Commit b5935bd6dc75c92194d9624e446a758a7d9867d4 b5935bd6 1 parent 4de494b7
Showing 12 changed files with 466 additions and 1898 deletions
.dockerignore
CHANGELOG
Database.md
Dockerfile
Errors.md
FAQ.md
INSTALL.md
KnownIssues.md
README.md
RNAnet.py
known_issues.txt
known_issues_reasons.txt
--- a/.dockerignore
View file @b5935bd
+++ b/.dockerignore
View file @b5935bd
@@ -9,12 +9,18 @@ esl*
 .vscode/
 __pycache__/
 .git/
+ .gitignore
+ .dockerignore
 errors.txt
 known_issues.txt
 known_issues_reasons.txt
 Dockerfile
 LICENSE
- README.md
+ CHANGELOG
+ *.md
 scripts/automate.sh
 scripts/kill_rnanet.sh
 scripts/build_docker_image.sh
+ scripts/*.tar
+ scripts/measure.py
+ scripts/recompute_some_chains.py
--- a/CHANGELOG
View file @b5935bd
+++ b/CHANGELOG
View file @b5935bd
@@ -27,9 +27,3 @@ BUG CORRECTIONS
     - Modified nucleotides were not always correctly transformed to N in the alignments (and nucleotide.nt_align_code fields).
     Now, the alignments and nt_align_code (and consensus) only contain "ACGUN-" chars. 
     Now, 'N' means 'other', while '-' means 'nothing' or 'unknown'.
- 
- COMING SOON
-     - Automated annotation of detected Recurrent Interaction Networks (RINs), see http://carnaval.lri.fr/ .
-     - Possibly, automated detection of HLs and ILs from the 3D Motif Atlas (BGSU). Maybe. Their own website already does the job.
-     - A field estimating the quality of the sequence alignment in table family.
-     - Possibly, more metrics about the alignments coming from Infernal.
\ No newline at end of file
--- a/Database.md 0 → 100644
View file @b5935bd
+++ b/Database.md 0 → 100644
View file @b5935bd
+ 
+ # More about the database structure
+ To help you design your own SQL requests, we provide a description of the database tables and fields.
+ 
+ ## Table `family`, for Rfam families and their properties
+ * `rfam_acc`: The family codename, from Rfam's numbering (Rfam accession number)
+ * `description`: What RNAs fit in this family
+ * `nb_homologs`: The number of hits known to be homologous downloaded from Rfam to compute nucleotide frequencies
+ * `nb_3d_chains`: The number of 3D RNA chains mapped to the family (from Rfam-PDB mappings, or inferred using the redundancy list)
+ * `nb_total_homol`: Sum of the two previous fields, the number of sequences in the multiple sequence alignment, used to compute nucleotide frequencies
+ * `max_len`: The longest RNA sequence among the homologs (in bases, unaligned)
+ * `ali_len`: The aligned sequences length (in bases, aligned)
+ * `ali_filtered_len`: The aligned sequences length when we filter the alignment to keep only the RNANet chains (which have a 3D structure) and some gap-only columns.
+ * `comput_time`: Time required to compute the family's multiple sequence alignment in seconds,
+ * `comput_peak_mem`: RAM (or swap) required to compute the family's multiple sequence alignment in megabytes,
+ * `idty_percent`: Average identity percentage over pairs of the 3D chains' sequences from the family
+ 
+ ## Table `structure`, for 3D structures of the PDB
+ * `pdb_id`: The 4-char PDB identifier
+ * `pdb_model`: The model used in the PDB file
+ * `date`: The first submission date of the 3D structure to a public database
+ * `exp_method`: A string to know wether the structure as been obtained by X-ray crystallography ('X-RAY DIFFRACTION'), electron microscopy ('ELECTRON MICROSCOPY'), or NMR (not seen yet)
+ * `resolution`: Resolution of the structure, in Angströms
+ 
+ ## Table `chain`, for the datapoints: one chain mapped to one Rfam family
+ * `chain_id`: A unique identifier
+ * `structure_id`: The `pdb_id` where the chain comes from
+ * `chain_name`: The chain label, extracted from the 3D file
+ * `eq_class`: The BGSU equivalence class label containing this chain
+ * `rfam_acc`: The family which the chain is mapped to (if not mapped, value is *unmappd*)
+ * `pdb_start`: Position in the chain where the mapping to Rfam begins (absolute position, not residue number)
+ * `pdb_end`: Position in the chain where the mapping to Rfam ends (absolute position, not residue number)
+ * `reversed`: Wether the mapping numbering order differs from the residue numbering order in the mmCIF file (eg 4c9d, chains C and D)
+ * `issue`: Wether an issue occurred with this structure while downloading, extracting, annotating or parsing the annotation. See the file known_issues_reasons.txt for more information about why your chain is marked as an issue.
+ * `inferred`: Wether the mapping has been inferred using the redundancy list (value is 1) or just known from Rfam-PDB mappings (value is 0)
+ * `chain_freq_A`, `chain_freq_C`, `chain_freq_G`, `chain_freq_U`, `chain_freq_other`: Nucleotide frequencies in the chain
+ * `pair_count_cWW`, `pair_count_cWH`, ... `pair_count_tSS`: Counts of the non-canonical base-pair types in the chain (intra-chain counts only)
+ 
+ ## Table `nucleotide`, for individual nucleotide descriptors
+ * `nt_id`: A unique identifier
+ * `chain_id`: The chain the nucleotide belongs to
+ * `index_chain`: its absolute position within the portion of chain mapped to Rfam, from 1 to X. This is completely uncorrelated to any gene start or 3D chain residue numbers.
+ * `nt_position`: relative position within the portion of chain mapped to RFam, from 0 to 1
+ * `old_nt_resnum`: The residue number in the 3D mmCIF file (it's a string actually, some contain a letter like '37A')
+ * `nt_name`: The residue type. This includes modified nucleotide names (e.g. 5MC for 5-methylcytosine)
+ * `nt_code`: One-letter name. Lowercase "acgu" letters are used for modified "ACGU" bases.
+ * `nt_align_code`: One-letter name used for sequence alignment. Contains "ACGUN-" only first, and then, gaps may be replaced by the most common letter at this position (default)
+ * `is_A`, `is_C`, `is_G`, `is_U`, `is_other`: One-hot encoding of the nucleotide base
+ * `dbn`: character used at this position if we look at the dot-bracket encoding of the secondary structure. Includes inter-chain (RNA complexes) contacts.
+ * `paired`: empty, or comma separated list of `index_chain` values referring to nucleotides the base is interacting with. Up to 3 values. Inter-chain interactions are marked paired to '0'.
+ * `nb_interact`: number of interactions with other nucleotides. Up to 3 values. Includes inter-chain interactions.
+ * `pair_type_LW`: The Leontis-Westhof nomenclature codes of the interactions. The first letter concerns cis/trans orientation, the second this base's side interacting, and the third the other base's side.
+ * `pair_type_DSSR`: Same but using the DSSR nomenclature (Hoogsteen edge approximately corresponds to Major-groove and Sugar edge to minor-groove)
+ * `alpha`, `beta`, `gamma`, `delta`, `epsilon`, `zeta`: The 6 torsion angles of the RNA backabone for this nucleotide
+ * `epsilon_zeta`: Difference between epsilon and zeta angles
+ * `bb_type`: conformation of the backbone (BI, BII or ..)
+ * `chi`: torsion angle between the sugar and base (O-C1'-N-C4)
+ * `glyco_bond`: syn or anti configuration of the sugar-base bond
+ * `v0`, `v1`, `v2`, `v3`, `v4`: 5 torsion angles of the ribose cycle
+ * `form`: if the nucleotide is involved in a stem, the stem type (A, B or Z)
+ * `ssZp`: Z-coordinate of the 3’ phosphorus atom with reference to the5’ base plane
+ * `Dp`: Perpendicular distance of the 3’ P atom to the glycosidic bond
+ * `eta`, `theta`: Pseudotorsions of the backbone, using phosphorus and carbon 4'
+ * `eta_prime`, `theta_prime`: Pseudotorsions of the backbone, using phosphorus and carbon 1'
+ * `eta_base`, `theta_base`: Pseudotorsions of the backbone, using phosphorus and the base center
+ * `phase_angle`: Conformation of the ribose cycle
+ * `amplitude`: Amplitude of the sugar puckering
+ * `puckering`: Conformation of the ribose cycle (10 classes depending on the phase_angle value)
+ 
+ ## Table `align_column`, for positions in multiple sequence alignments
+ * `column_id`: A unique identifier
+ * `rfam_acc`: The family's MSA the column belongs to
+ * `index_ali`: Position of the column in the alignment (starts at 1)
+ * `freq_A`, `freq_C`, `freq_G`, `freq_U`, `freq_other`: Nucleotide frequencies in the alignment at this position
+ * `gap_percent`: The frequencies of gaps at this position in the alignment (between 0.0 and 1.0)
+ * `consensus`: A consensus character (ACGUN or '-') summarizing the column, if we can. If >75% of the sequences are gaps at this position, the gap is picked as consensus. Otherwise, A/C/G/U is chosen if >50% of the non-gap positions are A/C/G/U. Otherwise, N is the consensus.
+ 
+ There always is an entry, for each family (rfam_acc), with index_ali = 0; gap_percent = 1.0; and nucleotide frequencies set to 0.0. This entry is used when the nucleotide frequencies cannot be determined because of local alignment issues.
+ 
+ ## Table `re_mapping`, to map a nucleotide to an alignment column
+ * `remapping_id`: A unique identifier
+ * `chain_id`: The chain which is mapped to an alignment
+ * `index_chain`: The absolute position of the nucleotide in the chain (from 1 to X)
+ * `index_ali` The position of that nucleotide in its family alignment
--- a/Dockerfile
View file @b5935bd
+++ b/Dockerfile
View file @b5935bd
@@ -40,6 +40,13 @@ RUN apk update && apk add --no-cache \
         musl-dev \
         py3-pip py3-wheel \
         freetype-dev zlib-dev
+ RUN addgroup -S appgroup -g 1000 && \
+     adduser -S appuser -u 1000 -G appgroup && \
+     chown -R appuser:appgroup /3D && \
+     chown -R appuser:appgroup /sequences && \
+     mkdir /runDir && \
+     chown -R appuser:appgroup /runDir
+ USER appuser
 VOLUME ["/3D", "/sequences", "/runDir"]
 WORKDIR /runDir
 ENTRYPOINT ["/RNANet/RNAnet.py", "--3d-folder", "/3D", "--seq-folder", "/sequences" ]
\ No newline at end of file
--- a/Errors.md 0 → 100644
View file @b5935bd
+++ b/Errors.md 0 → 100644
View file @b5935bd
+ 
+ # Warnings and errors in RNANet
+ 
+ Use Ctrl + F on this page to look for your error message in the list.
+ 
+ * **Could not load X.json with JSON package** : 
+ The JSON format produced as DSSR output could not be loaded by Python. Try deleting the file and re-running DSSR (through RNANet).
+ 
+ * **Found DSSR warning in annotation X.json: no nucleotides found. Ignoring X.** : 
+ DSSR complains because the CIF structure does not seem to contain nucleotides. This can happen on low resolution structures where only P atoms are solved, you should ignore them. This also can happen if the .cif file is corrupted (failed download, etc). Check with a 3D visualization software if your chain contains well-defined nucleotides. Try deleting the .cif and retry. If the problem persists, just ignore the chain.
+ 
+ * **Could not find nucleotides of chain X in annotation X.json. Ignoring chain X.** : Basically the same as above, but some nucleotides have been observed in another chain of the same structure. 
+ 
+ * **Could not find real nucleotides of chain X between START and STOP. Ignoring chain X."** : Same as the two above, but nucleotides can be found outside of the mapping interval. This can happen if there is a mapping problem, e.g., considered absolute interval when it should not.
+ 
+ * **Error while parsing DSSR X.json output: {custom-error}** : The DSSR annotations lack some of our required fields. It is likely that DSSR changed something in their fields names. Contact us so that we fix the problem with the latest DSSR version.
+ 
+ * **Mapping is reversed, this case is not supported (yet). Ignoring chain X.** : The mapping coordinates, as obtained from Rfam, have an end position coming before the start position (meaning, the sequence has to be reversed to map the RNA covariance model). We do not support this yet, we ignore this chain.
+ 
+ * **Error with parsing of X duplicate residue numbers. Ignoring it.** : This 3D chain contains new kind(s) of issue(s) in the residue numberings that are not part of the issues we already know how to tackle. Contact us, so that we add support for this entry.
+ 
+ * **Found duplicated index_chain N in X. Keeping only the first.** : This RNA 3D chain contains two (or more) residues with the same numbering N. This often happens when a nucleic-like ligand is annotated as part of the RNA chain, and DSSR considers it a nucleotide. By default, RNANet keeps only the first of the multiple residues with the same number. You may want to check that the produced 3D structure contains the appropriate nucleotide and no ligand.
+ 
+ * **Missing index_chain N in X !** : DSSR annotations for chain X are discontinuous, position N is missing. This means residue N has not been recognized as a nucleotide by DSSR. Is the .cif structure file corrupted ? Delete it and retry.
+ 
+ * **X sequence is too short, let's ignore it.** : We discard very short RNA chains.
+ 
+ * **Error downloading and/or extracting Rfam.cm !** : We cannot retrieve the Rfam covariance models file. RNANet tries to find it at ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.cm.gz so, check that your network is not blocking the FTP protocol (port 21 is open on your network), and check that the adress has not changed. If so, contact us so that we update RNANet with the correct address.
+ 
+ * **Something's wrong with the SQL database. Check mysql-rfam-public.ebi.ac.uk status and try again later. Not printing statistics.** : We cannot retrieve family statistics from Rfam public server. Check if you can connect to it by hand : `mysql -u rfamro -P 4497 -D Rfam -h mysql-rfam-public.ebi.ac.uk`. if not, check that the port 497 is opened on your network.
+ 
+ * **Error downloading RFXXXXX.fa.gz: {custom-error}** : We cannot reach the Rfam FTP server to download homologous sequences. We look in ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/ so, check if you can access it from your network (check that port 21 is opened on your network). Check if the address has changed and notify us.
+ 
+ * **Error downloading NR list !** : We cannot download BGSU's equivalence classes from their website. Check if you can access http://rna.bgsu.edu/rna3dhub/nrlist/download/current/20.0A/csv from a web browser. It actually happens that their website is not responding, the previous download will be re-used.
+ 
+ * **Error downloading the LSU/SSU database from SILVA** : We cannot reach SILVA's arb files. We are looking for http://www.arb-silva.de/fileadmin/arb_web_db/release_132/ARB_files/SILVA_132_LSURef_07_12_17_opt.arb.gz and http://www.arb-silva.de/fileadmin/silva_databases/release_138/ARB_files/SILVA_138_SSURef_05_01_20_opt.arb.gz , can you download and extract them from your web browser and place them in the realigned/ subfolder ?
+ 
+ * **Assuming mapping to RFXXXXX is an absolute position interval.** : The mapping provided by Rfam concerns a nucleotide interval START-END, but no nucleotides are defined in 3D in that interval. When this happens, we assume that the numbering is not relative to the residue numbers in the 3D file, but to the absolute position in the chain, starting at 1. And yes, we tried to apply this behavior to all mappings, this yields the opposite issue where some mappings get outside the available nucleotides. To be solved the day Rfam explains how they get build the mappings.
+ 
+ * **Added newly discovered issues to known issues** : You discovered new chains that cannot be perfectly understood as they actually are, congrats. For each chain of the list, another warning has been raised, refer to them. 
+ 
+ * **Structures without referenced chains have been detected.** : Something went wrong, because the database contains references to 3D structures that are not used by any entry in the `chain` table. You should rerun RNANet. The option `--only` may help to rerun it just for one chain.
+ 
+ * **Chains without referenced structures have been detected** : 
+ Something went wrong, because the database contains references to 3D chains that are not used by any entry in the `structure` table. You should rerun RNANet. The option `--only` may help to rerun it just for one chain.
+ 
+ * **Chains were not remapped** : Something went wrong, because the database contains references to 3D chains that are not used by any entry in the `re_mapping` table, assuming you were interested in homology data. You should rerun RNANet. The option `--only` may help to rerun it just for one chain. If you are not interested in homology data, use option `--no-homology` to skip alignment and remapping steps.
+ 
+ * **Operational Error: database is locked, retrying in 0.2s** : Too many workers are trying to access the database at the same time. Do not try to run several instances of RNANet in parallel. Even with only one instance, this might still happen if your device has slow I/O delays. Try to run RNANet from a SSD ?
+ 
+ * **Tried to reach database 100 times and failed. Aborting.** : Same as above, but in a more serious way.
+ 
+ * **Nothing to do !** : RNANet is up-to-date, or did not detect any modification to do, so nothing changed in your database.
+ 
+ * **KeyboardInterrupt, terminating workers.** : You interrupted the computation by pressing Ctrl+C. The database may be in an unstable state, rerun RNANet to solve the problem.
+ 
+ * **Found mappings to RFXXXXX in both directions on the same interval, keeping only the 5'->3' one.**  : A chain has been mapped to family RFXXXXX, but the mapping has been found twice, with the limits inverted. We only keep one (in 5'->3' sense).
+ 
+ * **There are mappings for RFXXXXX in both directions** : A chain has been mapped to family RFXXXXX several times, and the mappings are not in the same sequence sense (some are reverted, with END < START). Then, we do not know what to decide for this chain, and we abort. 
+ 
+ * **Unable to download XXXX.cif. Ignoring it.** :  We cannot access a certain 3D structure from RCSB's download site, can you access it from your web browser and put it in the RNAcifs/ folder ? We look at http://files.rcsb.org/download/XXXX.cif , replacing XXXX by the right PDB code.
+ 
+ * **Wtf, structure XXXX has no resolution ? Check https://files.rcsb.org/header/XXXX.cif to figure it out.** : We cannot find the resolution of structure XXXX from the .cif file. We are looking for it in the fields `_refine.ls_d_res_high`, `_refine.ls_d_res_low`, and `_em_3d_reconstruction.resolution`. Maybe the information is stored in another field ? If you find it, contact us so that we support this new CIF field.
+ 
+ * **Could not find annotations for X, ignoring it.** : It seems that DSSR has not been run for structure X, or failed. Rerun RNANet.
+ 
+ * **Nucleotides not inserted: {custom-error}** : For some reason, no nucleotides were saved to the database for this chain. Contact us.
+ 
+ * **Removing N doublons from existing RFXXXXX++.fa and using their newest version** : You are trying to re-compute sequence alignments of 3D structures that had already been computed in the past. They will be removed from the alignment and recomputed, for the case the sequences have changed.
+ 
+ * **Removing N doublons from existing RFXXXXX++.stk and using their newest version** :  Same as above.
+ 
+ * **Error during sequence alignment: {custom-error}** : Something went wrong during sequence alignment. Recompute the alignments using the `--update-homologous` option.
+ 
+ * **Failed to realign RFXXXXX (killed)** : You ran out of memory while computing multiple sequence alignments. Try to run RNANet of a machine with at least 32 GB of RAM.
+ 
+ * **RFXXXXX's alignment is wrong. Recompute it and retry.** : We could not load RFXXXXX's multiple sequence alignment. It may have failed to compute, or be corrupted. Recompute the alignments using the `--update-homologous` option.
\ No newline at end of file
--- a/FAQ.md 0 → 100644
View file @b5935bd
+++ b/FAQ.md 0 → 100644
View file @b5935bd
+ 
+ # FAQ
+ 
+ * **What is the difference between . and - in alignments ?**
+ 
+ In `cmalign` alignments, - means a nucleotide is missing compared to the covariance model. It represents a deletion. The dot '.' indicates that another chain has an insertion compared to the covariance model. The current chains does not lack anything, it's another which has more.
+ 
+ In the final filtered alignment that we provide for download, the same rule applies, but on top of that, some '.' are replaced by '-' when a gap in the 3D structure (a missing, unresolved nucleotide) is mapped to an insertion gap.
+ 
+ * **Why are there some gap-only columns in the alignment ?**
+ 
+ These columns are not completely gap-only, they contain at least one dash-gap '-'. This means an actual, physical nucleotide which should exist in the 3D structure should be located there. The previous and following nucleotides are **not** contiguous in space in 3D.
+ 
+ * **Why is the numbering of residues in my 3D chain weird ?**
+ 
+ Probably because the numbering in the original chain already was a mess, and the RNANet re-numbering process failed to understand it correctly. If you ran RNANet yourself, check the `logs/` folder and find your chain's log. It will explain you how it was re-numbered.
+ 
+ * **What is your standardized way to re-number residues ?**
+ 
+ We first remove the nucleotides whose number is outside the family mapping (if any). Then, we renumber the following way:
+ 
+     0) For truncated chains, we shift the numbering of every nucleotide so that the first nucleotide is 1.
+     1) We identify duplicate residue numbers and increase by 1 the numbering of all nucleotides starting at the duplicate, recursively, and until we find a gap in the numbering suite. If no gap is found, residue numbers are shifted until the end of the chain.
+     2) We proceed the similar way for nucleotides with letter numbering (e.g. 17, 17A and 17B will be renumbered to 17, 18 and 19, and the following nucleotides in the chain are also shifted).
+     3) Nucleotides with partial numbering and a letter are hopefully detected and processed with their correct numbering (e.g. in ...1629, 1630, 163B, 1631, ... the residue 163B has nothing to do with number 163 or 164, the series will be renumbered 1629, 1630, 1631, 1632 and the following will be shifted).
+     4) Nucleotides numbered -1 at the begining of a chain are shifted (with the following ones) to 1.
+     5) Ligands at the end of the chain are removed. Is detected as ligand any residue which is not A/C/G/U and has no defined puckering or no defined torsion angles. Residues are also considered to be ligands if they are at the end of the chain with a residue number which is more than 50 than the previous residue (ligands are sometimes numbered 1000 or 9999). Finally, residues "GNG", "E2C", "OHX", "IRI", "MPD", "8UZ" at then end of a chain are removed.
+     6) Ligands at the begining of a chain are removed. DSSR annotates them with index_chain 1, 2, 3..., so we can detect that there is a redundancy with the real nucleotides 1, 2, 3. We keep only the first, which hopefully is the real nucleotide. We also remove the ones that have a negative number (since we renumbered the truncated chain to 1, some became negative).
+     7) Nucleotides with creative, disruptive numbering are attempted to be detected and renumbered, even if the numbers fell out of the family mapping interval. For example, the suite ... 1003, 2003, 3003, 1004... will be renumbered ...1003, 1004, 1005, 1006 ... and the following accordingly.
+     8) Nucleotides missing from portions not resolved in 3D are created as gaps, with correct numbering, to fill the portion between the previous and the following resolved ones.
+ 
+ * **What are the versions of the dependencies you use ?**
+ 
+ `cmalign` is v1.1.3, `sina` is v1.6.0, `x3dna-dssr` is v1.9.9, Biopython is v1.78.
+     
\ No newline at end of file
--- a/INSTALL.md 0 → 100644
View file @b5935bd
+++ b/INSTALL.md 0 → 100644
View file @b5935bd
+ 
+ * [Required computational resources](#required-computational-resources)
+ * [Method 1 : Using Docker](#method-1-:-installation-using-docker)
+ * [Method 2 : Classical command-line installation](#method-2-:-classical-command-line-installation-linux-only)
+ * [Command options](#command-options)
+ * [Computation time](#computation-time)
+ * [Post-computation tasks](#post-computation-tasks-estimate-quality)
+ * [Output files](#output-files)
+ 
+ # Required computational resources
+ - CPU: no requirements. The program is optimized for multi-core CPUs, you might want to use Intel Xeons, AMD Ryzens, etc.
+ - GPU: not required
+ - RAM: 16 GB with a large swap partition is okay. 32 GB is recommended (usage peaks at ~27 GB)
+ - Storage: to date, it takes 60 GB for the 3D data (36 GB if you don't use the --extract option), 11 GB for the sequence data, and 7GB for the outputs (5.6 GB database, 1 GB archive of CSV files). You need to add a few more for the dependencies. Pick a 100GB partition and you are good to go. The computation speed is way better if you use a fast storage device (e.g. SSD instead of hard drive, or even better, a NVMe SSD) because of constant I/O with the SQlite database.
+ - Network : We query the Rfam public MySQL server on port 4497. Make sure your network enables communication (there should not be any issue on private networks, but maybe you company/university closes ports by default). You will get an error message if the port is not open. Around 30 GB of data is downloaded.
+ 
+ # Method 1 : Installation using Docker
+ 
+ * Step 1 : Download the [Docker container](https://entrepot.ibisc.univ-evry.fr/d/1aff90a9ef214a19b848/files/?p=/rnanet_v1.3_docker.tar&dl=1). Open a terminal and move to the appropriate directory.
+ * Step 2 : Extract the archive to a Docker image named *rnanet* in your local installation
+ ```
+ $ docker load -i rnanet_v1.3_docker.tar
+ ```
+ * Step 3 : Run the container, giving it 3 folders to mount as volumes: a first to store the 3D data, a second to store the sequence data and alignments, and a third to output the results, data and logs:
+ ```
+ $ docker run --rm -v path/to/3D/data/folder:/3D -v path/to/sequence/data/folder:/sequences -v path/to/experiment/results/folder:/runDir rnanet [ - other options ]
+ ```
+ 
+ Typical usage:
+ ```
+ nohup bash -c 'time docker run --rm -v /path/to/3D/data/folder:/3D -v /path/to/sequence/data/folder:/sequences -v /path/to/experiment/folder:/runDir rnanet -s --no-logs ' &
+ ```
+ 
+ 
+ # Method 2 : Classical command line installation (Linux only)
+ 
+ You need to install the dependencies:
+ - DSSR, you need to register to the X3DNA forum [here](http://forum.x3dna.org/site-announcements/download-instructions/) and then download the DSSR binary [on that page](http://forum.x3dna.org/downloads/3dna-download/).  Make sure to have the `x3dna-dssr` binary in your $PATH variable so that RNANet.py finds it.
+ - Infernal, to download at [Eddylab](http://eddylab.org/infernal/), several options are available depending on your preferences. Make sure to have the `cmalign`, `esl-alimanip`, `esl-alipid` and `esl-reformat` binaries in your $PATH variable, so that RNANet.py can find them.
+ - SINA, follow [these instructions](https://sina.readthedocs.io/en/latest/install.html) for example. Make sure to have the `sina` binary in your $PATH.
+ - Sqlite 3, available under the name *sqlite* in every distro's package manager,
+ - Python >= 3.8, (Unfortunately, python3.6 is no longer supported, because of changes in the multiprocessing and Threading packages. Untested with Python 3.7.\*)
+ - The following Python packages: `python3.8 -m pip install biopython matplotlib pandas psutil pymysql requests scipy setproctitle sqlalchemy tqdm`. 
+ 
+ Then, run it from the command line, preferably using nohup if your shell will be interrupted:
+ ```
+  ./RNANet.py --3d-folder path/to/3D/data/folder --seq-folder path/to/sequence/data/folder [ - other options ]
+ ```
+ 
+ Typical usage:
+ ```
+ nohup bash -c 'time ~/Projects/RNANet/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences -s --no-logs' &
+ ```
+ 
+ # Command options
+ 
+ The detailed list of options is below:
+ 
+ ```
+ -h [ --help ]                   Print this help message
+ --version                       Print the program version
+ 
+ -f [ --full-inference ]         Infer new mappings even if Rfam already provides some. Yields more copies of chains
+                                 mapped to different families.
+ -r 4.0 [ --resolution=4.0 ]     Maximum 3D structure resolution to consider a RNA chain.
+ -s                              Run statistics computations after completion
+ --extract                       Extract the portions of 3D RNA chains to individual mmCIF files.
+ --keep-hetatm=False             (True | False) Keep ions, waters and ligands in produced mmCIF files. 
+                                 Does not affect the descriptors.
+ --3d-folder=…                   Path to a folder to store the 3D data files. Subfolders will contain:
+                                         RNAcifs/                Full structures containing RNA, in mmCIF format
+                                         rna_mapped_to_Rfam/     Extracted 'pure' RNA chains
+                                         datapoints/             Final results in CSV file format.
+ --seq-folder=…                  Path to a folder to store the sequence and alignment files. Subfolders will be:
+                                         rfam_sequences/fasta/   Compressed hits to Rfam families
+                                         realigned/              Sequences, covariance models, and alignments by family
+ --no-homology                   Do not try to compute PSSMs and do not align sequences.
+                                 Allows to yield more 3D data (consider chains without a Rfam mapping).
+ 
+ --all                           Build chains even if they already are in the database.
+ --only                          Ask to process a specific chain label only
+ --ignore-issues                 Do not ignore already known issues and attempt to compute them
+ --update-homologous             Re-download Rfam and SILVA databases, realign all families, and recompute all CSV files
+ --from-scratch                  Delete database, local 3D and sequence files, and known issues, and recompute.
+ --archive                       Create a tar.gz archive of the datapoints text files, and update the link to the latest archive
+ --no-logs                       Do not save per-chain logs of the numbering modifications
+ ```
+ Options --3d-folder and --seq-folder are mandatory for command-line installations, but should not be used for installations with Docker. In the Docker container, they are set by default to the paths you provide with the -v options.
+ 
+ The most useful options in that list are 
+ * ` --extract`, to actually produce some re-numbered 3D mmCIF files of the RNA chains individually,
+ * ` --no-homology`, to ignore the family mapping and sequence alignment parts and only focus on 3D data download and annotation. This would yield more data since many RNAs are not mapped to any Rfam family.
+ * ` -s`, to run the "statistics" which are a few useful post-computation tasks such as:
+     * Computation of sequence identity matrices
+     * Statistics over the sequence lengths, nucleotide frequencies, and basepair types by RNA family
+     * Overall database content statistics
+ 
+ # Computation time 
+ 
+ To give you an estimation, our last full run took exactly 12h, excluding the time to download the MMCIF files containing RNA (around 25GB to download) and the time to compute statistics.
+ Measured the 23rd of June 2020 on a 16-core AMD Ryzen 7 3700X CPU @3.60GHz, plus 32 Go RAM, and a 7200rpm Hard drive. Total CPU time spent: 135 hours (user+kernel modes), corresponding to 12h (actual time spent with the 16-core CPU). 
+ 
+ Update runs are much quicker, around 3 hours. It depends mostly on what RNA families are concerned by the update.
+ 
+ 
+ # Post-computation tasks (estimate quality)
+ If your did not ask for automatic run of statistics over the produced dataset with the `-s` option, you can run them later using the file statistics.py. 
+ ```
+ python3.8 statistics.py --3d-folder path/to/3D/data/folder --seq-folder path/to/sequence/data/folder -r 20.0
+ ```
+ /!\ Beware, if not precised with option `-r`, no resolution threshold is applied and all the data in RNANet.db is used.
+ 
+ By default, this computes:
+ * Computation of sequence identity matrices
+ * Statistics over the sequence lengths, nucleotide frequencies, and basepair types by RNA family
+ * Overall database content statistics
+ 
+ If you have run RNANet once with options `--no-homology` and `--extract`, you unlock new statistics over unmapped chains.
+ * You will be allowed to use option `--wadley` to reproduce Wadley & al. (2007) results automatically. These are clustering results of the pseudotorsions angles of the backbone.
+ * (experimental) You will be allowed to use option `--distance-matrices` to compute pairwise residue distances within the chain for every chain, and compute average and standard deviations by RNA families. This is supposed to capture the average shape of an RNA family.
+ 
+ # Output files
+ 
+ * `results/RNANet.db` is a SQLite database file containing several tables with all the information, which you can query yourself with your custom requests,
+ * `3D-folder-you-passed-in-option/datapoints/*` are flat text CSV files, one for one RNA chain mapped to one RNA family, gathering the per-position nucleotide descriptors,
+ * `archive/RNANET_datapoints_{DATE}.tar.gz` is a compressed archive of the above CSV files (only if you passed the --archive option)
+ * `archive/RNANET_alignments_latest.tar.gz` is a compressed archive of multiple sequence alignments in FASTA format, one per RNA family, including only the portions of chains with a 3D structure which are mapped to a family. The alignment has been computed with all the RFam sequences of that family, but they have been removed then.
+ * `path-to-3D-folder-you-passed-in-option/rna_mapped_to_Rfam` If you used the `--extract` option, this folder contains one mmCIF file per RNA chain mapped to one RNA family, without other chains, proteins (nor ions and ligands by default). If you used both `--extract` and `--no-homology`, this folder is called `rna_only`.
+ * `results/summary.csv` summarizes information about the RNA chains
+ * `results/families.csv` summarizes information about the RNA families
+ * `results/pair_types.csv` summarizes statistics about base-pair types in every family.
+ * `results/frequencies.csv` summarizes statistics about nucleotides frequencies in every family (including all known modified bases)
+ 
+ Other folders are created and not deleted, which you might want to conserve to avoid re-computations in later runs:
+ 
+ * `path-to-sequence-folder-you-passed-in-option/rfam_sequences/fasta/` contains compressed FASTA files of the homologous sequences used, by Rfam family.
+ * `path-to-sequence-folder-you-passed-in-option/realigned/` contains families covariance models (\*.cm), unaligned list of sequences (\*.fa), and multiple sequence alignments in both formats Stockholm and Aligned-FASTA (\*.stk and \*.afa). Also contains SINA homolgous sequences databases LSU.arb and SSU.arb, and their index files (\*.sidx).
+ * `path-to-3D-folder-you-passed-in-option/RNAcifs/` contains mmCIF structures directly downloaded from the PDB, which contain RNA chains,
+ * `path-to-3D-folder-you-passed-in-option/annotations/` contains the raw JSON annotation files of the previous mmCIF structures. You may find additional information into them which is not properly supported by RNANet yet.
\ No newline at end of file
--- a/KnownIssues.md 0 → 100644
View file @b5935bd
+++ b/KnownIssues.md 0 → 100644
View file @b5935bd
+ # Known Issues
+ 
+ ## Annotation and numbering issues
+ * Some GDPs that are listed as HETATMs in the mmCIF files are not detected correctly to be real nucleotides. (e.g. 1e8o-E)
+ * Some chains are truncated in different pieces with different chain names. Reason unknown (e.g. 6ztp-AX)
+ * Some chains are not correctly renamed A in the produced separate files (e.g. 1d4r-B)
+ 
+ ## Alignment issues
+ * [SOLVED] Filtered alignments are shorter than the number of alignment columns saved to the SQL table `align_column`
+ * Chain names appear in triple in the FASTA header (e.g. 1d4r[1]-B 1d4r[1]-B 1d4r[1]-B)
+ 
+ ## Technical running issues
+ * [SOLVED] Files produced by Docker containers are owned by root and require root permissions to be read 
+ * [SOLVED] SQLite WAL files are not deleted properly
+ 
+ # Known feature requests
+ * [DONE] Get filtered versions of the sequence alignments containing the 3D chains, publicly available for download
+ * [DONE] Get a consensus residue for each alignement column
+ * [DONE] Get an option to limit the number of cores 
+ * [UPCOMING] Automated annotation of detected Recurrent Interaction Networks (RINs), see http://carnaval.lri.fr/ .
+ * [UPCOMING] Possibly, automated detection of HLs and ILs from the 3D Motif Atlas (BGSU). Maybe. Their own website already does the job.
+ * A field estimating the quality of the sequence alignment in table family.
+ * Possibly, more metrics about the alignments coming from Infernal.
\ No newline at end of file
--- a/README.md
View file @b5935bd
+++ b/README.md
View file @b5935bd
 # RNANet
- Building a dataset following the ProteinNet philosophy, but for RNA.
- 
- We use the Rfam mappings between 3D structures and known Rfam families, using the sequences that are known to belong to an Rfam family (hits provided in RF0XXXX.fasta files from Rfam).
- Future versions might compute a real MSA-based clusering directly with Rfamseq ncRNA sequences, like ProteinNet does with protein sequences, but this requires a tool similar to jackHMMER in the Infernal software suite, which is not available yet.
- 
- This script prepares the dataset from available public data in PDB and Rfam.
 
 Contents:
- * [What it does](#what-it-does)
- * [Output files](#output-files)
- * [How to run](#how-to-run)
-     * [Required computational resources](#required-computational-resources)
-     * [Using Docker](#using-docker)
-     * [Using classical command line installation](#using-classical-command-line-installation)
-     * [Post-computation task: estimate quality](#post-computation-task:-estimate-quality)
+ * [What is RNANet ?](#what-is-rnanet)
+ * [Install and run RNANet](INSTALL.md)
 * [How to further filter the dataset](#how-to-further-filter-the-dataset)
     * [Filter on 3D structure resolution](#filter-on-3D-structure-resolution)
     * [Filter on 3D structure publication date](#filter-on-3d-structure-publication-date)
     * [Filter to avoid chain redundancy when several mappings are available](#filter-to-avoid-chain-redundancy-when-several-mappings-are-available)
- * [More about the database structure](#more-about-the-database-structure)
+ * [Database tables documentation](Database.md)
+ * [FAQ](FAQ.md)
 * [Troubleshooting](#troubleshooting)
 * [Contact](#contact)
 
- **Please cite**: *Coming soon, expect it in 2021*
+ ## Cite us
 
- # What it does
- The script follows these steps:
- * Gets a list of 3D structures containing RNA from BGSU's non-redundant list (but keeps the redundant structures /!\\),
- * Asks Rfam for mappings of these structures onto Rfam families (~50% of structures have a direct mapping, some more are inferred using the redundancy list)
- * Downloads the corresponding 3D structures (mmCIFs)
- * If desired, extracts the right chain portions that map onto an Rfam family
+ * Louis Becquey, Eric Angel, and Fariza Tahi, (2020) **RNANet: an automatically built dual-source dataset integrating homologous sequences and RNA structures**, *Bioinformatics*, 2020, btaa944, [DOI](https://doi.org/10.1093/bioinformatics/btaa944), [Read the OpenAccess paper here](https://doi.org/10.1093/bioinformatics/btaa944)
 
- Now, compute the features:
+ Additional relevant references:
 
- * Extract the sequence for every 3D chain
- * Downloads Rfamseq ncRNA sequence hits for the concerned Rfam families
- * Realigns Rfamseq hits and sequences from the 3D structures together to obtain a multiple sequence alignment for each Rfam family (using `cmalign --cyk`, except for ribosomal LSU and SSU, where SINA is used)
- * Computes nucleotide frequencies at every position for each alignment
- * For each aligned 3D chain, get the nucleotide frequencies in the corresponding RNA family for each residue
+ The "ProteinNet" philosophy which inspired this work:
+ * AlQuraishi, M. (2019b). **ProteinNet: A standardized data set for machine learning of protein structure.** *BMC Bioinformatics*, 20(1), 311
 
- Then, compute the labels:
+ If you use our annotations by DSSR, you might want to cite:
+ * Lu, X.-J.et al.(2015). **DSSR: An integrated software tool for dissecting the spatial structure of RNA.** *Nucleic Acids Research*, 43(21), e142–e142.
 
- * Run DSSR on every RNA structure to get a variety of descriptors per position, describing secondary and tertiary structure. Basepair types annotations include intra-chain and inter-chain interactions.
+ If you use our multiple sequence alignments and homology data, you might want to cite:
+ * Pruesse, E. et al.(2012). **Sina: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.** *Bioinformatics*, 28(14), 1823–1829
+ * Nawrocki, E. P. and Eddy, S. R. (2013). **Infernal 1.1: 100-fold faster RNA homology searches.** *Bioinformatics*, 29(22), 2933–2935.
 
- Finally, export this data from the SQLite database into flat CSV files.
 
- # Output files
+ # What is RNANet ?
+ RNANet is a multiscale dataset of non-coding RNA structures, including sequences, secondary structures, non-canonical interactions, 3D geometrical descriptors, and sequence homology.
 
- * `results/RNANet.db` is a SQLite database file containing several tables with all the information, which you can query yourself with your custom requests,
- * `3D-folder-you-passed-in-option/datapoints/*` are flat text CSV files, one for one RNA chain mapped to one RNA family, gathering the per-position nucleotide descriptors,
- * `archive/RNANET_datapoints_{DATE}.tar.gz` is a compressed archive of the above CSV files (only if you passed the --archive option)
- * `path-to-3D-folder-you-passed-in-option/rna_mapped_to_Rfam` If you used the `--extract` option, this folder contains one mmCIF file per RNA chain mapped to one RNA family, without other chains, proteins (nor ions and ligands by default). If you used both `--extract` and `--no-homology`, this folder is called `rnaonly`.
- * `results/summary.csv` summarizes information about the RNA chains
- * `results/families.csv` summarizes information about the RNA families
+ It is available in machine-learning ready formats like CSV files per chain or an SQL database.
 
- Other folders are created and not deleted, which you might want to conserve to avoid re-computations in later runs:
+ Most interestingly, nucleotides have been renumered in a standardized way, and the 3D chains have been re-aligned with homologous sequences from the [Rfam](https://rfam.org/) database.
 
- * `path-to-sequence-folder-you-passed-in-option/rfam_sequences/fasta/` contains compressed FASTA files of the homologous sequences used, by Rfam family.
- * `path-to-sequence-folder-you-passed-in-option/realigned/` contains families covariance models (\*.cm), unaligned list of sequences (\*.fa), and multiple sequence alignments in both formats Stockholm and Aligned-FASTA (\*.stk and \*.afa). Also contains SINA homolgous sequences databases LSU.arb and SSU.arb, and their index files (\*.sidx).
- * `path-to-3D-folder-you-passed-in-option/RNAcifs/` contains mmCIF structures directly downloaded from the PDB, which contain RNA chains,
- * `path-to-3D-folder-you-passed-in-option/annotations/` contains the raw JSON annotation files of the previous mmCIF structures. You may find additional information into them which is not properly supported by RNANet yet.
 
- # How to run
- RNANet is availbale on Linux (x86-64) only. It could theoretically work on Mac using command line installation (*untested*).
+ ## Methodology
+ We use the Rfam mappings between 3D structures and known Rfam families, using the sequences that are known to belong to an Rfam family (hits provided in RF0XXXX.fasta files from Rfam).
+ Future versions might compute a real MSA-based clusering directly with Rfamseq ncRNA sequences, like ProteinNet does with protein sequences, but this requires a tool similar to jackHMMER in the Infernal software suite, which is not available yet.
 
- ## Required computational resources
- - CPU: no requirements. The program is optimized for multi-core CPUs, you might want to use Intel Xeons, AMD Ryzens, etc.
- - GPU: not required
- - RAM: 16 GB with a large swap partition is okay. 32 GB is recommended (usage peaks at ~27 GB)
- - Storage: to date, it takes 60 GB for the 3D data (36 GB if you don't use the --extract option), 11 GB for the sequence data, and 7GB for the outputs (5.6 GB database, 1 GB archive of CSV files). You need to add a few more for the dependencies. Pick a 100GB partition and you are good to go. The computation speed is way better if you use a fast storage device (e.g. SSD instead of hard drive, or even better, a NVMe SSD) because of constant I/O with the SQlite database.
- - Network : We query the Rfam public MySQL server on port 4497. Make sure your network enables communication (there should not be any issue on private networks, but maybe you company/university closes ports by default). You will get an error message if the port is not open. Around 30 GB of data is downloaded.
+ This script prepares the dataset from available public data in PDB, RNA 3D Hub, Rfam and SILVA.
 
- To give you an estimation, our last full run took exactly 12h, excluding the time to download the MMCIF files containing RNA (around 25GB to download) and the time to compute statistics.
- Measured the 23rd of June 2020 on a 16-core AMD Ryzen 7 3700X CPU @3.60GHz, plus 32 Go RAM, and a 7200rpm Hard drive. Total CPU time spent: 135 hours (user+kernel modes), corresponding to 12h (actual time spent with the 16-core CPU). 
 
- Update runs are much quicker, around 3 hours. It depends mostly on what RNA families are concerned by the update.
+ ## Pipeline
+ The script follows these steps:
 
- ## Using Docker
+ To gather structures:
+ * Gets a list of 3D structures containing RNA from BGSU's non-redundant list (but keeps the redundant structures /!\\),
+ * Asks Rfam for mappings of these structures onto Rfam families (~50% of structures have a direct mapping, some more are inferred using the redundancy list)
+ * Downloads the corresponding 3D structures (mmCIFs)
+ * If desired, extracts the right chain portions that map onto an Rfam family to a separate mmCIF file
 
- * Step 1 : Download the [Docker container](https://entrepot.ibisc.univ-evry.fr/f/e5edece989884a7294a6/?dl=1). Open a terminal and move to the appropriate directory.
- * Step 2 : Extract the archive to a Docker image named *rnanet* in your local installation
- ```
- $ docker load -i rnanet_v1.2_docker.tar
- ```
- * Step 3 : Run the container, giving it 3 folders to mount as volumes: a first to store the 3D data, a second to store the sequence data and alignments, and a third to output the results, data and logs:
- ```
- $ docker run --rm -v path/to/3D/data/folder:/3D -v path/to/sequence/data/folder:/sequences -v path/to/experiment/results/folder:/runDir rnanet [ - other options ]
- ```
+ To compute homology information:
+ * Extract the sequence for every 3D chain
+ * Downloads Rfamseq ncRNA sequence hits for the concerned Rfam families (or ARB databases of SSU or LSU sequences from SILVA for rRNAs)
+ * Realigns Rfamseq hits and sequences from the 3D structures together to obtain a multiple sequence alignment for each Rfam family (using `cmalign --cyk`, except for ribosomal LSU and SSU, where SINA is used)
+ * Computes nucleotide frequencies at every position for each alignment
+ * Map each nucleotide of a 3D chain to its position in the corresponding family sequence alignment
 
- The detailed list of options is below:
+ To compute 3D annotations:
+ * Run DSSR on every RNA structure to get a variety of descriptors per position, describing secondary and tertiary structure. Basepair types annotations include intra-chain and inter-chain interactions.
 
- ```
- -h [ --help ]                   Print this help message
- --version                       Print the program version
- 
- -f [ --full-inference ]         Infer new 3D->family mappings even if Rfam already provides some. Yields more copies of chains
-                                 mapped to different families.
- -r 4.0 [ --resolution=4.0 ]     Maximum 3D structure resolution to consider a RNA chain.
- -s                              Run statistics computations after completion
- --extract                       Extract the portions of 3D RNA chains to individual mmCIF files.
- --keep-hetatm=False             (True | False) Keep ions, waters and ligands in produced mmCIF files. 
-                                 Does not affect the descriptors.
- --fill-gaps=True                (True | False) Replace gaps in nt_align_code field due to unresolved residues
-                                 by the most common nucleotide at this position in the alignment.
- --3d-folder=…                   Path to a folder to store the 3D data files. Subfolders will contain:
-                                         RNAcifs/                Full structures containing RNA, in mmCIF format
-                                         rna_mapped_to_Rfam/     Extracted 'pure' RNA chains
-                                         datapoints/             Final results in CSV file format.
- --seq-folder=…                  Path to a folder to store the sequence and alignment files. Subfolders will be:
-                                         rfam_sequences/fasta/   Compressed hits to Rfam families
-                                         realigned/              Sequences, covariance models, and alignments by family
- --no-homology                   Do not try to compute PSSMs and do not align sequences.
-                                 Allows to yield more 3D data (consider chains without a Rfam mapping).
- 
- --all                           Build chains even if they already are in the database.
- --only                          Ask to process a specific chain label only
- --ignore-issues                 Do not ignore already known issues and attempt to compute them
- --update-homologous             Re-download Rfam and SILVA databases, realign all families, and recompute all CSV files
- --from-scratch                  Delete database, local 3D and sequence files, and known issues, and recompute.
- --archive                       Create a tar.gz archive of the datapoints text files, and update the link to the latest archive
- --no-logs                       Do not save per-chain logs of the numbering modifications
- ```
- You may not use the --3d-folder and --seq-folder options, they are set by default to the paths you provide with the -v options when running Docker.
+ Finally, export this data from the SQLite database into flat CSV files.
 
- Typical usage:
- ```
- nohup bash -c 'time docker run --rm -v /path/to/3D/data/folder:/3D -v /path/to/sequence/data/folder:/sequences -v /path/to/experiment/folder:/runDir rnanet -s --no-logs ' &
- ```
+ ## Data provided
 
- ## Using classical command line installation
+ We provide couple of resources to exploit this dataset. You can download them on [EvryRNA](https://evryrna.ibisc.univ-evry.fr/evryrna/rnanet/rnanet_home).
+ * A series of tables in the SQLite3 database, see [the database documentation](Database.md) and [examples of useful queries](#how-to-further-filter-the-dataset),
+ * One CSV file per RNA chain, summarizing all the relevant information about it,
+ * Filtered alignment files in FASTA format containing only the sequences with a 3D structure available in RNANet, but which have been aligned using all the homologous sequences of this family from Rfam or SILVA,
+ * Additional statistics files about nucleotide frequencies, modified bases, basepair types within each chain or by RNA family.
 
- You need to install the dependencies:
- - DSSR, you need to register to the X3DNA forum [here](http://forum.x3dna.org/site-announcements/download-instructions/) and then download the DSSR binary [on that page](http://forum.x3dna.org/downloads/3dna-download/).  Make sure to have the `x3dna-dssr` binary in your $PATH variable so that RNANet.py finds it.
- - Infernal, to download at [Eddylab](http://eddylab.org/infernal/), several options are available depending on your preferences. Make sure to have the `cmalign`, `esl-alimanip`, `esl-alipid` and `esl-reformat` binaries in your $PATH variable, so that RNANet.py can find them.
- - SINA, follow [these instructions](https://sina.readthedocs.io/en/latest/install.html) for example. Make sure to have the `sina` binary in your $PATH.
- - Sqlite 3, available under the name *sqlite* in every distro's package manager,
- - Python >= 3.8, (Unfortunately, python3.6 is no longer supported, because of changes in the multiprocessing and Threading packages. Untested with Python 3.7.\*)
- - The following Python packages: `python3.8 -m pip install biopython==1.76 matplotlib pandas psutil pymysql requests scipy setproctitle sqlalchemy tqdm`. Note that Biopython versions 1.77 or later do not work (yet) since they removed the alphabet system.
+ For now, we do not provide as public downloads the set of cleaned 3D structures nor the full alignments with Rfam sequences. If you need them, [recompute them](INSTALL.md) or ask us.
 
- Then, run it from the command line, preferably using nohup if your shell will be interrupted:
- ```
-  ./RNANet.py --3d-folder path/to/3D/data/folder --seq-folder path/to/sequence/data/folder [ - other options ]
- ```
- See the list of possible options juste above in the [Using Docker](#using-docker) section. Expect hours (maybe days) of computation.
+ ## Updates
+ RNANet is updated monthly to take into account new structures proposed in the [BGSU Non-redundant lists](http://rna.bgsu.edu/rna3dhub/nrlist/). The monthly runs realign previous alignments with the new sequences using `esl-alimerge` from Infernal.
 
- Typical usage:
- ```
- nohup bash -c 'time ~/Projects/RNANet/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences --no-logs -s' &
- ```
+ It is updated yearly from scratch to take into account new Rfam sequences or updates in the covariance models, and updates in the PDB 3D files.
 
- ## Post-computation task: estimate quality
- If your did not ask for automatic run of statistics over the produced dataset with the `-s` option, you can run them later using the file statistics.py. 
- ```
- python3.8 statistics.py --3d-folder path/to/3D/data/folder --seq-folder path/to/sequence/data/folder -r 20.0
- ```
- /!\ Beware, if not precised with option `-r`, no resolution threshold is applied and all the data in RNANet.db is used.
+ For now, the SILVA releases used are fixed (LSU release 132 and SSU release 138) and not automatically updated. SILVA authors if you reach this : please provide a "latest" download link to ease automatic retrieval of the latest version.
 
- If you have run RNANet twice, once with option `--no-homology`, and once without, you unlock new statistics over unmapped chains. You will also be allowed to use option `--wadley` to reproduce Wadley & al. (2007) results automatically.
+ See what's new in the latest version of RNANet [in the CHANGELOG](CHANGELOG).
 
 # How to further filter the dataset
 You may want to build your own sub-dataset by querying the results/RNANet.db file. Here are quick examples using Python3 and its sqlite3 package.
@@ -240,133 +166,21 @@ with sqlite3.connect("results/RNANet.db) as connection:
 ```
 Then proceed to steps 2 and 3.
 
- # More about the database structure
- To help you design your own requests, here follows a description of the database tables and fields.
- 
- ## Table `family`, for Rfam families and their properties
- * `rfam_acc`: The family codename, from Rfam's numbering (Rfam accession number)
- * `description`: What RNAs fit in this family
- * `nb_homologs`: The number of hits known to be homologous downloaded from Rfam to compute nucleotide frequencies
- * `nb_3d_chains`: The number of 3D RNA chains mapped to the family (from Rfam-PDB mappings, or inferred using the redundancy list)
- * `nb_total_homol`: Sum of the two previous fields, the number of sequences in the multiple sequence alignment, used to compute nucleotide frequencies
- * `max_len`: The longest RNA sequence among the homologs (in bases, unaligned)
- * `ali_len`: The aligned sequences length (in bases, aligned)
- * `ali_filtered_len`: The aligned sequences length when we filter the alignment to keep only the RNANet chains (which have a 3D structure) and remove the gap-only columns.
- * `comput_time`: Time required to compute the family's multiple sequence alignment in seconds,
- * `comput_peak_mem`: RAM (or swap) required to compute the family's multiple sequence alignment in megabytes,
- * `idty_percent`: Average identity percentage over pairs of the 3D chains' sequences from the family
- 
- ## Table `structure`, for 3D structures of the PDB
- * `pdb_id`: The 4-char PDB identifier
- * `pdb_model`: The model used in the PDB file
- * `date`: The first submission date of the 3D structure to a public database
- * `exp_method`: A string to know wether the structure as been obtained by X-ray crystallography ('X-RAY DIFFRACTION'), electron microscopy ('ELECTRON MICROSCOPY'), or NMR (not seen yet)
- * `resolution`: Resolution of the structure, in Angstöms
- 
- ## Table `chain`, for the datapoints: one chain mapped to one Rfam family
- * `chain_id`: A unique identifier
- * `structure_id`: The `pdb_id` where the chain comes from
- * `chain_name`: The chain label, extracted from the 3D file
- * `eq_class`: The BGSU equivalence class label containing this chain
- * `rfam_acc`: The family which the chain is mapped to (if not mapped, value is *unmappd*)
- * `pdb_start`: Position in the chain where the mapping to Rfam begins (absolute position, not residue number)
- * `pdb_end`: Position in the chain where the mapping to Rfam ends (absolute position, not residue number)
- * `reversed`: Wether the mapping numbering order differs from the residue numbering order in the mmCIF file (eg 4c9d, chains C and D)
- * `issue`: Wether an issue occurred with this structure while downloading, extracting, annotating or parsing the annotation. See the file known_issues_reasons.txt for more information about why your chain is marked as an issue.
- * `inferred`: Wether the mapping has been inferred using the redundancy list (value is 1) or just known from Rfam-PDB mappings (value is 0)
- * `chain_freq_A`, `chain_freq_C`, `chain_freq_G`, `chain_freq_U`, `chain_freq_other`: Nucleotide frequencies in the chain
- * `pair_count_cWW`, `pair_count_cWH`, ... `pair_count_tSS`: Counts of the non-canonical base-pair types in the chain (intra-chain counts only)
- 
- ## Table `nucleotide`, for individual nucleotide descriptors
- * `nt_id`: A unique identifier
- * `chain_id`: The chain the nucleotide belongs to
- * `index_chain`: its absolute position within the portion of chain mapped to Rfam, from 1 to X. This is completely uncorrelated to any gene start or 3D chain residue numbers.
- * `nt_position`: relative position within the portion of chain mapped to RFam, from 0 to 1
- * `old_nt_resnum`: The residue number in the 3D mmCIF file (it's a string actually, some contain a letter like '37A')
- * `nt_name`: The residue type. This includes modified nucleotide names (e.g. 5MC for 5-methylcytosine)
- * `nt_code`: One-letter name. Lowercase "acgu" letters are used for modified "ACGU" bases.
- * `nt_align_code`: One-letter name used for sequence alignment. Contains "ACGUN-" only first, and then, gaps may be replaced by the most common letter at this position (default)
- * `is_A`, `is_C`, `is_G`, `is_U`, `is_other`: One-hot encoding of the nucleotide base
- * `dbn`: character used at this position if we look at the dot-bracket encoding of the secondary structure. Includes inter-chain (RNA complexes) contacts.
- * `paired`: empty, or comma separated list of `index_chain` values referring to nucleotides the base is interacting with. Up to 3 values. Inter-chain interactions are marked paired to '0'.
- * `nb_interact`: number of interactions with other nucleotides. Up to 3 values. Includes inter-chain interactions.
- * `pair_type_LW`: The Leontis-Westhof nomenclature codes of the interactions. The first letter concerns cis/trans orientation, the second this base's side interacting, and the third the other base's side.
- * `pair_type_DSSR`: Same but using the DSSR nomenclature (Hoogsteen edge approximately corresponds to Major-groove and Sugar edge to minor-groove)
- * `alpha`, `beta`, `gamma`, `delta`, `epsilon`, `zeta`: The 6 torsion angles of the RNA backabone for this nucleotide
- * `epsilon_zeta`: Difference between epsilon and zeta angles
- * `bb_type`: conformation of the backbone (BI, BII or ..)
- * `chi`: torsion angle between the sugar and base (O-C1'-N-C4)
- * `glyco_bond`: syn or anti configuration of the sugar-base bond
- * `v0`, `v1`, `v2`, `v3`, `v4`: 5 torsion angles of the ribose cycle
- * `form`: if the nucleotide is involved in a stem, the stem type (A, B or Z)
- * `ssZp`: Z-coordinate of the 3’ phosphorus atom with reference to the5’ base plane
- * `Dp`: Perpendicular distance of the 3’ P atom to the glycosidic bond
- * `eta`, `theta`: Pseudotorsions of the backbone, using phosphorus and carbon 4'
- * `eta_prime`, `theta_prime`: Pseudotorsions of the backbone, using phosphorus and carbon 1'
- * `eta_base`, `theta_base`: Pseudotorsions of the backbone, using phosphorus and the base center
- * `phase_angle`: Conformation of the ribose cycle
- * `amplitude`: Amplitude of the sugar puckering
- * `puckering`: Conformation of the ribose cycle (10 classes depending on the phase_angle value)
- 
- ## Table `align_column`, for positions in multiple sequence alignments
- * `column_id`: A unique identifier
- * `rfam_acc`: The family's MSA the column belongs to
- * `index_ali`: Position of the column in the alignment (starts at 1)
- * `freq_A`, `freq_C`, `freq_G`, `freq_U`, `freq_other`: Nucleotide frequencies in the alignment at this position
- 
- There always is an entry, for each family (rfam_acc), with index_ali = zero and nucleotide frequencies set to freq_other = 1.0. This entry is used when the nucleotide frequencies cannot be determined because of local alignment issues.
- 
- ## Table `re_mapping`, to map a nucleotide to an alignment column
- * `remapping_id`: A unique identifier
- * `chain_id`: The chain which is mapped to an alignment
- * `index_chain`: The absolute position of the nucleotide in the chain (from 1 to X)
- * `index_ali` The position of that nucleotide in its family alignment
- 
 # Troubleshooting
 
- ## Understanding the warnings and errors
- 
- * **Could not load X.json with JSON package** : 
- The JSON format produced as DSSR output could not be loaded by Python. Try deleting the file and re-running DSSR (through RNANet).
- * **Found DSSR warning in annotation X.json: no nucleotides found. Ignoring X.** : 
- DSSR complains because the CIF structure does not seem to contain nucleotides. This can happen on low resolution structures where only P atoms are solved, you should ignore them. This also can happen if the .cif file is corrupted (failed download, etc). Check with a 3D visualization software if your chain contains well-defined nucleotides. Try deleting the .cif and retry. If the problem persists, just ignore the chain.
- * **Could not find nucleotides of chain X in annotation X.json. Ignoring chain X.** : Basically the same as above, but some nucleotides have been observed in another chain of the same structure. 
- * **Could not find real nucleotides of chain X between START and STOP. Ignoring chain X."** : Same as the two above, but nucleotides can be found outside of the mapping interval. This can happen if there is a mapping problem, e.g., considered absolute interval when it should not.
- * **Error while parsing DSSR X.json output: {custom-error}** : The DSSR annotations lack some of our required fields. It is likely that DSSR changed something in their fields names. Contact us so that we fix the problem with the latest DSSR version.
- * **Mapping is reversed, this case is not supported (yet). Ignoring chain X.** : The mapping coordinates, as obtained from Rfam, have an end position coming before the start position (meaning, the sequence has to be reversed to map the RNA covariance model). We do not support this yet, we ignore this chain.
- * **Error with parsing of X duplicate residue numbers. Ignoring it.** : This 3D chain contains new kind(s) of issue(s) in the residue numberings that are not part of the issues we already know how to tackle. Contact us, so that we add support for this entry.
- * **Found duplicated index_chain N in X. Keeping only the first.** : This RNA 3D chain contains two (or more) residues with the same numbering N. This often happens when a nucleic-like ligand is annotated as part of the RNA chain, and DSSR considers it a nucleotide. By default, RNANet keeps only the first of the multiple residues with the same number. You may want to check that the produced 3D structure contains the appropriate nucleotide and no ligand.
- * **Missing index_chain N in X !** : DSSR annotations for chain X are discontinuous, position N is missing. This means residue N has not been recognized as a nucleotide by DSSR. Is the .cif structure file corrupted ? Delete it and retry.
- * **X sequence is too short, let's ignore it.** : We discard very short RNA chains.
- * **Error downloading and/or extracting Rfam.cm !** : We cannot retrieve the Rfam covariance models file. RNANet tries to find it at ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.cm.gz so, check that your network is not blocking the FTP protocol (port 21 is open on your network), and check that the adress has not changed. If so, contact us so that we update RNANet with the correct address.
- * **Something's wrong with the SQL database. Check mysql-rfam-public.ebi.ac.uk status and try again later. Not printing statistics.** : We cannot retrieve family statistics from Rfam public server. Check if you can connect to it by hand : `mysql -u rfamro -P 4497 -D Rfam -h mysql-rfam-public.ebi.ac.uk`. if not, check that the port 497 is opened on your network.
- * **Error downloading RFXXXXX.fa.gz: {custom-error}** : We cannot reach the Rfam FTP server to download homologous sequences. We look in ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/ so, check if you can access it from your network (check that port 21 is opened on your network). Check if the address has changed and notify us.
- * **Error downloading NR list !** : We cannot download BGSU's equivalence classes from their website. Check if you can access http://rna.bgsu.edu/rna3dhub/nrlist/download/current/20.0A/csv from a web browser. It actually happens that their website is not responding, the previous download will be re-used.
- * **Error downloading the LSU/SSU database from SILVA** : We cannot reach SILVA's arb files. We are looking for http://www.arb-silva.de/fileadmin/arb_web_db/release_132/ARB_files/SILVA_132_LSURef_07_12_17_opt.arb.gz and http://www.arb-silva.de/fileadmin/silva_databases/release_138/ARB_files/SILVA_138_SSURef_05_01_20_opt.arb.gz , can you download and extract them from your web browser and place them in the realigned/ subfolder ?
- * **Assuming mapping to RFXXXXX is an absolute position interval.** : The mapping provided by Rfam concerns a nucleotide interval START-END, but no nucleotides are defined in 3D in that interval. When this happens, we assume that the numbering is not relative to the residue numbers in the 3D file, but to the absolute position in the chain, starting at 1. And yes, we tried to apply this behavior to all mappings, this yields the opposite issue where some mappings get outside the available nucleotides. To be solved the day Rfam explains how they get build the mappings.
- * **Added newly discovered issues to known issues** : You discovered new chains that cannot be perfectly understood as they actually are, congrats. For each chain of the list, another warning has been raised, refer to them. 
- * **Structures without referenced chains have been detected.** : Something went wrong, because the database contains references to 3D structures that are not used by any entry in the `chain` table. You should rerun RNANet. The option `--only` may help to rerun it just for one chain.
- * **Chains without referenced structures have been detected** : 
- Something went wrong, because the database contains references to 3D chains that are not used by any entry in the `structure` table. You should rerun RNANet. The option `--only` may help to rerun it just for one chain.
- * **Chains were not remapped** : Something went wrong, because the database contains references to 3D chains that are not used by any entry in the `re_mapping` table, assuming you were interested in homology data. You should rerun RNANet. The option `--only` may help to rerun it just for one chain. If you are not interested in homology data, use option `--no-homology` to skip alignment and remapping steps.
- * **Operational Error: database is locked, retrying in 0.2s** : Too many workers are trying to access the database at the same time. Do not try to run several instances of RNANet in parallel. Even with only one instance, this might still happen if your device has slow I/O delays. Try to run RNANet from a SSD ?
- * **Tried to reach database 100 times and failed. Aborting.** : Same as above, but in a more serious way.
- * **Nothing to do !** : RNANet is up-to-date, or did not detect any modification to do, so nothing changed in your database.
- * **KeyboardInterrupt, terminating workers.** : You interrupted the computation by pressing Ctrl+C. The database may be in an unstable state, rerun RNANet to solve the problem.
- * **Found mappings to RFXXXXX in both directions on the same interval, keeping only the 5'->3' one.**  : A chain has been mapped to family RFXXXXX, but the mapping has been found twice, with the limits inverted. We only keep one (in 5'->3' sense).
- * **There are mappings for RFXXXXX in both directions** : A chain has been mapped to family RFXXXXX several times, and the mappings are not in the same sequence sense (some are reverted, with END < START). Then, we do not know what to decide for this chain, and we abort. 
- * **Unable to download XXXX.cif. Ignoring it.** :  We cannot access a certain 3D structure from RCSB's download site, can you access it from your web browser and put it in the RNAcifs/ folder ? We look at http://files.rcsb.org/download/XXXX.cif , replacing XXXX by the right PDB code.
- * **Wtf, structure XXXX has no resolution ? Check https://files.rcsb.org/header/XXXX.cif to figure it out.** : We cannot find the resolution of structure XXXX from the .cif file. We are looking for it in the fields `_refine.ls_d_res_high`, `_refine.ls_d_res_low`, and `_em_3d_reconstruction.resolution`. Maybe the information is stored in another field ? If you find it, contact us so that we support this new CIF field.
- * **Could not find annotations for X, ignoring it.** : It seems that DSSR has not been run for structure X, or failed. Rerun RNANet.
- * **Nucleotides not inserted: {custom-error}** : For some reason, no nucleotides were saved to the database for this chain. Contact us.
- * **Removing N doublons from existing RFXXXXX++.fa and using their newest version** : You are trying to re-compute sequence alignments of 3D structures that had already been computed in the past. They will be removed from the alignment and recomputed, for the case the sequences have changed.
- * **Removing N doublons from existing RFXXXXX++.stk and using their newest version** :  Same as above.
- * **Error during sequence alignment: {custom-error}** : Something went wrong during sequence alignment. Recompute the alignments using the `--update-homologous` option.
- * **Failed to realign RFXXXXX (killed)** : You ran out of memory while computing multiple sequence alignments. Try to run RNANet of a machine with at least 32 GB of RAM.
- * **RFXXXXX's alignment is wrong. Recompute it and retry.** : We could not load RFXXXXX's multiple sequence alignment. It may have failed to compute, or be corrupted. Recompute the alignments using the `--update-homologous` option.
- 
- ## Not enough memory
- If you run out of memory, you may want to reduce the number of jobs run in parallel. #TODO: explain how
+ Check if your problem is listed in the [known issues](KnownIssues.md).
+ 
+ ### Warning and Errors
+ If you ran RNANet and got an error or a warning that you do not fully understand, check the [Error documentation](Errors.md).
+ 
+ ### Not enough memory
+ If you run out of memory (job killed), you may want to reduce the number of jobs run in parallel. Use the `--maxcores` option with a small number to ask RNANet to limit the concurrency and the simultaneous need for a lot of RAM. The computation time will increase accordingly.
+ 
+ ### Not enough memory/too slow (developer trick)
+ If `--maxcores` is not enough, and that you identified the step which fails, you can try to edit the Python code. Look for the "coeff_ncores" argument of some functions calls. This is the coefficient applied to `--maxcores` for different steps of the pipeline. You can change it following your needs to reduce or increase concurrency (to use less memory, or compute faster, respectively).
 
 # Contact
- louis.becquey@univ-evry.fr
+ RNANet is still in beta, this means we are truly open (and enjoying) all the feedback we can get from interested users.
+ 
+ Please send all your questions, feature requests, bug reports or angry reacts to
+ louis.becquey@univ-evry.fr .
--- a/RNAnet.py
View file @b5935bd
+++ b/RNAnet.py
View file @b5935bd
@@ -979,9 +979,9 @@ class Pipeline:
         setproctitle("RNANet.py process_options()")
 
         try:
-             opts, _ = getopt.getopt(sys.argv[1:], "r:fhs", ["help", "resolution=", "3d-folder=", "seq-folder=", "keep-hetatm=",  "only=",
+             opts, _ = getopt.getopt(sys.argv[1:], "r:fhs", ["help", "resolution=", "3d-folder=", "seq-folder=", "keep-hetatm=",  "only=", "maxcores=",
                                                             "from-scratch", "full-inference", "no-homology", "ignore-issues", "extract", 
-                                                             "all", "no-logs", "archive", "update-homologous"])
+                                                             "all", "no-logs", "archive", "update-homologous", "version"])
         except getopt.GetoptError as err:
             print(err)
             sys.exit(2)
@@ -1000,13 +1000,19 @@ class Pipeline:
                 print("-h [ --help ]\t\t\tPrint this help message")
                 print("--version\t\t\tPrint the program version")
                 print()
-                 print("-f [ --full-inference ]\t\tInfer new mappings even if Rfam already provides some. Yields more copies of chains"
-                       "\n\t\t\t\tmapped to different families.")
-                 print("-r 4.0 [ --resolution=4.0 ]\tMaximum 3D structure resolution to consider a RNA chain.")
+                 print("Select what to do:")
+                 print("--------------------------------------------------------------------------------------------------------------")
+                 print("-f [ --full-inference ]\t\tInfer new mappings even if Rfam already provides some. Yields more copies of"
+                       "\n\t\t\t\t chains mapped to different families.")
                 print("-s\t\t\t\tRun statistics computations after completion")
                 print("--extract\t\t\tExtract the portions of 3D RNA chains to individual mmCIF files.")
                 print("--keep-hetatm=False\t\t(True | False) Keep ions, waters and ligands in produced mmCIF files. "
-                       "\n\t\t\t\tDoes not affect the descriptors.")
+                       "\n\t\t\t\t Does not affect the descriptors.")
+                 print("--no-homology\t\t\tDo not try to compute PSSMs and do not align sequences."
+                       "\n\t\t\t\t Allows to yield more 3D data (consider chains without a Rfam mapping).")
+                 print()
+                 print("Select how to do it:")
+                 print("--------------------------------------------------------------------------------------------------------------")
                 print("--3d-folder=…\t\t\tPath to a folder to store the 3D data files. Subfolders will contain:"
                       "\n\t\t\t\t\tRNAcifs/\t\tFull structures containing RNA, in mmCIF format"
                       "\n\t\t\t\t\trna_mapped_to_Rfam/\tExtracted 'pure' RNA chains"
@@ -1014,22 +1020,28 @@ class Pipeline:
                 print("--seq-folder=…\t\t\tPath to a folder to store the sequence and alignment files. Subfolders will be:"
                       "\n\t\t\t\t\trfam_sequences/fasta/\tCompressed hits to Rfam families"
                       "\n\t\t\t\t\trealigned/\t\tSequences, covariance models, and alignments by family")
-                 print("--no-homology\t\t\tDo not try to compute PSSMs and do not align sequences."
-                       "\n\t\t\t\tAllows to yield more 3D data (consider chains without a Rfam mapping).")
+                 print("--maxcores=…\t\t\tLimit the number of cores to use in parallel portions to reduce the simultaneous"
+                       "\n\t\t\t\t need of RAM. Should be a number between 1 and your number of CPUs. Note that portions"
+                       "\n\t\t\t\t of the pipeline already limit themselves to 50% or 70% of that number by default.")
+                 print("--archive\t\t\tCreate tar.gz archives of the datapoints text files and the alignments,"
+                       "\n\t\t\t\t and update the link to the latest archive. ")
+                 print("--no-logs\t\t\tDo not save per-chain logs of the numbering modifications")
                 print()
+                 print("Select which data we are interested in:")
+                 print("--------------------------------------------------------------------------------------------------------------")
+                 print("-r 4.0 [ --resolution=4.0 ]\tMaximum 3D structure resolution to consider a RNA chain.")
                 print("--all\t\t\t\tBuild chains even if they already are in the database.")
                 print("--only\t\t\t\tAsk to process a specific chain label only")
                 print("--ignore-issues\t\t\tDo not ignore already known issues and attempt to compute them")
                 print("--update-homologous\t\tRe-download Rfam and SILVA databases, realign all families, and recompute all CSV files")
                 print("--from-scratch\t\t\tDelete database, local 3D and sequence files, and known issues, and recompute.")
-                 print("--archive\t\t\tCreate a tar.gz archive of the datapoints text files, and update the link to the latest archive")
-                 print("--no-logs\t\t\tDo not save per-chain logs of the numbering modifications")
                 print()
                 print("Typical usage:")
-                 print(f"nohup bash -c 'time {fileDir}/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences -s' &")
+                 print(f"nohup bash -c 'time {fileDir}/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences -s --no-logs' &")
                 sys.exit()
             elif opt == '--version':
-                 print("RNANet 1.3 beta, parallelized, Dockerized")
+                 print("RNANet v1.3 beta, parallelized, Dockerized")
+                 print("Last revision : Jan 2021")
                 sys.exit()
             elif opt == "-r" or opt == "--resolution":
                 assert float(arg) > 0.0 and float(arg) <= 20.0
@@ -1084,6 +1096,9 @@ class Pipeline:
                 self.ARCHIVE = True
             elif opt == "--no-logs":
                 self.SAVELOGS = False
+             elif opt == "--maxcores":
+                 global ncores
+                 ncores = min(ncores, int(arg))
             elif opt == "-f" or opt == "--full-inference":
                 self.FULLINFERENCE = True
 
@@ -2614,9 +2629,9 @@ if __name__ == "__main__":
     runDir = os.getcwd()
     fileDir = os.path.dirname(os.path.realpath(__file__))
     ncores = read_cpu_number()
-     print(f"> Running {python_executable} on {ncores} CPU cores in folder {runDir}.")
     pp = Pipeline()
     pp.process_options()
+     print(f"> Running {python_executable} on {ncores} CPU cores in folder {runDir}.")
 
     # Prepare folders
     os.makedirs(runDir + "/results", exist_ok=True)
@@ -2639,8 +2654,7 @@ if __name__ == "__main__":
 
     # Download and annotate new RNA 3D chains (Chain objects in pp.update)
     # If the original cif file and/or the Json DSSR annotation file already exist, they are not redownloaded/recomputed.
-     # pp.dl_and_annotate(coeff_ncores=0.5)
-     pp.dl_and_annotate(coeff_ncores=1.0)
+     pp.dl_and_annotate(coeff_ncores=0.5)
     print("Here we go.")
 
     # At this point, the structure table is up to date.
@@ -2652,7 +2666,7 @@ if __name__ == "__main__":
         # Redownload and re-annotate
         print("> Retrying to annotate some structures which just failed.", flush=True)
         pp.dl_and_annotate(retry=True, coeff_ncores=0.3)  #
-         pp.build_chains(retry=True, coeff_ncores=1.0)     # Use half the cores to reduce required amount of memory
+         pp.build_chains(retry=True, coeff_ncores=0.5)     # Use half the cores to reduce required amount of memory
     print(f"> Loaded {len(pp.loaded_chains)} RNA chains ({len(pp.update) - len(pp.loaded_chains)} ignored/errors).")
     if len(no_nts_set):
         print(f"Among errors, {len(no_nts_set)} structures seem to contain RNA chains without defined nucleotides:", no_nts_set, flush=True)
--- a/known_issues.txt deleted 100644 → 0
View file @4de494b
+++ b/known_issues.txt deleted 100644 → 0
View file @4de494b
- 1apg_1_D
- 1b2m_1_C
- 1b2m_1_D
- 1b2m_1_E
- 1cgm_1_I
- 1cwp_1_D
- 1cwp_1_E
- 1cwp_1_F
- 1ddl_1_E
- 1e8s_1_C
- 1eg0_1_L
- 1eg0_1_L_1-56
- 1eg0_1_M
- 1eg0_1_O
- 1eg0_1_O_1-73
- 1emi_1_B
- 1emi_1_B_1-108
- 1gsg_1_T
- 1gsg_1_T_1-72
- 1h2c_1_R
- 1h2d_1_R
- 1h2d_1_S
- 1i5l_1_U
- 1i5l_1_Y
- 1ibl_1_Z
- 1ibm_1_Z
- 1jgo_1_A
- 1jgo_1_A_2-1520
- 1jgp_1_A
- 1jgp_1_A_2-1520
- 1jgq_1_A
- 1jgq_1_A_2-1520
- 1laj_1_R
- 1ls2_1_B
- 1ls2_1_B_1-73
- 1m8w_1_E
- 1m8w_1_F
- 1mj1_1_Q
- 1mj1_1_R
- 1ml5_1_A
- 1ml5_1_a_1-2914
- 1ml5_1_A_2-1520
- 1ml5_1_b_5-121
- 1mvr_1_1
- 1mvr_1_A
- 1mvr_1_B
- 1mvr_1_B_3-96
- 1mvr_1_C
- 1mvr_1_D
- 1mvr_1_D_1-61
- 1mvr_1_E
- 1n1h_1_B
- 1n32_1_Z
- 1n33_1_Z
- 1n34_1_Z
- 1n38_1_B
- 1nb7_1_E
- 1nb7_1_F
- 1pn7_1_C
- 1pn8_1_D
- 1pvo_1_G
- 1pvo_1_H
- 1pvo_1_J
- 1pvo_1_K
- 1pvo_1_L
- 1qln_1_R
- 1qvg_1_3
- 1qzc_1_A
- 1qzc_1_B
- 1qzc_1_C
- 1r2w_1_C
- 1r2w_1_C_1-58
- 1r2x_1_C
- 1r2x_1_C_1-58
- 1rmv_1_B
- 1t1m_1_A
- 1t1m_1_B
- 1trj_1_B
- 1trj_1_C
- 1utd_1_1
- 1utd_1_2
- 1utd_1_3
- 1utd_1_4
- 1utd_1_5
- 1utd_1_6
- 1utd_1_7
- 1utd_1_8
- 1utd_1_9
- 1utd_1_Z
- 1uvi_1_D
- 1uvi_1_E
- 1uvi_1_F
- 1uvj_1_D
- 1uvj_1_E
- 1uvj_1_F
- 1uvn_1_B
- 1uvn_1_D
- 1uvn_1_F
- 1vq6_1_4
- 1vqn_1_4
- 1vqo_1_4
- 1vtm_1_R
- 1vy7_1_AY_1-73
- 1vy7_1_CY_1-73
- 1x18_1_A
- 1x18_1_B
- 1x18_1_C
- 1x18_1_D
- 1x1l_1_A
- 1x1l_1_A_1-132
- 1xmo_1_W
- 1xmq_1_W
- 1xnq_1_W
- 1xnr_1_W
- 1xpo_1_G
- 1xpo_1_H
- 1xpo_1_J
- 1xpo_1_K
- 1xpo_1_L
- 1xpo_1_M
- 1xpr_1_G
- 1xpr_1_H
- 1xpr_1_J
- 1xpr_1_K
- 1xpr_1_L
- 1xpr_1_M
- 1xpu_1_G
- 1xpu_1_H
- 1xpu_1_J
- 1xpu_1_K
- 1xpu_1_L
- 1xpu_1_M
- 1y1y_1_P
- 1ytu_1_D
- 1ytu_1_F
- 1zc8_1_A
- 1zc8_1_A_1-59
- 1zc8_1_B
- 1zc8_1_C
- 1zc8_1_F
- 1zc8_1_G
- 1zc8_1_H
- 1zc8_1_I
- 1zc8_1_J
- 1zc8_1_Z
- 1zc8_1_Z_1-93
- 1zn0_1_C
- 1zn1_1_B
- 1zn1_1_B_1-59
- 1zn1_1_C
- 2a1r_1_C
- 2a1r_1_D
- 2a8v_1_D
- 2atw_1_B
- 2atw_1_D
- 2az0_1_C
- 2az0_1_D
- 2az2_1_C
- 2az2_1_D
- 2b2d_1_S
- 2f4v_1_Z
- 2ftc_1_R
- 2ftc_1_R_81-1466
- 2fz2_1_D
- 2ht1_1_J
- 2ht1_1_K
- 2iy3_1_B
- 2iy3_1_B_9-105
- 2ob7_1_A
- 2ob7_1_A_10-319
- 2ob7_1_D
- 2ob7_1_D_1-132
- 2om3_1_R
- 2qqp_1_R
- 2r1g_1_A
- 2r1g_1_B
- 2r1g_1_C
- 2r1g_1_D
- 2r1g_1_E
- 2r1g_1_F
- 2r1g_1_X
- 2rdo_1_A
- 2rdo_1_A_3-118
- 2rdo_1_B
- 2rdo_1_B_1-2904
- 2tmv_1_R
- 2uxb_1_X
- 2uxc_1_Y
- 2uxd_1_X
- 2vaz_1_A
- 2vaz_1_A_64-177
- 2voo_1_C
- 2voo_1_D
- 2vrt_1_E
- 2vrt_1_F
- 2vrt_1_G
- 2vrt_1_H
- 2wj8_1_A
- 2wj8_1_B
- 2wj8_1_C
- 2wj8_1_D
- 2wj8_1_E
- 2wj8_1_F
- 2wj8_1_G
- 2wj8_1_H
- 2wj8_1_I
- 2wj8_1_J
- 2wj8_1_K
- 2wj8_1_L
- 2wj8_1_M
- 2wj8_1_N
- 2wj8_1_O
- 2wj8_1_P
- 2wj8_1_Q
- 2wj8_1_R
- 2wj8_1_S
- 2wj8_1_T
- 2x1a_1_B
- 2x1f_1_B
- 2xea_1_R
- 2xnr_1_C
- 2xpj_1_D
- 2xs5_1_D
- 2xs7_1_B
- 2z9q_1_A
- 2z9q_1_A_1-72
- 2zde_1_E
- 2zde_1_F
- 2zde_1_G
- 2zde_1_H
- 3avt_1_T
- 3b0u_1_A
- 3b0u_1_B
- 3bbv_1_Z
- 3cd6_1_4
- 3cma_1_5
- 3cme_1_5
- 3cw1_1_V
- 3cw1_1_v_1-138
- 3cw1_1_V_1-138
- 3cw1_1_W
- 3cw1_1_w_1-138
- 3cw1_1_X
- 3cw1_1_x_1-138
- 3d2s_1_F
- 3d2s_1_H
- 3ep2_1_A
- 3ep2_1_B
- 3ep2_1_B_1-50
- 3ep2_1_C
- 3ep2_1_D
- 3ep2_1_E
- 3ep2_1_Y
- 3ep2_1_Y_1-72
- 3eq3_1_A
- 3eq3_1_B
- 3eq3_1_B_1-50
- 3eq3_1_C
- 3eq3_1_D
- 3eq3_1_E
- 3eq3_1_Y
- 3eq3_1_Y_1-72
- 3eq4_1_A
- 3eq4_1_B
- 3eq4_1_B_1-50
- 3eq4_1_C
- 3eq4_1_D
- 3eq4_1_E
- 3eq4_1_Y
- 3eq4_1_Y_1-69
- 3er8_1_F
- 3er8_1_G
- 3er8_1_H
- 3er9_1_D
- 3erc_1_G
- 3gpq_1_E
- 3gpq_1_F
- 3ie1_1_E
- 3ie1_1_F
- 3ie1_1_G
- 3ie1_1_H
- 3iy8_1_A
- 3iy8_1_A_1-540
- 3iy9_1_A
- 3iy9_1_A_498-1027
- 3j06_1_R
- 3j0l_1_A
- 3j0l_1_B
- 3j0l_1_C
- 3j0l_1_D
- 3j0l_1_F
- 3j0l_1_H
- 3j0o_1_A
- 3j0o_1_B
- 3j0o_1_C
- 3j0o_1_D
- 3j0o_1_F
- 3j0o_1_H
- 3j0p_1_A
- 3j0p_1_C
- 3j0p_1_D
- 3j0p_1_F
- 3j0p_1_H
- 3j0q_1_A
- 3j0q_1_C
- 3j0q_1_D
- 3j0q_1_F
- 3j0q_1_H
- 3j2k_1_0
- 3j2k_1_1
- 3j2k_1_2
- 3j2k_1_3
- 3j2k_1_4
- 3j46_1_A
- 3j46_1_P
- 3j6b_1_E
- 3j6x_1_IR
- 3j6y_1_IR
- 3j9m_1_U
- 3j9y_1_V
- 3jb7_1_M
- 3jb7_1_T
- 3jbu_1_B
- 3jbu_1_V
- 3jbv_1_B
- 3jcj_1_G
- 3jcj_1_V
- 3jcn_1_V
- 3jcr_1_H
- 3jcr_1_H_1-115
- 3jcr_1_M
- 3jcr_1_M_1-141
- 3jcr_1_N
- 3jcr_1_N_1-107
- 3koa_1_C
- 3m7n_1_Z
- 3m85_1_X
- 3m85_1_Y
- 3m85_1_Z
- 3nma_1_B
- 3nma_1_C
- 3nvk_1_G
- 3nvk_1_S
- 3ok4_1_2
- 3ok4_1_4
- 3ok4_1_H
- 3ok4_1_J
- 3ok4_1_L
- 3ok4_1_N
- 3ok4_1_P
- 3ok4_1_R
- 3ok4_1_T
- 3ok4_1_V
- 3ok4_1_X
- 3ok4_1_Z
- 3ol6_1_D
- 3ol6_1_H
- 3ol6_1_L
- 3ol6_1_P
- 3ol7_1_D
- 3ol7_1_H
- 3ol7_1_L
- 3ol7_1_P
- 3ol8_1_D
- 3ol8_1_H
- 3ol8_1_L
- 3ol8_1_P
- 3ol9_1_D
- 3ol9_1_H
- 3ol9_1_L
- 3ol9_1_P
- 3olb_1_D
- 3olb_1_H
- 3olb_1_L
- 3olb_1_P
- 3p6y_1_Q
- 3p6y_1_T
- 3p6y_1_U
- 3p6y_1_V
- 3p6y_1_W
- 3pdm_1_R
- 3pf5_1_S
- 3pgw_1_N
- 3pgw_1_N_1-164
- 3pgw_1_R
- 3pgw_1_R_1-164
- 3qsu_1_P
- 3qsu_1_R
- 3rtj_1_D
- 3rzo_1_R
- 3s4g_1_B
- 3s4g_1_C
- 3t1h_1_W
- 3t1y_1_W
- 3u2e_1_C
- 3u2e_1_D
- 3wzi_1_C
- 486d_1_F
- 486d_1_G
- 4a3b_1_P
- 4a3c_1_P
- 4a3e_1_P
- 4a3g_1_P
- 4a3j_1_P
- 4a3m_1_P
- 4adx_1_0
- 4adx_1_0_1-2925
- 4adx_1_8
- 4adx_1_9
- 4adx_1_9_1-123
- 4afy_1_C
- 4afy_1_D
- 4am3_1_D
- 4am3_1_H
- 4am3_1_I
- 4b3r_1_W
- 4b3s_1_W
- 4b3t_1_W
- 4ba2_1_R
- 4bbl_1_Y
- 4bbl_1_Z
- 4csf_1_A
- 4csf_1_C
- 4csf_1_E
- 4csf_1_G
- 4csf_1_I
- 4csf_1_K
- 4csf_1_M
- 4csf_1_O
- 4csf_1_Q
- 4csf_1_S
- 4csf_1_U
- 4csf_1_W
- 4cxg_1_A
- 4cxg_1_B
- 4cxg_1_C
- 4cxh_1_A
- 4cxh_1_B
- 4cxh_1_C
- 4cxh_1_X
- 4d61_1_J
- 4dr4_1_V
- 4dr5_1_V
- 4dr6_1_B
- 4dr6_1_V
- 4dr7_1_B
- 4dr7_1_V
- 4dwa_1_D
- 4e6b_1_A
- 4e6b_1_B
- 4e6b_1_E
- 4e6b_1_F
- 4ejt_1_G
- 4eya_1_A
- 4eya_1_B
- 4eya_1_C
- 4eya_1_D
- 4eya_1_E
- 4eya_1_F
- 4eya_1_G
- 4eya_1_H
- 4eya_1_I
- 4eya_1_J
- 4eya_1_K
- 4eya_1_L
- 4eya_1_M
- 4eya_1_N
- 4eya_1_O
- 4eya_1_P
- 4eya_1_Q
- 4eya_1_R
- 4eya_1_S
- 4eya_1_T
- 4g0a_1_E
- 4g0a_1_F
- 4g0a_1_G
- 4g0a_1_H
- 4g7o_1_I
- 4g7o_1_S
- 4g9z_1_E
- 4g9z_1_F
- 4gkj_1_W
- 4gkk_1_W
- 4gv3_1_B
- 4gv3_1_C
- 4gv6_1_B
- 4gv6_1_C
- 4gv9_1_E
- 4hor_1_X
- 4hos_1_X
- 4hot_1_X
- 4ht9_1_E
- 4i67_1_B
- 4ii9_1_C
- 4j7m_1_B
- 4jzu_1_C
- 4jzv_1_C
- 4k4s_1_D
- 4k4s_1_H
- 4k4t_1_D
- 4k4t_1_H
- 4k4u_1_D
- 4k4u_1_H
- 4k4x_1_D
- 4k4x_1_H
- 4k4x_1_L
- 4k4x_1_P
- 4k4z_1_D
- 4k4z_1_H
- 4k4z_1_L
- 4k4z_1_P
- 4kzx_1_I
- 4kzy_1_I
- 4kzz_1_I
- 4kzz_1_J
- 4lj0_1_C
- 4lj0_1_D
- 4lj0_1_E
- 4lq3_1_R
- 4m7d_1_P
- 4n2s_1_B
- 4n48_1_D
- 4n48_1_G
- 4nia_1_1
- 4nia_1_2
- 4nia_1_3
- 4nia_1_4
- 4nia_1_5
- 4nia_1_6
- 4nia_1_7
- 4nia_1_8
- 4nia_1_A
- 4nia_1_B
- 4nia_1_C
- 4nia_1_D
- 4nia_1_E
- 4nia_1_F
- 4nia_1_G
- 4nia_1_H
- 4nia_1_I
- 4nia_1_J
- 4nia_1_K
- 4nia_1_L
- 4nia_1_M
- 4nia_1_N
- 4nia_1_O
- 4nia_1_U
- 4nia_1_W
- 4nia_1_Z
- 4nku_1_D
- 4nku_1_H
- 4oau_1_A
- 4oav_1_A
- 4oav_1_C
- 4ohy_1_B
- 4ohz_1_B
- 4oi0_1_B
- 4oi1_1_B
- 4oq8_1_D
- 4oq9_1_1
- 4oq9_1_2
- 4oq9_1_3
- 4oq9_1_4
- 4oq9_1_5
- 4oq9_1_6
- 4oq9_1_7
- 4oq9_1_8
- 4oq9_1_A
- 4oq9_1_B
- 4oq9_1_C
- 4oq9_1_D
- 4oq9_1_E
- 4oq9_1_F
- 4oq9_1_G
- 4oq9_1_H
- 4oq9_1_I
- 4oq9_1_J
- 4oq9_1_K
- 4oq9_1_L
- 4oq9_1_M
- 4oq9_1_N
- 4oq9_1_O
- 4oq9_1_U
- 4oq9_1_W
- 4oq9_1_Z
- 4peh_1_V
- 4peh_1_W
- 4peh_1_X
- 4peh_1_Y
- 4peh_1_Z
- 4pei_1_V
- 4pei_1_W
- 4pei_1_X
- 4pei_1_Y
- 4pei_1_Z
- 4qm6_1_C
- 4qm6_1_D
- 4qu6_1_B
- 4qu7_1_U
- 4qu7_1_V
- 4qu7_1_X
- 4qvc_1_G
- 4qvd_1_H
- 4rcj_1_B
- 4s2x_1_B
- 4s2y_1_B
- 4tu0_1_F
- 4tu0_1_G
- 4udv_1_R
- 4v42_1_AA
- 4v42_1_AA_2-1520
- 4v42_1_BA
- 4v42_1_BA_1-2914
- 4v42_1_BB
- 4v42_1_BB_5-121
- 4v47_1_A0
- 4v47_1_A0_1-2904
- 4v47_1_A9
- 4v47_1_A9_3-118
- 4v47_1_BA
- 4v47_1_BA_1-1542
- 4v48_1_A0
- 4v48_1_A0_1-2904
- 4v48_1_A6
- 4v48_1_A6_1-73
- 4v48_1_A9
- 4v48_1_A9_3-118
- 4v48_1_BA
- 4v48_1_BA_1-1543
- 4v4f_1_A0
- 4v4f_1_A1
- 4v4f_1_A2
- 4v4f_1_A3
- 4v4f_1_A4
- 4v4f_1_A5
- 4v4f_1_A6
- 4v4f_1_A7
- 4v4f_1_A8
- 4v4f_1_A9
- 4v4f_1_AZ
- 4v4f_1_B0
- 4v4f_1_B1
- 4v4f_1_B2
- 4v4f_1_B3
- 4v4f_1_B4
- 4v4f_1_B5
- 4v4f_1_B6
- 4v4f_1_B7
- 4v4f_1_B8
- 4v4f_1_B9
- 4v4f_1_BZ
- 4v4i_1_W
- 4v4i_1_X
- 4v4i_1_Y
- 4v4i_1_Z
- 4v4j_1_W
- 4v4j_1_X
- 4v4j_1_Y
- 4v4j_1_Z
- 4v5z_1_AA
- 4v5z_1_AA_1-1563
- 4v5z_1_AB
- 4v5z_1_AC
- 4v5z_1_AD
- 4v5z_1_AE
- 4v5z_1_AF
- 4v5z_1_AG
- 4v5z_1_AH
- 4v5z_1_B0
- 4v5z_1_B0_1-2902
- 4v5z_1_B1
- 4v5z_1_B1_2-125
- 4v5z_1_BA
- 4v5z_1_BB
- 4v5z_1_BC
- 4v5z_1_BD
- 4v5z_1_BE
- 4v5z_1_BF
- 4v5z_1_BG
- 4v5z_1_BH
- 4v5z_1_BI
- 4v5z_1_BJ
- 4v5z_1_BK
- 4v5z_1_BL
- 4v5z_1_BM
- 4v5z_1_BN
- 4v5z_1_BO
- 4v5z_1_BP
- 4v5z_1_BQ
- 4v5z_1_BR
- 4v5z_1_BS
- 4v5z_1_BT
- 4v5z_1_BU
- 4v5z_1_BV
- 4v5z_1_BW
- 4v5z_1_BX
- 4v5z_1_BY
- 4v5z_1_BY_2-113
- 4v5z_1_BZ
- 4v5z_1_BZ_1-70
- 4v68_1_A0
- 4v7e_1_AA
- 4v7e_1_AB
- 4v7e_1_AC
- 4v7e_1_AD
- 4v7e_1_AE
- 4v7j_1_AV
- 4v7j_1_AW
- 4v7j_1_BV
- 4v7j_1_BW
- 4v7k_1_AV
- 4v7k_1_AW
- 4v7k_1_BV
- 4v7k_1_BW
- 4v8t_1_1
- 4v8z_1_CX
- 4v99_1_AC
- 4v99_1_AH
- 4v99_1_AM
- 4v99_1_AR
- 4v99_1_AW
- 4v99_1_BC
- 4v99_1_BH
- 4v99_1_BM
- 4v99_1_BR
- 4v99_1_BW
- 4v99_1_CC
- 4v99_1_CH
- 4v99_1_CM
- 4v99_1_CR
- 4v99_1_CW
- 4v99_1_DC
- 4v99_1_DH
- 4v99_1_DM
- 4v99_1_DR
- 4v99_1_DW
- 4v99_1_EC
- 4v99_1_EH
- 4v99_1_EM
- 4v99_1_ER
- 4v99_1_EW
- 4v99_1_FC
- 4v99_1_FH
- 4v99_1_FM
- 4v99_1_FR
- 4v99_1_FW
- 4v99_1_GC
- 4v99_1_GH
- 4v99_1_GM
- 4v99_1_GR
- 4v99_1_GW
- 4v99_1_HC
- 4v99_1_HH
- 4v99_1_HM
- 4v99_1_HR
- 4v99_1_HW
- 4v99_1_IC
- 4v99_1_IH
- 4v99_1_IM
- 4v99_1_IR
- 4v99_1_IW
- 4v99_1_JC
- 4v99_1_JH
- 4v99_1_JM
- 4v99_1_JR
- 4v99_1_JW
- 4v9e_1_AA
- 4v9e_1_AG
- 4v9e_1_AM
- 4v9e_1_BA
- 4v9e_1_BG
- 4v9e_1_BM
- 4w2e_1_W
- 4w2e_1_X
- 4w2h_1_CY_1-73
- 4wkr_1_C
- 4wt8_1_AB
- 4wt8_1_BB
- 4wt8_1_CS
- 4wt8_1_DS
- 4wti_1_P
- 4wti_1_T
- 4wtj_1_P
- 4wtj_1_T
- 4wtk_1_P
- 4wtk_1_T
- 4wtl_1_P
- 4wtl_1_T
- 4wtm_1_P
- 4wtm_1_T
- 4x4u_1_H
- 4x62_1_B
- 4x64_1_B
- 4x65_1_B
- 4x66_1_B
- 4x9e_1_G
- 4x9e_1_H
- 4xbf_1_D
- 4xln_1_Q
- 4xln_1_T
- 4xlr_1_Q
- 4xlr_1_T
- 4y4p_1_1W
- 4y4p_1_1X
- 4y4p_1_1Y
- 4y4p_1_2W
- 4y4p_1_2X
- 4y4p_1_2Y
- 4yln_1_3
- 4yln_1_6
- 4yln_1_9
- 4ylo_1_3
- 4ylo_1_6
- 4ylo_1_9
- 4yoe_1_E
- 4z3s_1_1W
- 4z3s_1_1X
- 4z3s_1_1Y
- 4z3s_1_2W
- 4z3s_1_2X
- 4z3s_1_2Y
- 4z8c_1_1X
- 4z8c_1_2X
- 4zer_1_1X
- 4zer_1_2X
- 5a0v_1_F
- 5a79_1_R
- 5a7a_1_R
- 5afi_1_V
- 5afi_1_W
- 5afi_1_Y
- 5aj0_1_BV
- 5aj0_1_BW
- 5bud_1_D
- 5bud_1_E
- 5c0y_1_C
- 5ceu_1_C
- 5ceu_1_D
- 5det_1_P
- 5doy_1_1W
- 5doy_1_1X
- 5doy_1_1Y
- 5doy_1_2W
- 5doy_1_2X
- 5doy_1_2Y
- 5dto_1_B
- 5e02_1_C
- 5elk_1_R
- 5els_1_I
- 5elt_1_E
- 5elt_1_F
- 5f6c_1_C
- 5f6c_1_E
- 5f8k_1_1X
- 5f8k_1_2X
- 5fl8_1_X
- 5fl8_1_Y
- 5fl8_1_Z
- 5flx_1_Z
- 5g2x_1_A_595-692
- 5gmf_1_E
- 5gmf_1_F
- 5gmf_1_G
- 5gmf_1_H
- 5gmg_1_C
- 5gmg_1_D
- 5gxi_1_B
- 5h5u_1_H
- 5hau_1_1W
- 5hau_1_2W
- 5hcp_1_1X
- 5hcp_1_2X
- 5hcq_1_1X
- 5hcq_1_2X
- 5hcr_1_1X
- 5hcr_1_2X
- 5hd1_1_1X
- 5hd1_1_2X
- 5hjz_1_C
- 5hk0_1_F
- 5hkc_1_C
- 5i2d_1_K
- 5i2d_1_V
- 5ipl_1_3
- 5ipm_1_3
- 5ipn_1_3
- 5it9_1_I
- 5j4b_1_1W
- 5j4b_1_1X
- 5j4b_1_1Y
- 5j4b_1_2W
- 5j4b_1_2X
- 5j4b_1_2Y
- 5j4c_1_1W
- 5j4c_1_1X
- 5j4c_1_1Y
- 5j4c_1_2W
- 5j4c_1_2X
- 5j4c_1_2Y
- 5j8b_1_W
- 5j8b_1_X
- 5j8b_1_Y
- 5jcs_1_X
- 5jcs_1_Y
- 5jcs_1_Z
- 5jju_1_C
- 5k77_1_V
- 5k77_1_W
- 5k77_1_X
- 5k77_1_Y
- 5k77_1_Z
- 5k78_1_X
- 5k78_1_Y
- 5k8h_1_A
- 5kal_1_Y
- 5kal_1_Z
- 5kcr_1_1X
- 5kcs_1_1X
- 5l3p_1_X
- 5l3p_1_Y
- 5lza_1_V
- 5lzb_1_V
- 5lzb_1_W
- 5lzb_1_X
- 5lzb_1_Y
- 5lzc_1_V
- 5lzc_1_W
- 5lzc_1_X
- 5lzc_1_Y
- 5lzd_1_V
- 5lzd_1_W
- 5lzd_1_X
- 5lzd_1_Y
- 5lze_1_V
- 5lze_1_W
- 5lze_1_X
- 5lze_1_Y
- 5lzf_1_V
- 5lzf_1_Y
- 5lzs_1_II
- 5lzy_1_HH
- 5mc6_1_M
- 5mc6_1_N
- 5mfx_1_B
- 5mgp_1_X
- 5mmi_1_Z
- 5mmj_1_A
- 5mmm_1_Z
- 5mq0_1_3
- 5mrc_1_AA
- 5mrc_1_BB
- 5mre_1_AA
- 5mre_1_BB
- 5mrf_1_AA
- 5mrf_1_BB
- 5new_1_C
- 5o1y_1_B
- 5o2r_1_X
- 5o3j_1_B
- 5odv_1_A
- 5odv_1_B
- 5odv_1_C
- 5odv_1_D
- 5odv_1_E
- 5odv_1_F
- 5odv_1_G
- 5odv_1_H
- 5odv_1_I
- 5odv_1_J
- 5odv_1_K
- 5odv_1_L
- 5odv_1_M
- 5odv_1_N
- 5odv_1_O
- 5odv_1_P
- 5odv_1_Q
- 5odv_1_R
- 5odv_1_S
- 5odv_1_T
- 5odv_1_U
- 5odv_1_V
- 5odv_1_W
- 5odv_1_X
- 5sze_1_C
- 5t2c_1_AN
- 5tbw_1_SR
- 5u4i_1_X
- 5u4i_1_Y
- 5u4j_1_X
- 5u4j_1_Z
- 5udi_1_B
- 5udj_1_B
- 5udk_1_B
- 5udl_1_B
- 5uef_1_C
- 5uef_1_D
- 5uh5_1_I
- 5uh6_1_I
- 5uh8_1_I
- 5uh9_1_I
- 5uhc_1_I
- 5uk4_1_U
- 5uk4_1_V
- 5uk4_1_W
- 5uk4_1_X
- 5uq7_1_X
- 5uq7_1_Y
- 5uq7_1_Z
- 5uq8_1_X
- 5uq8_1_Y
- 5uq8_1_Z
- 5vi5_1_Q
- 5vyc_1_I1
- 5vyc_1_I2
- 5vyc_1_I3
- 5vyc_1_I4
- 5vyc_1_I5
- 5vyc_1_I6
- 5w0m_1_H
- 5w0m_1_I
- 5w0m_1_J
- 5w4k_1_1W
- 5w4k_1_1X
- 5w4k_1_1Y
- 5w4k_1_2W
- 5w4k_1_2X
- 5w4k_1_2Y
- 5w5h_1_B
- 5w5h_1_D
- 5w5i_1_B
- 5w5i_1_D
- 5wdt_1_V
- 5wdt_1_W
- 5wdt_1_Y
- 5we4_1_V
- 5we4_1_W
- 5we4_1_Y
- 5we6_1_V
- 5we6_1_W
- 5we6_1_Y
- 5wf0_1_V
- 5wf0_1_W
- 5wf0_1_Y
- 5wfk_1_V
- 5wfk_1_W
- 5wfk_1_Y
- 5wfs_1_V
- 5wfs_1_W
- 5wfs_1_Y
- 5wis_1_1W
- 5wis_1_1X
- 5wis_1_1Y
- 5wis_1_2W
- 5wis_1_2X
- 5wis_1_2Y
- 5wit_1_1W
- 5wit_1_1X
- 5wit_1_1Y
- 5wit_1_2W
- 5wit_1_2X
- 5wit_1_2Y
- 5wnp_1_B
- 5wnt_1_B
- 5wnu_1_B
- 5wnv_1_B
- 5x21_1_I
- 5x22_1_I
- 5x22_1_S
- 5x70_1_E
- 5x70_1_G
- 5x8r_1_A
- 5y88_1_X
- 5yts_1_B
- 5ytv_1_B
- 5ytx_1_B
- 5z4a_1_B
- 5z4d_1_B
- 5z4j_1_B
- 5zeb_1_V
- 5zep_1_W
- 5zeu_1_A
- 5zeu_1_V
- 5zsa_1_C
- 5zsa_1_D
- 5zsb_1_C
- 5zsb_1_D
- 5zsc_1_C
- 5zsc_1_D
- 5zsd_1_C
- 5zsd_1_D
- 5zsl_1_D
- 5zsl_1_E
- 5zsn_1_D
- 5zsn_1_E
- 5zuu_1_G
- 5zuu_1_I
- 6a4e_1_B
- 6a4e_1_D
- 6a6l_1_D
- 6b6h_1_3
- 6bk8_1_I
- 6c4i_1_X
- 6c4i_1_Y
- 6cae_1_1W
- 6cae_1_1X
- 6cae_1_1Y
- 6cae_1_2W
- 6cae_1_2X
- 6cae_1_2Y
- 6cfj_1_1W
- 6cfj_1_1X
- 6cfj_1_1Y
- 6cfj_1_2W
- 6cfj_1_2X
- 6cfj_1_2Y
- 6d1v_1_C
- 6d2z_1_C
- 6d30_1_C
- 6dmn_1_B
- 6dmv_1_B
- 6do8_1_B
- 6do9_1_B
- 6doa_1_B
- 6dob_1_B
- 6doc_1_B
- 6dod_1_B
- 6doe_1_B
- 6dof_1_B
- 6dog_1_B
- 6doh_1_B
- 6doi_1_B
- 6doj_1_B
- 6dok_1_B
- 6dol_1_B
- 6dom_1_B
- 6don_1_B
- 6doo_1_B
- 6dop_1_B
- 6doq_1_B
- 6dor_1_B
- 6dos_1_B
- 6dot_1_B
- 6dou_1_B
- 6dov_1_B
- 6dow_1_B
- 6dox_1_B
- 6doz_1_B
- 6dp0_1_B
- 6dp1_1_B
- 6dp2_1_B
- 6dp3_1_B
- 6dp4_1_B
- 6dp5_1_B
- 6dp6_1_B
- 6dp7_1_B
- 6dp8_1_B
- 6dp9_1_B
- 6dpa_1_B
- 6dpb_1_B
- 6dpc_1_B
- 6dpd_1_B
- 6dpe_1_B
- 6dpf_1_B
- 6dpg_1_B
- 6dph_1_B
- 6dpi_1_B
- 6dpj_1_B
- 6dpk_1_B
- 6dpl_1_B
- 6dpm_1_B
- 6dpn_1_B
- 6dpo_1_B
- 6dpp_1_B
- 6dti_1_W
- 6dzi_1_H
- 6e0o_1_B
- 6e0o_1_C
- 6e4p_1_J
- 6e4p_1_K
- 6een_1_G
- 6een_1_H
- 6een_1_I
- 6enf_1_X
- 6enj_1_X
- 6enu_1_X
- 6eri_1_AX
- 6evj_1_M
- 6evj_1_N
- 6fqr_1_C
- 6ftg_1_U
- 6ftg_1_V
- 6ftg_1_W
- 6fti_1_Q
- 6fti_1_U
- 6fti_1_V
- 6fti_1_W
- 6ftj_1_U
- 6ftj_1_V
- 6ftj_1_W
- 6gc5_1_F
- 6gc5_1_G
- 6gc5_1_H
- 6gfw_1_R
- 6gwt_1_X
- 6gx6_1_B
- 6gxm_1_X
- 6gxn_1_X
- 6gxo_1_X
- 6gz3_1_BV
- 6gz3_1_BW
- 6gz4_1_BV
- 6gz4_1_BW
- 6gz5_1_BV
- 6gz5_1_BW
- 6h4n_1_W
- 6h58_1_W
- 6h58_1_WW
- 6ha1_1_X
- 6ha8_1_X
- 6hcj_1_Q3
- 6hcq_1_Q3
- 6hhq_1_SR
- 6htq_1_U
- 6htq_1_V
- 6htq_1_W
- 6hxx_1_AA
- 6hxx_1_AB
- 6hxx_1_AC
- 6hxx_1_AD
- 6hxx_1_AE
- 6hxx_1_AF
- 6hxx_1_AG
- 6hxx_1_AH
- 6hxx_1_AI
- 6hxx_1_AJ
- 6hxx_1_AK
- 6hxx_1_AL
- 6hxx_1_AM
- 6hxx_1_AN
- 6hxx_1_AO
- 6hxx_1_AP
- 6hxx_1_AQ
- 6hxx_1_AR
- 6hxx_1_AS
- 6hxx_1_AT
- 6hxx_1_AU
- 6hxx_1_AV
- 6hxx_1_AW
- 6hxx_1_AX
- 6hxx_1_AY
- 6hxx_1_AZ
- 6hxx_1_BA
- 6hxx_1_BB
- 6hxx_1_BC
- 6hxx_1_BD
- 6hxx_1_BE
- 6hxx_1_BF
- 6hxx_1_BG
- 6hxx_1_BH
- 6hxx_1_BI
- 6hyu_1_D
- 6i0t_1_B
- 6i0u_1_B
- 6i0v_1_B
- 6i2n_1_U
- 6i7o_1_2B
- 6i7o_1_L
- 6i7o_1_M
- 6i7o_1_MB
- 6i7o_1_N
- 6i7o_1_NB
- 6ij2_1_E
- 6ij2_1_F
- 6ij2_1_G
- 6ij2_1_H
- 6ip5_1_2M
- 6ip5_1_ZU
- 6ip5_1_ZY
- 6ip6_1_2M
- 6ip6_1_ZY
- 6ip6_1_ZZ
- 6ip8_1_2M
- 6ip8_1_ZY
- 6ip8_1_ZZ
- 6is0_1_C
- 6j7z_1_C
- 6k32_1_P
- 6k32_1_T
- 6kqd_1_I
- 6kqd_1_S
- 6kqe_1_I
- 6kql_1_I
- 6kr6_1_B
- 6ktc_1_V
- 6kug_1_B
- 6l74_1_I
- 6lkq_1_S
- 6lkq_1_T
- 6lkq_1_U
- 6lkq_1_W
- 6m6v_1_E
- 6m6v_1_F
- 6m6v_1_G
- 6m7k_1_B
- 6mkn_1_W
- 6mpf_1_W
- 6mpi_1_W
- 6n6a_1_D
- 6n6c_1_D
- 6n6d_1_D
- 6n6e_1_D
- 6n6f_1_D
- 6n6g_1_D
- 6n6h_1_D
- 6n6i_1_C
- 6n6i_1_D
- 6n6j_1_C
- 6n6j_1_D
- 6n6k_1_C
- 6n6k_1_D
- 6n9e_1_1X
- 6n9e_1_2W
- 6n9e_1_2X
- 6n9f_1_1X
- 6n9f_1_2X
- 6nd5_1_1W
- 6nd5_1_1X
- 6nd5_1_1Y
- 6nd5_1_2W
- 6nd5_1_2X
- 6nd5_1_2Y
- 6nd6_1_1W
- 6nd6_1_1X
- 6nd6_1_1Y
- 6nd6_1_2W
- 6nd6_1_2X
- 6nd6_1_2Y
- 6nu2_1_U
- 6nu3_1_U
- 6o6v_1_C
- 6o6v_1_D
- 6o6x_1_C
- 6o6x_1_D
- 6o75_1_C
- 6o75_1_D
- 6o78_1_E
- 6o79_1_C
- 6o7b_1_C
- 6o7b_1_D
- 6o7h_1_K
- 6o7i_1_I
- 6o7k_1_G
- 6o7k_1_V
- 6o8w_1_U
- 6o97_1_1W
- 6o97_1_1X
- 6o97_1_1Y
- 6o97_1_2W
- 6o97_1_2X
- 6o97_1_2Y
- 6o9j_1_V
- 6o9k_1_Y
- 6of1_1_1W
- 6of1_1_1X
- 6of1_1_1Y
- 6of1_1_2W
- 6of1_1_2X
- 6of1_1_2Y
- 6ogy_1_M
- 6ogy_1_N
- 6okk_1_G
- 6ole_1_T
- 6ole_1_U
- 6ole_1_V
- 6olf_1_T
- 6olf_1_U
- 6olf_1_V
- 6olg_1_BV
- 6oli_1_T
- 6oli_1_U
- 6oli_1_V
- 6olz_1_BV
- 6om0_1_T
- 6om0_1_U
- 6om0_1_V
- 6om7_1_T
- 6om7_1_U
- 6om7_1_V
- 6ov0_1_E
- 6ov0_1_F
- 6ov0_1_G
- 6ov0_1_H
- 6ovy_1_I
- 6ow3_1_I
- 6owl_1_B
- 6owl_1_C
- 6oy5_1_I
- 6oy6_1_I
- 6p71_1_I
- 6p7p_1_D
- 6p7p_1_E
- 6p7p_1_F
- 6p7q_1_D
- 6p7q_1_E
- 6p7q_1_F
- 6pb4_1_3
- 6pmi_1_3
- 6pmj_1_3
- 6ppn_1_A
- 6ppn_1_I
- 6q1h_1_D
- 6q1h_1_H
- 6q8y_1_M
- 6q8y_1_N
- 6qcs_1_M
- 6qdw_1_A
- 6qdw_1_B
- 6qdw_1_V
- 6qik_1_X
- 6qik_1_Y
- 6qt0_1_X
- 6qt0_1_Y
- 6qtz_1_X
- 6qtz_1_Y
- 6qx3_1_G
- 6r7b_1_D
- 6r7b_1_E
- 6r9m_1_B
- 6r9o_1_B
- 6r9p_1_B
- 6r9q_1_B
- 6r9r_1_D
- 6r9r_1_E
- 6raz_1_Y
- 6rcl_1_C
- 6ri5_1_X
- 6ri5_1_Y
- 6rt4_1_C
- 6rt4_1_D
- 6rt5_1_A
- 6rt5_1_E
- 6rt6_1_A
- 6rt6_1_E
- 6rt7_1_A
- 6rt7_1_E
- 6rzz_1_X
- 6rzz_1_Y
- 6s05_1_X
- 6s05_1_Y
- 6s0m_1_C
- 6sag_1_R
- 6sce_1_B
- 6scf_1_I
- 6scf_1_K
- 6scf_1_L
- 6scf_1_M
- 6skf_1_AA
- 6skg_1_AA
- 6spc_1_A
- 6spe_1_A
- 6sty_1_C
- 6sty_1_F
- 6sv4_1_2B
- 6sv4_1_2C
- 6sv4_1_MB
- 6sv4_1_MC
- 6sv4_1_N
- 6sv4_1_NB
- 6sv4_1_NC
- 6swa_1_Q
- 6swa_1_R
- 6swa_1_S
- 6szs_1_X
- 6t34_1_A
- 6t34_1_B
- 6t34_1_C
- 6t34_1_D
- 6t34_1_E
- 6t34_1_F
- 6t34_1_G
- 6t34_1_H
- 6t34_1_I
- 6t34_1_J
- 6t34_1_K
- 6t34_1_L
- 6t34_1_M
- 6t34_1_N
- 6t34_1_O
- 6t34_1_P
- 6t34_1_Q
- 6t34_1_R
- 6t34_1_S
- 6t83_1_1B
- 6t83_1_2B
- 6t83_1_3B
- 6t83_1_4B
- 6t83_1_6B
- 6t83_1_A
- 6t83_1_AA
- 6t83_1_BB
- 6t83_1_CA
- 6tb3_1_N
- 6th6_1_AA
- 6tnu_1_M
- 6tnu_1_N
- 6ty9_1_M
- 6tz1_1_N
- 6u6y_1_E
- 6u6y_1_F
- 6u6y_1_G
- 6u6y_1_H
- 6u9x_1_H
- 6u9x_1_K
- 6ucq_1_1X
- 6ucq_1_1Y
- 6ucq_1_2X
- 6ucq_1_2Y
- 6uej_1_B
- 6uo1_1_1W
- 6uo1_1_1X
- 6uo1_1_1Y
- 6uo1_1_2W
- 6uo1_1_2X
- 6uo1_1_2Y
- 6utw_1_333
- 6uu0_1_333
- 6uu1_1_333
- 6uu2_1_333
- 6uu3_1_333
- 6uu4_1_333
- 6uu6_1_333
- 6uuc_1_333
- 6uz7_1_8_2140-2827
- 6v39_1_SN1
- 6v39_1_V
- 6v3a_1_SN1
- 6v3a_1_V
- 6v3b_1_SN1
- 6v3e_1_SN1
- 6vm6_1_G
- 6vm6_1_H
- 6vm6_1_I
- 6vm6_1_J
- 6vm6_1_K
- 6vyt_1_Y
- 6vyu_1_Y
- 6vyw_1_Y
- 6vyx_1_Y
- 6vyy_1_Y
- 6vyz_1_Y
- 6vz2_1_Y
- 6vz3_1_Y
- 6vz5_1_Y
- 6vz7_1_Y
- 6w6l_1_T
- 6w6l_1_U
- 6w6l_1_V
- 6wan_1_G
- 6wan_1_H
- 6wan_1_I
- 6wan_1_J
- 6wan_1_K
- 6wan_1_L
- 6wox_1_I
- 6woy_1_I
- 6wre_1_D
- 6x1b_1_D
- 6x1b_1_F
- 6xqd_1_1X
- 6xqd_1_2X
- 6xqe_1_1X
- 6xqe_1_2X
- 6xz7_1_F
- 6xz7_1_G
- 6y69_1_W
- 6ybv_1_K
- 6ybv_1_W
- 6ys3_1_A
- 6ys3_1_B
- 6ys3_1_V
- 6ysr_1_W
- 6yss_1_W
- 6yst_1_W
- 6ysu_1_W
- 6yud_1_K
- 6yud_1_M
- 6yud_1_O
- 6yud_1_P
- 6yud_1_Q
- 6ywo_1_E
- 6ywo_1_F
- 6ywo_1_I
- 6ywo_1_K
- 6z1p_1_AA
- 6z1p_1_AB
- 6z1p_1_BA
- 6z1p_1_BB
- 6z8k_1_X
- 6zmw_1_W
- 6zvh_1_X
- 6zvi_1_D
- 6zvi_1_E
- 6zvi_1_H
- 7jql_1_1X
- 7jql_1_2X
- 7jqm_1_1X
- 7jqm_1_2X
- 7jyy_1_E
- 7jyy_1_F
- 7jz0_1_E
- 7jz0_1_F
- 7k00_1_5
- 7k00_1_B
- 1qzb_1_B_1-73
- 1qza_1_B_1-73
- 5zzm_1_M_3-118
- 5zzm_1_N_1-2904
- 3dg2_1_B_1-2904
- 3dg0_1_B_1-2904
- 3dg4_1_B_1-2904
- 3dg5_1_B_1-2904
- 3dg2_1_A_1-1542
- 3dg0_1_A_1-1542
- 3dg4_1_A_1-1542
- 3dg5_1_A_1-1542
--- a/known_issues_reasons.txt deleted 100644 → 0
View file @4de494b
+++ b/known_issues_reasons.txt deleted 100644 → 0
View file @4de494b