Louis BECQUEY

New documentation

...@@ -9,12 +9,18 @@ esl* ...@@ -9,12 +9,18 @@ esl*
9 .vscode/ 9 .vscode/
10 __pycache__/ 10 __pycache__/
11 .git/ 11 .git/
12 +.gitignore
13 +.dockerignore
12 errors.txt 14 errors.txt
13 known_issues.txt 15 known_issues.txt
14 known_issues_reasons.txt 16 known_issues_reasons.txt
15 Dockerfile 17 Dockerfile
16 LICENSE 18 LICENSE
17 -README.md 19 +CHANGELOG
20 +*.md
18 scripts/automate.sh 21 scripts/automate.sh
19 scripts/kill_rnanet.sh 22 scripts/kill_rnanet.sh
20 scripts/build_docker_image.sh 23 scripts/build_docker_image.sh
24 +scripts/*.tar
25 +scripts/measure.py
26 +scripts/recompute_some_chains.py
......
...@@ -27,9 +27,3 @@ BUG CORRECTIONS ...@@ -27,9 +27,3 @@ BUG CORRECTIONS
27 - Modified nucleotides were not always correctly transformed to N in the alignments (and nucleotide.nt_align_code fields). 27 - Modified nucleotides were not always correctly transformed to N in the alignments (and nucleotide.nt_align_code fields).
28 Now, the alignments and nt_align_code (and consensus) only contain "ACGUN-" chars. 28 Now, the alignments and nt_align_code (and consensus) only contain "ACGUN-" chars.
29 Now, 'N' means 'other', while '-' means 'nothing' or 'unknown'. 29 Now, 'N' means 'other', while '-' means 'nothing' or 'unknown'.
30 -
31 -COMING SOON
32 - - Automated annotation of detected Recurrent Interaction Networks (RINs), see http://carnaval.lri.fr/ .
33 - - Possibly, automated detection of HLs and ILs from the 3D Motif Atlas (BGSU). Maybe. Their own website already does the job.
34 - - A field estimating the quality of the sequence alignment in table family.
35 - - Possibly, more metrics about the alignments coming from Infernal.
...\ No newline at end of file ...\ No newline at end of file
......
1 +
2 +# More about the database structure
3 +To help you design your own SQL requests, we provide a description of the database tables and fields.
4 +
5 +## Table `family`, for Rfam families and their properties
6 +* `rfam_acc`: The family codename, from Rfam's numbering (Rfam accession number)
7 +* `description`: What RNAs fit in this family
8 +* `nb_homologs`: The number of hits known to be homologous downloaded from Rfam to compute nucleotide frequencies
9 +* `nb_3d_chains`: The number of 3D RNA chains mapped to the family (from Rfam-PDB mappings, or inferred using the redundancy list)
10 +* `nb_total_homol`: Sum of the two previous fields, the number of sequences in the multiple sequence alignment, used to compute nucleotide frequencies
11 +* `max_len`: The longest RNA sequence among the homologs (in bases, unaligned)
12 +* `ali_len`: The aligned sequences length (in bases, aligned)
13 +* `ali_filtered_len`: The aligned sequences length when we filter the alignment to keep only the RNANet chains (which have a 3D structure) and some gap-only columns.
14 +* `comput_time`: Time required to compute the family's multiple sequence alignment in seconds,
15 +* `comput_peak_mem`: RAM (or swap) required to compute the family's multiple sequence alignment in megabytes,
16 +* `idty_percent`: Average identity percentage over pairs of the 3D chains' sequences from the family
17 +
18 +## Table `structure`, for 3D structures of the PDB
19 +* `pdb_id`: The 4-char PDB identifier
20 +* `pdb_model`: The model used in the PDB file
21 +* `date`: The first submission date of the 3D structure to a public database
22 +* `exp_method`: A string to know wether the structure as been obtained by X-ray crystallography ('X-RAY DIFFRACTION'), electron microscopy ('ELECTRON MICROSCOPY'), or NMR (not seen yet)
23 +* `resolution`: Resolution of the structure, in Angströms
24 +
25 +## Table `chain`, for the datapoints: one chain mapped to one Rfam family
26 +* `chain_id`: A unique identifier
27 +* `structure_id`: The `pdb_id` where the chain comes from
28 +* `chain_name`: The chain label, extracted from the 3D file
29 +* `eq_class`: The BGSU equivalence class label containing this chain
30 +* `rfam_acc`: The family which the chain is mapped to (if not mapped, value is *unmappd*)
31 +* `pdb_start`: Position in the chain where the mapping to Rfam begins (absolute position, not residue number)
32 +* `pdb_end`: Position in the chain where the mapping to Rfam ends (absolute position, not residue number)
33 +* `reversed`: Wether the mapping numbering order differs from the residue numbering order in the mmCIF file (eg 4c9d, chains C and D)
34 +* `issue`: Wether an issue occurred with this structure while downloading, extracting, annotating or parsing the annotation. See the file known_issues_reasons.txt for more information about why your chain is marked as an issue.
35 +* `inferred`: Wether the mapping has been inferred using the redundancy list (value is 1) or just known from Rfam-PDB mappings (value is 0)
36 +* `chain_freq_A`, `chain_freq_C`, `chain_freq_G`, `chain_freq_U`, `chain_freq_other`: Nucleotide frequencies in the chain
37 +* `pair_count_cWW`, `pair_count_cWH`, ... `pair_count_tSS`: Counts of the non-canonical base-pair types in the chain (intra-chain counts only)
38 +
39 +## Table `nucleotide`, for individual nucleotide descriptors
40 +* `nt_id`: A unique identifier
41 +* `chain_id`: The chain the nucleotide belongs to
42 +* `index_chain`: its absolute position within the portion of chain mapped to Rfam, from 1 to X. This is completely uncorrelated to any gene start or 3D chain residue numbers.
43 +* `nt_position`: relative position within the portion of chain mapped to RFam, from 0 to 1
44 +* `old_nt_resnum`: The residue number in the 3D mmCIF file (it's a string actually, some contain a letter like '37A')
45 +* `nt_name`: The residue type. This includes modified nucleotide names (e.g. 5MC for 5-methylcytosine)
46 +* `nt_code`: One-letter name. Lowercase "acgu" letters are used for modified "ACGU" bases.
47 +* `nt_align_code`: One-letter name used for sequence alignment. Contains "ACGUN-" only first, and then, gaps may be replaced by the most common letter at this position (default)
48 +* `is_A`, `is_C`, `is_G`, `is_U`, `is_other`: One-hot encoding of the nucleotide base
49 +* `dbn`: character used at this position if we look at the dot-bracket encoding of the secondary structure. Includes inter-chain (RNA complexes) contacts.
50 +* `paired`: empty, or comma separated list of `index_chain` values referring to nucleotides the base is interacting with. Up to 3 values. Inter-chain interactions are marked paired to '0'.
51 +* `nb_interact`: number of interactions with other nucleotides. Up to 3 values. Includes inter-chain interactions.
52 +* `pair_type_LW`: The Leontis-Westhof nomenclature codes of the interactions. The first letter concerns cis/trans orientation, the second this base's side interacting, and the third the other base's side.
53 +* `pair_type_DSSR`: Same but using the DSSR nomenclature (Hoogsteen edge approximately corresponds to Major-groove and Sugar edge to minor-groove)
54 +* `alpha`, `beta`, `gamma`, `delta`, `epsilon`, `zeta`: The 6 torsion angles of the RNA backabone for this nucleotide
55 +* `epsilon_zeta`: Difference between epsilon and zeta angles
56 +* `bb_type`: conformation of the backbone (BI, BII or ..)
57 +* `chi`: torsion angle between the sugar and base (O-C1'-N-C4)
58 +* `glyco_bond`: syn or anti configuration of the sugar-base bond
59 +* `v0`, `v1`, `v2`, `v3`, `v4`: 5 torsion angles of the ribose cycle
60 +* `form`: if the nucleotide is involved in a stem, the stem type (A, B or Z)
61 +* `ssZp`: Z-coordinate of the 3’ phosphorus atom with reference to the5’ base plane
62 +* `Dp`: Perpendicular distance of the 3’ P atom to the glycosidic bond
63 +* `eta`, `theta`: Pseudotorsions of the backbone, using phosphorus and carbon 4'
64 +* `eta_prime`, `theta_prime`: Pseudotorsions of the backbone, using phosphorus and carbon 1'
65 +* `eta_base`, `theta_base`: Pseudotorsions of the backbone, using phosphorus and the base center
66 +* `phase_angle`: Conformation of the ribose cycle
67 +* `amplitude`: Amplitude of the sugar puckering
68 +* `puckering`: Conformation of the ribose cycle (10 classes depending on the phase_angle value)
69 +
70 +## Table `align_column`, for positions in multiple sequence alignments
71 +* `column_id`: A unique identifier
72 +* `rfam_acc`: The family's MSA the column belongs to
73 +* `index_ali`: Position of the column in the alignment (starts at 1)
74 +* `freq_A`, `freq_C`, `freq_G`, `freq_U`, `freq_other`: Nucleotide frequencies in the alignment at this position
75 +* `gap_percent`: The frequencies of gaps at this position in the alignment (between 0.0 and 1.0)
76 +* `consensus`: A consensus character (ACGUN or '-') summarizing the column, if we can. If >75% of the sequences are gaps at this position, the gap is picked as consensus. Otherwise, A/C/G/U is chosen if >50% of the non-gap positions are A/C/G/U. Otherwise, N is the consensus.
77 +
78 +There always is an entry, for each family (rfam_acc), with index_ali = 0; gap_percent = 1.0; and nucleotide frequencies set to 0.0. This entry is used when the nucleotide frequencies cannot be determined because of local alignment issues.
79 +
80 +## Table `re_mapping`, to map a nucleotide to an alignment column
81 +* `remapping_id`: A unique identifier
82 +* `chain_id`: The chain which is mapped to an alignment
83 +* `index_chain`: The absolute position of the nucleotide in the chain (from 1 to X)
84 +* `index_ali` The position of that nucleotide in its family alignment
...@@ -40,6 +40,13 @@ RUN apk update && apk add --no-cache \ ...@@ -40,6 +40,13 @@ RUN apk update && apk add --no-cache \
40 musl-dev \ 40 musl-dev \
41 py3-pip py3-wheel \ 41 py3-pip py3-wheel \
42 freetype-dev zlib-dev 42 freetype-dev zlib-dev
43 +RUN addgroup -S appgroup -g 1000 && \
44 + adduser -S appuser -u 1000 -G appgroup && \
45 + chown -R appuser:appgroup /3D && \
46 + chown -R appuser:appgroup /sequences && \
47 + mkdir /runDir && \
48 + chown -R appuser:appgroup /runDir
49 +USER appuser
43 VOLUME ["/3D", "/sequences", "/runDir"] 50 VOLUME ["/3D", "/sequences", "/runDir"]
44 WORKDIR /runDir 51 WORKDIR /runDir
45 ENTRYPOINT ["/RNANet/RNAnet.py", "--3d-folder", "/3D", "--seq-folder", "/sequences" ] 52 ENTRYPOINT ["/RNANet/RNAnet.py", "--3d-folder", "/3D", "--seq-folder", "/sequences" ]
...\ No newline at end of file ...\ No newline at end of file
......
1 +
2 +# Warnings and errors in RNANet
3 +
4 +Use Ctrl + F on this page to look for your error message in the list.
5 +
6 +* **Could not load X.json with JSON package** :
7 +The JSON format produced as DSSR output could not be loaded by Python. Try deleting the file and re-running DSSR (through RNANet).
8 +
9 +* **Found DSSR warning in annotation X.json: no nucleotides found. Ignoring X.** :
10 +DSSR complains because the CIF structure does not seem to contain nucleotides. This can happen on low resolution structures where only P atoms are solved, you should ignore them. This also can happen if the .cif file is corrupted (failed download, etc). Check with a 3D visualization software if your chain contains well-defined nucleotides. Try deleting the .cif and retry. If the problem persists, just ignore the chain.
11 +
12 +* **Could not find nucleotides of chain X in annotation X.json. Ignoring chain X.** : Basically the same as above, but some nucleotides have been observed in another chain of the same structure.
13 +
14 +* **Could not find real nucleotides of chain X between START and STOP. Ignoring chain X."** : Same as the two above, but nucleotides can be found outside of the mapping interval. This can happen if there is a mapping problem, e.g., considered absolute interval when it should not.
15 +
16 +* **Error while parsing DSSR X.json output: {custom-error}** : The DSSR annotations lack some of our required fields. It is likely that DSSR changed something in their fields names. Contact us so that we fix the problem with the latest DSSR version.
17 +
18 +* **Mapping is reversed, this case is not supported (yet). Ignoring chain X.** : The mapping coordinates, as obtained from Rfam, have an end position coming before the start position (meaning, the sequence has to be reversed to map the RNA covariance model). We do not support this yet, we ignore this chain.
19 +
20 +* **Error with parsing of X duplicate residue numbers. Ignoring it.** : This 3D chain contains new kind(s) of issue(s) in the residue numberings that are not part of the issues we already know how to tackle. Contact us, so that we add support for this entry.
21 +
22 +* **Found duplicated index_chain N in X. Keeping only the first.** : This RNA 3D chain contains two (or more) residues with the same numbering N. This often happens when a nucleic-like ligand is annotated as part of the RNA chain, and DSSR considers it a nucleotide. By default, RNANet keeps only the first of the multiple residues with the same number. You may want to check that the produced 3D structure contains the appropriate nucleotide and no ligand.
23 +
24 +* **Missing index_chain N in X !** : DSSR annotations for chain X are discontinuous, position N is missing. This means residue N has not been recognized as a nucleotide by DSSR. Is the .cif structure file corrupted ? Delete it and retry.
25 +
26 +* **X sequence is too short, let's ignore it.** : We discard very short RNA chains.
27 +
28 +* **Error downloading and/or extracting Rfam.cm !** : We cannot retrieve the Rfam covariance models file. RNANet tries to find it at ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.cm.gz so, check that your network is not blocking the FTP protocol (port 21 is open on your network), and check that the adress has not changed. If so, contact us so that we update RNANet with the correct address.
29 +
30 +* **Something's wrong with the SQL database. Check mysql-rfam-public.ebi.ac.uk status and try again later. Not printing statistics.** : We cannot retrieve family statistics from Rfam public server. Check if you can connect to it by hand : `mysql -u rfamro -P 4497 -D Rfam -h mysql-rfam-public.ebi.ac.uk`. if not, check that the port 497 is opened on your network.
31 +
32 +* **Error downloading RFXXXXX.fa.gz: {custom-error}** : We cannot reach the Rfam FTP server to download homologous sequences. We look in ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/ so, check if you can access it from your network (check that port 21 is opened on your network). Check if the address has changed and notify us.
33 +
34 +* **Error downloading NR list !** : We cannot download BGSU's equivalence classes from their website. Check if you can access http://rna.bgsu.edu/rna3dhub/nrlist/download/current/20.0A/csv from a web browser. It actually happens that their website is not responding, the previous download will be re-used.
35 +
36 +* **Error downloading the LSU/SSU database from SILVA** : We cannot reach SILVA's arb files. We are looking for http://www.arb-silva.de/fileadmin/arb_web_db/release_132/ARB_files/SILVA_132_LSURef_07_12_17_opt.arb.gz and http://www.arb-silva.de/fileadmin/silva_databases/release_138/ARB_files/SILVA_138_SSURef_05_01_20_opt.arb.gz , can you download and extract them from your web browser and place them in the realigned/ subfolder ?
37 +
38 +* **Assuming mapping to RFXXXXX is an absolute position interval.** : The mapping provided by Rfam concerns a nucleotide interval START-END, but no nucleotides are defined in 3D in that interval. When this happens, we assume that the numbering is not relative to the residue numbers in the 3D file, but to the absolute position in the chain, starting at 1. And yes, we tried to apply this behavior to all mappings, this yields the opposite issue where some mappings get outside the available nucleotides. To be solved the day Rfam explains how they get build the mappings.
39 +
40 +* **Added newly discovered issues to known issues** : You discovered new chains that cannot be perfectly understood as they actually are, congrats. For each chain of the list, another warning has been raised, refer to them.
41 +
42 +* **Structures without referenced chains have been detected.** : Something went wrong, because the database contains references to 3D structures that are not used by any entry in the `chain` table. You should rerun RNANet. The option `--only` may help to rerun it just for one chain.
43 +
44 +* **Chains without referenced structures have been detected** :
45 +Something went wrong, because the database contains references to 3D chains that are not used by any entry in the `structure` table. You should rerun RNANet. The option `--only` may help to rerun it just for one chain.
46 +
47 +* **Chains were not remapped** : Something went wrong, because the database contains references to 3D chains that are not used by any entry in the `re_mapping` table, assuming you were interested in homology data. You should rerun RNANet. The option `--only` may help to rerun it just for one chain. If you are not interested in homology data, use option `--no-homology` to skip alignment and remapping steps.
48 +
49 +* **Operational Error: database is locked, retrying in 0.2s** : Too many workers are trying to access the database at the same time. Do not try to run several instances of RNANet in parallel. Even with only one instance, this might still happen if your device has slow I/O delays. Try to run RNANet from a SSD ?
50 +
51 +* **Tried to reach database 100 times and failed. Aborting.** : Same as above, but in a more serious way.
52 +
53 +* **Nothing to do !** : RNANet is up-to-date, or did not detect any modification to do, so nothing changed in your database.
54 +
55 +* **KeyboardInterrupt, terminating workers.** : You interrupted the computation by pressing Ctrl+C. The database may be in an unstable state, rerun RNANet to solve the problem.
56 +
57 +* **Found mappings to RFXXXXX in both directions on the same interval, keeping only the 5'->3' one.** : A chain has been mapped to family RFXXXXX, but the mapping has been found twice, with the limits inverted. We only keep one (in 5'->3' sense).
58 +
59 +* **There are mappings for RFXXXXX in both directions** : A chain has been mapped to family RFXXXXX several times, and the mappings are not in the same sequence sense (some are reverted, with END < START). Then, we do not know what to decide for this chain, and we abort.
60 +
61 +* **Unable to download XXXX.cif. Ignoring it.** : We cannot access a certain 3D structure from RCSB's download site, can you access it from your web browser and put it in the RNAcifs/ folder ? We look at http://files.rcsb.org/download/XXXX.cif , replacing XXXX by the right PDB code.
62 +
63 +* **Wtf, structure XXXX has no resolution ? Check https://files.rcsb.org/header/XXXX.cif to figure it out.** : We cannot find the resolution of structure XXXX from the .cif file. We are looking for it in the fields `_refine.ls_d_res_high`, `_refine.ls_d_res_low`, and `_em_3d_reconstruction.resolution`. Maybe the information is stored in another field ? If you find it, contact us so that we support this new CIF field.
64 +
65 +* **Could not find annotations for X, ignoring it.** : It seems that DSSR has not been run for structure X, or failed. Rerun RNANet.
66 +
67 +* **Nucleotides not inserted: {custom-error}** : For some reason, no nucleotides were saved to the database for this chain. Contact us.
68 +
69 +* **Removing N doublons from existing RFXXXXX++.fa and using their newest version** : You are trying to re-compute sequence alignments of 3D structures that had already been computed in the past. They will be removed from the alignment and recomputed, for the case the sequences have changed.
70 +
71 +* **Removing N doublons from existing RFXXXXX++.stk and using their newest version** : Same as above.
72 +
73 +* **Error during sequence alignment: {custom-error}** : Something went wrong during sequence alignment. Recompute the alignments using the `--update-homologous` option.
74 +
75 +* **Failed to realign RFXXXXX (killed)** : You ran out of memory while computing multiple sequence alignments. Try to run RNANet of a machine with at least 32 GB of RAM.
76 +
77 +* **RFXXXXX's alignment is wrong. Recompute it and retry.** : We could not load RFXXXXX's multiple sequence alignment. It may have failed to compute, or be corrupted. Recompute the alignments using the `--update-homologous` option.
...\ No newline at end of file ...\ No newline at end of file
1 +
2 +# FAQ
3 +
4 +* **What is the difference between . and - in alignments ?**
5 +
6 +In `cmalign` alignments, - means a nucleotide is missing compared to the covariance model. It represents a deletion. The dot '.' indicates that another chain has an insertion compared to the covariance model. The current chains does not lack anything, it's another which has more.
7 +
8 +In the final filtered alignment that we provide for download, the same rule applies, but on top of that, some '.' are replaced by '-' when a gap in the 3D structure (a missing, unresolved nucleotide) is mapped to an insertion gap.
9 +
10 +* **Why are there some gap-only columns in the alignment ?**
11 +
12 +These columns are not completely gap-only, they contain at least one dash-gap '-'. This means an actual, physical nucleotide which should exist in the 3D structure should be located there. The previous and following nucleotides are **not** contiguous in space in 3D.
13 +
14 +* **Why is the numbering of residues in my 3D chain weird ?**
15 +
16 +Probably because the numbering in the original chain already was a mess, and the RNANet re-numbering process failed to understand it correctly. If you ran RNANet yourself, check the `logs/` folder and find your chain's log. It will explain you how it was re-numbered.
17 +
18 +* **What is your standardized way to re-number residues ?**
19 +
20 +We first remove the nucleotides whose number is outside the family mapping (if any). Then, we renumber the following way:
21 +
22 + 0) For truncated chains, we shift the numbering of every nucleotide so that the first nucleotide is 1.
23 + 1) We identify duplicate residue numbers and increase by 1 the numbering of all nucleotides starting at the duplicate, recursively, and until we find a gap in the numbering suite. If no gap is found, residue numbers are shifted until the end of the chain.
24 + 2) We proceed the similar way for nucleotides with letter numbering (e.g. 17, 17A and 17B will be renumbered to 17, 18 and 19, and the following nucleotides in the chain are also shifted).
25 + 3) Nucleotides with partial numbering and a letter are hopefully detected and processed with their correct numbering (e.g. in ...1629, 1630, 163B, 1631, ... the residue 163B has nothing to do with number 163 or 164, the series will be renumbered 1629, 1630, 1631, 1632 and the following will be shifted).
26 + 4) Nucleotides numbered -1 at the begining of a chain are shifted (with the following ones) to 1.
27 + 5) Ligands at the end of the chain are removed. Is detected as ligand any residue which is not A/C/G/U and has no defined puckering or no defined torsion angles. Residues are also considered to be ligands if they are at the end of the chain with a residue number which is more than 50 than the previous residue (ligands are sometimes numbered 1000 or 9999). Finally, residues "GNG", "E2C", "OHX", "IRI", "MPD", "8UZ" at then end of a chain are removed.
28 + 6) Ligands at the begining of a chain are removed. DSSR annotates them with index_chain 1, 2, 3..., so we can detect that there is a redundancy with the real nucleotides 1, 2, 3. We keep only the first, which hopefully is the real nucleotide. We also remove the ones that have a negative number (since we renumbered the truncated chain to 1, some became negative).
29 + 7) Nucleotides with creative, disruptive numbering are attempted to be detected and renumbered, even if the numbers fell out of the family mapping interval. For example, the suite ... 1003, 2003, 3003, 1004... will be renumbered ...1003, 1004, 1005, 1006 ... and the following accordingly.
30 + 8) Nucleotides missing from portions not resolved in 3D are created as gaps, with correct numbering, to fill the portion between the previous and the following resolved ones.
31 +
32 +* **What are the versions of the dependencies you use ?**
33 +
34 +`cmalign` is v1.1.3, `sina` is v1.6.0, `x3dna-dssr` is v1.9.9, Biopython is v1.78.
35 +
...\ No newline at end of file ...\ No newline at end of file
1 +
2 +* [Required computational resources](#required-computational-resources)
3 +* [Method 1 : Using Docker](#method-1-:-installation-using-docker)
4 +* [Method 2 : Classical command-line installation](#method-2-:-classical-command-line-installation-linux-only)
5 +* [Command options](#command-options)
6 +* [Computation time](#computation-time)
7 +* [Post-computation tasks](#post-computation-tasks-estimate-quality)
8 +* [Output files](#output-files)
9 +
10 +# Required computational resources
11 +- CPU: no requirements. The program is optimized for multi-core CPUs, you might want to use Intel Xeons, AMD Ryzens, etc.
12 +- GPU: not required
13 +- RAM: 16 GB with a large swap partition is okay. 32 GB is recommended (usage peaks at ~27 GB)
14 +- Storage: to date, it takes 60 GB for the 3D data (36 GB if you don't use the --extract option), 11 GB for the sequence data, and 7GB for the outputs (5.6 GB database, 1 GB archive of CSV files). You need to add a few more for the dependencies. Pick a 100GB partition and you are good to go. The computation speed is way better if you use a fast storage device (e.g. SSD instead of hard drive, or even better, a NVMe SSD) because of constant I/O with the SQlite database.
15 +- Network : We query the Rfam public MySQL server on port 4497. Make sure your network enables communication (there should not be any issue on private networks, but maybe you company/university closes ports by default). You will get an error message if the port is not open. Around 30 GB of data is downloaded.
16 +
17 +# Method 1 : Installation using Docker
18 +
19 +* Step 1 : Download the [Docker container](https://entrepot.ibisc.univ-evry.fr/d/1aff90a9ef214a19b848/files/?p=/rnanet_v1.3_docker.tar&dl=1). Open a terminal and move to the appropriate directory.
20 +* Step 2 : Extract the archive to a Docker image named *rnanet* in your local installation
21 +```
22 +$ docker load -i rnanet_v1.3_docker.tar
23 +```
24 +* Step 3 : Run the container, giving it 3 folders to mount as volumes: a first to store the 3D data, a second to store the sequence data and alignments, and a third to output the results, data and logs:
25 +```
26 +$ docker run --rm -v path/to/3D/data/folder:/3D -v path/to/sequence/data/folder:/sequences -v path/to/experiment/results/folder:/runDir rnanet [ - other options ]
27 +```
28 +
29 +Typical usage:
30 +```
31 +nohup bash -c 'time docker run --rm -v /path/to/3D/data/folder:/3D -v /path/to/sequence/data/folder:/sequences -v /path/to/experiment/folder:/runDir rnanet -s --no-logs ' &
32 +```
33 +
34 +
35 +# Method 2 : Classical command line installation (Linux only)
36 +
37 +You need to install the dependencies:
38 +- DSSR, you need to register to the X3DNA forum [here](http://forum.x3dna.org/site-announcements/download-instructions/) and then download the DSSR binary [on that page](http://forum.x3dna.org/downloads/3dna-download/). Make sure to have the `x3dna-dssr` binary in your $PATH variable so that RNANet.py finds it.
39 +- Infernal, to download at [Eddylab](http://eddylab.org/infernal/), several options are available depending on your preferences. Make sure to have the `cmalign`, `esl-alimanip`, `esl-alipid` and `esl-reformat` binaries in your $PATH variable, so that RNANet.py can find them.
40 +- SINA, follow [these instructions](https://sina.readthedocs.io/en/latest/install.html) for example. Make sure to have the `sina` binary in your $PATH.
41 +- Sqlite 3, available under the name *sqlite* in every distro's package manager,
42 +- Python >= 3.8, (Unfortunately, python3.6 is no longer supported, because of changes in the multiprocessing and Threading packages. Untested with Python 3.7.\*)
43 +- The following Python packages: `python3.8 -m pip install biopython matplotlib pandas psutil pymysql requests scipy setproctitle sqlalchemy tqdm`.
44 +
45 +Then, run it from the command line, preferably using nohup if your shell will be interrupted:
46 +```
47 + ./RNANet.py --3d-folder path/to/3D/data/folder --seq-folder path/to/sequence/data/folder [ - other options ]
48 +```
49 +
50 +Typical usage:
51 +```
52 +nohup bash -c 'time ~/Projects/RNANet/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences -s --no-logs' &
53 +```
54 +
55 +# Command options
56 +
57 +The detailed list of options is below:
58 +
59 +```
60 +-h [ --help ] Print this help message
61 +--version Print the program version
62 +
63 +-f [ --full-inference ] Infer new mappings even if Rfam already provides some. Yields more copies of chains
64 + mapped to different families.
65 +-r 4.0 [ --resolution=4.0 ] Maximum 3D structure resolution to consider a RNA chain.
66 +-s Run statistics computations after completion
67 +--extract Extract the portions of 3D RNA chains to individual mmCIF files.
68 +--keep-hetatm=False (True | False) Keep ions, waters and ligands in produced mmCIF files.
69 + Does not affect the descriptors.
70 +--3d-folder=… Path to a folder to store the 3D data files. Subfolders will contain:
71 + RNAcifs/ Full structures containing RNA, in mmCIF format
72 + rna_mapped_to_Rfam/ Extracted 'pure' RNA chains
73 + datapoints/ Final results in CSV file format.
74 +--seq-folder=… Path to a folder to store the sequence and alignment files. Subfolders will be:
75 + rfam_sequences/fasta/ Compressed hits to Rfam families
76 + realigned/ Sequences, covariance models, and alignments by family
77 +--no-homology Do not try to compute PSSMs and do not align sequences.
78 + Allows to yield more 3D data (consider chains without a Rfam mapping).
79 +
80 +--all Build chains even if they already are in the database.
81 +--only Ask to process a specific chain label only
82 +--ignore-issues Do not ignore already known issues and attempt to compute them
83 +--update-homologous Re-download Rfam and SILVA databases, realign all families, and recompute all CSV files
84 +--from-scratch Delete database, local 3D and sequence files, and known issues, and recompute.
85 +--archive Create a tar.gz archive of the datapoints text files, and update the link to the latest archive
86 +--no-logs Do not save per-chain logs of the numbering modifications
87 +```
88 +Options --3d-folder and --seq-folder are mandatory for command-line installations, but should not be used for installations with Docker. In the Docker container, they are set by default to the paths you provide with the -v options.
89 +
90 +The most useful options in that list are
91 +* ` --extract`, to actually produce some re-numbered 3D mmCIF files of the RNA chains individually,
92 +* ` --no-homology`, to ignore the family mapping and sequence alignment parts and only focus on 3D data download and annotation. This would yield more data since many RNAs are not mapped to any Rfam family.
93 +* ` -s`, to run the "statistics" which are a few useful post-computation tasks such as:
94 + * Computation of sequence identity matrices
95 + * Statistics over the sequence lengths, nucleotide frequencies, and basepair types by RNA family
96 + * Overall database content statistics
97 +
98 +# Computation time
99 +
100 +To give you an estimation, our last full run took exactly 12h, excluding the time to download the MMCIF files containing RNA (around 25GB to download) and the time to compute statistics.
101 +Measured the 23rd of June 2020 on a 16-core AMD Ryzen 7 3700X CPU @3.60GHz, plus 32 Go RAM, and a 7200rpm Hard drive. Total CPU time spent: 135 hours (user+kernel modes), corresponding to 12h (actual time spent with the 16-core CPU).
102 +
103 +Update runs are much quicker, around 3 hours. It depends mostly on what RNA families are concerned by the update.
104 +
105 +
106 +# Post-computation tasks (estimate quality)
107 +If your did not ask for automatic run of statistics over the produced dataset with the `-s` option, you can run them later using the file statistics.py.
108 +```
109 +python3.8 statistics.py --3d-folder path/to/3D/data/folder --seq-folder path/to/sequence/data/folder -r 20.0
110 +```
111 +/!\ Beware, if not precised with option `-r`, no resolution threshold is applied and all the data in RNANet.db is used.
112 +
113 +By default, this computes:
114 +* Computation of sequence identity matrices
115 +* Statistics over the sequence lengths, nucleotide frequencies, and basepair types by RNA family
116 +* Overall database content statistics
117 +
118 +If you have run RNANet once with options `--no-homology` and `--extract`, you unlock new statistics over unmapped chains.
119 +* You will be allowed to use option `--wadley` to reproduce Wadley & al. (2007) results automatically. These are clustering results of the pseudotorsions angles of the backbone.
120 +* (experimental) You will be allowed to use option `--distance-matrices` to compute pairwise residue distances within the chain for every chain, and compute average and standard deviations by RNA families. This is supposed to capture the average shape of an RNA family.
121 +
122 +# Output files
123 +
124 +* `results/RNANet.db` is a SQLite database file containing several tables with all the information, which you can query yourself with your custom requests,
125 +* `3D-folder-you-passed-in-option/datapoints/*` are flat text CSV files, one for one RNA chain mapped to one RNA family, gathering the per-position nucleotide descriptors,
126 +* `archive/RNANET_datapoints_{DATE}.tar.gz` is a compressed archive of the above CSV files (only if you passed the --archive option)
127 +* `archive/RNANET_alignments_latest.tar.gz` is a compressed archive of multiple sequence alignments in FASTA format, one per RNA family, including only the portions of chains with a 3D structure which are mapped to a family. The alignment has been computed with all the RFam sequences of that family, but they have been removed then.
128 +* `path-to-3D-folder-you-passed-in-option/rna_mapped_to_Rfam` If you used the `--extract` option, this folder contains one mmCIF file per RNA chain mapped to one RNA family, without other chains, proteins (nor ions and ligands by default). If you used both `--extract` and `--no-homology`, this folder is called `rna_only`.
129 +* `results/summary.csv` summarizes information about the RNA chains
130 +* `results/families.csv` summarizes information about the RNA families
131 +* `results/pair_types.csv` summarizes statistics about base-pair types in every family.
132 +* `results/frequencies.csv` summarizes statistics about nucleotides frequencies in every family (including all known modified bases)
133 +
134 +Other folders are created and not deleted, which you might want to conserve to avoid re-computations in later runs:
135 +
136 +* `path-to-sequence-folder-you-passed-in-option/rfam_sequences/fasta/` contains compressed FASTA files of the homologous sequences used, by Rfam family.
137 +* `path-to-sequence-folder-you-passed-in-option/realigned/` contains families covariance models (\*.cm), unaligned list of sequences (\*.fa), and multiple sequence alignments in both formats Stockholm and Aligned-FASTA (\*.stk and \*.afa). Also contains SINA homolgous sequences databases LSU.arb and SSU.arb, and their index files (\*.sidx).
138 +* `path-to-3D-folder-you-passed-in-option/RNAcifs/` contains mmCIF structures directly downloaded from the PDB, which contain RNA chains,
139 +* `path-to-3D-folder-you-passed-in-option/annotations/` contains the raw JSON annotation files of the previous mmCIF structures. You may find additional information into them which is not properly supported by RNANet yet.
...\ No newline at end of file ...\ No newline at end of file
1 +# Known Issues
2 +
3 +## Annotation and numbering issues
4 +* Some GDPs that are listed as HETATMs in the mmCIF files are not detected correctly to be real nucleotides. (e.g. 1e8o-E)
5 +* Some chains are truncated in different pieces with different chain names. Reason unknown (e.g. 6ztp-AX)
6 +* Some chains are not correctly renamed A in the produced separate files (e.g. 1d4r-B)
7 +
8 +## Alignment issues
9 +* [SOLVED] Filtered alignments are shorter than the number of alignment columns saved to the SQL table `align_column`
10 +* Chain names appear in triple in the FASTA header (e.g. 1d4r[1]-B 1d4r[1]-B 1d4r[1]-B)
11 +
12 +## Technical running issues
13 +* [SOLVED] Files produced by Docker containers are owned by root and require root permissions to be read
14 +* [SOLVED] SQLite WAL files are not deleted properly
15 +
16 +# Known feature requests
17 +* [DONE] Get filtered versions of the sequence alignments containing the 3D chains, publicly available for download
18 +* [DONE] Get a consensus residue for each alignement column
19 +* [DONE] Get an option to limit the number of cores
20 +* [UPCOMING] Automated annotation of detected Recurrent Interaction Networks (RINs), see http://carnaval.lri.fr/ .
21 +* [UPCOMING] Possibly, automated detection of HLs and ILs from the 3D Motif Atlas (BGSU). Maybe. Their own website already does the job.
22 +* A field estimating the quality of the sequence alignment in table family.
23 +* Possibly, more metrics about the alignments coming from Infernal.
...\ No newline at end of file ...\ No newline at end of file
1 # RNANet 1 # RNANet
2 -Building a dataset following the ProteinNet philosophy, but for RNA.
3 -
4 -We use the Rfam mappings between 3D structures and known Rfam families, using the sequences that are known to belong to an Rfam family (hits provided in RF0XXXX.fasta files from Rfam).
5 -Future versions might compute a real MSA-based clusering directly with Rfamseq ncRNA sequences, like ProteinNet does with protein sequences, but this requires a tool similar to jackHMMER in the Infernal software suite, which is not available yet.
6 -
7 -This script prepares the dataset from available public data in PDB and Rfam.
8 2
9 Contents: 3 Contents:
10 -* [What it does](#what-it-does) 4 +* [What is RNANet ?](#what-is-rnanet)
11 -* [Output files](#output-files) 5 +* [Install and run RNANet](INSTALL.md)
12 -* [How to run](#how-to-run)
13 - * [Required computational resources](#required-computational-resources)
14 - * [Using Docker](#using-docker)
15 - * [Using classical command line installation](#using-classical-command-line-installation)
16 - * [Post-computation task: estimate quality](#post-computation-task:-estimate-quality)
17 * [How to further filter the dataset](#how-to-further-filter-the-dataset) 6 * [How to further filter the dataset](#how-to-further-filter-the-dataset)
18 * [Filter on 3D structure resolution](#filter-on-3D-structure-resolution) 7 * [Filter on 3D structure resolution](#filter-on-3D-structure-resolution)
19 * [Filter on 3D structure publication date](#filter-on-3d-structure-publication-date) 8 * [Filter on 3D structure publication date](#filter-on-3d-structure-publication-date)
20 * [Filter to avoid chain redundancy when several mappings are available](#filter-to-avoid-chain-redundancy-when-several-mappings-are-available) 9 * [Filter to avoid chain redundancy when several mappings are available](#filter-to-avoid-chain-redundancy-when-several-mappings-are-available)
21 -* [More about the database structure](#more-about-the-database-structure) 10 +* [Database tables documentation](Database.md)
11 +* [FAQ](FAQ.md)
22 * [Troubleshooting](#troubleshooting) 12 * [Troubleshooting](#troubleshooting)
23 * [Contact](#contact) 13 * [Contact](#contact)
24 14
25 -**Please cite**: *Coming soon, expect it in 2021* 15 +## Cite us
26 16
27 -# What it does 17 +* Louis Becquey, Eric Angel, and Fariza Tahi, (2020) **RNANet: an automatically built dual-source dataset integrating homologous sequences and RNA structures**, *Bioinformatics*, 2020, btaa944, [DOI](https://doi.org/10.1093/bioinformatics/btaa944), [Read the OpenAccess paper here](https://doi.org/10.1093/bioinformatics/btaa944)
28 -The script follows these steps:
29 -* Gets a list of 3D structures containing RNA from BGSU's non-redundant list (but keeps the redundant structures /!\\),
30 -* Asks Rfam for mappings of these structures onto Rfam families (~50% of structures have a direct mapping, some more are inferred using the redundancy list)
31 -* Downloads the corresponding 3D structures (mmCIFs)
32 -* If desired, extracts the right chain portions that map onto an Rfam family
33 18
34 -Now, compute the features: 19 +Additional relevant references:
35 20
36 -* Extract the sequence for every 3D chain 21 +The "ProteinNet" philosophy which inspired this work:
37 -* Downloads Rfamseq ncRNA sequence hits for the concerned Rfam families 22 +* AlQuraishi, M. (2019b). **ProteinNet: A standardized data set for machine learning of protein structure.** *BMC Bioinformatics*, 20(1), 311
38 -* Realigns Rfamseq hits and sequences from the 3D structures together to obtain a multiple sequence alignment for each Rfam family (using `cmalign --cyk`, except for ribosomal LSU and SSU, where SINA is used)
39 -* Computes nucleotide frequencies at every position for each alignment
40 -* For each aligned 3D chain, get the nucleotide frequencies in the corresponding RNA family for each residue
41 23
42 -Then, compute the labels: 24 +If you use our annotations by DSSR, you might want to cite:
25 +* Lu, X.-J.et al.(2015). **DSSR: An integrated software tool for dissecting the spatial structure of RNA.** *Nucleic Acids Research*, 43(21), e142–e142.
43 26
44 -* Run DSSR on every RNA structure to get a variety of descriptors per position, describing secondary and tertiary structure. Basepair types annotations include intra-chain and inter-chain interactions. 27 +If you use our multiple sequence alignments and homology data, you might want to cite:
28 +* Pruesse, E. et al.(2012). **Sina: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.** *Bioinformatics*, 28(14), 1823–1829
29 +* Nawrocki, E. P. and Eddy, S. R. (2013). **Infernal 1.1: 100-fold faster RNA homology searches.** *Bioinformatics*, 29(22), 2933–2935.
45 30
46 -Finally, export this data from the SQLite database into flat CSV files.
47 31
48 -# Output files 32 +# What is RNANet ?
33 +RNANet is a multiscale dataset of non-coding RNA structures, including sequences, secondary structures, non-canonical interactions, 3D geometrical descriptors, and sequence homology.
49 34
50 -* `results/RNANet.db` is a SQLite database file containing several tables with all the information, which you can query yourself with your custom requests, 35 +It is available in machine-learning ready formats like CSV files per chain or an SQL database.
51 -* `3D-folder-you-passed-in-option/datapoints/*` are flat text CSV files, one for one RNA chain mapped to one RNA family, gathering the per-position nucleotide descriptors,
52 -* `archive/RNANET_datapoints_{DATE}.tar.gz` is a compressed archive of the above CSV files (only if you passed the --archive option)
53 -* `path-to-3D-folder-you-passed-in-option/rna_mapped_to_Rfam` If you used the `--extract` option, this folder contains one mmCIF file per RNA chain mapped to one RNA family, without other chains, proteins (nor ions and ligands by default). If you used both `--extract` and `--no-homology`, this folder is called `rnaonly`.
54 -* `results/summary.csv` summarizes information about the RNA chains
55 -* `results/families.csv` summarizes information about the RNA families
56 36
57 -Other folders are created and not deleted, which you might want to conserve to avoid re-computations in later runs: 37 +Most interestingly, nucleotides have been renumered in a standardized way, and the 3D chains have been re-aligned with homologous sequences from the [Rfam](https://rfam.org/) database.
58 38
59 -* `path-to-sequence-folder-you-passed-in-option/rfam_sequences/fasta/` contains compressed FASTA files of the homologous sequences used, by Rfam family.
60 -* `path-to-sequence-folder-you-passed-in-option/realigned/` contains families covariance models (\*.cm), unaligned list of sequences (\*.fa), and multiple sequence alignments in both formats Stockholm and Aligned-FASTA (\*.stk and \*.afa). Also contains SINA homolgous sequences databases LSU.arb and SSU.arb, and their index files (\*.sidx).
61 -* `path-to-3D-folder-you-passed-in-option/RNAcifs/` contains mmCIF structures directly downloaded from the PDB, which contain RNA chains,
62 -* `path-to-3D-folder-you-passed-in-option/annotations/` contains the raw JSON annotation files of the previous mmCIF structures. You may find additional information into them which is not properly supported by RNANet yet.
63 39
64 -# How to run 40 +## Methodology
65 -RNANet is availbale on Linux (x86-64) only. It could theoretically work on Mac using command line installation (*untested*). 41 +We use the Rfam mappings between 3D structures and known Rfam families, using the sequences that are known to belong to an Rfam family (hits provided in RF0XXXX.fasta files from Rfam).
42 +Future versions might compute a real MSA-based clusering directly with Rfamseq ncRNA sequences, like ProteinNet does with protein sequences, but this requires a tool similar to jackHMMER in the Infernal software suite, which is not available yet.
66 43
67 -## Required computational resources 44 +This script prepares the dataset from available public data in PDB, RNA 3D Hub, Rfam and SILVA.
68 -- CPU: no requirements. The program is optimized for multi-core CPUs, you might want to use Intel Xeons, AMD Ryzens, etc.
69 -- GPU: not required
70 -- RAM: 16 GB with a large swap partition is okay. 32 GB is recommended (usage peaks at ~27 GB)
71 -- Storage: to date, it takes 60 GB for the 3D data (36 GB if you don't use the --extract option), 11 GB for the sequence data, and 7GB for the outputs (5.6 GB database, 1 GB archive of CSV files). You need to add a few more for the dependencies. Pick a 100GB partition and you are good to go. The computation speed is way better if you use a fast storage device (e.g. SSD instead of hard drive, or even better, a NVMe SSD) because of constant I/O with the SQlite database.
72 -- Network : We query the Rfam public MySQL server on port 4497. Make sure your network enables communication (there should not be any issue on private networks, but maybe you company/university closes ports by default). You will get an error message if the port is not open. Around 30 GB of data is downloaded.
73 45
74 -To give you an estimation, our last full run took exactly 12h, excluding the time to download the MMCIF files containing RNA (around 25GB to download) and the time to compute statistics.
75 -Measured the 23rd of June 2020 on a 16-core AMD Ryzen 7 3700X CPU @3.60GHz, plus 32 Go RAM, and a 7200rpm Hard drive. Total CPU time spent: 135 hours (user+kernel modes), corresponding to 12h (actual time spent with the 16-core CPU).
76 46
77 -Update runs are much quicker, around 3 hours. It depends mostly on what RNA families are concerned by the update. 47 +## Pipeline
48 +The script follows these steps:
78 49
79 -## Using Docker 50 +To gather structures:
51 +* Gets a list of 3D structures containing RNA from BGSU's non-redundant list (but keeps the redundant structures /!\\),
52 +* Asks Rfam for mappings of these structures onto Rfam families (~50% of structures have a direct mapping, some more are inferred using the redundancy list)
53 +* Downloads the corresponding 3D structures (mmCIFs)
54 +* If desired, extracts the right chain portions that map onto an Rfam family to a separate mmCIF file
80 55
81 -* Step 1 : Download the [Docker container](https://entrepot.ibisc.univ-evry.fr/f/e5edece989884a7294a6/?dl=1). Open a terminal and move to the appropriate directory. 56 +To compute homology information:
82 -* Step 2 : Extract the archive to a Docker image named *rnanet* in your local installation 57 +* Extract the sequence for every 3D chain
83 -``` 58 +* Downloads Rfamseq ncRNA sequence hits for the concerned Rfam families (or ARB databases of SSU or LSU sequences from SILVA for rRNAs)
84 -$ docker load -i rnanet_v1.2_docker.tar 59 +* Realigns Rfamseq hits and sequences from the 3D structures together to obtain a multiple sequence alignment for each Rfam family (using `cmalign --cyk`, except for ribosomal LSU and SSU, where SINA is used)
85 -``` 60 +* Computes nucleotide frequencies at every position for each alignment
86 -* Step 3 : Run the container, giving it 3 folders to mount as volumes: a first to store the 3D data, a second to store the sequence data and alignments, and a third to output the results, data and logs: 61 +* Map each nucleotide of a 3D chain to its position in the corresponding family sequence alignment
87 -```
88 -$ docker run --rm -v path/to/3D/data/folder:/3D -v path/to/sequence/data/folder:/sequences -v path/to/experiment/results/folder:/runDir rnanet [ - other options ]
89 -```
90 62
91 -The detailed list of options is below: 63 +To compute 3D annotations:
64 +* Run DSSR on every RNA structure to get a variety of descriptors per position, describing secondary and tertiary structure. Basepair types annotations include intra-chain and inter-chain interactions.
92 65
93 -``` 66 +Finally, export this data from the SQLite database into flat CSV files.
94 --h [ --help ] Print this help message
95 ---version Print the program version
96 -
97 --f [ --full-inference ] Infer new 3D->family mappings even if Rfam already provides some. Yields more copies of chains
98 - mapped to different families.
99 --r 4.0 [ --resolution=4.0 ] Maximum 3D structure resolution to consider a RNA chain.
100 --s Run statistics computations after completion
101 ---extract Extract the portions of 3D RNA chains to individual mmCIF files.
102 ---keep-hetatm=False (True | False) Keep ions, waters and ligands in produced mmCIF files.
103 - Does not affect the descriptors.
104 ---fill-gaps=True (True | False) Replace gaps in nt_align_code field due to unresolved residues
105 - by the most common nucleotide at this position in the alignment.
106 ---3d-folder=… Path to a folder to store the 3D data files. Subfolders will contain:
107 - RNAcifs/ Full structures containing RNA, in mmCIF format
108 - rna_mapped_to_Rfam/ Extracted 'pure' RNA chains
109 - datapoints/ Final results in CSV file format.
110 ---seq-folder=… Path to a folder to store the sequence and alignment files. Subfolders will be:
111 - rfam_sequences/fasta/ Compressed hits to Rfam families
112 - realigned/ Sequences, covariance models, and alignments by family
113 ---no-homology Do not try to compute PSSMs and do not align sequences.
114 - Allows to yield more 3D data (consider chains without a Rfam mapping).
115 -
116 ---all Build chains even if they already are in the database.
117 ---only Ask to process a specific chain label only
118 ---ignore-issues Do not ignore already known issues and attempt to compute them
119 ---update-homologous Re-download Rfam and SILVA databases, realign all families, and recompute all CSV files
120 ---from-scratch Delete database, local 3D and sequence files, and known issues, and recompute.
121 ---archive Create a tar.gz archive of the datapoints text files, and update the link to the latest archive
122 ---no-logs Do not save per-chain logs of the numbering modifications
123 -```
124 -You may not use the --3d-folder and --seq-folder options, they are set by default to the paths you provide with the -v options when running Docker.
125 67
126 -Typical usage: 68 +## Data provided
127 -```
128 -nohup bash -c 'time docker run --rm -v /path/to/3D/data/folder:/3D -v /path/to/sequence/data/folder:/sequences -v /path/to/experiment/folder:/runDir rnanet -s --no-logs ' &
129 -```
130 69
131 -## Using classical command line installation 70 +We provide couple of resources to exploit this dataset. You can download them on [EvryRNA](https://evryrna.ibisc.univ-evry.fr/evryrna/rnanet/rnanet_home).
71 +* A series of tables in the SQLite3 database, see [the database documentation](Database.md) and [examples of useful queries](#how-to-further-filter-the-dataset),
72 +* One CSV file per RNA chain, summarizing all the relevant information about it,
73 +* Filtered alignment files in FASTA format containing only the sequences with a 3D structure available in RNANet, but which have been aligned using all the homologous sequences of this family from Rfam or SILVA,
74 +* Additional statistics files about nucleotide frequencies, modified bases, basepair types within each chain or by RNA family.
132 75
133 -You need to install the dependencies: 76 +For now, we do not provide as public downloads the set of cleaned 3D structures nor the full alignments with Rfam sequences. If you need them, [recompute them](INSTALL.md) or ask us.
134 -- DSSR, you need to register to the X3DNA forum [here](http://forum.x3dna.org/site-announcements/download-instructions/) and then download the DSSR binary [on that page](http://forum.x3dna.org/downloads/3dna-download/). Make sure to have the `x3dna-dssr` binary in your $PATH variable so that RNANet.py finds it.
135 -- Infernal, to download at [Eddylab](http://eddylab.org/infernal/), several options are available depending on your preferences. Make sure to have the `cmalign`, `esl-alimanip`, `esl-alipid` and `esl-reformat` binaries in your $PATH variable, so that RNANet.py can find them.
136 -- SINA, follow [these instructions](https://sina.readthedocs.io/en/latest/install.html) for example. Make sure to have the `sina` binary in your $PATH.
137 -- Sqlite 3, available under the name *sqlite* in every distro's package manager,
138 -- Python >= 3.8, (Unfortunately, python3.6 is no longer supported, because of changes in the multiprocessing and Threading packages. Untested with Python 3.7.\*)
139 -- The following Python packages: `python3.8 -m pip install biopython==1.76 matplotlib pandas psutil pymysql requests scipy setproctitle sqlalchemy tqdm`. Note that Biopython versions 1.77 or later do not work (yet) since they removed the alphabet system.
140 77
141 -Then, run it from the command line, preferably using nohup if your shell will be interrupted: 78 +## Updates
142 -``` 79 +RNANet is updated monthly to take into account new structures proposed in the [BGSU Non-redundant lists](http://rna.bgsu.edu/rna3dhub/nrlist/). The monthly runs realign previous alignments with the new sequences using `esl-alimerge` from Infernal.
143 - ./RNANet.py --3d-folder path/to/3D/data/folder --seq-folder path/to/sequence/data/folder [ - other options ]
144 -```
145 -See the list of possible options juste above in the [Using Docker](#using-docker) section. Expect hours (maybe days) of computation.
146 80
147 -Typical usage: 81 +It is updated yearly from scratch to take into account new Rfam sequences or updates in the covariance models, and updates in the PDB 3D files.
148 -```
149 -nohup bash -c 'time ~/Projects/RNANet/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences --no-logs -s' &
150 -```
151 82
152 -## Post-computation task: estimate quality 83 +For now, the SILVA releases used are fixed (LSU release 132 and SSU release 138) and not automatically updated. SILVA authors if you reach this : please provide a "latest" download link to ease automatic retrieval of the latest version.
153 -If your did not ask for automatic run of statistics over the produced dataset with the `-s` option, you can run them later using the file statistics.py.
154 -```
155 -python3.8 statistics.py --3d-folder path/to/3D/data/folder --seq-folder path/to/sequence/data/folder -r 20.0
156 -```
157 -/!\ Beware, if not precised with option `-r`, no resolution threshold is applied and all the data in RNANet.db is used.
158 84
159 -If you have run RNANet twice, once with option `--no-homology`, and once without, you unlock new statistics over unmapped chains. You will also be allowed to use option `--wadley` to reproduce Wadley & al. (2007) results automatically. 85 +See what's new in the latest version of RNANet [in the CHANGELOG](CHANGELOG).
160 86
161 # How to further filter the dataset 87 # How to further filter the dataset
162 You may want to build your own sub-dataset by querying the results/RNANet.db file. Here are quick examples using Python3 and its sqlite3 package. 88 You may want to build your own sub-dataset by querying the results/RNANet.db file. Here are quick examples using Python3 and its sqlite3 package.
...@@ -240,133 +166,21 @@ with sqlite3.connect("results/RNANet.db) as connection: ...@@ -240,133 +166,21 @@ with sqlite3.connect("results/RNANet.db) as connection:
240 ``` 166 ```
241 Then proceed to steps 2 and 3. 167 Then proceed to steps 2 and 3.
242 168
243 -# More about the database structure
244 -To help you design your own requests, here follows a description of the database tables and fields.
245 -
246 -## Table `family`, for Rfam families and their properties
247 -* `rfam_acc`: The family codename, from Rfam's numbering (Rfam accession number)
248 -* `description`: What RNAs fit in this family
249 -* `nb_homologs`: The number of hits known to be homologous downloaded from Rfam to compute nucleotide frequencies
250 -* `nb_3d_chains`: The number of 3D RNA chains mapped to the family (from Rfam-PDB mappings, or inferred using the redundancy list)
251 -* `nb_total_homol`: Sum of the two previous fields, the number of sequences in the multiple sequence alignment, used to compute nucleotide frequencies
252 -* `max_len`: The longest RNA sequence among the homologs (in bases, unaligned)
253 -* `ali_len`: The aligned sequences length (in bases, aligned)
254 -* `ali_filtered_len`: The aligned sequences length when we filter the alignment to keep only the RNANet chains (which have a 3D structure) and remove the gap-only columns.
255 -* `comput_time`: Time required to compute the family's multiple sequence alignment in seconds,
256 -* `comput_peak_mem`: RAM (or swap) required to compute the family's multiple sequence alignment in megabytes,
257 -* `idty_percent`: Average identity percentage over pairs of the 3D chains' sequences from the family
258 -
259 -## Table `structure`, for 3D structures of the PDB
260 -* `pdb_id`: The 4-char PDB identifier
261 -* `pdb_model`: The model used in the PDB file
262 -* `date`: The first submission date of the 3D structure to a public database
263 -* `exp_method`: A string to know wether the structure as been obtained by X-ray crystallography ('X-RAY DIFFRACTION'), electron microscopy ('ELECTRON MICROSCOPY'), or NMR (not seen yet)
264 -* `resolution`: Resolution of the structure, in Angstöms
265 -
266 -## Table `chain`, for the datapoints: one chain mapped to one Rfam family
267 -* `chain_id`: A unique identifier
268 -* `structure_id`: The `pdb_id` where the chain comes from
269 -* `chain_name`: The chain label, extracted from the 3D file
270 -* `eq_class`: The BGSU equivalence class label containing this chain
271 -* `rfam_acc`: The family which the chain is mapped to (if not mapped, value is *unmappd*)
272 -* `pdb_start`: Position in the chain where the mapping to Rfam begins (absolute position, not residue number)
273 -* `pdb_end`: Position in the chain where the mapping to Rfam ends (absolute position, not residue number)
274 -* `reversed`: Wether the mapping numbering order differs from the residue numbering order in the mmCIF file (eg 4c9d, chains C and D)
275 -* `issue`: Wether an issue occurred with this structure while downloading, extracting, annotating or parsing the annotation. See the file known_issues_reasons.txt for more information about why your chain is marked as an issue.
276 -* `inferred`: Wether the mapping has been inferred using the redundancy list (value is 1) or just known from Rfam-PDB mappings (value is 0)
277 -* `chain_freq_A`, `chain_freq_C`, `chain_freq_G`, `chain_freq_U`, `chain_freq_other`: Nucleotide frequencies in the chain
278 -* `pair_count_cWW`, `pair_count_cWH`, ... `pair_count_tSS`: Counts of the non-canonical base-pair types in the chain (intra-chain counts only)
279 -
280 -## Table `nucleotide`, for individual nucleotide descriptors
281 -* `nt_id`: A unique identifier
282 -* `chain_id`: The chain the nucleotide belongs to
283 -* `index_chain`: its absolute position within the portion of chain mapped to Rfam, from 1 to X. This is completely uncorrelated to any gene start or 3D chain residue numbers.
284 -* `nt_position`: relative position within the portion of chain mapped to RFam, from 0 to 1
285 -* `old_nt_resnum`: The residue number in the 3D mmCIF file (it's a string actually, some contain a letter like '37A')
286 -* `nt_name`: The residue type. This includes modified nucleotide names (e.g. 5MC for 5-methylcytosine)
287 -* `nt_code`: One-letter name. Lowercase "acgu" letters are used for modified "ACGU" bases.
288 -* `nt_align_code`: One-letter name used for sequence alignment. Contains "ACGUN-" only first, and then, gaps may be replaced by the most common letter at this position (default)
289 -* `is_A`, `is_C`, `is_G`, `is_U`, `is_other`: One-hot encoding of the nucleotide base
290 -* `dbn`: character used at this position if we look at the dot-bracket encoding of the secondary structure. Includes inter-chain (RNA complexes) contacts.
291 -* `paired`: empty, or comma separated list of `index_chain` values referring to nucleotides the base is interacting with. Up to 3 values. Inter-chain interactions are marked paired to '0'.
292 -* `nb_interact`: number of interactions with other nucleotides. Up to 3 values. Includes inter-chain interactions.
293 -* `pair_type_LW`: The Leontis-Westhof nomenclature codes of the interactions. The first letter concerns cis/trans orientation, the second this base's side interacting, and the third the other base's side.
294 -* `pair_type_DSSR`: Same but using the DSSR nomenclature (Hoogsteen edge approximately corresponds to Major-groove and Sugar edge to minor-groove)
295 -* `alpha`, `beta`, `gamma`, `delta`, `epsilon`, `zeta`: The 6 torsion angles of the RNA backabone for this nucleotide
296 -* `epsilon_zeta`: Difference between epsilon and zeta angles
297 -* `bb_type`: conformation of the backbone (BI, BII or ..)
298 -* `chi`: torsion angle between the sugar and base (O-C1'-N-C4)
299 -* `glyco_bond`: syn or anti configuration of the sugar-base bond
300 -* `v0`, `v1`, `v2`, `v3`, `v4`: 5 torsion angles of the ribose cycle
301 -* `form`: if the nucleotide is involved in a stem, the stem type (A, B or Z)
302 -* `ssZp`: Z-coordinate of the 3’ phosphorus atom with reference to the5’ base plane
303 -* `Dp`: Perpendicular distance of the 3’ P atom to the glycosidic bond
304 -* `eta`, `theta`: Pseudotorsions of the backbone, using phosphorus and carbon 4'
305 -* `eta_prime`, `theta_prime`: Pseudotorsions of the backbone, using phosphorus and carbon 1'
306 -* `eta_base`, `theta_base`: Pseudotorsions of the backbone, using phosphorus and the base center
307 -* `phase_angle`: Conformation of the ribose cycle
308 -* `amplitude`: Amplitude of the sugar puckering
309 -* `puckering`: Conformation of the ribose cycle (10 classes depending on the phase_angle value)
310 -
311 -## Table `align_column`, for positions in multiple sequence alignments
312 -* `column_id`: A unique identifier
313 -* `rfam_acc`: The family's MSA the column belongs to
314 -* `index_ali`: Position of the column in the alignment (starts at 1)
315 -* `freq_A`, `freq_C`, `freq_G`, `freq_U`, `freq_other`: Nucleotide frequencies in the alignment at this position
316 -
317 -There always is an entry, for each family (rfam_acc), with index_ali = zero and nucleotide frequencies set to freq_other = 1.0. This entry is used when the nucleotide frequencies cannot be determined because of local alignment issues.
318 -
319 -## Table `re_mapping`, to map a nucleotide to an alignment column
320 -* `remapping_id`: A unique identifier
321 -* `chain_id`: The chain which is mapped to an alignment
322 -* `index_chain`: The absolute position of the nucleotide in the chain (from 1 to X)
323 -* `index_ali` The position of that nucleotide in its family alignment
324 -
325 # Troubleshooting 169 # Troubleshooting
326 170
327 -## Understanding the warnings and errors 171 +Check if your problem is listed in the [known issues](KnownIssues.md).
328 - 172 +
329 -* **Could not load X.json with JSON package** : 173 +### Warning and Errors
330 -The JSON format produced as DSSR output could not be loaded by Python. Try deleting the file and re-running DSSR (through RNANet). 174 +If you ran RNANet and got an error or a warning that you do not fully understand, check the [Error documentation](Errors.md).
331 -* **Found DSSR warning in annotation X.json: no nucleotides found. Ignoring X.** : 175 +
332 -DSSR complains because the CIF structure does not seem to contain nucleotides. This can happen on low resolution structures where only P atoms are solved, you should ignore them. This also can happen if the .cif file is corrupted (failed download, etc). Check with a 3D visualization software if your chain contains well-defined nucleotides. Try deleting the .cif and retry. If the problem persists, just ignore the chain. 176 +### Not enough memory
333 -* **Could not find nucleotides of chain X in annotation X.json. Ignoring chain X.** : Basically the same as above, but some nucleotides have been observed in another chain of the same structure. 177 +If you run out of memory (job killed), you may want to reduce the number of jobs run in parallel. Use the `--maxcores` option with a small number to ask RNANet to limit the concurrency and the simultaneous need for a lot of RAM. The computation time will increase accordingly.
334 -* **Could not find real nucleotides of chain X between START and STOP. Ignoring chain X."** : Same as the two above, but nucleotides can be found outside of the mapping interval. This can happen if there is a mapping problem, e.g., considered absolute interval when it should not. 178 +
335 -* **Error while parsing DSSR X.json output: {custom-error}** : The DSSR annotations lack some of our required fields. It is likely that DSSR changed something in their fields names. Contact us so that we fix the problem with the latest DSSR version. 179 +### Not enough memory/too slow (developer trick)
336 -* **Mapping is reversed, this case is not supported (yet). Ignoring chain X.** : The mapping coordinates, as obtained from Rfam, have an end position coming before the start position (meaning, the sequence has to be reversed to map the RNA covariance model). We do not support this yet, we ignore this chain. 180 +If `--maxcores` is not enough, and that you identified the step which fails, you can try to edit the Python code. Look for the "coeff_ncores" argument of some functions calls. This is the coefficient applied to `--maxcores` for different steps of the pipeline. You can change it following your needs to reduce or increase concurrency (to use less memory, or compute faster, respectively).
337 -* **Error with parsing of X duplicate residue numbers. Ignoring it.** : This 3D chain contains new kind(s) of issue(s) in the residue numberings that are not part of the issues we already know how to tackle. Contact us, so that we add support for this entry.
338 -* **Found duplicated index_chain N in X. Keeping only the first.** : This RNA 3D chain contains two (or more) residues with the same numbering N. This often happens when a nucleic-like ligand is annotated as part of the RNA chain, and DSSR considers it a nucleotide. By default, RNANet keeps only the first of the multiple residues with the same number. You may want to check that the produced 3D structure contains the appropriate nucleotide and no ligand.
339 -* **Missing index_chain N in X !** : DSSR annotations for chain X are discontinuous, position N is missing. This means residue N has not been recognized as a nucleotide by DSSR. Is the .cif structure file corrupted ? Delete it and retry.
340 -* **X sequence is too short, let's ignore it.** : We discard very short RNA chains.
341 -* **Error downloading and/or extracting Rfam.cm !** : We cannot retrieve the Rfam covariance models file. RNANet tries to find it at ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.cm.gz so, check that your network is not blocking the FTP protocol (port 21 is open on your network), and check that the adress has not changed. If so, contact us so that we update RNANet with the correct address.
342 -* **Something's wrong with the SQL database. Check mysql-rfam-public.ebi.ac.uk status and try again later. Not printing statistics.** : We cannot retrieve family statistics from Rfam public server. Check if you can connect to it by hand : `mysql -u rfamro -P 4497 -D Rfam -h mysql-rfam-public.ebi.ac.uk`. if not, check that the port 497 is opened on your network.
343 -* **Error downloading RFXXXXX.fa.gz: {custom-error}** : We cannot reach the Rfam FTP server to download homologous sequences. We look in ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/ so, check if you can access it from your network (check that port 21 is opened on your network). Check if the address has changed and notify us.
344 -* **Error downloading NR list !** : We cannot download BGSU's equivalence classes from their website. Check if you can access http://rna.bgsu.edu/rna3dhub/nrlist/download/current/20.0A/csv from a web browser. It actually happens that their website is not responding, the previous download will be re-used.
345 -* **Error downloading the LSU/SSU database from SILVA** : We cannot reach SILVA's arb files. We are looking for http://www.arb-silva.de/fileadmin/arb_web_db/release_132/ARB_files/SILVA_132_LSURef_07_12_17_opt.arb.gz and http://www.arb-silva.de/fileadmin/silva_databases/release_138/ARB_files/SILVA_138_SSURef_05_01_20_opt.arb.gz , can you download and extract them from your web browser and place them in the realigned/ subfolder ?
346 -* **Assuming mapping to RFXXXXX is an absolute position interval.** : The mapping provided by Rfam concerns a nucleotide interval START-END, but no nucleotides are defined in 3D in that interval. When this happens, we assume that the numbering is not relative to the residue numbers in the 3D file, but to the absolute position in the chain, starting at 1. And yes, we tried to apply this behavior to all mappings, this yields the opposite issue where some mappings get outside the available nucleotides. To be solved the day Rfam explains how they get build the mappings.
347 -* **Added newly discovered issues to known issues** : You discovered new chains that cannot be perfectly understood as they actually are, congrats. For each chain of the list, another warning has been raised, refer to them.
348 -* **Structures without referenced chains have been detected.** : Something went wrong, because the database contains references to 3D structures that are not used by any entry in the `chain` table. You should rerun RNANet. The option `--only` may help to rerun it just for one chain.
349 -* **Chains without referenced structures have been detected** :
350 -Something went wrong, because the database contains references to 3D chains that are not used by any entry in the `structure` table. You should rerun RNANet. The option `--only` may help to rerun it just for one chain.
351 -* **Chains were not remapped** : Something went wrong, because the database contains references to 3D chains that are not used by any entry in the `re_mapping` table, assuming you were interested in homology data. You should rerun RNANet. The option `--only` may help to rerun it just for one chain. If you are not interested in homology data, use option `--no-homology` to skip alignment and remapping steps.
352 -* **Operational Error: database is locked, retrying in 0.2s** : Too many workers are trying to access the database at the same time. Do not try to run several instances of RNANet in parallel. Even with only one instance, this might still happen if your device has slow I/O delays. Try to run RNANet from a SSD ?
353 -* **Tried to reach database 100 times and failed. Aborting.** : Same as above, but in a more serious way.
354 -* **Nothing to do !** : RNANet is up-to-date, or did not detect any modification to do, so nothing changed in your database.
355 -* **KeyboardInterrupt, terminating workers.** : You interrupted the computation by pressing Ctrl+C. The database may be in an unstable state, rerun RNANet to solve the problem.
356 -* **Found mappings to RFXXXXX in both directions on the same interval, keeping only the 5'->3' one.** : A chain has been mapped to family RFXXXXX, but the mapping has been found twice, with the limits inverted. We only keep one (in 5'->3' sense).
357 -* **There are mappings for RFXXXXX in both directions** : A chain has been mapped to family RFXXXXX several times, and the mappings are not in the same sequence sense (some are reverted, with END < START). Then, we do not know what to decide for this chain, and we abort.
358 -* **Unable to download XXXX.cif. Ignoring it.** : We cannot access a certain 3D structure from RCSB's download site, can you access it from your web browser and put it in the RNAcifs/ folder ? We look at http://files.rcsb.org/download/XXXX.cif , replacing XXXX by the right PDB code.
359 -* **Wtf, structure XXXX has no resolution ? Check https://files.rcsb.org/header/XXXX.cif to figure it out.** : We cannot find the resolution of structure XXXX from the .cif file. We are looking for it in the fields `_refine.ls_d_res_high`, `_refine.ls_d_res_low`, and `_em_3d_reconstruction.resolution`. Maybe the information is stored in another field ? If you find it, contact us so that we support this new CIF field.
360 -* **Could not find annotations for X, ignoring it.** : It seems that DSSR has not been run for structure X, or failed. Rerun RNANet.
361 -* **Nucleotides not inserted: {custom-error}** : For some reason, no nucleotides were saved to the database for this chain. Contact us.
362 -* **Removing N doublons from existing RFXXXXX++.fa and using their newest version** : You are trying to re-compute sequence alignments of 3D structures that had already been computed in the past. They will be removed from the alignment and recomputed, for the case the sequences have changed.
363 -* **Removing N doublons from existing RFXXXXX++.stk and using their newest version** : Same as above.
364 -* **Error during sequence alignment: {custom-error}** : Something went wrong during sequence alignment. Recompute the alignments using the `--update-homologous` option.
365 -* **Failed to realign RFXXXXX (killed)** : You ran out of memory while computing multiple sequence alignments. Try to run RNANet of a machine with at least 32 GB of RAM.
366 -* **RFXXXXX's alignment is wrong. Recompute it and retry.** : We could not load RFXXXXX's multiple sequence alignment. It may have failed to compute, or be corrupted. Recompute the alignments using the `--update-homologous` option.
367 -
368 -## Not enough memory
369 -If you run out of memory, you may want to reduce the number of jobs run in parallel. #TODO: explain how
370 181
371 # Contact 182 # Contact
372 -louis.becquey@univ-evry.fr 183 +RNANet is still in beta, this means we are truly open (and enjoying) all the feedback we can get from interested users.
184 +
185 +Please send all your questions, feature requests, bug reports or angry reacts to
186 +louis.becquey@univ-evry.fr .
......
...@@ -979,9 +979,9 @@ class Pipeline: ...@@ -979,9 +979,9 @@ class Pipeline:
979 setproctitle("RNANet.py process_options()") 979 setproctitle("RNANet.py process_options()")
980 980
981 try: 981 try:
982 - opts, _ = getopt.getopt(sys.argv[1:], "r:fhs", ["help", "resolution=", "3d-folder=", "seq-folder=", "keep-hetatm=", "only=", 982 + opts, _ = getopt.getopt(sys.argv[1:], "r:fhs", ["help", "resolution=", "3d-folder=", "seq-folder=", "keep-hetatm=", "only=", "maxcores=",
983 "from-scratch", "full-inference", "no-homology", "ignore-issues", "extract", 983 "from-scratch", "full-inference", "no-homology", "ignore-issues", "extract",
984 - "all", "no-logs", "archive", "update-homologous"]) 984 + "all", "no-logs", "archive", "update-homologous", "version"])
985 except getopt.GetoptError as err: 985 except getopt.GetoptError as err:
986 print(err) 986 print(err)
987 sys.exit(2) 987 sys.exit(2)
...@@ -1000,13 +1000,19 @@ class Pipeline: ...@@ -1000,13 +1000,19 @@ class Pipeline:
1000 print("-h [ --help ]\t\t\tPrint this help message") 1000 print("-h [ --help ]\t\t\tPrint this help message")
1001 print("--version\t\t\tPrint the program version") 1001 print("--version\t\t\tPrint the program version")
1002 print() 1002 print()
1003 - print("-f [ --full-inference ]\t\tInfer new mappings even if Rfam already provides some. Yields more copies of chains" 1003 + print("Select what to do:")
1004 - "\n\t\t\t\tmapped to different families.") 1004 + print("--------------------------------------------------------------------------------------------------------------")
1005 - print("-r 4.0 [ --resolution=4.0 ]\tMaximum 3D structure resolution to consider a RNA chain.") 1005 + print("-f [ --full-inference ]\t\tInfer new mappings even if Rfam already provides some. Yields more copies of"
1006 + "\n\t\t\t\t chains mapped to different families.")
1006 print("-s\t\t\t\tRun statistics computations after completion") 1007 print("-s\t\t\t\tRun statistics computations after completion")
1007 print("--extract\t\t\tExtract the portions of 3D RNA chains to individual mmCIF files.") 1008 print("--extract\t\t\tExtract the portions of 3D RNA chains to individual mmCIF files.")
1008 print("--keep-hetatm=False\t\t(True | False) Keep ions, waters and ligands in produced mmCIF files. " 1009 print("--keep-hetatm=False\t\t(True | False) Keep ions, waters and ligands in produced mmCIF files. "
1009 - "\n\t\t\t\tDoes not affect the descriptors.") 1010 + "\n\t\t\t\t Does not affect the descriptors.")
1011 + print("--no-homology\t\t\tDo not try to compute PSSMs and do not align sequences."
1012 + "\n\t\t\t\t Allows to yield more 3D data (consider chains without a Rfam mapping).")
1013 + print()
1014 + print("Select how to do it:")
1015 + print("--------------------------------------------------------------------------------------------------------------")
1010 print("--3d-folder=…\t\t\tPath to a folder to store the 3D data files. Subfolders will contain:" 1016 print("--3d-folder=…\t\t\tPath to a folder to store the 3D data files. Subfolders will contain:"
1011 "\n\t\t\t\t\tRNAcifs/\t\tFull structures containing RNA, in mmCIF format" 1017 "\n\t\t\t\t\tRNAcifs/\t\tFull structures containing RNA, in mmCIF format"
1012 "\n\t\t\t\t\trna_mapped_to_Rfam/\tExtracted 'pure' RNA chains" 1018 "\n\t\t\t\t\trna_mapped_to_Rfam/\tExtracted 'pure' RNA chains"
...@@ -1014,22 +1020,28 @@ class Pipeline: ...@@ -1014,22 +1020,28 @@ class Pipeline:
1014 print("--seq-folder=…\t\t\tPath to a folder to store the sequence and alignment files. Subfolders will be:" 1020 print("--seq-folder=…\t\t\tPath to a folder to store the sequence and alignment files. Subfolders will be:"
1015 "\n\t\t\t\t\trfam_sequences/fasta/\tCompressed hits to Rfam families" 1021 "\n\t\t\t\t\trfam_sequences/fasta/\tCompressed hits to Rfam families"
1016 "\n\t\t\t\t\trealigned/\t\tSequences, covariance models, and alignments by family") 1022 "\n\t\t\t\t\trealigned/\t\tSequences, covariance models, and alignments by family")
1017 - print("--no-homology\t\t\tDo not try to compute PSSMs and do not align sequences." 1023 + print("--maxcores=…\t\t\tLimit the number of cores to use in parallel portions to reduce the simultaneous"
1018 - "\n\t\t\t\tAllows to yield more 3D data (consider chains without a Rfam mapping).") 1024 + "\n\t\t\t\t need of RAM. Should be a number between 1 and your number of CPUs. Note that portions"
1025 + "\n\t\t\t\t of the pipeline already limit themselves to 50% or 70% of that number by default.")
1026 + print("--archive\t\t\tCreate tar.gz archives of the datapoints text files and the alignments,"
1027 + "\n\t\t\t\t and update the link to the latest archive. ")
1028 + print("--no-logs\t\t\tDo not save per-chain logs of the numbering modifications")
1019 print() 1029 print()
1030 + print("Select which data we are interested in:")
1031 + print("--------------------------------------------------------------------------------------------------------------")
1032 + print("-r 4.0 [ --resolution=4.0 ]\tMaximum 3D structure resolution to consider a RNA chain.")
1020 print("--all\t\t\t\tBuild chains even if they already are in the database.") 1033 print("--all\t\t\t\tBuild chains even if they already are in the database.")
1021 print("--only\t\t\t\tAsk to process a specific chain label only") 1034 print("--only\t\t\t\tAsk to process a specific chain label only")
1022 print("--ignore-issues\t\t\tDo not ignore already known issues and attempt to compute them") 1035 print("--ignore-issues\t\t\tDo not ignore already known issues and attempt to compute them")
1023 print("--update-homologous\t\tRe-download Rfam and SILVA databases, realign all families, and recompute all CSV files") 1036 print("--update-homologous\t\tRe-download Rfam and SILVA databases, realign all families, and recompute all CSV files")
1024 print("--from-scratch\t\t\tDelete database, local 3D and sequence files, and known issues, and recompute.") 1037 print("--from-scratch\t\t\tDelete database, local 3D and sequence files, and known issues, and recompute.")
1025 - print("--archive\t\t\tCreate a tar.gz archive of the datapoints text files, and update the link to the latest archive")
1026 - print("--no-logs\t\t\tDo not save per-chain logs of the numbering modifications")
1027 print() 1038 print()
1028 print("Typical usage:") 1039 print("Typical usage:")
1029 - print(f"nohup bash -c 'time {fileDir}/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences -s' &") 1040 + print(f"nohup bash -c 'time {fileDir}/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences -s --no-logs' &")
1030 sys.exit() 1041 sys.exit()
1031 elif opt == '--version': 1042 elif opt == '--version':
1032 - print("RNANet 1.3 beta, parallelized, Dockerized") 1043 + print("RNANet v1.3 beta, parallelized, Dockerized")
1044 + print("Last revision : Jan 2021")
1033 sys.exit() 1045 sys.exit()
1034 elif opt == "-r" or opt == "--resolution": 1046 elif opt == "-r" or opt == "--resolution":
1035 assert float(arg) > 0.0 and float(arg) <= 20.0 1047 assert float(arg) > 0.0 and float(arg) <= 20.0
...@@ -1084,6 +1096,9 @@ class Pipeline: ...@@ -1084,6 +1096,9 @@ class Pipeline:
1084 self.ARCHIVE = True 1096 self.ARCHIVE = True
1085 elif opt == "--no-logs": 1097 elif opt == "--no-logs":
1086 self.SAVELOGS = False 1098 self.SAVELOGS = False
1099 + elif opt == "--maxcores":
1100 + global ncores
1101 + ncores = min(ncores, int(arg))
1087 elif opt == "-f" or opt == "--full-inference": 1102 elif opt == "-f" or opt == "--full-inference":
1088 self.FULLINFERENCE = True 1103 self.FULLINFERENCE = True
1089 1104
...@@ -2614,9 +2629,9 @@ if __name__ == "__main__": ...@@ -2614,9 +2629,9 @@ if __name__ == "__main__":
2614 runDir = os.getcwd() 2629 runDir = os.getcwd()
2615 fileDir = os.path.dirname(os.path.realpath(__file__)) 2630 fileDir = os.path.dirname(os.path.realpath(__file__))
2616 ncores = read_cpu_number() 2631 ncores = read_cpu_number()
2617 - print(f"> Running {python_executable} on {ncores} CPU cores in folder {runDir}.")
2618 pp = Pipeline() 2632 pp = Pipeline()
2619 pp.process_options() 2633 pp.process_options()
2634 + print(f"> Running {python_executable} on {ncores} CPU cores in folder {runDir}.")
2620 2635
2621 # Prepare folders 2636 # Prepare folders
2622 os.makedirs(runDir + "/results", exist_ok=True) 2637 os.makedirs(runDir + "/results", exist_ok=True)
...@@ -2639,8 +2654,7 @@ if __name__ == "__main__": ...@@ -2639,8 +2654,7 @@ if __name__ == "__main__":
2639 2654
2640 # Download and annotate new RNA 3D chains (Chain objects in pp.update) 2655 # Download and annotate new RNA 3D chains (Chain objects in pp.update)
2641 # If the original cif file and/or the Json DSSR annotation file already exist, they are not redownloaded/recomputed. 2656 # If the original cif file and/or the Json DSSR annotation file already exist, they are not redownloaded/recomputed.
2642 - # pp.dl_and_annotate(coeff_ncores=0.5) 2657 + pp.dl_and_annotate(coeff_ncores=0.5)
2643 - pp.dl_and_annotate(coeff_ncores=1.0)
2644 print("Here we go.") 2658 print("Here we go.")
2645 2659
2646 # At this point, the structure table is up to date. 2660 # At this point, the structure table is up to date.
...@@ -2652,7 +2666,7 @@ if __name__ == "__main__": ...@@ -2652,7 +2666,7 @@ if __name__ == "__main__":
2652 # Redownload and re-annotate 2666 # Redownload and re-annotate
2653 print("> Retrying to annotate some structures which just failed.", flush=True) 2667 print("> Retrying to annotate some structures which just failed.", flush=True)
2654 pp.dl_and_annotate(retry=True, coeff_ncores=0.3) # 2668 pp.dl_and_annotate(retry=True, coeff_ncores=0.3) #
2655 - pp.build_chains(retry=True, coeff_ncores=1.0) # Use half the cores to reduce required amount of memory 2669 + pp.build_chains(retry=True, coeff_ncores=0.5) # Use half the cores to reduce required amount of memory
2656 print(f"> Loaded {len(pp.loaded_chains)} RNA chains ({len(pp.update) - len(pp.loaded_chains)} ignored/errors).") 2670 print(f"> Loaded {len(pp.loaded_chains)} RNA chains ({len(pp.update) - len(pp.loaded_chains)} ignored/errors).")
2657 if len(no_nts_set): 2671 if len(no_nts_set):
2658 print(f"Among errors, {len(no_nts_set)} structures seem to contain RNA chains without defined nucleotides:", no_nts_set, flush=True) 2672 print(f"Among errors, {len(no_nts_set)} structures seem to contain RNA chains without defined nucleotides:", no_nts_set, flush=True)
......
1 -1apg_1_D
2 -1b2m_1_C
3 -1b2m_1_D
4 -1b2m_1_E
5 -1cgm_1_I
6 -1cwp_1_D
7 -1cwp_1_E
8 -1cwp_1_F
9 -1ddl_1_E
10 -1e8s_1_C
11 -1eg0_1_L
12 -1eg0_1_L_1-56
13 -1eg0_1_M
14 -1eg0_1_O
15 -1eg0_1_O_1-73
16 -1emi_1_B
17 -1emi_1_B_1-108
18 -1gsg_1_T
19 -1gsg_1_T_1-72
20 -1h2c_1_R
21 -1h2d_1_R
22 -1h2d_1_S
23 -1i5l_1_U
24 -1i5l_1_Y
25 -1ibl_1_Z
26 -1ibm_1_Z
27 -1jgo_1_A
28 -1jgo_1_A_2-1520
29 -1jgp_1_A
30 -1jgp_1_A_2-1520
31 -1jgq_1_A
32 -1jgq_1_A_2-1520
33 -1laj_1_R
34 -1ls2_1_B
35 -1ls2_1_B_1-73
36 -1m8w_1_E
37 -1m8w_1_F
38 -1mj1_1_Q
39 -1mj1_1_R
40 -1ml5_1_A
41 -1ml5_1_a_1-2914
42 -1ml5_1_A_2-1520
43 -1ml5_1_b_5-121
44 -1mvr_1_1
45 -1mvr_1_A
46 -1mvr_1_B
47 -1mvr_1_B_3-96
48 -1mvr_1_C
49 -1mvr_1_D
50 -1mvr_1_D_1-61
51 -1mvr_1_E
52 -1n1h_1_B
53 -1n32_1_Z
54 -1n33_1_Z
55 -1n34_1_Z
56 -1n38_1_B
57 -1nb7_1_E
58 -1nb7_1_F
59 -1pn7_1_C
60 -1pn8_1_D
61 -1pvo_1_G
62 -1pvo_1_H
63 -1pvo_1_J
64 -1pvo_1_K
65 -1pvo_1_L
66 -1qln_1_R
67 -1qvg_1_3
68 -1qzc_1_A
69 -1qzc_1_B
70 -1qzc_1_C
71 -1r2w_1_C
72 -1r2w_1_C_1-58
73 -1r2x_1_C
74 -1r2x_1_C_1-58
75 -1rmv_1_B
76 -1t1m_1_A
77 -1t1m_1_B
78 -1trj_1_B
79 -1trj_1_C
80 -1utd_1_1
81 -1utd_1_2
82 -1utd_1_3
83 -1utd_1_4
84 -1utd_1_5
85 -1utd_1_6
86 -1utd_1_7
87 -1utd_1_8
88 -1utd_1_9
89 -1utd_1_Z
90 -1uvi_1_D
91 -1uvi_1_E
92 -1uvi_1_F
93 -1uvj_1_D
94 -1uvj_1_E
95 -1uvj_1_F
96 -1uvn_1_B
97 -1uvn_1_D
98 -1uvn_1_F
99 -1vq6_1_4
100 -1vqn_1_4
101 -1vqo_1_4
102 -1vtm_1_R
103 -1vy7_1_AY_1-73
104 -1vy7_1_CY_1-73
105 -1x18_1_A
106 -1x18_1_B
107 -1x18_1_C
108 -1x18_1_D
109 -1x1l_1_A
110 -1x1l_1_A_1-132
111 -1xmo_1_W
112 -1xmq_1_W
113 -1xnq_1_W
114 -1xnr_1_W
115 -1xpo_1_G
116 -1xpo_1_H
117 -1xpo_1_J
118 -1xpo_1_K
119 -1xpo_1_L
120 -1xpo_1_M
121 -1xpr_1_G
122 -1xpr_1_H
123 -1xpr_1_J
124 -1xpr_1_K
125 -1xpr_1_L
126 -1xpr_1_M
127 -1xpu_1_G
128 -1xpu_1_H
129 -1xpu_1_J
130 -1xpu_1_K
131 -1xpu_1_L
132 -1xpu_1_M
133 -1y1y_1_P
134 -1ytu_1_D
135 -1ytu_1_F
136 -1zc8_1_A
137 -1zc8_1_A_1-59
138 -1zc8_1_B
139 -1zc8_1_C
140 -1zc8_1_F
141 -1zc8_1_G
142 -1zc8_1_H
143 -1zc8_1_I
144 -1zc8_1_J
145 -1zc8_1_Z
146 -1zc8_1_Z_1-93
147 -1zn0_1_C
148 -1zn1_1_B
149 -1zn1_1_B_1-59
150 -1zn1_1_C
151 -2a1r_1_C
152 -2a1r_1_D
153 -2a8v_1_D
154 -2atw_1_B
155 -2atw_1_D
156 -2az0_1_C
157 -2az0_1_D
158 -2az2_1_C
159 -2az2_1_D
160 -2b2d_1_S
161 -2f4v_1_Z
162 -2ftc_1_R
163 -2ftc_1_R_81-1466
164 -2fz2_1_D
165 -2ht1_1_J
166 -2ht1_1_K
167 -2iy3_1_B
168 -2iy3_1_B_9-105
169 -2ob7_1_A
170 -2ob7_1_A_10-319
171 -2ob7_1_D
172 -2ob7_1_D_1-132
173 -2om3_1_R
174 -2qqp_1_R
175 -2r1g_1_A
176 -2r1g_1_B
177 -2r1g_1_C
178 -2r1g_1_D
179 -2r1g_1_E
180 -2r1g_1_F
181 -2r1g_1_X
182 -2rdo_1_A
183 -2rdo_1_A_3-118
184 -2rdo_1_B
185 -2rdo_1_B_1-2904
186 -2tmv_1_R
187 -2uxb_1_X
188 -2uxc_1_Y
189 -2uxd_1_X
190 -2vaz_1_A
191 -2vaz_1_A_64-177
192 -2voo_1_C
193 -2voo_1_D
194 -2vrt_1_E
195 -2vrt_1_F
196 -2vrt_1_G
197 -2vrt_1_H
198 -2wj8_1_A
199 -2wj8_1_B
200 -2wj8_1_C
201 -2wj8_1_D
202 -2wj8_1_E
203 -2wj8_1_F
204 -2wj8_1_G
205 -2wj8_1_H
206 -2wj8_1_I
207 -2wj8_1_J
208 -2wj8_1_K
209 -2wj8_1_L
210 -2wj8_1_M
211 -2wj8_1_N
212 -2wj8_1_O
213 -2wj8_1_P
214 -2wj8_1_Q
215 -2wj8_1_R
216 -2wj8_1_S
217 -2wj8_1_T
218 -2x1a_1_B
219 -2x1f_1_B
220 -2xea_1_R
221 -2xnr_1_C
222 -2xpj_1_D
223 -2xs5_1_D
224 -2xs7_1_B
225 -2z9q_1_A
226 -2z9q_1_A_1-72
227 -2zde_1_E
228 -2zde_1_F
229 -2zde_1_G
230 -2zde_1_H
231 -3avt_1_T
232 -3b0u_1_A
233 -3b0u_1_B
234 -3bbv_1_Z
235 -3cd6_1_4
236 -3cma_1_5
237 -3cme_1_5
238 -3cw1_1_V
239 -3cw1_1_v_1-138
240 -3cw1_1_V_1-138
241 -3cw1_1_W
242 -3cw1_1_w_1-138
243 -3cw1_1_X
244 -3cw1_1_x_1-138
245 -3d2s_1_F
246 -3d2s_1_H
247 -3ep2_1_A
248 -3ep2_1_B
249 -3ep2_1_B_1-50
250 -3ep2_1_C
251 -3ep2_1_D
252 -3ep2_1_E
253 -3ep2_1_Y
254 -3ep2_1_Y_1-72
255 -3eq3_1_A
256 -3eq3_1_B
257 -3eq3_1_B_1-50
258 -3eq3_1_C
259 -3eq3_1_D
260 -3eq3_1_E
261 -3eq3_1_Y
262 -3eq3_1_Y_1-72
263 -3eq4_1_A
264 -3eq4_1_B
265 -3eq4_1_B_1-50
266 -3eq4_1_C
267 -3eq4_1_D
268 -3eq4_1_E
269 -3eq4_1_Y
270 -3eq4_1_Y_1-69
271 -3er8_1_F
272 -3er8_1_G
273 -3er8_1_H
274 -3er9_1_D
275 -3erc_1_G
276 -3gpq_1_E
277 -3gpq_1_F
278 -3ie1_1_E
279 -3ie1_1_F
280 -3ie1_1_G
281 -3ie1_1_H
282 -3iy8_1_A
283 -3iy8_1_A_1-540
284 -3iy9_1_A
285 -3iy9_1_A_498-1027
286 -3j06_1_R
287 -3j0l_1_A
288 -3j0l_1_B
289 -3j0l_1_C
290 -3j0l_1_D
291 -3j0l_1_F
292 -3j0l_1_H
293 -3j0o_1_A
294 -3j0o_1_B
295 -3j0o_1_C
296 -3j0o_1_D
297 -3j0o_1_F
298 -3j0o_1_H
299 -3j0p_1_A
300 -3j0p_1_C
301 -3j0p_1_D
302 -3j0p_1_F
303 -3j0p_1_H
304 -3j0q_1_A
305 -3j0q_1_C
306 -3j0q_1_D
307 -3j0q_1_F
308 -3j0q_1_H
309 -3j2k_1_0
310 -3j2k_1_1
311 -3j2k_1_2
312 -3j2k_1_3
313 -3j2k_1_4
314 -3j46_1_A
315 -3j46_1_P
316 -3j6b_1_E
317 -3j6x_1_IR
318 -3j6y_1_IR
319 -3j9m_1_U
320 -3j9y_1_V
321 -3jb7_1_M
322 -3jb7_1_T
323 -3jbu_1_B
324 -3jbu_1_V
325 -3jbv_1_B
326 -3jcj_1_G
327 -3jcj_1_V
328 -3jcn_1_V
329 -3jcr_1_H
330 -3jcr_1_H_1-115
331 -3jcr_1_M
332 -3jcr_1_M_1-141
333 -3jcr_1_N
334 -3jcr_1_N_1-107
335 -3koa_1_C
336 -3m7n_1_Z
337 -3m85_1_X
338 -3m85_1_Y
339 -3m85_1_Z
340 -3nma_1_B
341 -3nma_1_C
342 -3nvk_1_G
343 -3nvk_1_S
344 -3ok4_1_2
345 -3ok4_1_4
346 -3ok4_1_H
347 -3ok4_1_J
348 -3ok4_1_L
349 -3ok4_1_N
350 -3ok4_1_P
351 -3ok4_1_R
352 -3ok4_1_T
353 -3ok4_1_V
354 -3ok4_1_X
355 -3ok4_1_Z
356 -3ol6_1_D
357 -3ol6_1_H
358 -3ol6_1_L
359 -3ol6_1_P
360 -3ol7_1_D
361 -3ol7_1_H
362 -3ol7_1_L
363 -3ol7_1_P
364 -3ol8_1_D
365 -3ol8_1_H
366 -3ol8_1_L
367 -3ol8_1_P
368 -3ol9_1_D
369 -3ol9_1_H
370 -3ol9_1_L
371 -3ol9_1_P
372 -3olb_1_D
373 -3olb_1_H
374 -3olb_1_L
375 -3olb_1_P
376 -3p6y_1_Q
377 -3p6y_1_T
378 -3p6y_1_U
379 -3p6y_1_V
380 -3p6y_1_W
381 -3pdm_1_R
382 -3pf5_1_S
383 -3pgw_1_N
384 -3pgw_1_N_1-164
385 -3pgw_1_R
386 -3pgw_1_R_1-164
387 -3qsu_1_P
388 -3qsu_1_R
389 -3rtj_1_D
390 -3rzo_1_R
391 -3s4g_1_B
392 -3s4g_1_C
393 -3t1h_1_W
394 -3t1y_1_W
395 -3u2e_1_C
396 -3u2e_1_D
397 -3wzi_1_C
398 -486d_1_F
399 -486d_1_G
400 -4a3b_1_P
401 -4a3c_1_P
402 -4a3e_1_P
403 -4a3g_1_P
404 -4a3j_1_P
405 -4a3m_1_P
406 -4adx_1_0
407 -4adx_1_0_1-2925
408 -4adx_1_8
409 -4adx_1_9
410 -4adx_1_9_1-123
411 -4afy_1_C
412 -4afy_1_D
413 -4am3_1_D
414 -4am3_1_H
415 -4am3_1_I
416 -4b3r_1_W
417 -4b3s_1_W
418 -4b3t_1_W
419 -4ba2_1_R
420 -4bbl_1_Y
421 -4bbl_1_Z
422 -4csf_1_A
423 -4csf_1_C
424 -4csf_1_E
425 -4csf_1_G
426 -4csf_1_I
427 -4csf_1_K
428 -4csf_1_M
429 -4csf_1_O
430 -4csf_1_Q
431 -4csf_1_S
432 -4csf_1_U
433 -4csf_1_W
434 -4cxg_1_A
435 -4cxg_1_B
436 -4cxg_1_C
437 -4cxh_1_A
438 -4cxh_1_B
439 -4cxh_1_C
440 -4cxh_1_X
441 -4d61_1_J
442 -4dr4_1_V
443 -4dr5_1_V
444 -4dr6_1_B
445 -4dr6_1_V
446 -4dr7_1_B
447 -4dr7_1_V
448 -4dwa_1_D
449 -4e6b_1_A
450 -4e6b_1_B
451 -4e6b_1_E
452 -4e6b_1_F
453 -4ejt_1_G
454 -4eya_1_A
455 -4eya_1_B
456 -4eya_1_C
457 -4eya_1_D
458 -4eya_1_E
459 -4eya_1_F
460 -4eya_1_G
461 -4eya_1_H
462 -4eya_1_I
463 -4eya_1_J
464 -4eya_1_K
465 -4eya_1_L
466 -4eya_1_M
467 -4eya_1_N
468 -4eya_1_O
469 -4eya_1_P
470 -4eya_1_Q
471 -4eya_1_R
472 -4eya_1_S
473 -4eya_1_T
474 -4g0a_1_E
475 -4g0a_1_F
476 -4g0a_1_G
477 -4g0a_1_H
478 -4g7o_1_I
479 -4g7o_1_S
480 -4g9z_1_E
481 -4g9z_1_F
482 -4gkj_1_W
483 -4gkk_1_W
484 -4gv3_1_B
485 -4gv3_1_C
486 -4gv6_1_B
487 -4gv6_1_C
488 -4gv9_1_E
489 -4hor_1_X
490 -4hos_1_X
491 -4hot_1_X
492 -4ht9_1_E
493 -4i67_1_B
494 -4ii9_1_C
495 -4j7m_1_B
496 -4jzu_1_C
497 -4jzv_1_C
498 -4k4s_1_D
499 -4k4s_1_H
500 -4k4t_1_D
501 -4k4t_1_H
502 -4k4u_1_D
503 -4k4u_1_H
504 -4k4x_1_D
505 -4k4x_1_H
506 -4k4x_1_L
507 -4k4x_1_P
508 -4k4z_1_D
509 -4k4z_1_H
510 -4k4z_1_L
511 -4k4z_1_P
512 -4kzx_1_I
513 -4kzy_1_I
514 -4kzz_1_I
515 -4kzz_1_J
516 -4lj0_1_C
517 -4lj0_1_D
518 -4lj0_1_E
519 -4lq3_1_R
520 -4m7d_1_P
521 -4n2s_1_B
522 -4n48_1_D
523 -4n48_1_G
524 -4nia_1_1
525 -4nia_1_2
526 -4nia_1_3
527 -4nia_1_4
528 -4nia_1_5
529 -4nia_1_6
530 -4nia_1_7
531 -4nia_1_8
532 -4nia_1_A
533 -4nia_1_B
534 -4nia_1_C
535 -4nia_1_D
536 -4nia_1_E
537 -4nia_1_F
538 -4nia_1_G
539 -4nia_1_H
540 -4nia_1_I
541 -4nia_1_J
542 -4nia_1_K
543 -4nia_1_L
544 -4nia_1_M
545 -4nia_1_N
546 -4nia_1_O
547 -4nia_1_U
548 -4nia_1_W
549 -4nia_1_Z
550 -4nku_1_D
551 -4nku_1_H
552 -4oau_1_A
553 -4oav_1_A
554 -4oav_1_C
555 -4ohy_1_B
556 -4ohz_1_B
557 -4oi0_1_B
558 -4oi1_1_B
559 -4oq8_1_D
560 -4oq9_1_1
561 -4oq9_1_2
562 -4oq9_1_3
563 -4oq9_1_4
564 -4oq9_1_5
565 -4oq9_1_6
566 -4oq9_1_7
567 -4oq9_1_8
568 -4oq9_1_A
569 -4oq9_1_B
570 -4oq9_1_C
571 -4oq9_1_D
572 -4oq9_1_E
573 -4oq9_1_F
574 -4oq9_1_G
575 -4oq9_1_H
576 -4oq9_1_I
577 -4oq9_1_J
578 -4oq9_1_K
579 -4oq9_1_L
580 -4oq9_1_M
581 -4oq9_1_N
582 -4oq9_1_O
583 -4oq9_1_U
584 -4oq9_1_W
585 -4oq9_1_Z
586 -4peh_1_V
587 -4peh_1_W
588 -4peh_1_X
589 -4peh_1_Y
590 -4peh_1_Z
591 -4pei_1_V
592 -4pei_1_W
593 -4pei_1_X
594 -4pei_1_Y
595 -4pei_1_Z
596 -4qm6_1_C
597 -4qm6_1_D
598 -4qu6_1_B
599 -4qu7_1_U
600 -4qu7_1_V
601 -4qu7_1_X
602 -4qvc_1_G
603 -4qvd_1_H
604 -4rcj_1_B
605 -4s2x_1_B
606 -4s2y_1_B
607 -4tu0_1_F
608 -4tu0_1_G
609 -4udv_1_R
610 -4v42_1_AA
611 -4v42_1_AA_2-1520
612 -4v42_1_BA
613 -4v42_1_BA_1-2914
614 -4v42_1_BB
615 -4v42_1_BB_5-121
616 -4v47_1_A0
617 -4v47_1_A0_1-2904
618 -4v47_1_A9
619 -4v47_1_A9_3-118
620 -4v47_1_BA
621 -4v47_1_BA_1-1542
622 -4v48_1_A0
623 -4v48_1_A0_1-2904
624 -4v48_1_A6
625 -4v48_1_A6_1-73
626 -4v48_1_A9
627 -4v48_1_A9_3-118
628 -4v48_1_BA
629 -4v48_1_BA_1-1543
630 -4v4f_1_A0
631 -4v4f_1_A1
632 -4v4f_1_A2
633 -4v4f_1_A3
634 -4v4f_1_A4
635 -4v4f_1_A5
636 -4v4f_1_A6
637 -4v4f_1_A7
638 -4v4f_1_A8
639 -4v4f_1_A9
640 -4v4f_1_AZ
641 -4v4f_1_B0
642 -4v4f_1_B1
643 -4v4f_1_B2
644 -4v4f_1_B3
645 -4v4f_1_B4
646 -4v4f_1_B5
647 -4v4f_1_B6
648 -4v4f_1_B7
649 -4v4f_1_B8
650 -4v4f_1_B9
651 -4v4f_1_BZ
652 -4v4i_1_W
653 -4v4i_1_X
654 -4v4i_1_Y
655 -4v4i_1_Z
656 -4v4j_1_W
657 -4v4j_1_X
658 -4v4j_1_Y
659 -4v4j_1_Z
660 -4v5z_1_AA
661 -4v5z_1_AA_1-1563
662 -4v5z_1_AB
663 -4v5z_1_AC
664 -4v5z_1_AD
665 -4v5z_1_AE
666 -4v5z_1_AF
667 -4v5z_1_AG
668 -4v5z_1_AH
669 -4v5z_1_B0
670 -4v5z_1_B0_1-2902
671 -4v5z_1_B1
672 -4v5z_1_B1_2-125
673 -4v5z_1_BA
674 -4v5z_1_BB
675 -4v5z_1_BC
676 -4v5z_1_BD
677 -4v5z_1_BE
678 -4v5z_1_BF
679 -4v5z_1_BG
680 -4v5z_1_BH
681 -4v5z_1_BI
682 -4v5z_1_BJ
683 -4v5z_1_BK
684 -4v5z_1_BL
685 -4v5z_1_BM
686 -4v5z_1_BN
687 -4v5z_1_BO
688 -4v5z_1_BP
689 -4v5z_1_BQ
690 -4v5z_1_BR
691 -4v5z_1_BS
692 -4v5z_1_BT
693 -4v5z_1_BU
694 -4v5z_1_BV
695 -4v5z_1_BW
696 -4v5z_1_BX
697 -4v5z_1_BY
698 -4v5z_1_BY_2-113
699 -4v5z_1_BZ
700 -4v5z_1_BZ_1-70
701 -4v68_1_A0
702 -4v7e_1_AA
703 -4v7e_1_AB
704 -4v7e_1_AC
705 -4v7e_1_AD
706 -4v7e_1_AE
707 -4v7j_1_AV
708 -4v7j_1_AW
709 -4v7j_1_BV
710 -4v7j_1_BW
711 -4v7k_1_AV
712 -4v7k_1_AW
713 -4v7k_1_BV
714 -4v7k_1_BW
715 -4v8t_1_1
716 -4v8z_1_CX
717 -4v99_1_AC
718 -4v99_1_AH
719 -4v99_1_AM
720 -4v99_1_AR
721 -4v99_1_AW
722 -4v99_1_BC
723 -4v99_1_BH
724 -4v99_1_BM
725 -4v99_1_BR
726 -4v99_1_BW
727 -4v99_1_CC
728 -4v99_1_CH
729 -4v99_1_CM
730 -4v99_1_CR
731 -4v99_1_CW
732 -4v99_1_DC
733 -4v99_1_DH
734 -4v99_1_DM
735 -4v99_1_DR
736 -4v99_1_DW
737 -4v99_1_EC
738 -4v99_1_EH
739 -4v99_1_EM
740 -4v99_1_ER
741 -4v99_1_EW
742 -4v99_1_FC
743 -4v99_1_FH
744 -4v99_1_FM
745 -4v99_1_FR
746 -4v99_1_FW
747 -4v99_1_GC
748 -4v99_1_GH
749 -4v99_1_GM
750 -4v99_1_GR
751 -4v99_1_GW
752 -4v99_1_HC
753 -4v99_1_HH
754 -4v99_1_HM
755 -4v99_1_HR
756 -4v99_1_HW
757 -4v99_1_IC
758 -4v99_1_IH
759 -4v99_1_IM
760 -4v99_1_IR
761 -4v99_1_IW
762 -4v99_1_JC
763 -4v99_1_JH
764 -4v99_1_JM
765 -4v99_1_JR
766 -4v99_1_JW
767 -4v9e_1_AA
768 -4v9e_1_AG
769 -4v9e_1_AM
770 -4v9e_1_BA
771 -4v9e_1_BG
772 -4v9e_1_BM
773 -4w2e_1_W
774 -4w2e_1_X
775 -4w2h_1_CY_1-73
776 -4wkr_1_C
777 -4wt8_1_AB
778 -4wt8_1_BB
779 -4wt8_1_CS
780 -4wt8_1_DS
781 -4wti_1_P
782 -4wti_1_T
783 -4wtj_1_P
784 -4wtj_1_T
785 -4wtk_1_P
786 -4wtk_1_T
787 -4wtl_1_P
788 -4wtl_1_T
789 -4wtm_1_P
790 -4wtm_1_T
791 -4x4u_1_H
792 -4x62_1_B
793 -4x64_1_B
794 -4x65_1_B
795 -4x66_1_B
796 -4x9e_1_G
797 -4x9e_1_H
798 -4xbf_1_D
799 -4xln_1_Q
800 -4xln_1_T
801 -4xlr_1_Q
802 -4xlr_1_T
803 -4y4p_1_1W
804 -4y4p_1_1X
805 -4y4p_1_1Y
806 -4y4p_1_2W
807 -4y4p_1_2X
808 -4y4p_1_2Y
809 -4yln_1_3
810 -4yln_1_6
811 -4yln_1_9
812 -4ylo_1_3
813 -4ylo_1_6
814 -4ylo_1_9
815 -4yoe_1_E
816 -4z3s_1_1W
817 -4z3s_1_1X
818 -4z3s_1_1Y
819 -4z3s_1_2W
820 -4z3s_1_2X
821 -4z3s_1_2Y
822 -4z8c_1_1X
823 -4z8c_1_2X
824 -4zer_1_1X
825 -4zer_1_2X
826 -5a0v_1_F
827 -5a79_1_R
828 -5a7a_1_R
829 -5afi_1_V
830 -5afi_1_W
831 -5afi_1_Y
832 -5aj0_1_BV
833 -5aj0_1_BW
834 -5bud_1_D
835 -5bud_1_E
836 -5c0y_1_C
837 -5ceu_1_C
838 -5ceu_1_D
839 -5det_1_P
840 -5doy_1_1W
841 -5doy_1_1X
842 -5doy_1_1Y
843 -5doy_1_2W
844 -5doy_1_2X
845 -5doy_1_2Y
846 -5dto_1_B
847 -5e02_1_C
848 -5elk_1_R
849 -5els_1_I
850 -5elt_1_E
851 -5elt_1_F
852 -5f6c_1_C
853 -5f6c_1_E
854 -5f8k_1_1X
855 -5f8k_1_2X
856 -5fl8_1_X
857 -5fl8_1_Y
858 -5fl8_1_Z
859 -5flx_1_Z
860 -5g2x_1_A_595-692
861 -5gmf_1_E
862 -5gmf_1_F
863 -5gmf_1_G
864 -5gmf_1_H
865 -5gmg_1_C
866 -5gmg_1_D
867 -5gxi_1_B
868 -5h5u_1_H
869 -5hau_1_1W
870 -5hau_1_2W
871 -5hcp_1_1X
872 -5hcp_1_2X
873 -5hcq_1_1X
874 -5hcq_1_2X
875 -5hcr_1_1X
876 -5hcr_1_2X
877 -5hd1_1_1X
878 -5hd1_1_2X
879 -5hjz_1_C
880 -5hk0_1_F
881 -5hkc_1_C
882 -5i2d_1_K
883 -5i2d_1_V
884 -5ipl_1_3
885 -5ipm_1_3
886 -5ipn_1_3
887 -5it9_1_I
888 -5j4b_1_1W
889 -5j4b_1_1X
890 -5j4b_1_1Y
891 -5j4b_1_2W
892 -5j4b_1_2X
893 -5j4b_1_2Y
894 -5j4c_1_1W
895 -5j4c_1_1X
896 -5j4c_1_1Y
897 -5j4c_1_2W
898 -5j4c_1_2X
899 -5j4c_1_2Y
900 -5j8b_1_W
901 -5j8b_1_X
902 -5j8b_1_Y
903 -5jcs_1_X
904 -5jcs_1_Y
905 -5jcs_1_Z
906 -5jju_1_C
907 -5k77_1_V
908 -5k77_1_W
909 -5k77_1_X
910 -5k77_1_Y
911 -5k77_1_Z
912 -5k78_1_X
913 -5k78_1_Y
914 -5k8h_1_A
915 -5kal_1_Y
916 -5kal_1_Z
917 -5kcr_1_1X
918 -5kcs_1_1X
919 -5l3p_1_X
920 -5l3p_1_Y
921 -5lza_1_V
922 -5lzb_1_V
923 -5lzb_1_W
924 -5lzb_1_X
925 -5lzb_1_Y
926 -5lzc_1_V
927 -5lzc_1_W
928 -5lzc_1_X
929 -5lzc_1_Y
930 -5lzd_1_V
931 -5lzd_1_W
932 -5lzd_1_X
933 -5lzd_1_Y
934 -5lze_1_V
935 -5lze_1_W
936 -5lze_1_X
937 -5lze_1_Y
938 -5lzf_1_V
939 -5lzf_1_Y
940 -5lzs_1_II
941 -5lzy_1_HH
942 -5mc6_1_M
943 -5mc6_1_N
944 -5mfx_1_B
945 -5mgp_1_X
946 -5mmi_1_Z
947 -5mmj_1_A
948 -5mmm_1_Z
949 -5mq0_1_3
950 -5mrc_1_AA
951 -5mrc_1_BB
952 -5mre_1_AA
953 -5mre_1_BB
954 -5mrf_1_AA
955 -5mrf_1_BB
956 -5new_1_C
957 -5o1y_1_B
958 -5o2r_1_X
959 -5o3j_1_B
960 -5odv_1_A
961 -5odv_1_B
962 -5odv_1_C
963 -5odv_1_D
964 -5odv_1_E
965 -5odv_1_F
966 -5odv_1_G
967 -5odv_1_H
968 -5odv_1_I
969 -5odv_1_J
970 -5odv_1_K
971 -5odv_1_L
972 -5odv_1_M
973 -5odv_1_N
974 -5odv_1_O
975 -5odv_1_P
976 -5odv_1_Q
977 -5odv_1_R
978 -5odv_1_S
979 -5odv_1_T
980 -5odv_1_U
981 -5odv_1_V
982 -5odv_1_W
983 -5odv_1_X
984 -5sze_1_C
985 -5t2c_1_AN
986 -5tbw_1_SR
987 -5u4i_1_X
988 -5u4i_1_Y
989 -5u4j_1_X
990 -5u4j_1_Z
991 -5udi_1_B
992 -5udj_1_B
993 -5udk_1_B
994 -5udl_1_B
995 -5uef_1_C
996 -5uef_1_D
997 -5uh5_1_I
998 -5uh6_1_I
999 -5uh8_1_I
1000 -5uh9_1_I
1001 -5uhc_1_I
1002 -5uk4_1_U
1003 -5uk4_1_V
1004 -5uk4_1_W
1005 -5uk4_1_X
1006 -5uq7_1_X
1007 -5uq7_1_Y
1008 -5uq7_1_Z
1009 -5uq8_1_X
1010 -5uq8_1_Y
1011 -5uq8_1_Z
1012 -5vi5_1_Q
1013 -5vyc_1_I1
1014 -5vyc_1_I2
1015 -5vyc_1_I3
1016 -5vyc_1_I4
1017 -5vyc_1_I5
1018 -5vyc_1_I6
1019 -5w0m_1_H
1020 -5w0m_1_I
1021 -5w0m_1_J
1022 -5w4k_1_1W
1023 -5w4k_1_1X
1024 -5w4k_1_1Y
1025 -5w4k_1_2W
1026 -5w4k_1_2X
1027 -5w4k_1_2Y
1028 -5w5h_1_B
1029 -5w5h_1_D
1030 -5w5i_1_B
1031 -5w5i_1_D
1032 -5wdt_1_V
1033 -5wdt_1_W
1034 -5wdt_1_Y
1035 -5we4_1_V
1036 -5we4_1_W
1037 -5we4_1_Y
1038 -5we6_1_V
1039 -5we6_1_W
1040 -5we6_1_Y
1041 -5wf0_1_V
1042 -5wf0_1_W
1043 -5wf0_1_Y
1044 -5wfk_1_V
1045 -5wfk_1_W
1046 -5wfk_1_Y
1047 -5wfs_1_V
1048 -5wfs_1_W
1049 -5wfs_1_Y
1050 -5wis_1_1W
1051 -5wis_1_1X
1052 -5wis_1_1Y
1053 -5wis_1_2W
1054 -5wis_1_2X
1055 -5wis_1_2Y
1056 -5wit_1_1W
1057 -5wit_1_1X
1058 -5wit_1_1Y
1059 -5wit_1_2W
1060 -5wit_1_2X
1061 -5wit_1_2Y
1062 -5wnp_1_B
1063 -5wnt_1_B
1064 -5wnu_1_B
1065 -5wnv_1_B
1066 -5x21_1_I
1067 -5x22_1_I
1068 -5x22_1_S
1069 -5x70_1_E
1070 -5x70_1_G
1071 -5x8r_1_A
1072 -5y88_1_X
1073 -5yts_1_B
1074 -5ytv_1_B
1075 -5ytx_1_B
1076 -5z4a_1_B
1077 -5z4d_1_B
1078 -5z4j_1_B
1079 -5zeb_1_V
1080 -5zep_1_W
1081 -5zeu_1_A
1082 -5zeu_1_V
1083 -5zsa_1_C
1084 -5zsa_1_D
1085 -5zsb_1_C
1086 -5zsb_1_D
1087 -5zsc_1_C
1088 -5zsc_1_D
1089 -5zsd_1_C
1090 -5zsd_1_D
1091 -5zsl_1_D
1092 -5zsl_1_E
1093 -5zsn_1_D
1094 -5zsn_1_E
1095 -5zuu_1_G
1096 -5zuu_1_I
1097 -6a4e_1_B
1098 -6a4e_1_D
1099 -6a6l_1_D
1100 -6b6h_1_3
1101 -6bk8_1_I
1102 -6c4i_1_X
1103 -6c4i_1_Y
1104 -6cae_1_1W
1105 -6cae_1_1X
1106 -6cae_1_1Y
1107 -6cae_1_2W
1108 -6cae_1_2X
1109 -6cae_1_2Y
1110 -6cfj_1_1W
1111 -6cfj_1_1X
1112 -6cfj_1_1Y
1113 -6cfj_1_2W
1114 -6cfj_1_2X
1115 -6cfj_1_2Y
1116 -6d1v_1_C
1117 -6d2z_1_C
1118 -6d30_1_C
1119 -6dmn_1_B
1120 -6dmv_1_B
1121 -6do8_1_B
1122 -6do9_1_B
1123 -6doa_1_B
1124 -6dob_1_B
1125 -6doc_1_B
1126 -6dod_1_B
1127 -6doe_1_B
1128 -6dof_1_B
1129 -6dog_1_B
1130 -6doh_1_B
1131 -6doi_1_B
1132 -6doj_1_B
1133 -6dok_1_B
1134 -6dol_1_B
1135 -6dom_1_B
1136 -6don_1_B
1137 -6doo_1_B
1138 -6dop_1_B
1139 -6doq_1_B
1140 -6dor_1_B
1141 -6dos_1_B
1142 -6dot_1_B
1143 -6dou_1_B
1144 -6dov_1_B
1145 -6dow_1_B
1146 -6dox_1_B
1147 -6doz_1_B
1148 -6dp0_1_B
1149 -6dp1_1_B
1150 -6dp2_1_B
1151 -6dp3_1_B
1152 -6dp4_1_B
1153 -6dp5_1_B
1154 -6dp6_1_B
1155 -6dp7_1_B
1156 -6dp8_1_B
1157 -6dp9_1_B
1158 -6dpa_1_B
1159 -6dpb_1_B
1160 -6dpc_1_B
1161 -6dpd_1_B
1162 -6dpe_1_B
1163 -6dpf_1_B
1164 -6dpg_1_B
1165 -6dph_1_B
1166 -6dpi_1_B
1167 -6dpj_1_B
1168 -6dpk_1_B
1169 -6dpl_1_B
1170 -6dpm_1_B
1171 -6dpn_1_B
1172 -6dpo_1_B
1173 -6dpp_1_B
1174 -6dti_1_W
1175 -6dzi_1_H
1176 -6e0o_1_B
1177 -6e0o_1_C
1178 -6e4p_1_J
1179 -6e4p_1_K
1180 -6een_1_G
1181 -6een_1_H
1182 -6een_1_I
1183 -6enf_1_X
1184 -6enj_1_X
1185 -6enu_1_X
1186 -6eri_1_AX
1187 -6evj_1_M
1188 -6evj_1_N
1189 -6fqr_1_C
1190 -6ftg_1_U
1191 -6ftg_1_V
1192 -6ftg_1_W
1193 -6fti_1_Q
1194 -6fti_1_U
1195 -6fti_1_V
1196 -6fti_1_W
1197 -6ftj_1_U
1198 -6ftj_1_V
1199 -6ftj_1_W
1200 -6gc5_1_F
1201 -6gc5_1_G
1202 -6gc5_1_H
1203 -6gfw_1_R
1204 -6gwt_1_X
1205 -6gx6_1_B
1206 -6gxm_1_X
1207 -6gxn_1_X
1208 -6gxo_1_X
1209 -6gz3_1_BV
1210 -6gz3_1_BW
1211 -6gz4_1_BV
1212 -6gz4_1_BW
1213 -6gz5_1_BV
1214 -6gz5_1_BW
1215 -6h4n_1_W
1216 -6h58_1_W
1217 -6h58_1_WW
1218 -6ha1_1_X
1219 -6ha8_1_X
1220 -6hcj_1_Q3
1221 -6hcq_1_Q3
1222 -6hhq_1_SR
1223 -6htq_1_U
1224 -6htq_1_V
1225 -6htq_1_W
1226 -6hxx_1_AA
1227 -6hxx_1_AB
1228 -6hxx_1_AC
1229 -6hxx_1_AD
1230 -6hxx_1_AE
1231 -6hxx_1_AF
1232 -6hxx_1_AG
1233 -6hxx_1_AH
1234 -6hxx_1_AI
1235 -6hxx_1_AJ
1236 -6hxx_1_AK
1237 -6hxx_1_AL
1238 -6hxx_1_AM
1239 -6hxx_1_AN
1240 -6hxx_1_AO
1241 -6hxx_1_AP
1242 -6hxx_1_AQ
1243 -6hxx_1_AR
1244 -6hxx_1_AS
1245 -6hxx_1_AT
1246 -6hxx_1_AU
1247 -6hxx_1_AV
1248 -6hxx_1_AW
1249 -6hxx_1_AX
1250 -6hxx_1_AY
1251 -6hxx_1_AZ
1252 -6hxx_1_BA
1253 -6hxx_1_BB
1254 -6hxx_1_BC
1255 -6hxx_1_BD
1256 -6hxx_1_BE
1257 -6hxx_1_BF
1258 -6hxx_1_BG
1259 -6hxx_1_BH
1260 -6hxx_1_BI
1261 -6hyu_1_D
1262 -6i0t_1_B
1263 -6i0u_1_B
1264 -6i0v_1_B
1265 -6i2n_1_U
1266 -6i7o_1_2B
1267 -6i7o_1_L
1268 -6i7o_1_M
1269 -6i7o_1_MB
1270 -6i7o_1_N
1271 -6i7o_1_NB
1272 -6ij2_1_E
1273 -6ij2_1_F
1274 -6ij2_1_G
1275 -6ij2_1_H
1276 -6ip5_1_2M
1277 -6ip5_1_ZU
1278 -6ip5_1_ZY
1279 -6ip6_1_2M
1280 -6ip6_1_ZY
1281 -6ip6_1_ZZ
1282 -6ip8_1_2M
1283 -6ip8_1_ZY
1284 -6ip8_1_ZZ
1285 -6is0_1_C
1286 -6j7z_1_C
1287 -6k32_1_P
1288 -6k32_1_T
1289 -6kqd_1_I
1290 -6kqd_1_S
1291 -6kqe_1_I
1292 -6kql_1_I
1293 -6kr6_1_B
1294 -6ktc_1_V
1295 -6kug_1_B
1296 -6l74_1_I
1297 -6lkq_1_S
1298 -6lkq_1_T
1299 -6lkq_1_U
1300 -6lkq_1_W
1301 -6m6v_1_E
1302 -6m6v_1_F
1303 -6m6v_1_G
1304 -6m7k_1_B
1305 -6mkn_1_W
1306 -6mpf_1_W
1307 -6mpi_1_W
1308 -6n6a_1_D
1309 -6n6c_1_D
1310 -6n6d_1_D
1311 -6n6e_1_D
1312 -6n6f_1_D
1313 -6n6g_1_D
1314 -6n6h_1_D
1315 -6n6i_1_C
1316 -6n6i_1_D
1317 -6n6j_1_C
1318 -6n6j_1_D
1319 -6n6k_1_C
1320 -6n6k_1_D
1321 -6n9e_1_1X
1322 -6n9e_1_2W
1323 -6n9e_1_2X
1324 -6n9f_1_1X
1325 -6n9f_1_2X
1326 -6nd5_1_1W
1327 -6nd5_1_1X
1328 -6nd5_1_1Y
1329 -6nd5_1_2W
1330 -6nd5_1_2X
1331 -6nd5_1_2Y
1332 -6nd6_1_1W
1333 -6nd6_1_1X
1334 -6nd6_1_1Y
1335 -6nd6_1_2W
1336 -6nd6_1_2X
1337 -6nd6_1_2Y
1338 -6nu2_1_U
1339 -6nu3_1_U
1340 -6o6v_1_C
1341 -6o6v_1_D
1342 -6o6x_1_C
1343 -6o6x_1_D
1344 -6o75_1_C
1345 -6o75_1_D
1346 -6o78_1_E
1347 -6o79_1_C
1348 -6o7b_1_C
1349 -6o7b_1_D
1350 -6o7h_1_K
1351 -6o7i_1_I
1352 -6o7k_1_G
1353 -6o7k_1_V
1354 -6o8w_1_U
1355 -6o97_1_1W
1356 -6o97_1_1X
1357 -6o97_1_1Y
1358 -6o97_1_2W
1359 -6o97_1_2X
1360 -6o97_1_2Y
1361 -6o9j_1_V
1362 -6o9k_1_Y
1363 -6of1_1_1W
1364 -6of1_1_1X
1365 -6of1_1_1Y
1366 -6of1_1_2W
1367 -6of1_1_2X
1368 -6of1_1_2Y
1369 -6ogy_1_M
1370 -6ogy_1_N
1371 -6okk_1_G
1372 -6ole_1_T
1373 -6ole_1_U
1374 -6ole_1_V
1375 -6olf_1_T
1376 -6olf_1_U
1377 -6olf_1_V
1378 -6olg_1_BV
1379 -6oli_1_T
1380 -6oli_1_U
1381 -6oli_1_V
1382 -6olz_1_BV
1383 -6om0_1_T
1384 -6om0_1_U
1385 -6om0_1_V
1386 -6om7_1_T
1387 -6om7_1_U
1388 -6om7_1_V
1389 -6ov0_1_E
1390 -6ov0_1_F
1391 -6ov0_1_G
1392 -6ov0_1_H
1393 -6ovy_1_I
1394 -6ow3_1_I
1395 -6owl_1_B
1396 -6owl_1_C
1397 -6oy5_1_I
1398 -6oy6_1_I
1399 -6p71_1_I
1400 -6p7p_1_D
1401 -6p7p_1_E
1402 -6p7p_1_F
1403 -6p7q_1_D
1404 -6p7q_1_E
1405 -6p7q_1_F
1406 -6pb4_1_3
1407 -6pmi_1_3
1408 -6pmj_1_3
1409 -6ppn_1_A
1410 -6ppn_1_I
1411 -6q1h_1_D
1412 -6q1h_1_H
1413 -6q8y_1_M
1414 -6q8y_1_N
1415 -6qcs_1_M
1416 -6qdw_1_A
1417 -6qdw_1_B
1418 -6qdw_1_V
1419 -6qik_1_X
1420 -6qik_1_Y
1421 -6qt0_1_X
1422 -6qt0_1_Y
1423 -6qtz_1_X
1424 -6qtz_1_Y
1425 -6qx3_1_G
1426 -6r7b_1_D
1427 -6r7b_1_E
1428 -6r9m_1_B
1429 -6r9o_1_B
1430 -6r9p_1_B
1431 -6r9q_1_B
1432 -6r9r_1_D
1433 -6r9r_1_E
1434 -6raz_1_Y
1435 -6rcl_1_C
1436 -6ri5_1_X
1437 -6ri5_1_Y
1438 -6rt4_1_C
1439 -6rt4_1_D
1440 -6rt5_1_A
1441 -6rt5_1_E
1442 -6rt6_1_A
1443 -6rt6_1_E
1444 -6rt7_1_A
1445 -6rt7_1_E
1446 -6rzz_1_X
1447 -6rzz_1_Y
1448 -6s05_1_X
1449 -6s05_1_Y
1450 -6s0m_1_C
1451 -6sag_1_R
1452 -6sce_1_B
1453 -6scf_1_I
1454 -6scf_1_K
1455 -6scf_1_L
1456 -6scf_1_M
1457 -6skf_1_AA
1458 -6skg_1_AA
1459 -6spc_1_A
1460 -6spe_1_A
1461 -6sty_1_C
1462 -6sty_1_F
1463 -6sv4_1_2B
1464 -6sv4_1_2C
1465 -6sv4_1_MB
1466 -6sv4_1_MC
1467 -6sv4_1_N
1468 -6sv4_1_NB
1469 -6sv4_1_NC
1470 -6swa_1_Q
1471 -6swa_1_R
1472 -6swa_1_S
1473 -6szs_1_X
1474 -6t34_1_A
1475 -6t34_1_B
1476 -6t34_1_C
1477 -6t34_1_D
1478 -6t34_1_E
1479 -6t34_1_F
1480 -6t34_1_G
1481 -6t34_1_H
1482 -6t34_1_I
1483 -6t34_1_J
1484 -6t34_1_K
1485 -6t34_1_L
1486 -6t34_1_M
1487 -6t34_1_N
1488 -6t34_1_O
1489 -6t34_1_P
1490 -6t34_1_Q
1491 -6t34_1_R
1492 -6t34_1_S
1493 -6t83_1_1B
1494 -6t83_1_2B
1495 -6t83_1_3B
1496 -6t83_1_4B
1497 -6t83_1_6B
1498 -6t83_1_A
1499 -6t83_1_AA
1500 -6t83_1_BB
1501 -6t83_1_CA
1502 -6tb3_1_N
1503 -6th6_1_AA
1504 -6tnu_1_M
1505 -6tnu_1_N
1506 -6ty9_1_M
1507 -6tz1_1_N
1508 -6u6y_1_E
1509 -6u6y_1_F
1510 -6u6y_1_G
1511 -6u6y_1_H
1512 -6u9x_1_H
1513 -6u9x_1_K
1514 -6ucq_1_1X
1515 -6ucq_1_1Y
1516 -6ucq_1_2X
1517 -6ucq_1_2Y
1518 -6uej_1_B
1519 -6uo1_1_1W
1520 -6uo1_1_1X
1521 -6uo1_1_1Y
1522 -6uo1_1_2W
1523 -6uo1_1_2X
1524 -6uo1_1_2Y
1525 -6utw_1_333
1526 -6uu0_1_333
1527 -6uu1_1_333
1528 -6uu2_1_333
1529 -6uu3_1_333
1530 -6uu4_1_333
1531 -6uu6_1_333
1532 -6uuc_1_333
1533 -6uz7_1_8_2140-2827
1534 -6v39_1_SN1
1535 -6v39_1_V
1536 -6v3a_1_SN1
1537 -6v3a_1_V
1538 -6v3b_1_SN1
1539 -6v3e_1_SN1
1540 -6vm6_1_G
1541 -6vm6_1_H
1542 -6vm6_1_I
1543 -6vm6_1_J
1544 -6vm6_1_K
1545 -6vyt_1_Y
1546 -6vyu_1_Y
1547 -6vyw_1_Y
1548 -6vyx_1_Y
1549 -6vyy_1_Y
1550 -6vyz_1_Y
1551 -6vz2_1_Y
1552 -6vz3_1_Y
1553 -6vz5_1_Y
1554 -6vz7_1_Y
1555 -6w6l_1_T
1556 -6w6l_1_U
1557 -6w6l_1_V
1558 -6wan_1_G
1559 -6wan_1_H
1560 -6wan_1_I
1561 -6wan_1_J
1562 -6wan_1_K
1563 -6wan_1_L
1564 -6wox_1_I
1565 -6woy_1_I
1566 -6wre_1_D
1567 -6x1b_1_D
1568 -6x1b_1_F
1569 -6xqd_1_1X
1570 -6xqd_1_2X
1571 -6xqe_1_1X
1572 -6xqe_1_2X
1573 -6xz7_1_F
1574 -6xz7_1_G
1575 -6y69_1_W
1576 -6ybv_1_K
1577 -6ybv_1_W
1578 -6ys3_1_A
1579 -6ys3_1_B
1580 -6ys3_1_V
1581 -6ysr_1_W
1582 -6yss_1_W
1583 -6yst_1_W
1584 -6ysu_1_W
1585 -6yud_1_K
1586 -6yud_1_M
1587 -6yud_1_O
1588 -6yud_1_P
1589 -6yud_1_Q
1590 -6ywo_1_E
1591 -6ywo_1_F
1592 -6ywo_1_I
1593 -6ywo_1_K
1594 -6z1p_1_AA
1595 -6z1p_1_AB
1596 -6z1p_1_BA
1597 -6z1p_1_BB
1598 -6z8k_1_X
1599 -6zmw_1_W
1600 -6zvh_1_X
1601 -6zvi_1_D
1602 -6zvi_1_E
1603 -6zvi_1_H
1604 -7jql_1_1X
1605 -7jql_1_2X
1606 -7jqm_1_1X
1607 -7jqm_1_2X
1608 -7jyy_1_E
1609 -7jyy_1_F
1610 -7jz0_1_E
1611 -7jz0_1_F
1612 -7k00_1_5
1613 -7k00_1_B
1614 -1qzb_1_B_1-73
1615 -1qza_1_B_1-73
1616 -5zzm_1_M_3-118
1617 -5zzm_1_N_1-2904
1618 -3dg2_1_B_1-2904
1619 -3dg0_1_B_1-2904
1620 -3dg4_1_B_1-2904
1621 -3dg5_1_B_1-2904
1622 -3dg2_1_A_1-1542
1623 -3dg0_1_A_1-1542
1624 -3dg4_1_A_1-1542
1625 -3dg5_1_A_1-1542
This diff could not be displayed because it is too large.