Aglaé TABOT

Merge branch 'master' into stage_aglae

...@@ -23,3 +23,5 @@ scripts/*.sh ...@@ -23,3 +23,5 @@ scripts/*.sh
23 scripts/*.tar 23 scripts/*.tar
24 scripts/measure.py 24 scripts/measure.py
25 scripts/recompute_some_chains.py 25 scripts/recompute_some_chains.py
26 +scripts/convert_rna_jsons.py
27 +scripts/recompute_family.py
......
1 ############################################################################################ 1 ############################################################################################
2 +v 1.6 beta, August 2021
3 +
4 +Aglaé Tabot joins the development team. Khodor Hannoush leaves.
5 +
6 +FEATURE CHANGES
7 + - Distinct options --cmalign-opts and --cmalign-rrna-opts allow to adapt the parameters for LSU and SSU families.
8 + The LSU and SSU are now aligned with Infernal options '--cpu 10 --mxsize 8192 --mxtau 0.1', which is slow,
9 + requires up to 100 GB of RAM, and yields a suboptimal alignment (tau=0.1 is quite bad), but is homogenous with the other families.
10 + - The LSU and SSU therefore have defined cm_coords fields, and therefore distance matrices can be computed.
11 + - Distances matrices are computed on all availables molecules of the family by default, but you can use statistics.py --non-redundant to only
12 + select the equivalence class representatives at a given resolution into account (new option). For storage reasons, rRNAs are always run in
13 + this mode (but this might change in the future : space required is 'only' ~300 GB).
14 + - We now provide for download the renumbered (standardised) 3D MMCIF files, the nucleotides being numbered by their "index_chain" in the database.
15 + - We now provide for download the sequences of the 3D chains aligned by Rfam family (without Rfam sequences, which have been removed).
16 + - statistics.py now computes histograms and a density estimation with Gaussian mixture models for a large set of geometric parameters,
17 + measured on the unmapped data at a given resolution threshold. The parameters include:
18 + * All atom bonded distances and torsion angles
19 + * Distances, flat angles and torsion angles in the Pyle/VFold model
20 + * Distances, flat angles and torsion anfles in the HiRE-RNA model
21 + * Sequence-dependant geometric parameters of the basepairs for all non-canonical basepairs in the HiRE-RNA model.
22 + The data is saved as JSON files of parameters, and numerous figures are produced to illustrate the distributions.
23 + The number of gaussians to use in the GMMs are hard-coded in geometric_stats.py after our first estimation. If you do not want to trust this estimation,
24 + you can ignore it with option --rescan-nmodes. An exploration of the number of Gaussians from 1 to 8 will be performed, and the best GMM will be kept.
25 +
26 +BUG CORRECTIONS
27 + - New code file geometric_stats.py
28 + - New automation script that starts from scratch
29 + - Many small fixes, leading to the support of many previously "known issues"
30 + - Performance tweaks
31 +
32 +TECHNICAL CHANGES
33 + - Switched to DSSR Pro.
34 + - Switched to esl-alimerge instead of cmalign --merge to merge alignments.
35 + - Tested successfully with Python 3.9.6 + BioPython 1.79.
36 + However, the production server still runs with Python 3.8.1 + BioPython 1.78.
37 +
38 +############################################################################################
2 v 1.5 beta, April 2021 39 v 1.5 beta, April 2021
3 40
4 FEATURE CHANGES 41 FEATURE CHANGES
......
1 MIT License 1 MIT License
2 2
3 -Copyright (c) 2019 Louis Becquey 3 +Copyright (c) 2019-2021 IBISC, Université Paris Saclay
4 4
5 Permission is hereby granted, free of charge, to any person obtaining a copy 5 Permission is hereby granted, free of charge, to any person obtaining a copy
6 of this software and associated documentation files (the "Software"), to deal 6 of this software and associated documentation files (the "Software"), to deal
......
...@@ -10,6 +10,7 @@ Contents: ...@@ -10,6 +10,7 @@ Contents:
10 * [Database tables documentation](doc/Database.md) 10 * [Database tables documentation](doc/Database.md)
11 * [FAQ](doc/FAQ.md) 11 * [FAQ](doc/FAQ.md)
12 * [Troubleshooting](#troubleshooting) 12 * [Troubleshooting](#troubleshooting)
13 +* [Known Issues and Feature Requests](doc/KnownIssues.md)
13 * [Contact](#contact) 14 * [Contact](#contact)
14 15
15 ## Cite us 16 ## Cite us
...@@ -18,15 +19,13 @@ Contents: ...@@ -18,15 +19,13 @@ Contents:
18 19
19 Additional relevant references: 20 Additional relevant references:
20 21
21 -The "ProteinNet" philosophy which inspired this work:
22 -* AlQuraishi, M. (2019b). **ProteinNet: A standardized data set for machine learning of protein structure.** *BMC Bioinformatics*, 20(1), 311
23 -
24 If you use our annotations by DSSR, you might want to cite: 22 If you use our annotations by DSSR, you might want to cite:
25 * Lu, X.-J.et al.(2015). **DSSR: An integrated software tool for dissecting the spatial structure of RNA.** *Nucleic Acids Research*, 43(21), e142–e142. 23 * Lu, X.-J.et al.(2015). **DSSR: An integrated software tool for dissecting the spatial structure of RNA.** *Nucleic Acids Research*, 43(21), e142–e142.
26 24
27 If you use our multiple sequence alignments and homology data, you might want to cite: 25 If you use our multiple sequence alignments and homology data, you might want to cite:
28 -* Pruesse, E. et al.(2012). **Sina: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.** *Bioinformatics*, 28(14), 1823–1829
29 * Nawrocki, E. P. and Eddy, S. R. (2013). **Infernal 1.1: 100-fold faster RNA homology searches.** *Bioinformatics*, 29(22), 2933–2935. 26 * Nawrocki, E. P. and Eddy, S. R. (2013). **Infernal 1.1: 100-fold faster RNA homology searches.** *Bioinformatics*, 29(22), 2933–2935.
27 +* Pruesse, E. et al.(2012). **Sina: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.** *Bioinformatics*, 28(14), 1823–1829
28 +
30 29
31 30
32 # What is RNANet ? 31 # What is RNANet ?
...@@ -39,7 +38,8 @@ Most interestingly, nucleotides have been renumered in a standardized way, and t ...@@ -39,7 +38,8 @@ Most interestingly, nucleotides have been renumered in a standardized way, and t
39 38
40 ## Methodology 39 ## Methodology
41 We use the Rfam mappings between 3D structures and known Rfam families, using the sequences that are known to belong to an Rfam family (hits provided in RF0XXXX.fasta files from Rfam). 40 We use the Rfam mappings between 3D structures and known Rfam families, using the sequences that are known to belong to an Rfam family (hits provided in RF0XXXX.fasta files from Rfam).
42 -Future versions might compute a real MSA-based clusering directly with Rfamseq ncRNA sequences, like ProteinNet does with protein sequences, but this requires a tool similar to jackHMMER in the Infernal software suite, which is not available yet. 41 +Future versions might compute a real MSA-based clusering directly with Rfamseq ncRNA sequences, like ProteinNet does with protein sequences, but this requires a tool similar to jackHMMER in the Infernal software suite, which is not available yet.
42 +If interested by such approaches, the user may check tools like RNAlien.
43 43
44 This script prepares the dataset from available public data in PDB, RNA 3D Hub, Rfam and SILVA. 44 This script prepares the dataset from available public data in PDB, RNA 3D Hub, Rfam and SILVA.
45 45
...@@ -48,15 +48,16 @@ This script prepares the dataset from available public data in PDB, RNA 3D Hub, ...@@ -48,15 +48,16 @@ This script prepares the dataset from available public data in PDB, RNA 3D Hub,
48 The script follows these steps: 48 The script follows these steps:
49 49
50 To gather structures: 50 To gather structures:
51 -* Gets a list of 3D structures containing RNA from BGSU's non-redundant list (but keeps the redundant structures /!\\), 51 +* Gets a list of 3D structures containing RNA from BGSU's non-redundant list (redundancy can be kept or eliminated, see command line option `--redundant`),
52 * Asks Rfam for mappings of these structures onto Rfam families (~50% of structures have a direct mapping, some more are inferred using the redundancy list) 52 * Asks Rfam for mappings of these structures onto Rfam families (~50% of structures have a direct mapping, some more are inferred using the redundancy list)
53 * Downloads the corresponding 3D structures (mmCIFs) 53 * Downloads the corresponding 3D structures (mmCIFs)
54 -* If desired, extracts the right chain portions that map onto an Rfam family to a separate mmCIF file 54 +* Standardizes the residue numbering from 1 to N, including missing residues (gaps)
55 +* If desired, extracts the renumbered chain portions that map onto an Rfam family to a separate mmCIF file
55 56
56 To compute homology information: 57 To compute homology information:
57 -* Extract the sequence for every 3D chain 58 +* Extracts the sequence of every 3D chain
58 * Downloads Rfamseq ncRNA sequence hits for the concerned Rfam families (or ARB databases of SSU or LSU sequences from SILVA for rRNAs) 59 * Downloads Rfamseq ncRNA sequence hits for the concerned Rfam families (or ARB databases of SSU or LSU sequences from SILVA for rRNAs)
59 -* Realigns Rfamseq hits and sequences from the 3D structures together to obtain a multiple sequence alignment for each Rfam family (using `cmalign --cyk`, except for ribosomal LSU and SSU, where SINA is used) 60 +* Realigns Rfamseq hits and sequences from the 3D structures together to obtain a multiple sequence alignment for each Rfam family (using `cmalign`, but SINA can be used for ribosomal LSU and SSU)
60 * Computes nucleotide frequencies at every position for each alignment 61 * Computes nucleotide frequencies at every position for each alignment
61 * Map each nucleotide of a 3D chain to its position in the corresponding family sequence alignment 62 * Map each nucleotide of a 3D chain to its position in the corresponding family sequence alignment
62 63
...@@ -65,6 +66,15 @@ To compute 3D annotations: ...@@ -65,6 +66,15 @@ To compute 3D annotations:
65 66
66 Finally, export this data from the SQLite database into flat CSV files. 67 Finally, export this data from the SQLite database into flat CSV files.
67 68
69 +Statistical analysis of the structures:
70 +* Computes statistics about the amount of data from various resolutions and experimental methods (by RNA family)
71 +* Computes basic statistics about the frequency of (modified) nucleotides by chain and by family,
72 +* Computes basic statistics about the frequencies of non-canonical interactions,
73 +* Computes density estimations (using Gaussian mixtures) for various geometrical parameters like distances and torsion angles for different representations : all-atom, the Pyle/VFold model, and the HiRE-RNA model,
74 +* Computes pairwise residue distance matrices for each chain, and average + std-dev by RNA family
75 +* Computes sequence identity matrices for each RNA family (based on the alignments)
76 +* Saves covariance models (Infernal .cm files) for each RNA family
77 +
68 ## Data provided 78 ## Data provided
69 79
70 We provide couple of resources to exploit this dataset. You can download them on [EvryRNA](https://evryrna.ibisc.univ-evry.fr/evryrna/rnanet/rnanet_home). 80 We provide couple of resources to exploit this dataset. You can download them on [EvryRNA](https://evryrna.ibisc.univ-evry.fr/evryrna/rnanet/rnanet_home).
......
This diff is collapsed. Click to expand it.
1 -
2 # Warnings and errors in RNANet 1 # Warnings and errors in RNANet
3 2
4 Use Ctrl + F on this page to look for your error message in the list. 3 Use Ctrl + F on this page to look for your error message in the list.
...@@ -27,7 +26,7 @@ DSSR complains because the CIF structure does not seem to contain nucleotides. T ...@@ -27,7 +26,7 @@ DSSR complains because the CIF structure does not seem to contain nucleotides. T
27 26
28 * **Error downloading and/or extracting Rfam.cm !** : We cannot retrieve the Rfam covariance models file. RNANet tries to find it at ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.cm.gz so, check that your network is not blocking the FTP protocol (port 21 is open on your network), and check that the adress has not changed. If so, contact us so that we update RNANet with the correct address. 27 * **Error downloading and/or extracting Rfam.cm !** : We cannot retrieve the Rfam covariance models file. RNANet tries to find it at ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.cm.gz so, check that your network is not blocking the FTP protocol (port 21 is open on your network), and check that the adress has not changed. If so, contact us so that we update RNANet with the correct address.
29 28
30 -* **Something's wrong with the SQL database. Check mysql-rfam-public.ebi.ac.uk status and try again later. Not printing statistics.** : We cannot retrieve family statistics from Rfam public server. Check if you can connect to it by hand : `mysql -u rfamro -P 4497 -D Rfam -h mysql-rfam-public.ebi.ac.uk`. if not, check that the port 497 is opened on your network. 29 +* **Something's wrong with the SQL database. Check mysql-rfam-public.ebi.ac.uk status and try again later. Not printing statistics.** : We cannot retrieve family statistics from Rfam public server. Check if you can connect to it by hand : `mysql -u rfamro -P 4497 -D Rfam -h mysql-rfam-public.ebi.ac.uk`. if not, check that the port 4497 is opened on your network.
31 30
32 * **Error downloading RFXXXXX.fa.gz: {custom-error}** : We cannot reach the Rfam FTP server to download homologous sequences. We look in ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/ so, check if you can access it from your network (check that port 21 is opened on your network). Check if the address has changed and notify us. 31 * **Error downloading RFXXXXX.fa.gz: {custom-error}** : We cannot reach the Rfam FTP server to download homologous sequences. We look in ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/ so, check if you can access it from your network (check that port 21 is opened on your network). Check if the address has changed and notify us.
33 32
......
...@@ -7,6 +7,15 @@ In `cmalign` alignments, - means a nucleotide is missing compared to the covaria ...@@ -7,6 +7,15 @@ In `cmalign` alignments, - means a nucleotide is missing compared to the covaria
7 7
8 In the final filtered alignment that we provide for download, the same rule applies, but on top of that, some '.' are replaced by '-' when a gap in the 3D structure (a missing, unresolved nucleotide) is mapped to an insertion gap. 8 In the final filtered alignment that we provide for download, the same rule applies, but on top of that, some '.' are replaced by '-' when a gap in the 3D structure (a missing, unresolved nucleotide) is mapped to an insertion gap.
9 9
10 +* **What are the cmalign options for ?**
11 +
12 +From Infernal's user guide, we can quote that Infernal uses an HMM banding technique to accelerate alignment by default. It also takes care of 3' or 5' truncated sequences to be aligned correctly (and we have some).
13 +First, one can choose an algorithm, between `--optacc` (maximizing posterior probabilities, the default) and `--cyk` (maximizing likelihood).
14 +
15 +Then, the use of bands allows faster and more memory efficient computation, at the price of the guarantee of determining the optimal alignment. Bands can be disabled using the `--nonbanded` option. A best idea would be to control the threshold of probability mass to be considered negligible during HMM band calculation with the `--tau` parameter. Higher values of Tau yield greater speedups and lower memory usage, but a greater chance to miss the optimal alignment. In practice, the algorithm explores several Tau values (increasing it by a factor 2.0 from the original `--tau` value) until the DP matrix size falls below the threshold given by `--mxsize` (default 1028 Mb) or the value of `--maxtau` is reached (in this case, the program fails). One can disable this exploration with option `--fixedtau`. The default value of `--tau` is 1e-7, the default `--maxtau` is 0.05. Basically, you may decide on a value of `--mxsize` by dividing your available RAM by the number of cores used with cmalign. If necessary, you may use less cores than you have, using option `--cpu`.
16 +
17 +Finally, if using `--cyk --nonbanded --notrunc --noprob`, one can use the `--small` option to align using the divide-and-conquer CYK algorithm from Eddy 2002, requiring a very few memory but a lot of time. The major drawback of this is that it requires `--notrunc` and `--noprob`, so we give up on the correct alignment of truncated sequences, and the computation of posterior probabilities.
18 +
10 * **Why are there some gap-only columns in the alignment ?** 19 * **Why are there some gap-only columns in the alignment ?**
11 20
12 These columns are not completely gap-only, they contain at least one dash-gap '-'. This means an actual, physical nucleotide which should exist in the 3D structure should be located there. The previous and following nucleotides are **not** contiguous in space in 3D. 21 These columns are not completely gap-only, they contain at least one dash-gap '-'. This means an actual, physical nucleotide which should exist in the 3D structure should be located there. The previous and following nucleotides are **not** contiguous in space in 3D.
...@@ -31,5 +40,5 @@ We first remove the nucleotides whose number is outside the family mapping (if a ...@@ -31,5 +40,5 @@ We first remove the nucleotides whose number is outside the family mapping (if a
31 40
32 * **What are the versions of the dependencies you use ?** 41 * **What are the versions of the dependencies you use ?**
33 42
34 -`cmalign` is v1.1.4, `sina` is v1.6.0, `x3dna-dssr` is v1.9.9, Biopython is v1.78. 43 +`cmalign` is v1.1.4, `sina` is v1.6.0, `x3dna-dssr` is v2.3.2-2021jun29, Biopython is v1.78.
35 44
...\ No newline at end of file ...\ No newline at end of file
......
This diff is collapsed. Click to expand it.
1 # Known Issues 1 # Known Issues
2 2
3 ## Annotation and numbering issues 3 ## Annotation and numbering issues
4 -* Some GDPs that are listed as HETATMs in the mmCIF files are not detected correctly to be real nucleotides. (e.g. 1e8o-E) 4 +* [SOLVED] Some GDPs that are listed as HETATMs in the mmCIF files are not detected correctly to be real nucleotides. (e.g. 1e8o-E)
5 * Some chains are truncated in different pieces with different chain names. Reason unknown (e.g. 6ztp-AX) 5 * Some chains are truncated in different pieces with different chain names. Reason unknown (e.g. 6ztp-AX)
6 -* Some chains are not correctly renamed A in the produced separate files (e.g. 1d4r-B) 6 +* [SOLVED] Some chains are not correctly renamed A in the produced separate files (e.g. 1d4r-B)
7 7
8 ## Alignment issues 8 ## Alignment issues
9 -* [SOLVED] Filtered alignments are shorter than the number of alignment columns saved to the SQL table `align_column` 9 +* [SOLVED] Chain names appear in triple in the FASTA header (e.g. 1d4r[1]-B 1d4r[1]-B 1d4r[1]-B)
10 -* Chain names appear in triple in the FASTA header (e.g. 1d4r[1]-B 1d4r[1]-B 1d4r[1]-B)
11 -
12 -## Technical running issues
13 -* [SOLVED] Files produced by Docker containers are owned by root and require root permissions to be read
14 -* [SOLVED] SQLite WAL files are not deleted properly
15 10
16 # Known feature requests 11 # Known feature requests
17 -* [DONE] Get filtered versions of the sequence alignments containing the 3D chains, publicly available for download 12 +* Automated annotation of detected Recurrent Interaction Networks (RINs), see http://carnaval.lri.fr/ .
18 -* [DONE] Get a consensus residue for each alignement column 13 +* Possibly, automated detection of HLs and ILs from the 3D Motif Atlas (BGSU). Maybe. Their own website already does the job.
19 -* [DONE] Get an option to limit the number of cores 14 +* Weight sequences in alignment to give more importance to rarer sequences
20 -* [DONE] Move to SILVA LSU release 138.1 15 +* Give both gap_percent and insertion_gap_percent
21 -* [UPCOMING] Automated annotation of detected Recurrent Interaction Networks (RINs), see http://carnaval.lri.fr/ .
22 -* [UPCOMING] Possibly, automated detection of HLs and ILs from the 3D Motif Atlas (BGSU). Maybe. Their own website already does the job.
23 -* [UPCOMING] Weight sequences in alignment to give more importance to rarer sequences
24 -* [UPCOMING] Give both gap_percent and insertion_gap_percent
25 * A field estimating the quality of the sequence alignment in table family. 16 * A field estimating the quality of the sequence alignment in table family.
26 * Possibly, more metrics about the alignments coming from Infernal. 17 * Possibly, more metrics about the alignments coming from Infernal.
27 * Run cmscan ourselves from the NDB instead of using Rfam-PDB mappings ? (Iff this actually makes a real difference, untested yet) 18 * Run cmscan ourselves from the NDB instead of using Rfam-PDB mappings ? (Iff this actually makes a real difference, untested yet)
28 * Use and save Infernal alignment bounds and truncation information 19 * Use and save Infernal alignment bounds and truncation information
20 +* Save if a chain is a representative or not in BGSU list, so that they can be filtered easily
21 +* Annotate unstructured regions (on a nucleotide basis)
22 +
23 +## Technical to-do list
24 +* `cmalign --merge` is now deprecated, we use `esl-alimerge` instead. But, esl is a single-core process. We should run the merges of alignements of different families in parallel to save some time [TODO].
......
This diff could not be displayed because it is too large.
1 -6ydp_1_AA_1176-2737
2 -6ydw_1_AA_1176-2737
3 -2z9q_1_A_1-72
4 -1ml5_1_b_5-121
5 -1ml5_1_a_1-2914
6 -3ep2_1_Y_1-72
7 -3eq3_1_Y_1-72
8 -4v48_1_A6_1-73
9 -1ml5_1_A_2-1520
10 -1qzb_1_B_1-73
11 -1qza_1_B_1-73
12 -1ls2_1_B_1-73
13 -1gsg_1_T_1-72
14 -7d1a_1_A_805-902
15 -7d0g_1_A_805-913
16 -7d0f_1_A_817-913
17 -3jcr_1_H_1-115
18 -1vy7_1_AY_1-73
19 -1vy7_1_CY_1-73
20 -4w2h_1_CY_1-73
21 -5zzm_1_M_3-118
22 -2rdo_1_A_3-118
23 -4v48_1_A9_3-118
24 -4v47_1_A9_3-118
25 -2ob7_1_A_10-319
26 -1x1l_1_A_1-130
27 -1zc8_1_Z_1-91
28 -2ob7_1_D_1-130
29 -4v42_1_BA_1-2914
30 -4v42_1_BB_5-121
31 -1r2x_1_C_1-58
32 -1r2w_1_C_1-58
33 -1eg0_1_L_1-56
34 -3dg2_1_A_1-1542
35 -3dg0_1_A_1-1542
36 -4v48_1_BA_1-1543
37 -4v47_1_BA_1-1542
38 -3dg4_1_A_1-1542
39 -3dg5_1_A_1-1542
40 -5zzm_1_N_1-2903
41 -2rdo_1_B_1-2904
42 -3dg2_1_B_1-2904
43 -3dg0_1_B_1-2904
44 -4v48_1_A0_1-2904
45 -4v47_1_A0_1-2904
46 -3dg4_1_B_1-2904
47 -3dg5_1_B_1-2904
48 -1eg0_1_O_1-73
49 -1zc8_1_A_1-59
50 -1jgq_1_A_2-1520
51 -4v42_1_AA_2-1520
52 -1jgo_1_A_2-1520
53 -1jgp_1_A_2-1520
54 -1mvr_1_D_1-59
55 -4c9d_1_D_29-1
56 -4c9d_1_C_29-1
57 -4adx_1_9_1-121
58 -1zn1_1_B_1-59
59 -1emi_1_B_1-108
60 -3iy9_1_A_498-1027
61 -3ep2_1_B_1-50
62 -3eq3_1_B_1-50
63 -3eq4_1_B_1-50
64 -3pgw_1_R_1-164
65 -3pgw_1_N_1-164
66 -3cw1_1_x_1-138
67 -3cw1_1_w_1-138
68 -3cw1_1_V_1-138
69 -3cw1_1_v_1-138
70 -2iy3_1_B_9-105
71 -3jcr_1_N_1-106
72 -2vaz_1_A_64-177
73 -2ftc_1_R_81-1466
74 -3jcr_1_M_1-141
75 -4v5z_1_B0_1-2902
76 -5g2x_1_A_595-692
77 -3iy8_1_A_1-540
78 -4v5z_1_BY_2-113
79 -4v5z_1_BZ_1-70
80 -4v5z_1_B1_2-123
81 -1mvr_1_B_1-96
82 -4adx_1_0_1-2923
83 -3eq4_1_Y_1-69
84 -7a5p_1_2_259-449
85 -6uz7_1_8_2140-2825
86 -4v5z_1_AA_1-1563
87 6cfj_1_1X 1 6cfj_1_1X
88 6cfj_1_2X 2 6cfj_1_2X
89 5hcq_1_1X 3 5hcq_1_1X
...@@ -196,7 +110,6 @@ ...@@ -196,7 +110,6 @@
196 5lzb_1_V 110 5lzb_1_V
197 6h58_1_W 111 6h58_1_W
198 6h58_1_WW 112 6h58_1_WW
199 -1eg0_1_O
200 5j8b_1_X 113 5j8b_1_X
201 4v7j_1_AV 114 4v7j_1_AV
202 4v7j_1_BV 115 4v7j_1_BV
...@@ -224,10 +137,6 @@ ...@@ -224,10 +137,6 @@
224 7k00_1_B 137 7k00_1_B
225 6ys3_1_A 138 6ys3_1_A
226 6qdw_1_A 139 6qdw_1_A
227 -5zzm_1_M
228 -2rdo_1_A
229 -4v48_1_A9
230 -4v47_1_A9
231 6hcj_1_Q3 140 6hcj_1_Q3
232 6hcq_1_Q3 141 6hcq_1_Q3
233 6o8w_1_U 142 6o8w_1_U
...@@ -295,7 +204,12 @@ ...@@ -295,7 +204,12 @@
295 6ucq_1_2Y 204 6ucq_1_2Y
296 4w2e_1_X 205 4w2e_1_X
297 6ucq_1_2X 206 6ucq_1_2X
207 +7n1p_1_DT
208 +7n2u_1_DT
298 6yss_1_W 209 6yss_1_W
210 +7n30_1_DT
211 +7n31_1_DT
212 +7n2c_1_DT
299 5afi_1_Y 213 5afi_1_Y
300 5uq8_1_Z 214 5uq8_1_Z
301 5wdt_1_Y 215 5wdt_1_Y
...@@ -321,18 +235,20 @@ ...@@ -321,18 +235,20 @@
321 4v4i_1_Y 235 4v4i_1_Y
322 5uq8_1_X 236 5uq8_1_X
323 5uq7_1_X 237 5uq7_1_X
324 -1jgq_1_A
325 -4v42_1_AA
326 -1jgo_1_A
327 -1jgp_1_A
328 4v4j_1_W 238 4v4j_1_W
329 4v4i_1_W 239 4v4i_1_W
330 -4v42_1_BA
331 4wt8_1_CS 240 4wt8_1_CS
332 4wt8_1_DS 241 4wt8_1_DS
333 4v4j_1_X 242 4v4j_1_X
334 4v4i_1_X 243 4v4i_1_X
335 -4v42_1_BB 244 +6lkq_1_S
245 +5h5u_1_H
246 +7d6z_1_F
247 +5lze_1_Y
248 +5lze_1_V
249 +5lze_1_X
250 +3jcj_1_G
251 +6o7k_1_G
336 6d30_1_C 252 6d30_1_C
337 6j7z_1_C 253 6j7z_1_C
338 3er9_1_D 254 3er9_1_D
...@@ -367,20 +283,11 @@ ...@@ -367,20 +283,11 @@
367 4oq9_1_1 283 4oq9_1_1
368 6rt5_1_A 284 6rt5_1_A
369 6rt5_1_E 285 6rt5_1_E
370 -4qu6_1_B
371 6lkq_1_T 286 6lkq_1_T
372 6ys3_1_B 287 6ys3_1_B
373 6qdw_1_B 288 6qdw_1_B
374 3jbv_1_B 289 3jbv_1_B
375 3jbu_1_B 290 3jbu_1_B
376 -5zzm_1_N
377 -2rdo_1_B
378 -3dg2_1_B
379 -3dg0_1_B
380 -4v48_1_A0
381 -4v47_1_A0
382 -3dg4_1_B
383 -3dg5_1_B
384 6do8_1_B 291 6do8_1_B
385 6dpi_1_B 292 6dpi_1_B
386 6dp9_1_B 293 6dp9_1_B
...@@ -437,25 +344,17 @@ ...@@ -437,25 +344,17 @@
437 6doc_1_B 344 6doc_1_B
438 6doe_1_B 345 6doe_1_B
439 6n6g_1_D 346 6n6g_1_D
440 -6lkq_1_S
441 -5h5u_1_H
442 -7d6z_1_F
443 -5lze_1_Y
444 -5lze_1_V
445 -5lze_1_X
446 -3jcj_1_G
447 -6o7k_1_G
448 -3dg2_1_A
449 -3dg0_1_A
450 -4v48_1_BA
451 -4v47_1_BA
452 -3dg4_1_A
453 -3dg5_1_A
454 4b3r_1_W 347 4b3r_1_W
455 4b3t_1_W 348 4b3t_1_W
456 4b3s_1_W 349 4b3s_1_W
350 +7b5k_1_X
457 5o2r_1_X 351 5o2r_1_X
458 5kcs_1_1X 352 5kcs_1_1X
353 +7n1p_1_PT
354 +7n2u_1_PT
355 +7n30_1_PT
356 +7n31_1_PT
357 +7n2c_1_PT
459 6zvk_1_E2 358 6zvk_1_E2
460 6zvk_1_H2 359 6zvk_1_H2
461 7a01_1_E2 360 7a01_1_E2
...@@ -549,15 +448,9 @@ ...@@ -549,15 +448,9 @@
549 6xzb_1_G2 448 6xzb_1_G2
550 6gz5_1_BW 449 6gz5_1_BW
551 6gz3_1_BW 450 6gz3_1_BW
552 -1qzb_1_B
553 -1qza_1_B
554 -1ls2_1_B
555 -3ep2_1_Y
556 -3eq3_1_Y
557 -4v48_1_A6
558 -2z9q_1_A
559 4hot_1_X 451 4hot_1_X
560 6d2z_1_C 452 6d2z_1_C
453 +7eh0_1_I
561 4tu0_1_F 454 4tu0_1_F
562 4tu0_1_G 455 4tu0_1_G
563 6r9o_1_B 456 6r9o_1_B
...@@ -572,37 +465,38 @@ ...@@ -572,37 +465,38 @@
572 6sv4_1_MB 465 6sv4_1_MB
573 7nrd_1_SM 466 7nrd_1_SM
574 6i7o_1_MB 467 6i7o_1_MB
575 -1gsg_1_T
576 6zvi_1_D 468 6zvi_1_D
577 6sv4_1_NB 469 6sv4_1_NB
578 6sv4_1_NC 470 6sv4_1_NC
579 6i7o_1_NB 471 6i7o_1_NB
580 -1ml5_1_A 472 +7nsq_1_V
473 +7nsp_1_V
581 6swa_1_Q 474 6swa_1_Q
582 6swa_1_R 475 6swa_1_R
583 -3j6x_1_IR
584 -3j6y_1_IR
585 6ole_1_T 476 6ole_1_T
586 6om0_1_T 477 6om0_1_T
587 6oli_1_T 478 6oli_1_T
588 6om7_1_T 479 6om7_1_T
589 6olf_1_T 480 6olf_1_T
590 6w6l_1_T 481 6w6l_1_T
482 +6tnu_1_M
483 +5mc6_1_M
484 +7nrc_1_SM
591 6tb3_1_N 485 6tb3_1_N
592 7b7d_1_SM 486 7b7d_1_SM
593 7b7d_1_SN 487 7b7d_1_SN
594 6tnu_1_N 488 6tnu_1_N
489 +7nrc_1_SN
595 7nrd_1_SN 490 7nrd_1_SN
596 6zot_1_C 491 6zot_1_C
492 +4qu6_1_B
597 2uxb_1_X 493 2uxb_1_X
598 2x1f_1_B 494 2x1f_1_B
599 2x1a_1_B 495 2x1a_1_B
600 -3ep2_1_D
601 -3eq3_1_D
602 -1eg0_1_M
603 -3eq4_1_D
604 5o1y_1_B 496 5o1y_1_B
605 -3jcr_1_H 497 +4kzy_1_I
498 +4kzz_1_I
499 +4kzx_1_I
606 6dzi_1_H 500 6dzi_1_H
607 5zeu_1_A 501 5zeu_1_A
608 6evj_1_N 502 6evj_1_N
...@@ -705,7 +599,6 @@ ...@@ -705,7 +599,6 @@
705 6ip6_1_ZZ 599 6ip6_1_ZZ
706 6uu3_1_333 600 6uu3_1_333
707 6uu1_1_333 601 6uu1_1_333
708 -1pn8_1_D
709 3er8_1_H 602 3er8_1_H
710 3er8_1_G 603 3er8_1_G
711 3er8_1_F 604 3er8_1_F
...@@ -744,9 +637,8 @@ ...@@ -744,9 +637,8 @@
744 4wtl_1_T 637 4wtl_1_T
745 4wtl_1_P 638 4wtl_1_P
746 1xnq_1_W 639 1xnq_1_W
747 -1x18_1_C 640 +7n2v_1_DT
748 -1x18_1_B 641 +4peh_1_Z
749 -1x18_1_D
750 1vq6_1_4 642 1vq6_1_4
751 4am3_1_D 643 4am3_1_D
752 4am3_1_H 644 4am3_1_H
...@@ -764,6 +656,38 @@ ...@@ -764,6 +656,38 @@
764 3rtj_1_D 656 3rtj_1_D
765 6ty9_1_M 657 6ty9_1_M
766 6tz1_1_N 658 6tz1_1_N
659 +6q1h_1_D
660 +6q1h_1_H
661 +6p7p_1_F
662 +6p7p_1_E
663 +6p7p_1_D
664 +6vm6_1_J
665 +6vm6_1_G
666 +6wan_1_K
667 +6wan_1_H
668 +6wan_1_G
669 +6wan_1_L
670 +6wan_1_I
671 +6ywo_1_F
672 +6wan_1_J
673 +4oau_1_A
674 +6ywo_1_E
675 +6ywo_1_K
676 +6vm6_1_I
677 +6vm6_1_H
678 +6ywo_1_I
679 +2a1r_1_C
680 +6m6v_1_F
681 +6m6v_1_E
682 +2a1r_1_D
683 +3gpq_1_E
684 +3gpq_1_F
685 +6o79_1_C
686 +6vm6_1_K
687 +6m6v_1_G
688 +6hyu_1_D
689 +1laj_1_R
690 +6ybv_1_K
767 6sce_1_B 691 6sce_1_B
768 6xl1_1_C 692 6xl1_1_C
769 6scf_1_I 693 6scf_1_I
...@@ -809,31 +733,20 @@ ...@@ -809,31 +733,20 @@
809 1y1y_1_P 733 1y1y_1_P
810 5zuu_1_I 734 5zuu_1_I
811 5zuu_1_G 735 5zuu_1_G
736 +7am2_1_R1
812 4peh_1_W 737 4peh_1_W
813 4peh_1_V 738 4peh_1_V
814 4peh_1_X 739 4peh_1_X
815 4peh_1_Y 740 4peh_1_Y
816 -4peh_1_Z 741 +7d8c_1_C
817 6mkn_1_W 742 6mkn_1_W
818 7kl3_1_B 743 7kl3_1_B
819 4cxg_1_C 744 4cxg_1_C
820 4cxh_1_C 745 4cxh_1_C
821 -1x1l_1_A
822 -1zc8_1_Z
823 -2ob7_1_D
824 -2ob7_1_A
825 4eya_1_E 746 4eya_1_E
826 4eya_1_F 747 4eya_1_F
827 4eya_1_Q 748 4eya_1_Q
828 4eya_1_R 749 4eya_1_R
829 -1qzc_1_B
830 -1t1o_1_B
831 -1mvr_1_C
832 -1t1m_1_B
833 -1t1o_1_C
834 -1t1m_1_A
835 -1t1o_1_A
836 -2r1g_1_B
837 4ht9_1_E 750 4ht9_1_E
838 6z1p_1_AB 751 6z1p_1_AB
839 6z1p_1_AA 752 6z1p_1_AA
...@@ -844,19 +757,14 @@ ...@@ -844,19 +757,14 @@
844 5uk4_1_W 757 5uk4_1_W
845 5uk4_1_U 758 5uk4_1_U
846 5f6c_1_E 759 5f6c_1_E
760 +7nwh_1_HH
847 4rcj_1_B 761 4rcj_1_B
848 1xnr_1_W 762 1xnr_1_W
849 -2agn_1_A
850 -2agn_1_C
851 -2agn_1_B
852 6e0o_1_C 763 6e0o_1_C
853 6o75_1_D 764 6o75_1_D
854 6o75_1_C 765 6o75_1_C
855 6e0o_1_B 766 6e0o_1_B
856 3j06_1_R 767 3j06_1_R
857 -1r2x_1_C
858 -1r2w_1_C
859 -1eg0_1_L
860 4eya_1_G 768 4eya_1_G
861 4eya_1_H 769 4eya_1_H
862 4eya_1_S 770 4eya_1_S
...@@ -866,8 +774,7 @@ ...@@ -866,8 +774,7 @@
866 1ibm_1_Z 774 1ibm_1_Z
867 4dr5_1_V 775 4dr5_1_V
868 4d61_1_J 776 4d61_1_J
869 -1trj_1_B 777 +7nwg_1_Q3
870 -1trj_1_C
871 5tbw_1_SR 778 5tbw_1_SR
872 6hhq_1_SR 779 6hhq_1_SR
873 6zvi_1_H 780 6zvi_1_H
...@@ -909,14 +816,8 @@ ...@@ -909,14 +816,8 @@
909 6ppn_1_I 816 6ppn_1_I
910 5flx_1_Z 817 5flx_1_Z
911 6eri_1_AX 818 6eri_1_AX
819 +7k5l_1_R
912 7d80_1_Y 820 7d80_1_Y
913 -1zc8_1_A
914 -1zc8_1_C
915 -1zc8_1_B
916 -1zc8_1_G
917 -1zc8_1_I
918 -1zc8_1_H
919 -1zc8_1_J
920 7du2_1_R 821 7du2_1_R
921 4v8z_1_CX 822 4v8z_1_CX
922 6kqe_1_I 823 6kqe_1_I
...@@ -930,7 +831,6 @@ ...@@ -930,7 +831,6 @@
930 4xlr_1_Q 831 4xlr_1_Q
931 6sty_1_C 832 6sty_1_C
932 6sty_1_F 833 6sty_1_F
933 -2xs5_1_D
934 3ok4_1_N 834 3ok4_1_N
935 3ok4_1_L 835 3ok4_1_L
936 3ok4_1_Z 836 3ok4_1_Z
...@@ -973,19 +873,17 @@ ...@@ -973,19 +873,17 @@
973 3ol7_1_H 873 3ol7_1_H
974 3ol8_1_L 874 3ol8_1_L
975 3ol8_1_P 875 3ol8_1_P
976 -1qzc_1_C
977 -1qzc_1_A
978 6yrq_1_E 876 6yrq_1_E
979 6yrq_1_H 877 6yrq_1_H
980 6yrq_1_G 878 6yrq_1_G
981 6yrq_1_F 879 6yrq_1_F
982 6yrb_1_C 880 6yrb_1_C
983 6yrb_1_D 881 6yrb_1_D
984 -1mvr_1_D
985 6gz5_1_BV 882 6gz5_1_BV
986 6gz4_1_BV 883 6gz4_1_BV
987 6gz3_1_BV 884 6gz3_1_BV
988 6fti_1_Q 885 6fti_1_Q
886 +7njc_1_B
989 4v7e_1_AB 887 4v7e_1_AB
990 4v7e_1_AE 888 4v7e_1_AE
991 4v7e_1_AD 889 4v7e_1_AD
...@@ -997,9 +895,7 @@ ...@@ -997,9 +895,7 @@
997 3t1h_1_W 895 3t1h_1_W
998 3t1y_1_W 896 3t1y_1_W
999 1xmo_1_W 897 1xmo_1_W
1000 -4adx_1_9
1001 6kr6_1_B 898 6kr6_1_B
1002 -1zn1_1_B
1003 6z8k_1_X 899 6z8k_1_X
1004 4csf_1_U 900 4csf_1_U
1005 4csf_1_Q 901 4csf_1_Q
...@@ -1025,7 +921,6 @@ ...@@ -1025,7 +921,6 @@
1025 2xpj_1_D 921 2xpj_1_D
1026 2vrt_1_H 922 2vrt_1_H
1027 2vrt_1_G 923 2vrt_1_G
1028 -1emi_1_B
1029 6r9m_1_B 924 6r9m_1_B
1030 4nia_1_C 925 4nia_1_C
1031 4nia_1_A 926 4nia_1_A
...@@ -1051,45 +946,23 @@ ...@@ -1051,45 +946,23 @@
1051 1uvn_1_F 946 1uvn_1_F
1052 1uvn_1_B 947 1uvn_1_B
1053 1uvn_1_D 948 1uvn_1_D
1054 -3iy9_1_A
1055 4wtk_1_T 949 4wtk_1_T
1056 4wtk_1_P 950 4wtk_1_P
1057 1vqn_1_4 951 1vqn_1_4
1058 4oav_1_C 952 4oav_1_C
1059 4oav_1_A 953 4oav_1_A
1060 -3ep2_1_E
1061 -3eq3_1_E
1062 -3eq4_1_E
1063 -3ep2_1_A
1064 -3eq3_1_A
1065 -3eq4_1_A
1066 -3ep2_1_C
1067 -3eq3_1_C
1068 -3eq4_1_C
1069 -3ep2_1_B
1070 -3eq3_1_B
1071 -3eq4_1_B
1072 4i67_1_B 954 4i67_1_B
1073 -3pgw_1_R
1074 -3pgw_1_N
1075 -3cw1_1_X
1076 -3cw1_1_W
1077 -3cw1_1_V
1078 -7b0y_1_A
1079 6k32_1_T 955 6k32_1_T
1080 6k32_1_P 956 6k32_1_P
1081 5mmj_1_A 957 5mmj_1_A
1082 5x8r_1_A 958 5x8r_1_A
1083 -2agn_1_E
1084 -2agn_1_D
1085 -4v5z_1_BD
1086 6yw5_1_AA 959 6yw5_1_AA
1087 6ywe_1_AA 960 6ywe_1_AA
1088 6ywy_1_AA 961 6ywy_1_AA
1089 6ywx_1_AA 962 6ywx_1_AA
1090 3nvk_1_G 963 3nvk_1_G
1091 3nvk_1_S 964 3nvk_1_S
1092 -2iy3_1_B 965 +1cwp_1_D
1093 1cwp_1_F 966 1cwp_1_F
1094 5z4j_1_B 967 5z4j_1_B
1095 5gmf_1_E 968 5gmf_1_E
...@@ -1129,7 +1002,6 @@ ...@@ -1129,7 +1002,6 @@
1129 4kzz_1_J 1002 4kzz_1_J
1130 7a09_1_F 1003 7a09_1_F
1131 5t2c_1_AN 1004 5t2c_1_AN
1132 -4v5z_1_BF
1133 3j6b_1_E 1005 3j6b_1_E
1134 4v4f_1_B6 1006 4v4f_1_B6
1135 4v4f_1_A5 1007 4v4f_1_A5
...@@ -1153,21 +1025,21 @@ ...@@ -1153,21 +1025,21 @@
1153 4v4f_1_B4 1025 4v4f_1_B4
1154 4v4f_1_A6 1026 4v4f_1_A6
1155 4v4f_1_B2 1027 4v4f_1_B2
1028 +7m4y_1_V
1029 +7m4x_1_V
1030 +6v3a_1_V
1031 +6v39_1_V
1156 5it9_1_I 1032 5it9_1_I
1157 7jqc_1_I 1033 7jqc_1_I
1158 5zsb_1_C 1034 5zsb_1_C
1159 5zsb_1_D 1035 5zsb_1_D
1160 5zsn_1_D 1036 5zsn_1_D
1161 5zsn_1_E 1037 5zsn_1_E
1162 -1cwp_1_D
1163 -3jcr_1_N
1164 6gfw_1_R 1038 6gfw_1_R
1165 -2vaz_1_A
1166 6zm6_1_X 1039 6zm6_1_X
1167 6zm5_1_X 1040 6zm5_1_X
1168 6zm6_1_W 1041 6zm6_1_W
1169 6zm5_1_W 1042 6zm5_1_W
1170 -4v5z_1_BP
1171 6n6e_1_D 1043 6n6e_1_D
1172 4g7o_1_I 1044 4g7o_1_I
1173 4g7o_1_S 1045 4g7o_1_S
...@@ -1177,11 +1049,9 @@ ...@@ -1177,11 +1049,9 @@
1177 5uh6_1_I 1049 5uh6_1_I
1178 6l74_1_I 1050 6l74_1_I
1179 5uh9_1_I 1051 5uh9_1_I
1180 -2ftc_1_R
1181 7a5j_1_X 1052 7a5j_1_X
1182 6sag_1_R 1053 6sag_1_R
1183 4udv_1_R 1054 4udv_1_R
1184 -2r1g_1_E
1185 5zsc_1_D 1055 5zsc_1_D
1186 5zsc_1_C 1056 5zsc_1_C
1187 6woy_1_I 1057 6woy_1_I
...@@ -1209,7 +1079,6 @@ ...@@ -1209,7 +1079,6 @@
1209 3m85_1_X 1079 3m85_1_X
1210 3m85_1_Z 1080 3m85_1_Z
1211 3m85_1_Y 1081 3m85_1_Y
1212 -1e8s_1_C
1213 5wnp_1_B 1082 5wnp_1_B
1214 5wnv_1_B 1083 5wnv_1_B
1215 5yts_1_B 1084 5yts_1_B
...@@ -1232,8 +1101,11 @@ ...@@ -1232,8 +1101,11 @@
1232 6ij2_1_E 1101 6ij2_1_E
1233 3u2e_1_D 1102 3u2e_1_D
1234 3u2e_1_C 1103 3u2e_1_C
1104 +7eh1_1_I
1235 5uef_1_C 1105 5uef_1_C
1236 5uef_1_D 1106 5uef_1_D
1107 +7eh2_1_R
1108 +7eh2_1_I
1237 4x4u_1_H 1109 4x4u_1_H
1238 4afy_1_D 1110 4afy_1_D
1239 6oy5_1_I 1111 6oy5_1_I
...@@ -1249,8 +1121,6 @@ ...@@ -1249,8 +1121,6 @@
1249 4k4s_1_H 1121 4k4s_1_H
1250 4k4t_1_H 1122 4k4t_1_H
1251 4k4t_1_D 1123 4k4t_1_D
1252 -1zn1_1_C
1253 -1zn0_1_C
1254 1xpu_1_G 1124 1xpu_1_G
1255 1xpu_1_L 1125 1xpu_1_L
1256 1xpr_1_L 1126 1xpr_1_L
...@@ -1275,6 +1145,7 @@ ...@@ -1275,6 +1145,7 @@
1275 6gc5_1_H 1145 6gc5_1_H
1276 6gc5_1_G 1146 6gc5_1_G
1277 1n1h_1_B 1147 1n1h_1_B
1148 +7n2v_1_PT
1278 4ohz_1_B 1149 4ohz_1_B
1279 6t83_1_6B 1150 6t83_1_6B
1280 4gv6_1_C 1151 4gv6_1_C
...@@ -1287,14 +1158,11 @@ ...@@ -1287,14 +1158,11 @@
1287 6qx3_1_G 1158 6qx3_1_G
1288 2xnr_1_C 1159 2xnr_1_C
1289 4gkj_1_W 1160 4gkj_1_W
1290 -4v5z_1_BC
1291 5y88_1_X 1161 5y88_1_X
1292 -4v5z_1_BB
1293 3j0o_1_H 1162 3j0o_1_H
1294 3j0l_1_H 1163 3j0l_1_H
1295 3j0p_1_H 1164 3j0p_1_H
1296 3j0q_1_H 1165 3j0q_1_H
1297 -4v5z_1_BH
1298 3j0o_1_F 1166 3j0o_1_F
1299 3j0l_1_F 1167 3j0l_1_F
1300 3j0p_1_F 1168 3j0p_1_F
...@@ -1309,7 +1177,6 @@ ...@@ -1309,7 +1177,6 @@
1309 3j0l_1_A 1177 3j0l_1_A
1310 3j0q_1_A 1178 3j0q_1_A
1311 3j0p_1_A 1179 3j0p_1_A
1312 -4v5z_1_BJ
1313 6ys3_1_V 1180 6ys3_1_V
1314 6qdw_1_V 1181 6qdw_1_V
1315 5hk0_1_F 1182 5hk0_1_F
...@@ -1345,14 +1212,10 @@ ...@@ -1345,14 +1212,10 @@
1345 5mrc_1_BB 1212 5mrc_1_BB
1346 5mre_1_BB 1213 5mre_1_BB
1347 5mrf_1_BB 1214 5mrf_1_BB
1348 -4v5z_1_BN
1349 3j46_1_P 1215 3j46_1_P
1350 -3jcr_1_M
1351 4e6b_1_A 1216 4e6b_1_A
1352 4e6b_1_B 1217 4e6b_1_B
1353 6a6l_1_D 1218 6a6l_1_D
1354 -4v5z_1_BS
1355 -4v8t_1_1
1356 1uvi_1_D 1219 1uvi_1_D
1357 1uvi_1_F 1220 1uvi_1_F
1358 1uvi_1_E 1221 1uvi_1_E
...@@ -1376,10 +1239,7 @@ ...@@ -1376,10 +1239,7 @@
1376 6ip5_1_2M 1239 6ip5_1_2M
1377 6ip6_1_2M 1240 6ip6_1_2M
1378 6qcs_1_M 1241 6qcs_1_M
1379 -486d_1_G 1242 +7b5k_1_Z
1380 -2r1g_1_C
1381 -486d_1_F
1382 -4v5z_1_B0
1383 4nia_1_O 1243 4nia_1_O
1384 4nia_1_J 1244 4nia_1_J
1385 4nia_1_K 1245 4nia_1_K
...@@ -1391,13 +1251,11 @@ ...@@ -1391,13 +1251,11 @@
1391 4oq9_1_F 1251 4oq9_1_F
1392 4oq9_1_L 1252 4oq9_1_L
1393 6r9q_1_B 1253 6r9q_1_B
1254 +7m4u_1_A
1394 6v3a_1_SN1 1255 6v3a_1_SN1
1395 6v3b_1_SN1 1256 6v3b_1_SN1
1396 6v39_1_SN1 1257 6v39_1_SN1
1397 6v3e_1_SN1 1258 6v3e_1_SN1
1398 -1pn7_1_C
1399 -1mj1_1_Q
1400 -1mj1_1_R
1401 4dr6_1_V 1259 4dr6_1_V
1402 6kql_1_I 1260 6kql_1_I
1403 4eya_1_M 1261 4eya_1_M
...@@ -1437,13 +1295,20 @@ ...@@ -1437,13 +1295,20 @@
1437 6ow3_1_I 1295 6ow3_1_I
1438 6ovy_1_I 1296 6ovy_1_I
1439 6oy6_1_I 1297 6oy6_1_I
1440 -4bbl_1_Y
1441 -4bbl_1_Z
1442 4qvd_1_H 1298 4qvd_1_H
1443 5gxi_1_B 1299 5gxi_1_B
1444 -3iy8_1_A 1300 +7n06_1_G
1445 -6tnu_1_M 1301 +7n06_1_H
1446 -5mc6_1_M 1302 +7n06_1_I
1303 +7n06_1_J
1304 +7n06_1_K
1305 +7n06_1_L
1306 +7n33_1_G
1307 +7n33_1_H
1308 +7n33_1_I
1309 +7n33_1_J
1310 +7n33_1_K
1311 +7n33_1_L
1447 5mc6_1_N 1312 5mc6_1_N
1448 4eya_1_O 1313 4eya_1_O
1449 4eya_1_P 1314 4eya_1_P
...@@ -1453,33 +1318,13 @@ ...@@ -1453,33 +1318,13 @@
1453 6htq_1_W 1318 6htq_1_W
1454 6htq_1_U 1319 6htq_1_U
1455 6uu6_1_333 1320 6uu6_1_333
1456 -6v3a_1_V
1457 -6v39_1_V
1458 5a0v_1_F 1321 5a0v_1_F
1459 3avt_1_T 1322 3avt_1_T
1460 6d1v_1_C 1323 6d1v_1_C
1461 4s2x_1_B 1324 4s2x_1_B
1462 4s2y_1_B 1325 4s2y_1_B
1463 5wnu_1_B 1326 5wnu_1_B
1464 -1zc8_1_F
1465 1vtm_1_R 1327 1vtm_1_R
1466 -4v5z_1_BA
1467 -4v5z_1_BE
1468 -4v5z_1_BG
1469 -4v5z_1_BI
1470 -4v5z_1_BK
1471 -4v5z_1_BM
1472 -4v5z_1_BL
1473 -4v5z_1_BV
1474 -4v5z_1_BO
1475 -4v5z_1_BQ
1476 -4v5z_1_BR
1477 -4v5z_1_BT
1478 -4v5z_1_BU
1479 -4v5z_1_BW
1480 -4v5z_1_BY
1481 -4v5z_1_BX
1482 -4v5z_1_BZ
1483 5elt_1_F 1328 5elt_1_F
1484 5elt_1_E 1329 5elt_1_E
1485 6xlj_1_R 1330 6xlj_1_R
...@@ -1492,11 +1337,11 @@ ...@@ -1492,11 +1337,11 @@
1492 6bk8_1_I 1337 6bk8_1_I
1493 4cxg_1_B 1338 4cxg_1_B
1494 4cxh_1_B 1339 4cxh_1_B
1495 -4v5z_1_B1
1496 5z4d_1_B 1340 5z4d_1_B
1497 6o78_1_E 1341 6o78_1_E
1498 6xa1_1_BV 1342 6xa1_1_BV
1499 6ha8_1_X 1343 6ha8_1_X
1344 +2xs5_1_D
1500 1m8w_1_E 1345 1m8w_1_E
1501 1m8w_1_F 1346 1m8w_1_F
1502 5udi_1_B 1347 5udi_1_B
...@@ -1525,11 +1370,13 @@ ...@@ -1525,11 +1370,13 @@
1525 3rzo_1_R 1370 3rzo_1_R
1526 2f4v_1_Z 1371 2f4v_1_Z
1527 1qln_1_R 1372 1qln_1_R
1373 +3cw1_1_X
1374 +3cw1_1_W
1375 +7b0y_1_A
1528 6ogy_1_M 1376 6ogy_1_M
1529 6ogy_1_N 1377 6ogy_1_N
1530 6uej_1_B 1378 6uej_1_B
1531 6ywy_1_BB 1379 6ywy_1_BB
1532 -1x18_1_A
1533 5ytx_1_B 1380 5ytx_1_B
1534 4g0a_1_H 1381 4g0a_1_H
1535 6r9p_1_B 1382 6r9p_1_B
...@@ -1559,11 +1406,6 @@ ...@@ -1559,11 +1406,6 @@
1559 5lzc_1_W 1406 5lzc_1_W
1560 5lzb_1_W 1407 5lzb_1_W
1561 3wzi_1_C 1408 3wzi_1_C
1562 -1mvr_1_E
1563 -1mvr_1_B
1564 -1mvr_1_A
1565 -4adx_1_0
1566 -4adx_1_8
1567 1n33_1_Z 1409 1n33_1_Z
1568 6dti_1_W 1410 6dti_1_W
1569 3d2s_1_F 1411 3d2s_1_F
...@@ -1572,12 +1414,7 @@ ...@@ -1572,12 +1414,7 @@
1572 5mre_1_AA 1414 5mre_1_AA
1573 5mrf_1_AA 1415 5mrf_1_AA
1574 7jhy_1_Z 1416 7jhy_1_Z
1575 -2r1g_1_A
1576 -2r1g_1_D
1577 -2r1g_1_F
1578 -3eq4_1_Y
1579 4wkr_1_C 1417 4wkr_1_C
1580 -2r1g_1_X
1581 4v99_1_EC 1418 4v99_1_EC
1582 4v99_1_AC 1419 4v99_1_AC
1583 4v99_1_BH 1420 4v99_1_BH
...@@ -1647,38 +1484,6 @@ ...@@ -1647,38 +1484,6 @@
1647 2xs7_1_B 1484 2xs7_1_B
1648 1n38_1_B 1485 1n38_1_B
1649 4qvc_1_G 1486 4qvc_1_G
1650 -6q1h_1_D
1651 -6q1h_1_H
1652 -6p7p_1_F
1653 -6p7p_1_E
1654 -6p7p_1_D
1655 -6vm6_1_J
1656 -6vm6_1_G
1657 -6wan_1_K
1658 -6wan_1_H
1659 -6wan_1_G
1660 -6wan_1_L
1661 -6wan_1_I
1662 -6ywo_1_F
1663 -6wan_1_J
1664 -4oau_1_A
1665 -6ywo_1_E
1666 -6ywo_1_K
1667 -6vm6_1_I
1668 -6vm6_1_H
1669 -6ywo_1_I
1670 -2a1r_1_C
1671 -6m6v_1_F
1672 -6m6v_1_E
1673 -2a1r_1_D
1674 -3gpq_1_E
1675 -3gpq_1_F
1676 -6o79_1_C
1677 -6vm6_1_K
1678 -6m6v_1_G
1679 -6hyu_1_D
1680 -1laj_1_R
1681 -6ybv_1_K
1682 6mpf_1_W 1487 6mpf_1_W
1683 6spc_1_A 1488 6spc_1_A
1684 6spe_1_A 1489 6spe_1_A
...@@ -1692,43 +1497,36 @@ ...@@ -1692,43 +1497,36 @@
1692 4g0a_1_E 1497 4g0a_1_E
1693 2b2d_1_S 1498 2b2d_1_S
1694 5hkc_1_C 1499 5hkc_1_C
1695 -4kzy_1_I
1696 -4kzz_1_I
1697 -4kzx_1_I
1698 1rmv_1_B 1500 1rmv_1_B
1699 4qu7_1_X 1501 4qu7_1_X
1700 4qu7_1_V 1502 4qu7_1_V
1701 4qu7_1_U 1503 4qu7_1_U
1702 -4v5z_1_AH
1703 -4v5z_1_AA
1704 -4v5z_1_AB
1705 -4v5z_1_AC
1706 -4v5z_1_AD
1707 -4v5z_1_AE
1708 -4v5z_1_AF
1709 -4v5z_1_AG
1710 6pmi_1_3 1504 6pmi_1_3
1711 6pmj_1_3 1505 6pmj_1_3
1712 5hjz_1_C 1506 5hjz_1_C
1713 -7nrc_1_SM 1507 +6ydp_1_AA_1176-2737
1714 -7nrc_1_SN 1508 +6ydw_1_AA_1176-2737
1715 -7am2_1_R1 1509 +1vy7_1_AY_1-73
1716 -7k5l_1_R 1510 +1vy7_1_CY_1-73
1717 -7b5k_1_X 1511 +4w2h_1_CY_1-73
1718 -7d8c_1_C 1512 +7d1a_1_A_805-902
1719 -7m4y_1_V 1513 +7d0g_1_A_805-913
1720 -7m4x_1_V 1514 +7d0f_1_A_817-913
1721 -7b5k_1_Z 1515 +7o7z_1_AH_144-220
1722 -7m4u_1_A 1516 +4c9d_1_D_29-1
1723 -7n06_1_G 1517 +4c9d_1_C_29-1
1724 -7n06_1_H 1518 +7aih_1_1_2400-2963
1725 -7n06_1_I 1519 +7aih_1_1_2984-3610
1726 -7n06_1_J 1520 +7ane_1_2_1904-2468
1727 -7n06_1_K 1521 +7ane_1_2_2489-3115
1728 -7n06_1_L 1522 +5g2x_1_A_595-692
1729 -7n33_1_G 1523 +7aor_1_2_2020-2579
1730 -7n33_1_H 1524 +7aor_1_2_2589-3210
1731 -7n33_1_I 1525 +7a5p_1_2_259-449
1732 -7n33_1_J 1526 +7aor_1_A_2020-2579
1733 -7n33_1_K 1527 +7aor_1_A_2589-3210
1734 -7n33_1_L 1528 +7am2_1_1_1904-2470
1529 +7am2_1_1_2491-3117
1530 +7ane_1_1_1904-2468
1531 +7ane_1_1_2489-3115
1532 +6uz7_1_8_2140-2825
......
This diff is collapsed. Click to expand it.
...@@ -5,7 +5,7 @@ rm -rf latest_run.log errors.txt ...@@ -5,7 +5,7 @@ rm -rf latest_run.log errors.txt
5 5
6 # Run RNANet 6 # Run RNANet
7 bash -c 'time python3.8 ./RNAnet.py --3d-folder /home/lbecquey/Data/RNA/3D/ --seq-folder /home/lbecquey/Data/RNA/sequences/ -r 20.0 --no-homology --redundant --extract' > latest_run.log 2>&1 7 bash -c 'time python3.8 ./RNAnet.py --3d-folder /home/lbecquey/Data/RNA/3D/ --seq-folder /home/lbecquey/Data/RNA/sequences/ -r 20.0 --no-homology --redundant --extract' > latest_run.log 2>&1
8 -bash -c 'time python3.8 ./RNAnet.py --3d-folder /home/lbecquey/Data/RNA/3D/ --seq-folder /home/lbecquey/Data/RNA/sequences/ -r 20.0 --redundant --sina --extract -s --stats-opts="--wadley --distance-matrices" --archive' > latest_run.log 2>&1 8 +bash -c 'time python3.8 ./RNAnet.py --3d-folder /home/lbecquey/Data/RNA/3D/ --seq-folder /home/lbecquey/Data/RNA/sequences/ -r 20.0 --redundant --extract -s --stats-opts="-r 20.0 --wadley --hire-rna --distance-matrices" --archive' >> latest_run.log 2>&1
9 echo 'Compressing RNANet.db.gz...' >> latest_run.log 9 echo 'Compressing RNANet.db.gz...' >> latest_run.log
10 touch results/RNANet.db # update last modification date 10 touch results/RNANet.db # update last modification date
11 gzip -k /home/lbecquey/Projects/RNANet/results/RNANet.db # compress it 11 gzip -k /home/lbecquey/Projects/RNANet/results/RNANet.db # compress it
......
1 +# This is a script supposed to be run periodically as a cron job
2 +# This one uses argument --from-scratch, so all is recomputed ! /!\
3 +# run it one or twice a year, otherwise, the faster update runs should be enough.
4 +
5 +cd /home/lbecquey/Projects/RNANet
6 +rm -rf latest_run.log errors.txt known_issues.txt known_issues_reasons.txt
7 +
8 +# Run RNANet
9 +bash -c 'time python3.8 ./RNAnet.py --3d-folder /home/lbecquey/Data/RNA/3D/ --seq-folder /home/lbecquey/Data/RNA/sequences/ --from-scratch --ignore-issues -r 20.0 --no-homology --redundant --extract' > latest_run.log 2>&1
10 +bash -c 'time python3.8 ./RNAnet.py --3d-folder /home/lbecquey/Data/RNA/3D/ --seq-folder /home/lbecquey/Data/RNA/sequences/ --from-scratch --ignore-issues -r 20.0 --redundant --extract -s --stats-opts="-r 20.0 --wadley --hire-rna --distance-matrices" --archive' >> latest_run.log 2>&1
11 +echo 'Compressing RNANet.db.gz...' >> latest_run.log
12 +touch results/RNANet.db # update last modification date
13 +gzip -k /home/lbecquey/Projects/RNANet/results/RNANet.db # compress it
14 +rm -f results/RNANet.db-wal results/RNANet.db-shm # SQLite temporary files
15 +
16 +# Save the latest results
17 +export DATE=`date +%Y%m%d`
18 +echo "Creating new release in ./archive/ folder ($DATE)..." >> latest_run.log
19 +cp /home/lbecquey/Projects/RNANet/results/summary.csv /home/lbecquey/Projects/RNANet/archive/summary_latest.csv
20 +cp /home/lbecquey/Projects/RNANet/results/summary.csv "/home/lbecquey/Projects/RNANet/archive/summary_$DATE.csv"
21 +cp /home/lbecquey/Projects/RNANet/results/families.csv /home/lbecquey/Projects/RNANet/archive/families_latest.csv
22 +cp /home/lbecquey/Projects/RNANet/results/families.csv "/home/lbecquey/Projects/RNANet/archive/families_$DATE.csv"
23 +cp /home/lbecquey/Projects/RNANet/results/frequencies.csv /home/lbecquey/Projects/RNANet/archive/frequencies_latest.csv
24 +cp /home/lbecquey/Projects/RNANet/results/pair_types.csv /home/lbecquey/Projects/RNANet/archive/pair_types_latest.csv
25 +mv /home/lbecquey/Projects/RNANet/results/RNANet.db.gz /home/lbecquey/Projects/RNANet/archive/
26 +
27 +# Init Seafile synchronization between RNANet library and ./archive/ folder (just the first time !)
28 +# seaf-cli sync -l 8e082c6e-b9ed-4b2f-9279-de2177134c57 -s https://entrepot.ibisc.univ-evry.fr -u l****.b*****y@univ-evry.fr -p ****************** -d archive/
29 +
30 +# Sync in Seafile
31 +seaf-cli start >> latest_run.log 2>&1
32 +echo 'Waiting 10m for SeaFile synchronization...' >> latest_run.log
33 +sleep 15m
34 +echo `seaf-cli status` >> latest_run.log
35 +seaf-cli stop >> latest_run.log 2>&1
36 +echo 'We are '`date`', update completed.' >> latest_run.log
37 +
...@@ -21,6 +21,6 @@ docker build -t rnanet:latest .. ...@@ -21,6 +21,6 @@ docker build -t rnanet:latest ..
21 rm x3dna-dssr 21 rm x3dna-dssr
22 22
23 # to run, use something like: 23 # to run, use something like:
24 -# docker run -v /home/persalteas/Data/RNA/3D/:/3D -v /home/persalteas/Data/RNA/sequences/:/sequences -v /home/persalteas/labo/:/runDir persalteas/rnanet [ additional options here ] 24 +# docker run -v /home/lbecquey/Data/RNA/3D/:/3D -v /home/lbecquey/Data/RNA/sequences/:/sequences -v /home/lbecquey/labo/:/runDir rnanet [ additional options here ]
25 # Without additional options, this runs a standard pass with known issues support, log output, and no statistics. The default resolution threshold is 4.0 Angstroms. 25 # Without additional options, this runs a standard pass with known issues support, log output, and no statistics. The default resolution threshold is 4.0 Angstroms.
26 26
......
...@@ -36,6 +36,6 @@ for fam in families: ...@@ -36,6 +36,6 @@ for fam in families:
36 36
37 # Now re run RNANet normally. 37 # Now re run RNANet normally.
38 command = ["python3.8", "./RNAnet.py", "--3d-folder", path_to_3D_data, "--seq-folder", path_to_seq_data, "-r", "20.0", 38 command = ["python3.8", "./RNAnet.py", "--3d-folder", path_to_3D_data, "--seq-folder", path_to_seq_data, "-r", "20.0",
39 - "--redundant", "--sina", "--extract", "-s", "--stats-opts=\"--wadley --distance-matrices\""] 39 + "--redundant", "--extract", "-s", "--stats-opts=\"-r 20.0 --wadley --hire-rna --distance-matrices\""]
40 print(' '.join(command)) 40 print(' '.join(command))
41 subprocess.run(command) 41 subprocess.run(command)
...\ No newline at end of file ...\ No newline at end of file
......
...@@ -3,8 +3,9 @@ import subprocess, os, sys ...@@ -3,8 +3,9 @@ import subprocess, os, sys
3 3
4 # Put a list of problematic chains here, they will be properly deleted and recomputed 4 # Put a list of problematic chains here, they will be properly deleted and recomputed
5 problems = [ 5 problems = [
6 - "1k73_1_A", 6 + "7nhm_1_A_1-2923"
7 - "1k73_1_B" 7 + "4wfa_1_X_1-2923"
8 + "4wce_1_X_1-2923"
8 ] 9 ]
9 10
10 # provide the path to your data folders, the RNANet.db file, and the RNANet.py file as arguments to this script 11 # provide the path to your data folders, the RNANet.db file, and the RNANet.py file as arguments to this script
...@@ -22,6 +23,7 @@ for p in problems: ...@@ -22,6 +23,7 @@ for p in problems:
22 23
23 # Remove the datapoints files and 3D files 24 # Remove the datapoints files and 3D files
24 subprocess.run(["rm", '-f', path_to_3D_data + f"/rna_mapped_to_Rfam/{p}.cif"]) 25 subprocess.run(["rm", '-f', path_to_3D_data + f"/rna_mapped_to_Rfam/{p}.cif"])
26 + subprocess.run(["rm", '-f', path_to_3D_data + f"/rna_only/{p}.cif"])
25 files = [ f for f in os.listdir(path_to_3D_data + "/datapoints") if p in f ] 27 files = [ f for f in os.listdir(path_to_3D_data + "/datapoints") if p in f ]
26 for f in files: 28 for f in files:
27 subprocess.run(["rm", '-f', path_to_3D_data + f"/datapoints/{f}"]) 29 subprocess.run(["rm", '-f', path_to_3D_data + f"/datapoints/{f}"])
...@@ -38,14 +40,14 @@ for p in problems: ...@@ -38,14 +40,14 @@ for p in problems:
38 print(' '.join(command)) 40 print(' '.join(command))
39 subprocess.run(command) 41 subprocess.run(command)
40 42
41 - command = ["python3.8", path_to_RNANet, "--3d-folder", path_to_3D_data, "--seq-folder", path_to_seq_data, "-r", "20.0", "--extract", "--only", p] 43 + command = ["python3.8", path_to_RNANet, "--3d-folder", path_to_3D_data, "--seq-folder", path_to_seq_data, "--redundant", "-r", "20.0", "--extract", "--only", p]
42 else: 44 else:
43 # Delete the chain from the database, and the associated nucleotides and re_mappings, using foreign keys 45 # Delete the chain from the database, and the associated nucleotides and re_mappings, using foreign keys
44 command = ["sqlite3", path_to_db, f"PRAGMA foreign_keys=ON; delete from chain where structure_id=\"{structure}\" and chain_name=\"{chain}\" and rfam_acc is null;"] 46 command = ["sqlite3", path_to_db, f"PRAGMA foreign_keys=ON; delete from chain where structure_id=\"{structure}\" and chain_name=\"{chain}\" and rfam_acc is null;"]
45 print(' '.join(command)) 47 print(' '.join(command))
46 subprocess.run(command) 48 subprocess.run(command)
47 49
48 - command = ["python3.8", path_to_RNANet, "--3d-folder", path_to_3D_data, "--seq-folder", path_to_seq_data, "-r", "20.0", "--no-homology", "--extract", "--only", p] 50 + command = ["python3.8", path_to_RNANet, "--3d-folder", path_to_3D_data, "--seq-folder", path_to_seq_data, "--redundant", "-r", "20.0", "--no-homology", "--extract", "--only", p]
49 51
50 # Re-run RNANet 52 # Re-run RNANet
51 os.chdir(os.path.dirname(os.path.realpath(path_to_db)) + '/../') 53 os.chdir(os.path.dirname(os.path.realpath(path_to_db)) + '/../')
......
This diff could not be displayed because it is too large.