Louis BECQUEY

NR class representatives only for rRNA distance matrices

...@@ -23,3 +23,5 @@ scripts/*.sh ...@@ -23,3 +23,5 @@ scripts/*.sh
23 scripts/*.tar 23 scripts/*.tar
24 scripts/measure.py 24 scripts/measure.py
25 scripts/recompute_some_chains.py 25 scripts/recompute_some_chains.py
26 +scripts/convert_rna_jsons.py
27 +scripts/recompute_family.py
......
...@@ -8,6 +8,9 @@ FEATURE CHANGES ...@@ -8,6 +8,9 @@ FEATURE CHANGES
8 The LSU and SSU are now aligned with Infernal options '--cpu 10 --mxsize 8192 --mxtau 0.1', which is slow, 8 The LSU and SSU are now aligned with Infernal options '--cpu 10 --mxsize 8192 --mxtau 0.1', which is slow,
9 requires up to 100 GB of RAM, and yields a suboptimal alignment (tau=0.1 is quite bad), but is homogenous with the other families. 9 requires up to 100 GB of RAM, and yields a suboptimal alignment (tau=0.1 is quite bad), but is homogenous with the other families.
10 - The LSU and SSU therefore have defined cm_coords fields, and therefore distance matrices can be computed. 10 - The LSU and SSU therefore have defined cm_coords fields, and therefore distance matrices can be computed.
11 + - Distances matrices are computed on all availables molecules of the family by default, but you can use statistics.py --non-redundant to only
12 + select the equivalence class representatives at a given resolution into account (new option). For storage reasons, rRNAs are always run in
13 + this mode (but this might change in the future : space required is 'only' ~300 GB).
11 - We now provide for download the renumbered (standardised) 3D MMCIF files, the nucleotides being numbered by their "index_chain" in the database. 14 - We now provide for download the renumbered (standardised) 3D MMCIF files, the nucleotides being numbered by their "index_chain" in the database.
12 - We now provide for download the sequences of the 3D chains aligned by Rfam family (without Rfam sequences, which have been removed). 15 - We now provide for download the sequences of the 3D chains aligned by Rfam family (without Rfam sequences, which have been removed).
13 - statistics.py now computes histograms and a density estimation with Gaussian mixture models for a large set of geometric parameters, 16 - statistics.py now computes histograms and a density estimation with Gaussian mixture models for a large set of geometric parameters,
...@@ -23,7 +26,7 @@ FEATURE CHANGES ...@@ -23,7 +26,7 @@ FEATURE CHANGES
23 BUG CORRECTIONS 26 BUG CORRECTIONS
24 - New code file geometric_stats.py 27 - New code file geometric_stats.py
25 - New automation script that starts from scratch 28 - New automation script that starts from scratch
26 - - Many small fixes 29 + - Many small fixes, leading to the support of many previously "known issues"
27 - Performance tweaks 30 - Performance tweaks
28 31
29 TECHNICAL CHANGES 32 TECHNICAL CHANGES
......
This diff is collapsed. Click to expand it.
This diff could not be displayed because it is too large.
This diff is collapsed. Click to expand it.