Louis BECQUEY

doc restructuration

...@@ -4,6 +4,7 @@ latest_run.log ...@@ -4,6 +4,7 @@ latest_run.log
4 results/ 4 results/
5 archive/ 5 archive/
6 logs/ 6 logs/
7 +doc/
7 data/ 8 data/
8 esl* 9 esl*
9 .vscode/ 10 .vscode/
......
1 +############################################################################################
2 +v 1.5 beta, April 2021
3 +
4 +FEATURE CHANGES
5 + - New option --stats-opts="..." allows to pass options to the automatic run of statistics.py (when -s is used)
6 + - Removed support for 3'->5' Rfam hits, they are now completely ignored. They concern the opposite strand, which is unresolved in 3D.
7 + - Removed PyDCA, which is outdated and introduces dependencies conflicts. Code may be adapted later.
8 + - A new column in align_column table, 'index_small_ali', gives the index of the nucléotide in the "3d only" alignment.
9 + - 3D distance matrices are now computed only for match positions (match to the covariance model)
10 +
11 +BUG CORRECTIONS
12 + - Corrected a bug which skipped angle conversions from degrees (DSSR) to radians if nucleotides where renumbered.
13 +
14 +############################################################################################
15 +v 1.4 beta, March 2021
16 +
17 +Khodor Hannoush joins the development of RNANet.
18 +
19 +FEATURE CHANGES
20 + - SINA is now used only if you pass the option --sina, Infernal is used by default even for rRNAs.
21 + - A new option --cmalign-opts="..." allows to customize your cmalign runs, e.g. with --cyk. The default is no option.
22 + - RNANet makes use of PyDCA to compute DCA-related features on the alignments (descriptions to come in the Database.md)
23 + - statistics.py now fully supports the computation of 3D distance matrices, with average and standard deviation by RNA family
24 + - Now RNANet considers only the equivalence class representative structure by default. To consider all members of an equivalence
25 + class (like before), use the --redundant option.
26 +
27 +TECHNICAL CHANGES
28 + - cmalign is not run with --cyk anymore by default, and now requires huge amounts of RAM if launched with the default options.
29 + - Moving to a 60-core/128GB server for our internal runs.
30 +
31 +############################################################################################
1 v 1.3 beta, January 2021 32 v 1.3 beta, January 2021
2 33
3 The first uses of RNAnet by people from outside the development team happened between this December. 34 The first uses of RNAnet by people from outside the development team happened between this December.
......
1 # RNANet 1 # RNANet
2 2
3 Contents: 3 Contents:
4 -* [What is RNANet ?](#what-is-rnanet) 4 +* [What is RNANet ?](#what-is-rnanet-?)
5 -* [Install and run RNANet](INSTALL.md) 5 +* [Install and run RNANet](doc/INSTALL.md)
6 * [How to further filter the dataset](#how-to-further-filter-the-dataset) 6 * [How to further filter the dataset](#how-to-further-filter-the-dataset)
7 * [Filter on 3D structure resolution](#filter-on-3D-structure-resolution) 7 * [Filter on 3D structure resolution](#filter-on-3D-structure-resolution)
8 * [Filter on 3D structure publication date](#filter-on-3d-structure-publication-date) 8 * [Filter on 3D structure publication date](#filter-on-3d-structure-publication-date)
9 * [Filter to avoid chain redundancy when several mappings are available](#filter-to-avoid-chain-redundancy-when-several-mappings-are-available) 9 * [Filter to avoid chain redundancy when several mappings are available](#filter-to-avoid-chain-redundancy-when-several-mappings-are-available)
10 -* [Database tables documentation](Database.md) 10 +* [Database tables documentation](doc/Database.md)
11 -* [FAQ](FAQ.md) 11 +* [FAQ](doc/FAQ.md)
12 * [Troubleshooting](#troubleshooting) 12 * [Troubleshooting](#troubleshooting)
13 * [Contact](#contact) 13 * [Contact](#contact)
14 14
...@@ -32,9 +32,9 @@ If you use our multiple sequence alignments and homology data, you might want to ...@@ -32,9 +32,9 @@ If you use our multiple sequence alignments and homology data, you might want to
32 # What is RNANet ? 32 # What is RNANet ?
33 RNANet is a multiscale dataset of non-coding RNA structures, including sequences, secondary structures, non-canonical interactions, 3D geometrical descriptors, and sequence homology. 33 RNANet is a multiscale dataset of non-coding RNA structures, including sequences, secondary structures, non-canonical interactions, 3D geometrical descriptors, and sequence homology.
34 34
35 -It is available in machine-learning ready formats like CSV files per chain or an SQL database. 35 +It is available in machine-learning ready formats like CSV files (one per RNA chain) or as a SQL database.
36 36
37 -Most interestingly, nucleotides have been renumered in a standardized way, and the 3D chains have been re-aligned with homologous sequences from the [Rfam](https://rfam.org/) database. 37 +Most interestingly, nucleotides have been renumered in a standardized way, and the 3D chains have been re-aligned with homologous sequences and covariance models from the [Rfam](https://rfam.org/) database.
38 38
39 39
40 ## Methodology 40 ## Methodology
...@@ -68,12 +68,12 @@ Finally, export this data from the SQLite database into flat CSV files. ...@@ -68,12 +68,12 @@ Finally, export this data from the SQLite database into flat CSV files.
68 ## Data provided 68 ## Data provided
69 69
70 We provide couple of resources to exploit this dataset. You can download them on [EvryRNA](https://evryrna.ibisc.univ-evry.fr/evryrna/rnanet/rnanet_home). 70 We provide couple of resources to exploit this dataset. You can download them on [EvryRNA](https://evryrna.ibisc.univ-evry.fr/evryrna/rnanet/rnanet_home).
71 -* A series of tables in the SQLite3 database, see [the database documentation](Database.md) and [examples of useful queries](#how-to-further-filter-the-dataset), 71 +* A series of tables in the SQLite3 database, see [the database documentation](doc/Database.md) and [examples of useful queries](#how-to-further-filter-the-dataset),
72 * One CSV file per RNA chain, summarizing all the relevant information about it, 72 * One CSV file per RNA chain, summarizing all the relevant information about it,
73 * Filtered alignment files in FASTA format containing only the sequences with a 3D structure available in RNANet, but which have been aligned using all the homologous sequences of this family from Rfam or SILVA, 73 * Filtered alignment files in FASTA format containing only the sequences with a 3D structure available in RNANet, but which have been aligned using all the homologous sequences of this family from Rfam or SILVA,
74 * Additional statistics files about nucleotide frequencies, modified bases, basepair types within each chain or by RNA family. 74 * Additional statistics files about nucleotide frequencies, modified bases, basepair types within each chain or by RNA family.
75 75
76 -For now, we do not provide as public downloads the set of cleaned 3D structures nor the full alignments with Rfam sequences. If you need them, [recompute them](INSTALL.md) or ask us. 76 +For now, we do not provide as public downloads the set of cleaned 3D structures nor the full alignments with Rfam sequences. If you need them, [recompute them](doc/INSTALL.md) or ask us.
77 77
78 ## Updates 78 ## Updates
79 RNANet is updated monthly to take into account new structures proposed in the [BGSU Non-redundant lists](http://rna.bgsu.edu/rna3dhub/nrlist/). The monthly runs realign previous alignments with the new sequences using `esl-alimerge` from Infernal. 79 RNANet is updated monthly to take into account new structures proposed in the [BGSU Non-redundant lists](http://rna.bgsu.edu/rna3dhub/nrlist/). The monthly runs realign previous alignments with the new sequences using `esl-alimerge` from Infernal.
......