Louis BECQUEY

doc restructuration

......@@ -4,6 +4,7 @@ latest_run.log
results/
archive/
logs/
doc/
data/
esl*
.vscode/
......
############################################################################################
v 1.5 beta, April 2021
FEATURE CHANGES
- New option --stats-opts="..." allows to pass options to the automatic run of statistics.py (when -s is used)
- Removed support for 3'->5' Rfam hits, they are now completely ignored. They concern the opposite strand, which is unresolved in 3D.
- Removed PyDCA, which is outdated and introduces dependencies conflicts. Code may be adapted later.
- A new column in align_column table, 'index_small_ali', gives the index of the nucléotide in the "3d only" alignment.
- 3D distance matrices are now computed only for match positions (match to the covariance model)
BUG CORRECTIONS
- Corrected a bug which skipped angle conversions from degrees (DSSR) to radians if nucleotides where renumbered.
############################################################################################
v 1.4 beta, March 2021
Khodor Hannoush joins the development of RNANet.
FEATURE CHANGES
- SINA is now used only if you pass the option --sina, Infernal is used by default even for rRNAs.
- A new option --cmalign-opts="..." allows to customize your cmalign runs, e.g. with --cyk. The default is no option.
- RNANet makes use of PyDCA to compute DCA-related features on the alignments (descriptions to come in the Database.md)
- statistics.py now fully supports the computation of 3D distance matrices, with average and standard deviation by RNA family
- Now RNANet considers only the equivalence class representative structure by default. To consider all members of an equivalence
class (like before), use the --redundant option.
TECHNICAL CHANGES
- cmalign is not run with --cyk anymore by default, and now requires huge amounts of RAM if launched with the default options.
- Moving to a 60-core/128GB server for our internal runs.
############################################################################################
v 1.3 beta, January 2021
The first uses of RNAnet by people from outside the development team happened between this December.
......
# RNANet
Contents:
* [What is RNANet ?](#what-is-rnanet)
* [Install and run RNANet](INSTALL.md)
* [What is RNANet ?](#what-is-rnanet-?)
* [Install and run RNANet](doc/INSTALL.md)
* [How to further filter the dataset](#how-to-further-filter-the-dataset)
* [Filter on 3D structure resolution](#filter-on-3D-structure-resolution)
* [Filter on 3D structure publication date](#filter-on-3d-structure-publication-date)
* [Filter to avoid chain redundancy when several mappings are available](#filter-to-avoid-chain-redundancy-when-several-mappings-are-available)
* [Database tables documentation](Database.md)
* [FAQ](FAQ.md)
* [Database tables documentation](doc/Database.md)
* [FAQ](doc/FAQ.md)
* [Troubleshooting](#troubleshooting)
* [Contact](#contact)
......@@ -32,9 +32,9 @@ If you use our multiple sequence alignments and homology data, you might want to
# What is RNANet ?
RNANet is a multiscale dataset of non-coding RNA structures, including sequences, secondary structures, non-canonical interactions, 3D geometrical descriptors, and sequence homology.
It is available in machine-learning ready formats like CSV files per chain or an SQL database.
It is available in machine-learning ready formats like CSV files (one per RNA chain) or as a SQL database.
Most interestingly, nucleotides have been renumered in a standardized way, and the 3D chains have been re-aligned with homologous sequences from the [Rfam](https://rfam.org/) database.
Most interestingly, nucleotides have been renumered in a standardized way, and the 3D chains have been re-aligned with homologous sequences and covariance models from the [Rfam](https://rfam.org/) database.
## Methodology
......@@ -68,12 +68,12 @@ Finally, export this data from the SQLite database into flat CSV files.
## Data provided
We provide couple of resources to exploit this dataset. You can download them on [EvryRNA](https://evryrna.ibisc.univ-evry.fr/evryrna/rnanet/rnanet_home).
* A series of tables in the SQLite3 database, see [the database documentation](Database.md) and [examples of useful queries](#how-to-further-filter-the-dataset),
* A series of tables in the SQLite3 database, see [the database documentation](doc/Database.md) and [examples of useful queries](#how-to-further-filter-the-dataset),
* One CSV file per RNA chain, summarizing all the relevant information about it,
* Filtered alignment files in FASTA format containing only the sequences with a 3D structure available in RNANet, but which have been aligned using all the homologous sequences of this family from Rfam or SILVA,
* Additional statistics files about nucleotide frequencies, modified bases, basepair types within each chain or by RNA family.
For now, we do not provide as public downloads the set of cleaned 3D structures nor the full alignments with Rfam sequences. If you need them, [recompute them](INSTALL.md) or ask us.
For now, we do not provide as public downloads the set of cleaned 3D structures nor the full alignments with Rfam sequences. If you need them, [recompute them](doc/INSTALL.md) or ask us.
## Updates
RNANet is updated monthly to take into account new structures proposed in the [BGSU Non-redundant lists](http://rna.bgsu.edu/rna3dhub/nrlist/). The monthly runs realign previous alignments with the new sequences using `esl-alimerge` from Infernal.
......