Merge branch 'master' of https://github.com/persalteas/RNANet
Showing
1 changed file
with
8 additions
and
9 deletions
... | @@ -27,7 +27,7 @@ Contents: | ... | @@ -27,7 +27,7 @@ Contents: |
27 | # What it does | 27 | # What it does |
28 | The script follows these steps: | 28 | The script follows these steps: |
29 | * Gets a list of 3D structures containing RNA from BGSU's non-redundant list (but keeps the redundant structures /!\\), | 29 | * Gets a list of 3D structures containing RNA from BGSU's non-redundant list (but keeps the redundant structures /!\\), |
30 | -* Asks Rfam for mappings of these structures onto Rfam families (~ a half of structures have a direct mapping, some more are inferred using the redundancy list) | 30 | +* Asks Rfam for mappings of these structures onto Rfam families (~50% of structures have a direct mapping, some more are inferred using the redundancy list) |
31 | * Downloads the corresponding 3D structures (mmCIFs) | 31 | * Downloads the corresponding 3D structures (mmCIFs) |
32 | * If desired, extracts the right chain portions that map onto an Rfam family | 32 | * If desired, extracts the right chain portions that map onto an Rfam family |
33 | 33 | ||
... | @@ -35,7 +35,7 @@ Now, compute the features: | ... | @@ -35,7 +35,7 @@ Now, compute the features: |
35 | 35 | ||
36 | * Extract the sequence for every 3D chain | 36 | * Extract the sequence for every 3D chain |
37 | * Downloads Rfamseq ncRNA sequence hits for the concerned Rfam families | 37 | * Downloads Rfamseq ncRNA sequence hits for the concerned Rfam families |
38 | -* Realigns Rfamseq hits and sequences from the 3D structures together to obtain a multiple sequence alignment for each Rfam family (using cmalign, except for ribosomal LSU and SSU, where SINA is used) | 38 | +* Realigns Rfamseq hits and sequences from the 3D structures together to obtain a multiple sequence alignment for each Rfam family (using `cmalign --cyk`, except for ribosomal LSU and SSU, where SINA is used) |
39 | * Computes nucleotide frequencies at every position for each alignment | 39 | * Computes nucleotide frequencies at every position for each alignment |
40 | * For each aligned 3D chain, get the nucleotide frequencies in the corresponding RNA family for each residue | 40 | * For each aligned 3D chain, get the nucleotide frequencies in the corresponding RNA family for each residue |
41 | 41 | ||
... | @@ -49,12 +49,10 @@ Finally, export this data from the SQLite database into flat CSV files. | ... | @@ -49,12 +49,10 @@ Finally, export this data from the SQLite database into flat CSV files. |
49 | 49 | ||
50 | * `results/RNANet.db` is a SQLite database file containing several tables with all the information, which you can query yourself with your custom requests, | 50 | * `results/RNANet.db` is a SQLite database file containing several tables with all the information, which you can query yourself with your custom requests, |
51 | * `3D-folder-you-passed-in-option/datapoints/*` are flat text CSV files, one for one RNA chain mapped to one RNA family, gathering the per-position nucleotide descriptors, | 51 | * `3D-folder-you-passed-in-option/datapoints/*` are flat text CSV files, one for one RNA chain mapped to one RNA family, gathering the per-position nucleotide descriptors, |
52 | -* `results/RNANET_datapoints_latest.tar.gz` is a compressed archive of the above CSV files (only if you passed the --archive option) | 52 | +* `archive/RNANET_datapoints_{DATE}.tar.gz` is a compressed archive of the above CSV files (only if you passed the --archive option) |
53 | -* `path-to-3D-folder-you-passed-in-option/rna_mapped_to_Rfam` If you used the --extract option, this folder contains one mmCIF file per RNA chain mapped to one RNA family, without other chains, proteins (nor ions and ligands by default) | 53 | +* `path-to-3D-folder-you-passed-in-option/rna_mapped_to_Rfam` If you used the `--extract` option, this folder contains one mmCIF file per RNA chain mapped to one RNA family, without other chains, proteins (nor ions and ligands by default). If you used both `--extract` and `--no-homology`, this folder is called `rnaonly`. |
54 | -* `results/summary_latest.csv` summarizes information about the RNA chains | 54 | +* `results/summary.csv` summarizes information about the RNA chains |
55 | -* `results/families_latest.csv` summarizes information about the RNA families | 55 | +* `results/families.csv` summarizes information about the RNA families |
56 | - | ||
57 | -If you launch successive executions of RNANet, the previous tar.gz archive and the two summary CSV files are stored in the `results/archive/` folder. | ||
58 | 56 | ||
59 | Other folders are created and not deleted, which you might want to conserve to avoid re-computations in later runs: | 57 | Other folders are created and not deleted, which you might want to conserve to avoid re-computations in later runs: |
60 | 58 | ||
... | @@ -63,7 +61,8 @@ Other folders are created and not deleted, which you might want to conserve to a | ... | @@ -63,7 +61,8 @@ Other folders are created and not deleted, which you might want to conserve to a |
63 | * `path-to-3D-folder-you-passed-in-option/RNAcifs/` contains mmCIF structures directly downloaded from the PDB, which contain RNA chains, | 61 | * `path-to-3D-folder-you-passed-in-option/RNAcifs/` contains mmCIF structures directly downloaded from the PDB, which contain RNA chains, |
64 | * `path-to-3D-folder-you-passed-in-option/annotations/` contains the raw JSON annotation files of the previous mmCIF structures. You may find additional information into them which is not properly supported by RNANet yet. | 62 | * `path-to-3D-folder-you-passed-in-option/annotations/` contains the raw JSON annotation files of the previous mmCIF structures. You may find additional information into them which is not properly supported by RNANet yet. |
65 | 63 | ||
66 | -# How to run (on Linux x86-64 only) | 64 | +# How to run |
65 | +RNANet is availbale on Linux (x86-64) only. It could theoretically work on Mac using command line installation (*untested*). | ||
67 | 66 | ||
68 | ## Required computational resources | 67 | ## Required computational resources |
69 | - CPU: no requirements. The program is optimized for multi-core CPUs, you might want to use Intel Xeons, AMD Ryzens, etc. | 68 | - CPU: no requirements. The program is optimized for multi-core CPUs, you might want to use Intel Xeons, AMD Ryzens, etc. | ... | ... |
-
Please register or login to post a comment