Nathalie BERNARD

Changement du readme pour prendre en compte MFE et les fonctions E/F

Showing 1 changed file with 19 additions and 11 deletions
...@@ -19,6 +19,7 @@ THEN ...@@ -19,6 +19,7 @@ THEN
19 OUTPUT: 19 OUTPUT:
20 - A set of secondary structures from the Pareto front, 20 - A set of secondary structures from the Pareto front,
21 - The list of known modules inserted inplace in the corresponding structures 21 - The list of known modules inserted inplace in the corresponding structures
22 +- A set of positions of the nucleotides in contact with the protein represented by asterisks (only if the motifs_28-05-2021.json library is used!)
22 23
23 2/ The different models 24 2/ The different models
24 ================================== 25 ==================================
...@@ -28,7 +29,8 @@ Biorseo can be used with two modules datasets (yet): ...@@ -28,7 +29,8 @@ Biorseo can be used with two modules datasets (yet):
28 * Rna3Dmotifs (from the work of *Djelloul & Denise, 2008*) 29 * Rna3Dmotifs (from the work of *Djelloul & Denise, 2008*)
29 * The RNA 3D Motif Atlas of BGSU's RNA lab (*Petrov et al, 2013*, see http://rna.bgsu.edu/rna3dhub/motifs/) 30 * The RNA 3D Motif Atlas of BGSU's RNA lab (*Petrov et al, 2013*, see http://rna.bgsu.edu/rna3dhub/motifs/)
30 * CaRNAval 1.0 (*Reinhartz et al, 2018*) 31 * CaRNAval 1.0 (*Reinhartz et al, 2018*)
31 -* RNA-Bricks 2, RNAMC, CaRNAval 2.0, and others could theoretically be used, but are not supported (yet). You might write your own API. 32 +* /data/modules/ISAURE/motifs_28-05-2021.json a library of motifs from RNA linked to a protein from Isaure Chauvot de Beauchêne of LORIA laboratory
33 + (contact:isaure.chauvot-de-beauchene@loria.fr)
32 34
33 PATTERN MATCHING STEP 35 PATTERN MATCHING STEP
34 - Use **simple pattern matching**. Rna3Dmotifs modules are available with sequence information. We use regular expressions to find those known loops in your query. This is the approach of RNA-MoIP (*Reinharz et al, 2012*), we deal the same way with short components and wildcards. 36 - Use **simple pattern matching**. Rna3Dmotifs modules are available with sequence information. We use regular expressions to find those known loops in your query. This is the approach of RNA-MoIP (*Reinharz et al, 2012*), we deal the same way with short components and wildcards.
...@@ -43,6 +45,8 @@ OBJECTIVE FUNCTIONS FOR THE MODULE INSERTION CRITERIA ...@@ -43,6 +45,8 @@ OBJECTIVE FUNCTIONS FOR THE MODULE INSERTION CRITERIA
43 * **Function B** : weights a module by its number of components (strands) and penalizes it by the log^(_2) of its nucleotide size. 45 * **Function B** : weights a module by its number of components (strands) and penalizes it by the log^(_2) of its nucleotide size.
44 * **Function C** : weights a module by its insertion site score (JAR3D or BayesPairing score). 46 * **Function C** : weights a module by its insertion site score (JAR3D or BayesPairing score).
45 * **Function D** : weights a module by its number of components (strands) and insertion site score (JAR3D or BayesPairing score), and penalizes it by the log^(_2) of its nucleotide size. 47 * **Function D** : weights a module by its number of components (strands) and insertion site score (JAR3D or BayesPairing score), and penalizes it by the log^(_2) of its nucleotide size.
48 +* **Function E** : weights a module by its nucleotides in contact with a protein, number of occurences and number of nucleotides in the module.
49 +* **Function F** : weights a module by its nucleotides in contact with a protein, number of occurences and number of nucleotides along the entire length of the RNA.
46 50
47 3/ Installation 51 3/ Installation
48 ================================== 52 ==================================
...@@ -55,10 +59,10 @@ Check the file [INSTALL.md](INSTALL.md) for installation instructions. ...@@ -55,10 +59,10 @@ Check the file [INSTALL.md](INSTALL.md) for installation instructions.
55 59
56 - If you **might expect a pseudoknot, or don't know**: 60 - If you **might expect a pseudoknot, or don't know**:
57 * The most promising method is the use of direct pattern matching with Rna3Dmotifs and function A. But this method is sometimes subject to combinatorial explosion issues. If you have a long RNA or a large number of loops, don't use it. Example: 61 * The most promising method is the use of direct pattern matching with Rna3Dmotifs and function A. But this method is sometimes subject to combinatorial explosion issues. If you have a long RNA or a large number of loops, don't use it. Example:
58 - `./biorseo.py -i PDB_00304.fa -O resultsFolder/ --rna3dmotifs --patternmatch --func A` 62 + `./biorseo.py -i PDB_00304.fa -O resultsFolder/ --rna3dmotifs --patternmatch --func A --MEA`
59 63
60 * The use of the RNA 3D Motif Atlas placed by JAR3D and scored with function A is not subject to combinatorial issues, but performs a bit worse. It also returns less solutions. Example: 64 * The use of the RNA 3D Motif Atlas placed by JAR3D and scored with function A is not subject to combinatorial issues, but performs a bit worse. It also returns less solutions. Example:
61 - `./biorseo.py -i PDB_00304.fa -O resultsFolder/ --3dmotifatlas --jar3d --func A 65 + `./biorseo.py -i PDB_00304.fa -O resultsFolder/ --3dmotifatlas --jar3d --func A --MEA
62 66
63 5/ List of Options 67 5/ List of Options
64 ================================== 68 ==================================
...@@ -68,9 +72,9 @@ Usage: You must provide: ...@@ -68,9 +72,9 @@ Usage: You must provide:
68 2) a module type with --rna3dmotifs, --carnaval, --3dmotifatlas or --contacts 72 2) a module type with --rna3dmotifs, --carnaval, --3dmotifatlas or --contacts
69 3) one module placement method in { --patternmatch, --jar3d, --bayespairing } 73 3) one module placement method in { --patternmatch, --jar3d, --bayespairing }
70 4) one scoring function with --func A, B, C, D, E ou F 74 4) one scoring function with --func A, B, C, D, E ou F
71 - 75 + 5) one estimator betwenn --MEA or --MFE
72 If you are not using the Docker image: 76 If you are not using the Docker image:
73 - 5) --modules-path, --biorseo-dir and (--jar3d-exec or --bypdir) 77 + 6) --modules-path, --biorseo-dir and (--jar3d-exec or --bypdir)
74 78
75 Options: 79 Options:
76 -h [ --help ] Print this help message 80 -h [ --help ] Print this help message
...@@ -85,11 +89,15 @@ Options: ...@@ -85,11 +89,15 @@ Options:
85 -b [ --bayespairing ] Use BayesPairing2 to place modules in the sequence (requires --rna3dmotifs or --3dmotifatlas) 89 -b [ --bayespairing ] Use BayesPairing2 to place modules in the sequence (requires --rna3dmotifs or --3dmotifatlas)
86 -o [ --output=… ] File to summarize the results 90 -o [ --output=… ] File to summarize the results
87 -O [ --outputf=… ] Folder where to output result and temp files 91 -O [ --outputf=… ] Folder where to output result and temp files
88 --f [ --func=… ] (A, B, C or D, default is B) Objective function to score module insertions: 92 +-f [ --func=… ] (A, B, C, D, E or F default is B) Objective function to score module insertions:
89 (A) insert big modules (B) insert light, high-order modules 93 (A) insert big modules (B) insert light, high-order modules
90 - (c) insert modules which score well with the sequence 94 + (C) insert modules which score well with the sequence
91 (D) insert light, high-order modules which score well with the sequence. 95 (D) insert light, high-order modules which score well with the sequence.
92 - C and D require cannot be used with --patternmatch. 96 + C and D cannot be used with --patternmatch.
97 + (E) and (F) insert modules with a lot of nucleotides and a lot of nucleotides in contact with a proteine, and a huge number of occurences.
98 + (E) maximize the number of contact nucleotide inside the module, while (F) maximize the number of contact nucleotide along the entire length of the RNA.
99 +--MEA Use Maximum Expected Accuracy for the second objective
100 +--MFE Use Minimum Free Energy based on the formula of (*Legendre et al., 2018*) for the second objective
93 -c [ --first-objective=… ] (default 1) Objective to solve in the mono-objective portions of the algorithm. 101 -c [ --first-objective=… ] (default 1) Objective to solve in the mono-objective portions of the algorithm.
94 (1) is the module objective given by --func, (2) is the expected accuracy of the structure. 102 (1) is the module objective given by --func, (2) is the expected accuracy of the structure.
95 -l [ --limit=… ] (default 500) Number of solutions in the Pareto set from which 103 -l [ --limit=… ] (default 500) Number of solutions in the Pareto set from which
...@@ -114,9 +122,9 @@ Options: ...@@ -114,9 +122,9 @@ Options:
114 BiORSEO from outside the docker image. Use the FULL path. 122 BiORSEO from outside the docker image. Use the FULL path.
115 123
116 Examples: 124 Examples:
117 -biorseo.py -i myRNA.fa -O myResultsFolder/ --rna3dmotifs --patternmatch --func B 125 +biorseo.py -i myRNA.fa -O myResultsFolder/ --rna3dmotifs --patternmatch --func B --MEA
118 -biorseo.py -i myRNA.fa -O myResultsFolder/ --3dmotifatlas --jar3d --func B -l 800 126 +biorseo.py -i myRNA.fa -O myResultsFolder/ --3dmotifatlas --jar3d --func B -l 800 --MEA
119 -biorseo.py -i myRNA.fa -v --3dmotifatlas --bayespairing --func D 127 +biorseo.py -i myRNA.fa -v --3dmotifatlas --bayespairing --func D --MEA
120 128
121 The allowed module/placement-method/function combinations are: 129 The allowed module/placement-method/function combinations are:
122 130
......