Showing
4 changed files
with
115 additions
and
73 deletions
INSTALL.md
0 → 100644
1 | +Installation | ||
2 | +================================== | ||
3 | +### CLONING | ||
4 | +* Clone this git repository : `git clone https://github.com/persalteas/biorseo.git` and `cd biorseo`. | ||
5 | +* Create folders for the modules you will use: `mkdir -p data/modules/`. If you plan to use several module sources, add subdirectories : `mkdir -p data/modules/DESC` and `mkdir -p data/modules/BGSU` | ||
6 | + | ||
7 | +### RNA3DMOTIFS DATA | ||
8 | + | ||
9 | +If you use Rna3Dmotifs, you need to get RNA-MoIP's .DESC dataset: download it from [GitHub](https://github.com/McGill-CSB/RNAMoIP/blob/master/CATALOGUE.tgz). Put all the .desc from the `Non_Redundant_DESC` folder into `./data/modules/DESC`. Otherwise, you also can run Rna3Dmotifs' `catalog` program to get your own DESC modules collection from updated 3D data (download [Rna3Dmotifs](https://rna3dmotif.lri.fr/Rna3Dmotif.tgz)). You also need to move the final DESC files into `./data/modules/DESC`. | ||
10 | + | ||
11 | +### THE RNA 3D MOTIF ATLAS DATA | ||
12 | + | ||
13 | +If not done during the installation of JAR3D, get the latest version of the HL and IL module models from the [BGSU website](http://rna.bgsu.edu/data/jar3d/models/) and extract the Zip files. Put the HL and IL folders from inside the Zip files into `./data/modules/BGSU`. Note that only the latest Zip is required. | ||
14 | + | ||
15 | + | ||
16 | +### DEPENDENCIES | ||
17 | +- Make sure you have Python 3.5+, Cmake, and a C++ compiler installed on your distribution. Please, it's 2019, use a recent one, we use the 2017 C++ standard. The compilation will not work with Ubuntu 16's GCC 5.4 for example. Tested with libstdc++-dev >= 6.0, so use GCC >=6.0 or Clang >= 6.0. | ||
18 | +- Install automake, libboost-program-options and libboost-filesystem. | ||
19 | +- Download and install [IBM ILOG Cplex optimization studio](https://www.ibm.com/analytics/cplex-optimizer), an academic account is required. The free version is too limited, you must register as academic. This is also free. | ||
20 | +- Download and install Eigen: Get the latest Eigen archive from http://eigen.tuxfamily.org. Unpack it, and install it. | ||
21 | +```bash | ||
22 | +wget http://bitbucket.org/eigen/eigen/get/3.3.7.tar.gz -O eigen_src.tar.gz | ||
23 | +tar -xf eigen_src.tar.gz | ||
24 | +cd eigen-eigen-323c052e1731 | ||
25 | +mkdir build | ||
26 | +cd build | ||
27 | +cmake .. | ||
28 | +sudo make install | ||
29 | +``` | ||
30 | +- Download and install NUPACK: Register on [Nupack's website](http://www.nupack.org/downloads/source), download the source, unpack it, build it, and install it: | ||
31 | +```bash | ||
32 | +wget http://www.nupack.org/downloads/serve_file/nupack3.2.2.tar.gz | ||
33 | +tar -xf nupack3.2.2.tar.gz | ||
34 | +cd nupack3.2.2 | ||
35 | +mkdir build | ||
36 | +cd build | ||
37 | +cmake .. | ||
38 | +make -j4 | ||
39 | +sudo make install | ||
40 | +``` | ||
41 | +You will notice that the installation process is not complete, some of the headers are not well copied to /usr/local. Solve it manually: | ||
42 | +``` | ||
43 | +sudo cp nupack3.2.2/src/thermo/*.h /usr/local/include/nupack/thermo/ | ||
44 | +``` | ||
45 | +### OPTIONAL DEPENDENCIES FOR USE OF JAR3D | ||
46 | +- Download and install RNAsubopt from the [ViennaRNA package](https://www.tbi.univie.ac.at/RNA/). | ||
47 | +- Download and install Java runtime (Tested with Java 10) | ||
48 | +- Download the latest JAR3D executable "*jar3d_releasedate.jar*" from [the BGSU website](http://rna.bgsu.edu/data/jar3d/models/). | ||
49 | + | ||
50 | + | ||
51 | +### OPTIONAL DEPENDENCIES FOR USE OF BAYESPAIRING | ||
52 | +- Download and install RNAfold from the [ViennaRNA package](https://www.tbi.univie.ac.at/RNA/) (if not already done at the previous step). | ||
53 | +- Make sure you have Python 3.5+ with packages networkx, numpy, regex, wrapt and biopython. You can install them with pip, you will need the python3-dev package to build them. | ||
54 | +- Clone the latest BayesPairing Git repo, and install it : | ||
55 | +``` | ||
56 | +git clone http://jwgitlab.cs.mcgill.ca/sarrazin/rnabayespairing.git BayesPairing | ||
57 | +cd BayesPairing | ||
58 | +pip install . | ||
59 | +``` | ||
60 | + | ||
61 | +### BUILDING | ||
62 | +* Edit the file `EditMe` to set the paths of the above dependencies and data. Fields that you will not use can be ignored (ex: bypdir if you do not use BayesPairing). Example of my setup: | ||
63 | + * CPLEXDir="/opt/ibm/ILOG/CPLEX_Studio128_Student" | ||
64 | + * IEIGEN="/usr/local/include/eigen3" | ||
65 | + * INUPACK="/usr/local/include/nupack" | ||
66 | + * jar3dexec="/nhome/siniac/lbecquey/Software/jar3dbin/jar3d_2014-12-11.jar" | ||
67 | + * bypdir="/nhome/siniac/lbecquey/Software/BayesPairing/bayespairing/src" | ||
68 | + * biorseoDir="/nhome/siniac/lbecquey/Software/biorseo" | ||
69 | +* You might want to edit `Makefile` if you are not using clang as compiler. For example, if you use g++, replace clang++ by g++. | ||
70 | +* Build it: `make -j4` | ||
71 | +* Check if the executable file exists: `./bin/biorseo --version`. | ||
72 | + | ||
73 | +### BAYESPAIRING USERS: PREPARE BAYESIAN NETWORKS | ||
74 | +We run an example job for it to build the bayesian networks of our modules. | ||
75 | +``` | ||
76 | +cd rnabayespairing/src | ||
77 | +python3 parse_sequences.py -d rna3dmotif -seq ACACGGGGUAAGAGCUGAACGCAUCUAAGCUCGAAACCCACUUGGAAAAGAGACACCGCCGAGGUCCCGCGUACAAGACGCGGUCGAUAGACUCGGGGUGUGCGCGUCGAGGUAACGAGACGUUAAGCCCACGAGCACUAACAGACCAAAGCCAUCAU -ss ".................................................................((...............)xxxx(...................................................)xxx).............." | ||
78 | +``` | ||
79 | +Use `-d rna3dmotif` or `-d 3dmotifatlas` depending on the module source you are planning to use. | ||
80 | +This is a quite long step, but the bayesian networks will be ready for all the future uses. |
... | @@ -37,6 +37,7 @@ $(OBJECTS): $(OBJDIR)/%.o : $(SRCDIR)/%.cpp $(INCLUDES) | ... | @@ -37,6 +37,7 @@ $(OBJECTS): $(OBJDIR)/%.o : $(SRCDIR)/%.cpp $(INCLUDES) |
37 | @echo "\033[00;32mCompiled "$<".\033[00m" | 37 | @echo "\033[00;32mCompiled "$<".\033[00m" |
38 | 38 | ||
39 | doc: mainpdf supppdf | 39 | doc: mainpdf supppdf |
40 | + @echo "\033[00;32mLaTeX documentation rendered.\033[00m" | ||
40 | 41 | ||
41 | mainpdf: doc/main_bioinformatics.tex doc/references.bib doc/bioinfo.cls doc/natbib.bst | 42 | mainpdf: doc/main_bioinformatics.tex doc/references.bib doc/bioinfo.cls doc/natbib.bst |
42 | cd doc; pdflatex -synctex=1 -interaction=nonstopmode -file-line-error main_bioinformatics | 43 | cd doc; pdflatex -synctex=1 -interaction=nonstopmode -file-line-error main_bioinformatics | ... | ... |
... | @@ -50,85 +50,45 @@ OBJECTIVE FUNCTIONS FOR THE MODULE INSERTION CRITERIA | ... | @@ -50,85 +50,45 @@ OBJECTIVE FUNCTIONS FOR THE MODULE INSERTION CRITERIA |
50 | 50 | ||
51 | - If you **might expect a pseudoknot, or don't know**: | 51 | - If you **might expect a pseudoknot, or don't know**: |
52 | * The most promising method is the use of direct pattern matching with Rna3Dmotifs and function B. But this method is sometimes subject to combinatorial explosion issues. If you have a long RNA or a large number of loops, don't use it. Example: | 52 | * The most promising method is the use of direct pattern matching with Rna3Dmotifs and function B. But this method is sometimes subject to combinatorial explosion issues. If you have a long RNA or a large number of loops, don't use it. Example: |
53 | - `./bin/biorseo -s PDB_00304.fa --descfolder ./data/modules/DESC --type B -o PDB_00304.rawB ` | 53 | + `./biorseo.py -i PDB_00304.fa --rna3dmotifs --patternmatch --func B` |
54 | 54 | ||
55 | * The use of the RNA 3D Motif Atlas placed by JAR3D and scored with function B is not subject to combinatorial issues, but performs a bit worse. It also returns less solutions. Example: | 55 | * The use of the RNA 3D Motif Atlas placed by JAR3D and scored with function B is not subject to combinatorial issues, but performs a bit worse. It also returns less solutions. Example: |
56 | - `./bin/biorseo -s PDB_00304.fa --jar3dcsv PDB_00304.sites.csv --type B -o PDB_00304.jar3dB` | 56 | + `./bin/biorseo -i PDB_00304.fa --3dmotifatlas --jar3d --func B` |
57 | 57 | ||
58 | 58 | ||
59 | 4/ Installation | 59 | 4/ Installation |
60 | ================================== | 60 | ================================== |
61 | -### DEPENDENCIES | 61 | +Check the file INSTALL.md for installation instructions. |
62 | -- Make sure you have Python 3.5+, Cmake, and a C++ compiler installed on your distribution. | ||
63 | -- Install automake and libboost-filesystem. | ||
64 | -- Download and install [IBM ILOG Cplex optimization studio](https://www.ibm.com/analytics/cplex-optimizer), an academic account is required. The free version is too limited, you must register as academic. This is also free. | ||
65 | -- Download and install Eigen: Get the latest Eigen archive from http://eigen.tuxfamily.org. Unpack it, and install it. | ||
66 | -```bash | ||
67 | -wget http://bitbucket.org/eigen/eigen/get/3.3.7.tar.gz -O eigen_src.tar.gz | ||
68 | -tar -xf eigen_src.tar.gz | ||
69 | -cd eigen-eigen-323c052e1731 | ||
70 | -mkdir build | ||
71 | -cd build | ||
72 | -cmake .. | ||
73 | -sudo make install | ||
74 | -``` | ||
75 | -- Download and install NUPACK: Register on [Nupack's website](http://www.nupack.org/downloads/source), download the source, unpack it, build it, and install it: | ||
76 | -```bash | ||
77 | -wget http://www.nupack.org/downloads/serve_file/nupack3.2.2.tar.gz | ||
78 | -tar -xf nupack3.2.2.tar.gz | ||
79 | -cd nupack3.2.2 | ||
80 | -mkdir build | ||
81 | -cd build | ||
82 | -cmake .. | ||
83 | -make -j4 | ||
84 | -sudo make install | ||
85 | -``` | ||
86 | - | ||
87 | -### OPTIONAL DEPENDENCIES FOR USE OF JAR3D | ||
88 | -- Download and install RNAsubopt from the [ViennaRNA package](https://www.tbi.univie.ac.at/RNA/). | ||
89 | -- Download and install Java runtime (Tested with Java 10) | ||
90 | -- Download the latest JAR3D executable "*jar3d_releasedate.jar*", and latest IL and HL models from [here](http://rna.bgsu.edu/data/jar3d/models/). | ||
91 | - Note that only the latest version is required (not all the versions provided in the folders). | ||
92 | - | ||
93 | -### OPTIONAL DEPENDENCIES FOR USE OF BAYESPAIRING | ||
94 | -- Download and install RNAfold from the [ViennaRNA package](https://www.tbi.univie.ac.at/RNA/). | ||
95 | -- Make sure you have Python 3.5+ with packages networkx, numpy, regex, wrapt and biopython | ||
96 | -- Clone the latest BayesPairing Git repo, and install it : | ||
97 | -``` | ||
98 | -git clone http://jwgitlab.cs.mcgill.ca/sarrazin/rnabayespairing.git BayesPairing | ||
99 | -cd BayesPairing | ||
100 | -pip install . | ||
101 | -``` | ||
102 | 62 | ||
103 | -### RNA3DMOTIFS DATA | 63 | +5/ List of Options |
104 | - | 64 | +================================== |
105 | -If you use Rna3Dmotifs, you need to get RNA-MoIP's .DESC dataset: download it from [GitHub](https://github.com/McGill-CSB/RNAMoIP/blob/master/CATALOGUE.tgz). Put all the .desc from the `Non_Redundant_DESC` folder into `./data/modules/DESC`. Otherwise, you also can run Rna3Dmotifs' `catalog` program to get your own DESC modules collection from updated 3D data (download [Rna3Dmotifs](https://rna3dmotif.lri.fr/Rna3Dmotif.tgz)). You also need to move the final DESC files into `./data/modules/DESC`. | ||
106 | - | ||
107 | -### THE RNA 3D MOTIF ATLAS DATA | ||
108 | - | ||
109 | -If not done during the installation of JAR3D, get the latest version of the HL and IL module models from the [BGSU website](http://rna.bgsu.edu/data/jar3d/models/) and extract the Zip files. Put the HL and IL folders into `./data/modules/BGSU`. | ||
110 | - | ||
111 | -### BUILDING | ||
112 | -* Clone this git repository : `git clone https://github.com/persalteas/biorseo.git` and `cd biorseo`. | ||
113 | -* Edit the file `EditMe` to set the paths of the above dependencies and data. Fileds that you will not use can be ignored (ex: bypdir if you do not use BayesPairing). Example of my setup: | ||
114 | - * CPLEXDir="/opt/ibm/ILOG/CPLEX_Studio128_Student" | ||
115 | - * IEIGEN="/usr/local/include/eigen3" | ||
116 | - * INUPACK="/usr/local/include/nupack" | ||
117 | - * jar3dexec="/nhome/siniac/lbecquey/Software/jar3dbin/jar3d_2014-12-11.jar" | ||
118 | - * ILmotifDir="/nhome/siniac/lbecquey/Data/RNA/motifs/BGSU/Matlab_results/IL/3.2/lib" | ||
119 | - * HLmotifDir="/nhome/siniac/lbecquey/Data/RNA/motifs/BGSU/Matlab_results/HL/3.2/lib" | ||
120 | - * descfolder="/nhome/siniac/lbecquey/Data/RNA/motifs/Rna3Dmotifs/No_Redondance_DESC/" | ||
121 | - * bypdir="/nhome/siniac/lbecquey/Software/BayesPairing/bayespairing/src" | ||
122 | - * biorseoDir="/nhome/siniac/lbecquey/Software/biorseo" | ||
123 | -* You might want to edit `Makefile` if you are not using clang as compiler. For example, if you use g++, replace clang++ by g++. | ||
124 | -* Build it: `make -j4` | ||
125 | -* The working executable file is `./bin/biorseo`. | ||
126 | - | ||
127 | -### BAYESPAIRING USERS: PREPARE BAYESIAN NETWORKS | ||
128 | -We run an example job for it to build the bayesian networks of our modules. | ||
129 | ``` | 65 | ``` |
130 | -cd rnabayespairing/src | 66 | +-h [ --help ] Print this help message |
131 | -python3 parse_sequences.py -d rna3dmotif -seq ACACGGGGUAAGAGCUGAACGCAUCUAAGCUCGAAACCCACUUGGAAAAGAGACACCGCCGAGGUCCCGCGUACAAGACGCGGUCGAUAGACUCGGGGUGUGCGCGUCGAGGUAACGAGACGUUAAGCCCACGAGCACUAACAGACCAAAGCCAUCAU -ss ".................................................................((...............)xxxx(...................................................)xxx).............." | 67 | +--version Print the program version |
68 | +-i [ --seq=… ] FASTA file with the query RNA sequence | ||
69 | +-p [ --patternmatch ] Use regular expressions to place modules in the sequence | ||
70 | +-j [ --jar3d ] Use JAR3D to place modules in the sequence (requires --3dmotifatlas) | ||
71 | +-b [ --bayespairing ] Use BayesPairing to place modules in the sequence | ||
72 | +-o [ --output=… ] Folder where to output files | ||
73 | +-f [ --func=… ] (A, B, C or D, default is B) Objective function to score module insertions: | ||
74 | + (A) insert big modules (B) insert light, high-order modules | ||
75 | + (c) insert modules which score well with the sequence | ||
76 | + (D) insert light, high-order modules which score well with the sequence. | ||
77 | + C and D require cannot be used with --patternmatch. | ||
78 | +-c [ --first-objective=… ] (default 1) Objective to solve in the mono-objective portions of the algorithm. | ||
79 | + (1) is the module objective given by --func, (2) is the expected accuracy of the structure. | ||
80 | +-l [ --limit=… ] (default 500) Intermediate number of solutions in the Pareto set from whichwe give up the computation. | ||
81 | +-t [ --theta=… ] (default 0.001) Pairing-probability threshold to consider or not the possibility of pairing | ||
82 | +-n [ --disable-pseudoknots ] Add constraints to explicitly forbid the formation of pseudoknots | ||
83 | +-v [ --verbose ] Print what is happening to stdout | ||
84 | +--modules-path=… Path to the modules data. | ||
85 | + The folder should contain modules in the DESC format as output by Djelloul & Denise's | ||
86 | + 'catalog' program for use with --rna3dmotifs, or should contain the IL/ and HL/ folders from a release of | ||
87 | + the RNA 3D Motif Atlasfor use with --3dmotifatlas. | ||
88 | + Consider placing these files on a fast I/O device (NVMe SSD, ...) | ||
89 | + | ||
90 | +Examples: | ||
91 | +biorseo.py -i myRNA.fa -o myResultsFolder/ --rna3dmotifs --patternmatch --func B | ||
92 | +biorseo.py -i myRNA.fa -o myResultsFolder/ --3dmotifatlas --jar3d --func B -l 800 | ||
93 | +biorseo.py -i myRNA.fa --3dmotifatlas --bayespairing --func D | ||
132 | ``` | 94 | ``` |
133 | -Use `-d rna3dmotif` or `-d 3dmotifatlas` depending on the module source you are planning to use. | ||
134 | -This is a quite long step, but the bayesian networks will be ready for all the future uses. | ... | ... |
-
Please register or login to post a comment