Louis BECQUEY

Cosmetics & doc update

...@@ -13,6 +13,7 @@ doc/*.fdb_latexmk ...@@ -13,6 +13,7 @@ doc/*.fdb_latexmk
13 13
14 # Compiled Object files 14 # Compiled Object files
15 obj/* 15 obj/*
16 +doc/*.pdf
16 17
17 # Executables 18 # Executables
18 bin/* 19 bin/*
......
1 +Installation
2 +==================================
3 +### CLONING
4 +* Clone this git repository : `git clone https://github.com/persalteas/biorseo.git` and `cd biorseo`.
5 +* Create folders for the modules you will use: `mkdir -p data/modules/`. If you plan to use several module sources, add subdirectories : `mkdir -p data/modules/DESC` and `mkdir -p data/modules/BGSU`
6 +
7 +### RNA3DMOTIFS DATA
8 +
9 +If you use Rna3Dmotifs, you need to get RNA-MoIP's .DESC dataset: download it from [GitHub](https://github.com/McGill-CSB/RNAMoIP/blob/master/CATALOGUE.tgz). Put all the .desc from the `Non_Redundant_DESC` folder into `./data/modules/DESC`. Otherwise, you also can run Rna3Dmotifs' `catalog` program to get your own DESC modules collection from updated 3D data (download [Rna3Dmotifs](https://rna3dmotif.lri.fr/Rna3Dmotif.tgz)). You also need to move the final DESC files into `./data/modules/DESC`.
10 +
11 +### THE RNA 3D MOTIF ATLAS DATA
12 +
13 +If not done during the installation of JAR3D, get the latest version of the HL and IL module models from the [BGSU website](http://rna.bgsu.edu/data/jar3d/models/) and extract the Zip files. Put the HL and IL folders from inside the Zip files into `./data/modules/BGSU`. Note that only the latest Zip is required.
14 +
15 +
16 +### DEPENDENCIES
17 +- Make sure you have Python 3.5+, Cmake, and a C++ compiler installed on your distribution. Please, it's 2019, use a recent one, we use the 2017 C++ standard. The compilation will not work with Ubuntu 16's GCC 5.4 for example. Tested with libstdc++-dev >= 6.0, so use GCC >=6.0 or Clang >= 6.0.
18 +- Install automake, libboost-program-options and libboost-filesystem.
19 +- Download and install [IBM ILOG Cplex optimization studio](https://www.ibm.com/analytics/cplex-optimizer), an academic account is required. The free version is too limited, you must register as academic. This is also free.
20 +- Download and install Eigen: Get the latest Eigen archive from http://eigen.tuxfamily.org. Unpack it, and install it.
21 +```bash
22 +wget http://bitbucket.org/eigen/eigen/get/3.3.7.tar.gz -O eigen_src.tar.gz
23 +tar -xf eigen_src.tar.gz
24 +cd eigen-eigen-323c052e1731
25 +mkdir build
26 +cd build
27 +cmake ..
28 +sudo make install
29 +```
30 +- Download and install NUPACK: Register on [Nupack's website](http://www.nupack.org/downloads/source), download the source, unpack it, build it, and install it:
31 +```bash
32 +wget http://www.nupack.org/downloads/serve_file/nupack3.2.2.tar.gz
33 +tar -xf nupack3.2.2.tar.gz
34 +cd nupack3.2.2
35 +mkdir build
36 +cd build
37 +cmake ..
38 +make -j4
39 +sudo make install
40 +```
41 +You will notice that the installation process is not complete, some of the headers are not well copied to /usr/local. Solve it manually:
42 +```
43 +sudo cp nupack3.2.2/src/thermo/*.h /usr/local/include/nupack/thermo/
44 +```
45 +### OPTIONAL DEPENDENCIES FOR USE OF JAR3D
46 +- Download and install RNAsubopt from the [ViennaRNA package](https://www.tbi.univie.ac.at/RNA/).
47 +- Download and install Java runtime (Tested with Java 10)
48 +- Download the latest JAR3D executable "*jar3d_releasedate.jar*" from [the BGSU website](http://rna.bgsu.edu/data/jar3d/models/).
49 +
50 +
51 +### OPTIONAL DEPENDENCIES FOR USE OF BAYESPAIRING
52 +- Download and install RNAfold from the [ViennaRNA package](https://www.tbi.univie.ac.at/RNA/) (if not already done at the previous step).
53 +- Make sure you have Python 3.5+ with packages networkx, numpy, regex, wrapt and biopython. You can install them with pip, you will need the python3-dev package to build them.
54 +- Clone the latest BayesPairing Git repo, and install it :
55 +```
56 +git clone http://jwgitlab.cs.mcgill.ca/sarrazin/rnabayespairing.git BayesPairing
57 +cd BayesPairing
58 +pip install .
59 +```
60 +
61 +### BUILDING
62 +* Edit the file `EditMe` to set the paths of the above dependencies and data. Fields that you will not use can be ignored (ex: bypdir if you do not use BayesPairing). Example of my setup:
63 + * CPLEXDir="/opt/ibm/ILOG/CPLEX_Studio128_Student"
64 + * IEIGEN="/usr/local/include/eigen3"
65 + * INUPACK="/usr/local/include/nupack"
66 + * jar3dexec="/nhome/siniac/lbecquey/Software/jar3dbin/jar3d_2014-12-11.jar"
67 + * bypdir="/nhome/siniac/lbecquey/Software/BayesPairing/bayespairing/src"
68 + * biorseoDir="/nhome/siniac/lbecquey/Software/biorseo"
69 +* You might want to edit `Makefile` if you are not using clang as compiler. For example, if you use g++, replace clang++ by g++.
70 +* Build it: `make -j4`
71 +* Check if the executable file exists: `./bin/biorseo --version`.
72 +
73 +### BAYESPAIRING USERS: PREPARE BAYESIAN NETWORKS
74 +We run an example job for it to build the bayesian networks of our modules.
75 +```
76 +cd rnabayespairing/src
77 +python3 parse_sequences.py -d rna3dmotif -seq ACACGGGGUAAGAGCUGAACGCAUCUAAGCUCGAAACCCACUUGGAAAAGAGACACCGCCGAGGUCCCGCGUACAAGACGCGGUCGAUAGACUCGGGGUGUGCGCGUCGAGGUAACGAGACGUUAAGCCCACGAGCACUAACAGACCAAAGCCAUCAU -ss ".................................................................((...............)xxxx(...................................................)xxx).............."
78 +```
79 +Use `-d rna3dmotif` or `-d 3dmotifatlas` depending on the module source you are planning to use.
80 +This is a quite long step, but the bayesian networks will be ready for all the future uses.
...@@ -37,6 +37,7 @@ $(OBJECTS): $(OBJDIR)/%.o : $(SRCDIR)/%.cpp $(INCLUDES) ...@@ -37,6 +37,7 @@ $(OBJECTS): $(OBJDIR)/%.o : $(SRCDIR)/%.cpp $(INCLUDES)
37 @echo "\033[00;32mCompiled "$<".\033[00m" 37 @echo "\033[00;32mCompiled "$<".\033[00m"
38 38
39 doc: mainpdf supppdf 39 doc: mainpdf supppdf
40 + @echo "\033[00;32mLaTeX documentation rendered.\033[00m"
40 41
41 mainpdf: doc/main_bioinformatics.tex doc/references.bib doc/bioinfo.cls doc/natbib.bst 42 mainpdf: doc/main_bioinformatics.tex doc/references.bib doc/bioinfo.cls doc/natbib.bst
42 cd doc; pdflatex -synctex=1 -interaction=nonstopmode -file-line-error main_bioinformatics 43 cd doc; pdflatex -synctex=1 -interaction=nonstopmode -file-line-error main_bioinformatics
......
...@@ -50,85 +50,45 @@ OBJECTIVE FUNCTIONS FOR THE MODULE INSERTION CRITERIA ...@@ -50,85 +50,45 @@ OBJECTIVE FUNCTIONS FOR THE MODULE INSERTION CRITERIA
50 50
51 - If you **might expect a pseudoknot, or don't know**: 51 - If you **might expect a pseudoknot, or don't know**:
52 * The most promising method is the use of direct pattern matching with Rna3Dmotifs and function B. But this method is sometimes subject to combinatorial explosion issues. If you have a long RNA or a large number of loops, don't use it. Example: 52 * The most promising method is the use of direct pattern matching with Rna3Dmotifs and function B. But this method is sometimes subject to combinatorial explosion issues. If you have a long RNA or a large number of loops, don't use it. Example:
53 - `./bin/biorseo -s PDB_00304.fa --descfolder ./data/modules/DESC --type B -o PDB_00304.rawB ` 53 + `./biorseo.py -i PDB_00304.fa --rna3dmotifs --patternmatch --func B`
54 54
55 * The use of the RNA 3D Motif Atlas placed by JAR3D and scored with function B is not subject to combinatorial issues, but performs a bit worse. It also returns less solutions. Example: 55 * The use of the RNA 3D Motif Atlas placed by JAR3D and scored with function B is not subject to combinatorial issues, but performs a bit worse. It also returns less solutions. Example:
56 - `./bin/biorseo -s PDB_00304.fa --jar3dcsv PDB_00304.sites.csv --type B -o PDB_00304.jar3dB` 56 + `./bin/biorseo -i PDB_00304.fa --3dmotifatlas --jar3d --func B`
57 57
58 58
59 4/ Installation 59 4/ Installation
60 ================================== 60 ==================================
61 -### DEPENDENCIES 61 +Check the file INSTALL.md for installation instructions.
62 -- Make sure you have Python 3.5+, Cmake, and a C++ compiler installed on your distribution.
63 -- Install automake and libboost-filesystem.
64 -- Download and install [IBM ILOG Cplex optimization studio](https://www.ibm.com/analytics/cplex-optimizer), an academic account is required. The free version is too limited, you must register as academic. This is also free.
65 -- Download and install Eigen: Get the latest Eigen archive from http://eigen.tuxfamily.org. Unpack it, and install it.
66 -```bash
67 -wget http://bitbucket.org/eigen/eigen/get/3.3.7.tar.gz -O eigen_src.tar.gz
68 -tar -xf eigen_src.tar.gz
69 -cd eigen-eigen-323c052e1731
70 -mkdir build
71 -cd build
72 -cmake ..
73 -sudo make install
74 -```
75 -- Download and install NUPACK: Register on [Nupack's website](http://www.nupack.org/downloads/source), download the source, unpack it, build it, and install it:
76 -```bash
77 -wget http://www.nupack.org/downloads/serve_file/nupack3.2.2.tar.gz
78 -tar -xf nupack3.2.2.tar.gz
79 -cd nupack3.2.2
80 -mkdir build
81 -cd build
82 -cmake ..
83 -make -j4
84 -sudo make install
85 -```
86 -
87 -### OPTIONAL DEPENDENCIES FOR USE OF JAR3D
88 -- Download and install RNAsubopt from the [ViennaRNA package](https://www.tbi.univie.ac.at/RNA/).
89 -- Download and install Java runtime (Tested with Java 10)
90 -- Download the latest JAR3D executable "*jar3d_releasedate.jar*", and latest IL and HL models from [here](http://rna.bgsu.edu/data/jar3d/models/).
91 - Note that only the latest version is required (not all the versions provided in the folders).
92 -
93 -### OPTIONAL DEPENDENCIES FOR USE OF BAYESPAIRING
94 -- Download and install RNAfold from the [ViennaRNA package](https://www.tbi.univie.ac.at/RNA/).
95 -- Make sure you have Python 3.5+ with packages networkx, numpy, regex, wrapt and biopython
96 -- Clone the latest BayesPairing Git repo, and install it :
97 -```
98 -git clone http://jwgitlab.cs.mcgill.ca/sarrazin/rnabayespairing.git BayesPairing
99 -cd BayesPairing
100 -pip install .
101 -```
102 62
103 -### RNA3DMOTIFS DATA 63 +5/ List of Options
104 - 64 +==================================
105 -If you use Rna3Dmotifs, you need to get RNA-MoIP's .DESC dataset: download it from [GitHub](https://github.com/McGill-CSB/RNAMoIP/blob/master/CATALOGUE.tgz). Put all the .desc from the `Non_Redundant_DESC` folder into `./data/modules/DESC`. Otherwise, you also can run Rna3Dmotifs' `catalog` program to get your own DESC modules collection from updated 3D data (download [Rna3Dmotifs](https://rna3dmotif.lri.fr/Rna3Dmotif.tgz)). You also need to move the final DESC files into `./data/modules/DESC`.
106 -
107 -### THE RNA 3D MOTIF ATLAS DATA
108 -
109 -If not done during the installation of JAR3D, get the latest version of the HL and IL module models from the [BGSU website](http://rna.bgsu.edu/data/jar3d/models/) and extract the Zip files. Put the HL and IL folders into `./data/modules/BGSU`.
110 -
111 -### BUILDING
112 -* Clone this git repository : `git clone https://github.com/persalteas/biorseo.git` and `cd biorseo`.
113 -* Edit the file `EditMe` to set the paths of the above dependencies and data. Fileds that you will not use can be ignored (ex: bypdir if you do not use BayesPairing). Example of my setup:
114 - * CPLEXDir="/opt/ibm/ILOG/CPLEX_Studio128_Student"
115 - * IEIGEN="/usr/local/include/eigen3"
116 - * INUPACK="/usr/local/include/nupack"
117 - * jar3dexec="/nhome/siniac/lbecquey/Software/jar3dbin/jar3d_2014-12-11.jar"
118 - * ILmotifDir="/nhome/siniac/lbecquey/Data/RNA/motifs/BGSU/Matlab_results/IL/3.2/lib"
119 - * HLmotifDir="/nhome/siniac/lbecquey/Data/RNA/motifs/BGSU/Matlab_results/HL/3.2/lib"
120 - * descfolder="/nhome/siniac/lbecquey/Data/RNA/motifs/Rna3Dmotifs/No_Redondance_DESC/"
121 - * bypdir="/nhome/siniac/lbecquey/Software/BayesPairing/bayespairing/src"
122 - * biorseoDir="/nhome/siniac/lbecquey/Software/biorseo"
123 -* You might want to edit `Makefile` if you are not using clang as compiler. For example, if you use g++, replace clang++ by g++.
124 -* Build it: `make -j4`
125 -* The working executable file is `./bin/biorseo`.
126 -
127 -### BAYESPAIRING USERS: PREPARE BAYESIAN NETWORKS
128 -We run an example job for it to build the bayesian networks of our modules.
129 ``` 65 ```
130 -cd rnabayespairing/src 66 +-h [ --help ] Print this help message
131 -python3 parse_sequences.py -d rna3dmotif -seq ACACGGGGUAAGAGCUGAACGCAUCUAAGCUCGAAACCCACUUGGAAAAGAGACACCGCCGAGGUCCCGCGUACAAGACGCGGUCGAUAGACUCGGGGUGUGCGCGUCGAGGUAACGAGACGUUAAGCCCACGAGCACUAACAGACCAAAGCCAUCAU -ss ".................................................................((...............)xxxx(...................................................)xxx).............." 67 +--version Print the program version
68 +-i [ --seq=… ] FASTA file with the query RNA sequence
69 +-p [ --patternmatch ] Use regular expressions to place modules in the sequence
70 +-j [ --jar3d ] Use JAR3D to place modules in the sequence (requires --3dmotifatlas)
71 +-b [ --bayespairing ] Use BayesPairing to place modules in the sequence
72 +-o [ --output=… ] Folder where to output files
73 +-f [ --func=… ] (A, B, C or D, default is B) Objective function to score module insertions:
74 + (A) insert big modules (B) insert light, high-order modules
75 + (c) insert modules which score well with the sequence
76 + (D) insert light, high-order modules which score well with the sequence.
77 + C and D require cannot be used with --patternmatch.
78 +-c [ --first-objective=… ] (default 1) Objective to solve in the mono-objective portions of the algorithm.
79 + (1) is the module objective given by --func, (2) is the expected accuracy of the structure.
80 +-l [ --limit=… ] (default 500) Intermediate number of solutions in the Pareto set from whichwe give up the computation.
81 +-t [ --theta=… ] (default 0.001) Pairing-probability threshold to consider or not the possibility of pairing
82 +-n [ --disable-pseudoknots ] Add constraints to explicitly forbid the formation of pseudoknots
83 +-v [ --verbose ] Print what is happening to stdout
84 +--modules-path=… Path to the modules data.
85 + The folder should contain modules in the DESC format as output by Djelloul & Denise's
86 + 'catalog' program for use with --rna3dmotifs, or should contain the IL/ and HL/ folders from a release of
87 + the RNA 3D Motif Atlasfor use with --3dmotifatlas.
88 + Consider placing these files on a fast I/O device (NVMe SSD, ...)
89 +
90 +Examples:
91 +biorseo.py -i myRNA.fa -o myResultsFolder/ --rna3dmotifs --patternmatch --func B
92 +biorseo.py -i myRNA.fa -o myResultsFolder/ --3dmotifatlas --jar3d --func B -l 800
93 +biorseo.py -i myRNA.fa --3dmotifatlas --bayespairing --func D
132 ``` 94 ```
133 -Use `-d rna3dmotif` or `-d 3dmotifatlas` depending on the module source you are planning to use.
134 -This is a quite long step, but the bayesian networks will be ready for all the future uses.
......