Louis BECQUEY

Cleaned benchmark.py and installation

FROM ubuntu:bionic
# You can pick the Ubuntu version that suits you instead, according to the version of the boost libraries
# that you are using to compile biorseo.
#
# Typically, on the machine where you typed 'make', check :
# ls /usr/lib/libboost_filesystem.so.*
# this will give you the file name of your boost library, including the version number.
# Use the docker basis image of the Ubuntu which has this version of boost in the apt sources.
FROM ubuntu:focal
# installing dependencies
# compiled biorseo
COPY . /biorseo/
# Install runtime dependencies
RUN apt-get update -yq && \
apt-get upgrade -y && \
apt-get install -y python3-dev python3-pip openjdk-11-jre libgsl23 libgslcblas0 libboost-program-options-dev libboost-filesystem-dev && \
apt-get install -y libboost-program-options-dev libboost-filesystem-dev && \
rm -rf /var/lib/apt/lists/*
# compiled biorseo
COPY . /biorseo
# ViennaRNA installer
ADD "https://www.tbi.univie.ac.at/RNA/download/ubuntu/ubuntu_18_04/viennarna_2.4.14-1_amd64.deb" /
# jar3d archive
ADD http://rna.bgsu.edu/data/jar3d/models/jar3d_2014-12-11.jar /
# install codes
RUN dpkg -i /viennarna_2.4.14-1_amd64.deb && \
apt-get install -f && \
\
pip3 install networkx numpy regex wrapt biopython /biorseo/BayesPairing && \
\
cd / && \
rm -rf /biorseo/BayesPairing /ViennaRNA-2.4.13 /ViennaRNA-2.4.13.tar.gz
WORKDIR /biorseo
\ No newline at end of file
......
......@@ -14,10 +14,9 @@ sudo apt update && sudo apt install docker-ce docker-ce-cli containerd.io
```
### Download and install the RNA motifs data files:
* Move your JSON-formatted or CSV-formatted files containing motifs in the folder.
* If you use Rna3Dmotifs, you need to get RNA-MoIP's .DESC dataset: download it from [GitHub](https://github.com/McGill-CSB/RNAMoIP/blob/master/CATALOGUE.tgz). Put all the .desc from the `Non_Redundant_DESC` folder into `./data/modules/DESC`. Otherwise, you also can run Rna3Dmotifs' `catalog` program to get your own DESC modules collection from updated 3D data (download [Rna3Dmotifs](https://rna3dmotif.lri.fr/Rna3Dmotif.tgz)). You also need to move the final DESC files into `./data/modules/DESC`.
* Get the latest version of the HL and IL module models from the [BGSU website](http://rna.bgsu.edu/data/jar3d/models/) and extract the Zip files. Put the HL and IL folders from inside the Zip files into `./data/modules/BGSU`. Note that only the latest Zip is required.
### Download the docker image from Docker Hub
`docker pull persalteas/biorseo:latest`
......@@ -31,9 +30,9 @@ $ docker run
persalteas/biorseo
yourexamplejobcommandhere
```
You can replace \`pwd\` by the full path of the biorseo/ root folder. Here we launch the biorseo image with 4 volumes : A first to give BiORSEO access to the module files, a second to give it access to your input file(s), a third for your trained BayesPairing, and a last for it to output the result files of your job. Considering you place your input file 'MyFastaFile.fa' into the `data/fasta` folder, an example job command can be ` ./biorseo.py -i /biorseo/data/fasta/myFastaFile.fa --rna3dmotifs --patternmatch --func B`, so the full run command would be
You can replace \`pwd\` by the full path of the biorseo/ root folder. Here we launch the biorseo image with 4 volumes : A first to give BiORSEO access to the module files, a second to give it access to your input file(s), a third for your trained BayesPairing, and a last for it to output the result files of your job. Considering you place your input file 'MyFastaFile.fa' into the `data/fasta` folder, an example job command can be ` ./biorseo.py -i /biorseo/data/fasta/myFastaFile.fa --rna3dmotifs --func B`, so the full run command would be
```
$ docker run -v `pwd`/data/modules:/modules -v `pwd`/data/fasta:/biorseo/data/fasta -v `pwd`/results:/biorseo/results persalteas/biorseo ./biorseo.py -i /biorseo/data/fasta/applications.fa --rna3dmotifs --patternmatch --func B
$ docker run -v `pwd`/data/modules:/modules -v `pwd`/data/fasta:/biorseo/data/fasta -v `pwd`/results:/biorseo/results persalteas/biorseo ./bin/biorseo -s /biorseo/data/fasta/applications.fa --descfolder /biorseo/data/modules/DESC --func B -v
```
Note that the paths to the input and output files are paths *inside the Docker container*, and those paths are mounted to folders of the host machine with -v options.
......@@ -83,12 +82,11 @@ If you use Rna3Dmotifs, you need to get RNA-MoIP's .DESC dataset: download it fr
* Check if the executable file exists: `./bin/biorseo --version`.
### RUN BIORSEO
Now you can run biorseo.py, but, as you are not into the Docker environment, you MUST provide the options to tell it the jar3d or BayesPairing locations, for example:
Now you can run biorseo, but, as you are not into the Docker environment, you MUST provide the options to tell it the jar3d or BayesPairing locations, for example:
```
$ ./biorseo.py
-i ./data/fasta/applications.fa
-O ./results/
--rna3dmotifs --patternmatch --func B
--biorseo-dir /FULL/path/to/the/root/biorseo/dir
--modules-path=./data/modules/DESC
$ ./bin/biorseo
-s ./data/fasta/applications.fa
-o result.bi
--func B
--descfolder=./data/modules/DESC
```
......
......@@ -273,15 +273,15 @@ class RNA:
# print(filename, "not found !")
def load_biokop_results(self):
filename = outputDir+"PK/"+basename+".biok"
filename = outputDir+"PK/"+self.basename+".biok"
if path.isfile(filename):
rna = open(filename, "r")
lines = rna.readlines()
rna.close()
for i in range(2, len(lines)):
ss = lines[i].split(' ')[0].split('\t')[0]
method.predictions.append(ss)
method.ninsertions.append(lines[i].count('+'))
ss = lines[i].split('\t')[0]
self.get_method("Biokop-mode").predictions.append(ss)
self.get_method("Biokop-mode").ninsertions.append(0)
def load_results(self, include_noPK=False):
if "Biokop-mode" in self.meth_idx.keys():
......@@ -905,28 +905,24 @@ if __name__ == '__main__':
colors = [
'#911eb4', #purple
'#000075', #navy
'#ffe119', '#ffe119', # yellow
'#e6194B', '#e6194B', #red
'#3cb44b', '#3cb44b', #green
'#4363d8', '#4363d8', #blue
'#3cb44b', '#3cb44b', #green
]
def plot_best_MCCs(x_noPK_fully, x_PK_fully, x_pseudobase_fully):
print("Best MCCs...")
labels = [
"Biokop-mode\n", "RNAsubopt",
"$f_{1A}$", "$f_{1B}$",
"$f_{1A}$", "$f_{1B}$",
"Biokop\nmode", "RNA\nsubopt",
"$f_{1A}$", "$f_{1B}$",
"$f_{1A}$", "$f_{1B}$",
"$f_{1A}$", "$f_{1B}$",
]
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10,5), dpi=150)
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(7,5), dpi=150)
fig.suptitle(" \n ")
fig.subplots_adjust(left=0.1, right=0.97, top=0.83, bottom=0.05)
fig.subplots_adjust(left=0.18, right=0.97, top=0.83, bottom=0.05)
# Line 1 : no Pseudoknots
......@@ -949,10 +945,10 @@ if __name__ == '__main__':
axes[0].set_ylabel("(A)\nmax MCC\n(%d RNAs)" % (len(x_noPK_fully[0])), fontsize=12)
# Line 2 : Pseudoknots supported
xpos = [ 0 ] + [ i for i in range(4,20) ]
xpos = [ 0 ] + [ 1+i for i in range(1, len(x_PK_fully)) ]
vplot = axes[1].violinplot(x_PK_fully, showmeans=False, showmedians=False, showextrema=False,
points=len(x_PK_fully[0]), positions=xpos)
for patch, color in zip(vplot['bodies'], colors[:1] + colors[4:]):
for patch, color in zip(vplot['bodies'], [colors[0]] + colors[2:]):
patch.set_facecolor(color)
patch.set_edgecolor(color)
patch.set_alpha(0.5)
......@@ -986,13 +982,13 @@ if __name__ == '__main__':
for ax in axes:
ax.set_ylim((0.0, 1.01))
ax.set_xlim((-1, 20))
ax.set_xlim((-1, 8))
yticks = [ i/10 for i in range(0, 11, 2) ]
ax.set_yticks(yticks)
for y in yticks:
ax.axhline(y=y, color="grey", linestyle="--", linewidth=1)
ax.tick_params(top=False, bottom=False, labeltop=False, labelbottom=False)
ax.set_xticks([i for i in range(20)])
ax.set_xticks([i for i in range(8)])
axes[0].tick_params(top=True, bottom=False, labeltop=True, labelbottom=False)
axes[0].set_xticklabels(labels)
for i, tick in enumerate(axes[0].xaxis.get_major_ticks()):
......@@ -1006,9 +1002,9 @@ if __name__ == '__main__':
# Figure : number of solutions
print("Number of solutions...")
plt.figure(figsize=(9,2.5), dpi=80)
plt.figure(figsize=(5,3), dpi=80)
plt.suptitle(" \n ")
plt.subplots_adjust(left=0.05, right=0.97, top=0.6, bottom=0.05)
plt.subplots_adjust(left=0.1, right=0.97, top=0.72, bottom=0.05)
xpos = [ x for x in range(len(n)) ]
for y in [ 10*x for x in range(8) ]:
plt.axhline(y=y, color="grey", linestyle="-", linewidth=0.5)
......@@ -1019,24 +1015,15 @@ if __name__ == '__main__':
patch.set_edgecolor(color)
patch.set_alpha(0.5)
labels = [
"Biokop",
"RNAsubopt","RNA-MoIP\n1by1", "RNA-MoIP\nchunk",
"Biokop\nmode", "RNA\nsubopt",
"$f_{1A}$", "$f_{1B}$",
"$f_{1A}$", "$f_{1B}$", "$f_{1C}$", "$f_{1D}$",
"$f_{1A}$", "$f_{1B}$", "$f_{1C}$", "$f_{1D}$",
"$f_{1A}$", "$f_{1B}$", "$f_{1C}$", "$f_{1D}$",
"$f_{1A}$", "$f_{1B}$",
"$f_{1A}$", "$f_{1B}$"
]
plt.xlim((-1,20))
plt.xlim((-1,8))
plt.tick_params(top=False, bottom=False, labeltop=False, labelbottom=False)
plt.xticks([ i for i in range(len(labels))], labels)
plt.tick_params(top=True, bottom=False, labeltop=True, labelbottom=False)
for i, tick in enumerate(plt.gca().xaxis.get_major_ticks()):
if i<4: # Reduce size of RNA-MoIP labels to stay readable
# tick.label2.set_fontsize(8)
tick.label2.set_rotation(90)
else:
tick.label2.set_fontsize(12)
plt.yticks([ 20*x for x in range(3) ])
plt.ylim((0,40))
plt.savefig("number_of_solutions.png")
......@@ -1044,11 +1031,11 @@ if __name__ == '__main__':
# Figure : max number of insertions and ratio
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(10,4), dpi=80)
fig.suptitle(" \n ")
fig.subplots_adjust(left=0.09, right=0.99, top=0.7, bottom=0.05)
fig.subplots_adjust(left=0.09, right=0.99, top=0.85, bottom=0.05)
# Figure : max inserted
print("Max inserted...")
xpos = [ x for x in range(18) ]
xpos = [ x for x in range(len(max_i)) ]
axes[0].set_yticks([ 5*x for x in range(3) ])
for y in [ 2*x for x in range(7) ]:
axes[0].axhline(y=y, color="grey", linestyle="-", linewidth=0.5)
......@@ -1061,14 +1048,13 @@ if __name__ == '__main__':
# Figure : insertion ratio
print("Ratio of insertions...")
xpos = [ 0 ] + [ x for x in range(2, 1+len(r)) ]
axes[1].set_ylim((-0.01, 1.01))
yticks = [ 0, 0.5, 1.0 ]
axes[1].set_yticks(yticks)
for y in yticks:
axes[1].axhline(y=y, color="grey", linestyle="-", linewidth=0.5)
vplot = axes[1].violinplot(r, showmeans=False, showmedians=False, showextrema=False, points=len(r[0]), positions=xpos)
for patch, color in zip(vplot['bodies'], [colors[2]] + colors[4:]):
for patch, color in zip(vplot['bodies'], colors[2:]):
patch.set_facecolor(color)
patch.set_edgecolor(color)
patch.set_alpha(0.5)
......@@ -1078,21 +1064,15 @@ if __name__ == '__main__':
labels = labels[2:]
for ax in axes:
ax.set_xlim((-1,18))
ax.set_xlim((-1,6))
ax.tick_params(top=False, bottom=False, labeltop=False, labelbottom=False)
ax.set_xticks([ i for i in range(18)])
ax.set_xticks([ i for i in range(6)])
axes[0].tick_params(top=True, bottom=False, labeltop=True, labelbottom=False)
axes[0].set_xticklabels(labels)
for i, tick in enumerate(axes[0].xaxis.get_major_ticks()):
if i<2: # Reduce size of RNA-MoIP labels to stay readable
# tick.label2.set_fontsize(9)
tick.label2.set_rotation(90)
else:
tick.label2.set_fontsize(12)
tick.label2.set_fontsize(12)
plot_best_MCCs(x_noPK_fully, x_PK_fully, x_pseudobase_fully)
plt.savefig("best_MCCs.png")
plot_more_info()
plt.savefig("detailed_stats.png")
compare_subopt_MoIP()
plt.savefig("compare_subopt_MOIP.png")
......
This diff is collapsed. Click to expand it.
......@@ -3,16 +3,13 @@
echo "WARNING: The purpose of this file is to document how the docker image was built.";
echo "You cannot execute it directly, because of licensing reasons. Please get your own:";
echo "- CPLEX academic version: cplex_installer_12.8_Student.bin";
echo "- Nupack header files: nupack_3.2.2.tar.gz";
exit 0;
cd ../
THISDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
####################################################### Dependencies ##############################################################
sudo apt install -y clang-7 cmake make automake libboost-program-options-dev libboost-filesystem-dev openjdk-11-jre
sudo update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-7 100
sudo update-alternatives --install /usr/bin/clang clang /usr/bin/clang-7 100
sudo apt install -y make automake libgsl-dev libmpfr-dev libeigen3-dev libboost-program-options-dev libboost-filesystem-dev
# CPLEX: only to build biorseo
# HERE YOU SHOULD GET YOUR OWN cplex_installer_12.8_Student.bin ! I am not allowed to share mine anymore.
......@@ -20,39 +17,20 @@ chmod +x cplex_installer_12.8_Student.bin
printf "4\n\n1\n\n\n\n\n" | sudo ./cplex_installer_12.8_Student.bin
rm cplex_installer_12.8_Student.bin
# Eigen: only to build biorseo (no need to give it to the docker image)
wget http://bitbucket.org/eigen/eigen/get/3.3.7.tar.gz -O eigen_src.tar.gz
tar -xf eigen_src.tar.gz
cd eigen-eigen-323c052e1731
mkdir build
cd build
cmake ..
sudo make install
cd ../..
rm -rf eigen_src.tar.gz eigen-eigen-323c052e1731
# Nupack: only to build biorseo (no need to give it to the docker image)
#curl -u yourname@yourUni.com:yourPassword http://www.nupack.org/downloads/serve_file/nupack3.2.2.tar.gz --output nupack3.2.2.tar.gz
tar -xf nupack3.2.2.tar.gz
cd nupack3.2.2
mkdir build
cd build
cmake ..
make -j8
# ViennaRNA (to build Biorseo with libRNA)
wget https://www.tbi.univie.ac.at/RNA/download/sourcecode/2_5_x/ViennaRNA-2.5.0.tar.gz
tar xzf ViennaRNA-2.5.0.tar.gz
cd ViennaRNA-2.5.0
./configure
make -j 8
sudo make install
cd ../..
sudo cp nupack3.2.2/src/thermo/*.h /usr/local/include/nupack/thermo/
rm -rf nupack3.2.2.tar.gz nupack3.2.2/
# BayesPairing: install on the docker image (done by the Dockerfile)
git clone http://jwgitlab.cs.mcgill.ca/sarrazin/rnabayespairing.git BayesPairing
######################################################### Build Biorseo ###########################################################
# build here, install later on the docker image (done by the Dockerfile)
mkdir -p results
make -j 8
make clean
rm -rf doc/ obj/
rm -rf obj/ figures/
######################################################## Build Docker container ##################################################
# Execute the Dockerfile and build the image
......