Cleaned benchmark.py and installation

Louis BECQUEY
Commit b01c7f7763d75741a535492200221a7fbba36b82 b01c7f77 1 parent d73e4a4c
Showing 5 changed files with 53 additions and 102 deletions
Dockerfile
INSTALL.md
benchmark.py
data/modules/ISAURE/benchmark_16-06-2021.json
scripts/build_BiORSEO_docker_image_ubuntu18.sh
--- a/Dockerfile
View file @b01c7f7
+++ b/Dockerfile
View file @b01c7f7
- FROM ubuntu:bionic
+ # You can pick the Ubuntu version that suits you instead, according to the version of the boost libraries
+ # that you are using to compile biorseo.
+ #
+ # Typically, on the machine where you typed 'make', check :
+ # ls /usr/lib/libboost_filesystem.so.*
+ # this will give you the file name of your boost library, including the version number.
+ # Use the docker basis image of the Ubuntu which has this version of boost in the apt sources.
+ FROM ubuntu:focal
 
- # installing dependencies
+ # compiled biorseo
+ COPY . /biorseo/
+ 
+ # Install runtime dependencies
 RUN apt-get update -yq && \
     apt-get upgrade -y && \
-     apt-get install -y python3-dev python3-pip openjdk-11-jre libgsl23 libgslcblas0 libboost-program-options-dev libboost-filesystem-dev && \
+     apt-get install -y libboost-program-options-dev libboost-filesystem-dev && \
     rm -rf /var/lib/apt/lists/*
 
- # compiled biorseo
- COPY . /biorseo 
- # ViennaRNA installer
- ADD "https://www.tbi.univie.ac.at/RNA/download/ubuntu/ubuntu_18_04/viennarna_2.4.14-1_amd64.deb" /
- # jar3d archive
- ADD http://rna.bgsu.edu/data/jar3d/models/jar3d_2014-12-11.jar /
- 
- # install codes
- RUN dpkg -i /viennarna_2.4.14-1_amd64.deb && \
-     apt-get install -f          && \
-     \
-     pip3 install networkx numpy regex wrapt biopython /biorseo/BayesPairing && \
-     \
-     cd / && \
-     rm -rf /biorseo/BayesPairing /ViennaRNA-2.4.13 /ViennaRNA-2.4.13.tar.gz
 WORKDIR /biorseo
\ No newline at end of file
--- a/INSTALL.md
View file @b01c7f7
+++ b/INSTALL.md
View file @b01c7f7
@@ -14,10 +14,9 @@ sudo apt update && sudo apt install docker-ce docker-ce-cli containerd.io
 ```
 
 ### Download and install the RNA motifs data files:
+ * Move your JSON-formatted or CSV-formatted files containing motifs in the folder.
 * If you use Rna3Dmotifs, you need to get RNA-MoIP's .DESC dataset: download it from [GitHub](https://github.com/McGill-CSB/RNAMoIP/blob/master/CATALOGUE.tgz). Put all the .desc from the `Non_Redundant_DESC` folder into `./data/modules/DESC`. Otherwise, you also can run Rna3Dmotifs' `catalog` program to get your own DESC modules collection from updated 3D data (download [Rna3Dmotifs](https://rna3dmotif.lri.fr/Rna3Dmotif.tgz)). You also need to move the final DESC files into `./data/modules/DESC`.
 
- * Get the latest version of the HL and IL module models from the [BGSU website](http://rna.bgsu.edu/data/jar3d/models/) and extract the Zip files. Put the HL and IL folders from inside the Zip files into `./data/modules/BGSU`. Note that only the latest Zip is required.
- 
 ### Download the docker image from Docker Hub
 `docker pull persalteas/biorseo:latest`
 
@@ -31,9 +30,9 @@ $ docker run
 persalteas/biorseo 
 yourexamplejobcommandhere
 ```
- You can replace \`pwd\` by the full path of the biorseo/ root folder. Here we launch the biorseo image with 4 volumes : A first to give BiORSEO access to the module files, a second to give it access to your input file(s), a third for your trained BayesPairing, and a last for it to output the result files of your job. Considering you place your input file 'MyFastaFile.fa' into the `data/fasta` folder, an example job command can be ` ./biorseo.py -i /biorseo/data/fasta/myFastaFile.fa  --rna3dmotifs --patternmatch --func B`, so the full run command would be 
+ You can replace \`pwd\` by the full path of the biorseo/ root folder. Here we launch the biorseo image with 4 volumes : A first to give BiORSEO access to the module files, a second to give it access to your input file(s), a third for your trained BayesPairing, and a last for it to output the result files of your job. Considering you place your input file 'MyFastaFile.fa' into the `data/fasta` folder, an example job command can be ` ./biorseo.py -i /biorseo/data/fasta/myFastaFile.fa  --rna3dmotifs --func B`, so the full run command would be 
 ```
- $ docker run -v `pwd`/data/modules:/modules -v `pwd`/data/fasta:/biorseo/data/fasta -v `pwd`/results:/biorseo/results persalteas/biorseo ./biorseo.py -i /biorseo/data/fasta/applications.fa --rna3dmotifs --patternmatch --func B
+ $ docker run -v `pwd`/data/modules:/modules -v `pwd`/data/fasta:/biorseo/data/fasta -v `pwd`/results:/biorseo/results persalteas/biorseo ./bin/biorseo -s /biorseo/data/fasta/applications.fa --descfolder /biorseo/data/modules/DESC --func B -v
 ```
 
 Note that the paths to the input and output files are paths *inside the Docker container*, and those paths are mounted to folders of the host machine with -v options.
@@ -83,12 +82,11 @@ If you use Rna3Dmotifs, you need to get RNA-MoIP's .DESC dataset: download it fr
 * Check if the executable file exists: `./bin/biorseo --version`.
 
 ### RUN BIORSEO
- Now you can run biorseo.py, but, as you are not into the Docker environment, you MUST provide the options to tell it the jar3d or BayesPairing locations, for example:
+ Now you can run biorseo, but, as you are not into the Docker environment, you MUST provide the options to tell it the jar3d or BayesPairing locations, for example:
 ```
- $ ./biorseo.py 
- -i ./data/fasta/applications.fa 
- -O ./results/
- --rna3dmotifs --patternmatch --func B 
- --biorseo-dir /FULL/path/to/the/root/biorseo/dir
- --modules-path=./data/modules/DESC 
+ $ ./bin/biorseo
+ -s ./data/fasta/applications.fa 
+ -o result.bi
+ --func B 
+ --descfolder=./data/modules/DESC 
 ```
--- a/benchmark.py
View file @b01c7f7
+++ b/benchmark.py
View file @b01c7f7
@@ -273,15 +273,15 @@ class RNA:
 		#     print(filename, "not found !")
 
 	def load_biokop_results(self):
- 		filename = outputDir+"PK/"+basename+".biok"
+ 		filename = outputDir+"PK/"+self.basename+".biok"
 		if path.isfile(filename):
 			rna = open(filename, "r")
 			lines = rna.readlines()
 			rna.close()
 			for i in range(2, len(lines)):
- 				ss = lines[i].split(' ')[0].split('\t')[0]
- 				method.predictions.append(ss)
- 				method.ninsertions.append(lines[i].count('+'))
+ 				ss = lines[i].split('\t')[0]
+ 				self.get_method("Biokop-mode").predictions.append(ss)
+ 				self.get_method("Biokop-mode").ninsertions.append(0)
 
 	def load_results(self, include_noPK=False):
 		if "Biokop-mode" in self.meth_idx.keys():
@@ -905,28 +905,24 @@ if __name__ == '__main__':
 	colors = [
 		'#911eb4', #purple
 		'#000075', #navy
- 		'#ffe119', '#ffe119', # yellow
 		'#e6194B', '#e6194B', #red
- 		'#3cb44b', '#3cb44b', #green
 		'#4363d8', '#4363d8', #blue
+ 		'#3cb44b', '#3cb44b', #green
 	]
 
 	def plot_best_MCCs(x_noPK_fully, x_PK_fully, x_pseudobase_fully):
 
 		print("Best MCCs...")
 		labels = [
- 			"Biokop-mode\n", "RNAsubopt",
- 			"$f_{1A}$", "$f_{1B}$",
- 			"$f_{1A}$", "$f_{1B}$",
+ 			"Biokop\nmode", "RNA\nsubopt",
 			"$f_{1A}$", "$f_{1B}$",
 			"$f_{1A}$", "$f_{1B}$",
 			"$f_{1A}$", "$f_{1B}$",
 		]
 
- 		fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10,5), dpi=150)
+ 		fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(7,5), dpi=150)
 		fig.suptitle(" \n ")
- 		fig.subplots_adjust(left=0.1, right=0.97, top=0.83, bottom=0.05)
- 
+ 		fig.subplots_adjust(left=0.18, right=0.97, top=0.83, bottom=0.05)
 
 
 		# Line 1 : no Pseudoknots
@@ -949,10 +945,10 @@ if __name__ == '__main__':
 		axes[0].set_ylabel("(A)\nmax MCC\n(%d RNAs)" % (len(x_noPK_fully[0])), fontsize=12)
 
 		# Line 2 : Pseudoknots supported
- 		xpos = [ 0 ] + [ i for i in range(4,20) ]
+ 		xpos = [ 0 ] + [ 1+i for i in range(1, len(x_PK_fully)) ]
 		vplot = axes[1].violinplot(x_PK_fully, showmeans=False, showmedians=False, showextrema=False,
 								   points=len(x_PK_fully[0]), positions=xpos)
- 		for patch, color in zip(vplot['bodies'], colors[:1] + colors[4:]):
+ 		for patch, color in zip(vplot['bodies'], [colors[0]] + colors[2:]):
 			patch.set_facecolor(color)
 			patch.set_edgecolor(color)
 			patch.set_alpha(0.5)
@@ -986,13 +982,13 @@ if __name__ == '__main__':
 
 		for ax in axes:
 			ax.set_ylim((0.0, 1.01))
- 			ax.set_xlim((-1, 20))
+ 			ax.set_xlim((-1, 8))
 			yticks = [ i/10 for i in range(0, 11, 2) ]
 			ax.set_yticks(yticks)
 			for y in yticks:
 				ax.axhline(y=y, color="grey", linestyle="--", linewidth=1)
 			ax.tick_params(top=False, bottom=False, labeltop=False, labelbottom=False)
- 			ax.set_xticks([i for i in range(20)])
+ 			ax.set_xticks([i for i in range(8)])
 		axes[0].tick_params(top=True, bottom=False, labeltop=True, labelbottom=False)
 		axes[0].set_xticklabels(labels)
 		for i, tick in enumerate(axes[0].xaxis.get_major_ticks()):
@@ -1006,9 +1002,9 @@ if __name__ == '__main__':
 
 		# Figure : number of solutions
 		print("Number of solutions...")
- 		plt.figure(figsize=(9,2.5), dpi=80)
+ 		plt.figure(figsize=(5,3), dpi=80)
 		plt.suptitle(" \n ")
- 		plt.subplots_adjust(left=0.05, right=0.97, top=0.6, bottom=0.05)
+ 		plt.subplots_adjust(left=0.1, right=0.97, top=0.72, bottom=0.05)
 		xpos = [ x for x in range(len(n)) ]
 		for y in [ 10*x for x in range(8) ]:
 			plt.axhline(y=y, color="grey", linestyle="-", linewidth=0.5)
@@ -1019,24 +1015,15 @@ if __name__ == '__main__':
 			patch.set_edgecolor(color)
 			patch.set_alpha(0.5)
 		labels = [
- 			"Biokop",
- 			"RNAsubopt","RNA-MoIP\n1by1", "RNA-MoIP\nchunk",
+ 			"Biokop\nmode", "RNA\nsubopt",
 			"$f_{1A}$", "$f_{1B}$",
- 			"$f_{1A}$", "$f_{1B}$", "$f_{1C}$", "$f_{1D}$",
- 			"$f_{1A}$", "$f_{1B}$", "$f_{1C}$", "$f_{1D}$",
- 			"$f_{1A}$", "$f_{1B}$", "$f_{1C}$", "$f_{1D}$",
+ 			"$f_{1A}$", "$f_{1B}$", 
 			"$f_{1A}$", "$f_{1B}$"
 		]
- 		plt.xlim((-1,20))
+ 		plt.xlim((-1,8))
 		plt.tick_params(top=False, bottom=False, labeltop=False, labelbottom=False)
 		plt.xticks([ i for i in range(len(labels))], labels)
 		plt.tick_params(top=True, bottom=False, labeltop=True, labelbottom=False)
- 		for i, tick in enumerate(plt.gca().xaxis.get_major_ticks()):
- 			if i<4: # Reduce size of RNA-MoIP labels to stay readable
- 				# tick.label2.set_fontsize(8)
- 				tick.label2.set_rotation(90)
- 			else:
- 				tick.label2.set_fontsize(12)
 		plt.yticks([ 20*x for x in range(3) ])
 		plt.ylim((0,40))
 		plt.savefig("number_of_solutions.png")
@@ -1044,11 +1031,11 @@ if __name__ == '__main__':
 		# Figure : max number of insertions and ratio
 		fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(10,4), dpi=80)
 		fig.suptitle(" \n ")
- 		fig.subplots_adjust(left=0.09, right=0.99, top=0.7, bottom=0.05)
+ 		fig.subplots_adjust(left=0.09, right=0.99, top=0.85, bottom=0.05)
 		
 		# Figure : max inserted
 		print("Max inserted...")
- 		xpos = [ x for x in range(18) ]
+ 		xpos = [ x for x in range(len(max_i)) ]
 		axes[0].set_yticks([ 5*x for x in range(3) ])
 		for y in [ 2*x for x in range(7) ]:
 			axes[0].axhline(y=y, color="grey", linestyle="-", linewidth=0.5)
@@ -1061,14 +1048,13 @@ if __name__ == '__main__':
 
 		# Figure : insertion ratio
 		print("Ratio of insertions...")
- 		xpos = [ 0 ] + [ x for x in range(2, 1+len(r)) ]
 		axes[1].set_ylim((-0.01, 1.01))
 		yticks = [ 0, 0.5, 1.0 ]
 		axes[1].set_yticks(yticks)
 		for y in yticks:
 			axes[1].axhline(y=y, color="grey", linestyle="-", linewidth=0.5)
 		vplot = axes[1].violinplot(r, showmeans=False, showmedians=False, showextrema=False, points=len(r[0]), positions=xpos)
- 		for patch, color in zip(vplot['bodies'], [colors[2]] + colors[4:]):
+ 		for patch, color in zip(vplot['bodies'], colors[2:]):
 			patch.set_facecolor(color)
 			patch.set_edgecolor(color)
 			patch.set_alpha(0.5)
@@ -1078,21 +1064,15 @@ if __name__ == '__main__':
 
 		labels = labels[2:]
 		for ax in axes:
- 			ax.set_xlim((-1,18))
+ 			ax.set_xlim((-1,6))
 			ax.tick_params(top=False, bottom=False, labeltop=False, labelbottom=False)
- 			ax.set_xticks([ i for i in range(18)])
+ 			ax.set_xticks([ i for i in range(6)])
 		axes[0].tick_params(top=True, bottom=False, labeltop=True, labelbottom=False)
 		axes[0].set_xticklabels(labels)
 		for i, tick in enumerate(axes[0].xaxis.get_major_ticks()):
- 			if i<2: # Reduce size of RNA-MoIP labels to stay readable
- 				# tick.label2.set_fontsize(9)
- 				tick.label2.set_rotation(90)
- 			else:
- 				tick.label2.set_fontsize(12)
+ 			tick.label2.set_fontsize(12)
 
 	plot_best_MCCs(x_noPK_fully, x_PK_fully, x_pseudobase_fully)
 	plt.savefig("best_MCCs.png")
 	plot_more_info()
 	plt.savefig("detailed_stats.png")
- 	compare_subopt_MoIP()
- 	plt.savefig("compare_subopt_MOIP.png")
--- a/data/modules/ISAURE/benchmark_16-06-2021.json deleted 100644 → 0
View file @d73e4a4
+++ b/data/modules/ISAURE/benchmark_16-06-2021.json deleted 100644 → 0
View file @d73e4a4
--- a/scripts/build_BiORSEO_docker_image_ubuntu18.sh
View file @b01c7f7
+++ b/scripts/build_BiORSEO_docker_image_ubuntu18.sh
View file @b01c7f7
@@ -3,16 +3,13 @@
 echo "WARNING: The purpose of this file is to document how the docker image was built.";
 echo "You cannot execute it directly, because of licensing reasons. Please get your own:";
 echo "- CPLEX academic version: cplex_installer_12.8_Student.bin";
- echo "- Nupack header files: nupack_3.2.2.tar.gz";
 exit 0;
 
 cd ../
 THISDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 
 ####################################################### Dependencies ##############################################################
- sudo apt install -y clang-7 cmake make automake libboost-program-options-dev libboost-filesystem-dev openjdk-11-jre
- sudo update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-7 100
- sudo update-alternatives --install /usr/bin/clang clang /usr/bin/clang-7 100
+ sudo apt install -y make automake libgsl-dev libmpfr-dev libeigen3-dev libboost-program-options-dev libboost-filesystem-dev
 
 # CPLEX: only to build biorseo
 # HERE YOU SHOULD GET YOUR OWN cplex_installer_12.8_Student.bin ! I am not allowed to share mine anymore.
@@ -20,39 +17,20 @@ chmod +x cplex_installer_12.8_Student.bin
 printf "4\n\n1\n\n\n\n\n" | sudo ./cplex_installer_12.8_Student.bin
 rm cplex_installer_12.8_Student.bin
 
- # Eigen: only to build biorseo (no need to give it to the docker image)
- wget http://bitbucket.org/eigen/eigen/get/3.3.7.tar.gz -O eigen_src.tar.gz
- tar -xf eigen_src.tar.gz
- cd eigen-eigen-323c052e1731
- mkdir build
- cd build
- cmake ..
- sudo make install
- cd ../..
- rm -rf eigen_src.tar.gz eigen-eigen-323c052e1731
- 
- # Nupack: only to build biorseo (no need to give it to the docker image)
- #curl -u yourname@yourUni.com:yourPassword http://www.nupack.org/downloads/serve_file/nupack3.2.2.tar.gz --output nupack3.2.2.tar.gz
- tar -xf nupack3.2.2.tar.gz
- cd nupack3.2.2
- mkdir build
- cd build
- cmake ..
- make -j8
+ # ViennaRNA (to build Biorseo with libRNA)
+ wget https://www.tbi.univie.ac.at/RNA/download/sourcecode/2_5_x/ViennaRNA-2.5.0.tar.gz
+ tar xzf ViennaRNA-2.5.0.tar.gz
+ cd ViennaRNA-2.5.0
+ ./configure
+ make -j 8
 sudo make install
- cd ../..
- sudo cp nupack3.2.2/src/thermo/*.h /usr/local/include/nupack/thermo/
- rm -rf nupack3.2.2.tar.gz nupack3.2.2/
- 
- # BayesPairing: install on the docker image (done by the Dockerfile)
- git clone http://jwgitlab.cs.mcgill.ca/sarrazin/rnabayespairing.git BayesPairing
 
 ######################################################### Build Biorseo ###########################################################
 # build here, install later on the docker image (done by the Dockerfile)
 mkdir -p results
 make -j 8
 make clean
- rm -rf doc/ obj/
+ rm -rf obj/ figures/
 
 ######################################################## Build Docker container ##################################################
 # Execute the Dockerfile and build the image