Louis BECQUEY

Removed support for BayesPairing & Jar3d

......@@ -18,27 +18,6 @@ sudo apt update && sudo apt install docker-ce docker-ce-cli containerd.io
* Get the latest version of the HL and IL module models from the [BGSU website](http://rna.bgsu.edu/data/jar3d/models/) and extract the Zip files. Put the HL and IL folders from inside the Zip files into `./data/modules/BGSU`. Note that only the latest Zip is required.
### Install and train BayesPairing
To use Bayespairing, you need to install it on the host machine to train it (build bayesian networks for every RNA motif in your database). It is already installed in the Docker container, but not trained, so you need to train it on your data and tell docker to mount it (see the docker run command below).
Make sure you have Python 3.5+ with packages networkx, numpy, regex, wrapt and biopython. You can install them with pip, on Linux you will need the python3-dev package to build them.
On Windows, on Mac : script coming soon
On Linux:
```
$ sudo -H pip3 install --upgrade pip
$ sudo -H pip3 install networkx numpy regex wrapt biopython
$ git clone http://jwgitlab.cs.mcgill.ca/sarrazin/rnabayespairing.git BayesPairing
$ cd BayesPairing
$ sudo -H pip3 install .
$ cd bayespairing/src
$ python3 parse_sequences.py -d rna3dmotif -seq ACACGGGGUAAGAGCUGAACGCAUCUAAGCUCGAAACCCACUUGGAAAAGAGACACCGCCGAGGUCCCGCGUACAAGACGCGGUCGAUAGACUCGGGGUGUGCGCGUCGAGGUAACGAGACGUUAAGCCCACGAGCACUAACAGACCAAAGCCAUCAU -ss ".................................................................((...............)xxxx(...................................................)xxx).............."
$ python3 parse_sequences.py -d 3dmotifatlas -seq ACACGGGGUAAGAGCUGAACGCAUCUAAGCUCGAAACCCACUUGGAAAAGAGACACCGCCGAGGUCCCGCGUACAAGACGCGGUCGAUAGACUCGGGGUGUGCGCGUCGAGGUAACGAGACGUUAAGCCCACGAGCACUAACAGACCAAAGCCAUCAU -ss ".................................................................((...............)xxxx(...................................................)xxx).............."
```
The training is quite long, but has to be run only once.
### Download the docker image from Docker Hub
`docker pull persalteas/biorseo:latest`
......@@ -48,14 +27,13 @@ Use the following command to run the docker image:
$ docker run
-v `pwd`/data/modules:/modules
-v `pwd`/data/fasta:/biorseo/data/fasta
-v `pwd`/BayesPairing/bayespairing:/byp
-v `pwd`/results:/biorseo/results
persalteas/biorseo
yourexamplejobcommandhere
```
You can replace \`pwd\` by the full path of the biorseo/ root folder. Here we launch the biorseo image with 4 volumes : A first to give BiORSEO access to the module files, a second to give it access to your input file(s), a third for your trained BayesPairing, and a last for it to output the result files of your job. Considering you place your input file 'MyFastaFile.fa' into the `data/fasta` folder, an example job command can be ` ./biorseo.py -i /biorseo/data/fasta/myFastaFile.fa --rna3dmotifs --patternmatch --func B`, so the full run command would be
```
$ docker run -v `pwd`/data/modules:/modules -v `pwd`/data/fasta:/biorseo/data/fasta -v `pwd`/BayesPairing/bayespairing:/byp -v `pwd`/results:/biorseo/results persalteas/biorseo ./biorseo.py -i /biorseo/data/fasta/applications.fa --rna3dmotifs --patternmatch --func B
$ docker run -v `pwd`/data/modules:/modules -v `pwd`/data/fasta:/biorseo/data/fasta -v `pwd`/results:/biorseo/results persalteas/biorseo ./biorseo.py -i /biorseo/data/fasta/applications.fa --rna3dmotifs --patternmatch --func B
```
Note that the paths to the input and output files are paths *inside the Docker container*, and those paths are mounted to folders of the host machine with -v options.
......@@ -68,16 +46,12 @@ Option 2 : Compile and Install from source (without docker, Linux only)
* Create folders for the modules you will use: `mkdir -p data/modules/`. If you plan to use several module sources, add subdirectories :
```bash
mkdir -p data/modules/DESC
mkdir -p data/modules/BGSU
mkdir -p data/modules/RIN
mkdir -p data/modules/ISAURE/Motifs_derniere_version
mkdir -p data/modules/DESC
mkdir -p data/modules/JSON
```
### RNA3DMOTIFS DATA
If you use Rna3Dmotifs, you need to get RNA-MoIP's .DESC dataset: download it from [GitHub](https://github.com/McGill-CSB/RNAMoIP/blob/master/CATALOGUE.tgz). Put all the .desc from the `Non_Redundant_DESC` folder into `./data/modules/DESC`. Otherwise, you also can run Rna3Dmotifs' `catalog` program to get your own DESC modules collection from updated 3D data (download [Rna3Dmotifs](https://rna3dmotif.lri.fr/Rna3Dmotif.tgz)). You also need to move the final DESC files into `./data/modules/DESC`.
### THE RNA 3D MOTIF ATLAS DATA
Get the latest version of the HL and IL module models from the [BGSU website](http://rna.bgsu.edu/data/jar3d/models/) and extract the Zip files. Put the HL and IL folders from inside the Zip files into `./data/modules/BGSU`. Note that only the latest Zip is required.
......@@ -91,52 +65,23 @@ python3 Install_CaRNAval_RINs.py
```
If you do not have the unzip command, download and extract manually the [CaRNAval dataset](http://carnaval.lri.fr/carnaval_dataset.zip) and place the files `RIN.py` and `CaRNAval_1_as_dictionnary.nxpickled` in the folder `data/modules/RIN/`, and run the python script.
### CONTACTS DATA
### RNA3DMOTIFS DATA (DEPRECATED)
If you use contacts, you need to put the motifs.json of Isaure in `data/modules/ISAURE`.
If you use Rna3Dmotifs, you need to get RNA-MoIP's .DESC dataset: download it from [GitHub](https://github.com/McGill-CSB/RNAMoIP/blob/master/CATALOGUE.tgz). Put all the .desc from the `Non_Redundant_DESC` folder into `./data/modules/DESC`. Otherwise, you also can run Rna3Dmotifs' `catalog` program to get your own DESC modules collection from updated 3D data (download [Rna3Dmotifs](https://rna3dmotif.lri.fr/Rna3Dmotif.tgz)). You also need to move the final DESC files into `./data/modules/DESC`.
### DEPENDENCIES
- Make sure you have Python 3.7+ and a C++ compiler (tested with GCC and clang) installed on your distribution. Use a recent one, we use the 2017 C++ standard. The compilation will not work with Ubuntu 16's GCC 5.4 for example.
- Install automake, libeigen3-dev, libboost-program-options-dev and libboost-filesystem-dev, or equivalent packages in your distribution (Eigen 3 and Boost headers).
- Download and install the [ViennaRNA package](https://www.tbi.univie.ac.at/RNA/). We are interested in the libRNA library to build Biorseo first, and then, Jar3D and BayesPairing may use some of the executables (RNAsubopt, RNAfold).
- If you use the pre-complied packages, you need both the "Core" package and the development files.
- If you compile it from source, extract the archive (`tar -xvzf ViennaRNA-2.4.15.tar.gz`), go into the folder (`cd ViennaRNA-2.4.15`), configure and build it (`./configure` and then `make -j 4`) and finally install it with root permissions (`sudo make install`). Everything should be fine. This takes ~15 min. Optionnally, you may want to install optional libraries GSL and MPFR before (libgsl-dev and libmpfr-dev on Ubuntu), for better performance.
- Download and install the [ViennaRNA package](https://www.tbi.univie.ac.at/RNA/). We use their libRNA C++ API to build Biorseo.
- If you use the pre-complied packages, you need the "Core" package and the development files.
- If you compile it from source, extract the archive (`tar -xvzf ViennaRNA-2.5.0.tar.gz`), go into the folder (`cd ViennaRNA-2.5.0`), configure and build it (`./configure` and then `make -j 4`) and finally install it with root permissions (`sudo make install`). Everything should be fine. This takes ~15 min. Optionnally, you may want to install optional libraries GSL and MPFR before (libgsl-dev and libmpfr-dev on Ubuntu), for better performance.
- Download and install [IBM ILOG Cplex optimization studio](https://www.ibm.com/analytics/cplex-optimizer), through their [academic initiative](https://www.ibm.com/academic/home). The Student and the Community Edition versions are fine, but the free version is too limited. Registering as academic is free. We actually don't use the Studio, but we use the development files to compile Biorseo.
### OPTIONAL DEPENDENCIES FOR USE OF JAR3D
- Download and install Java runtime (Tested with Java 14)
- Download the latest JAR3D executable "*jar3d_releasedate.jar*" from [the BGSU website](http://rna.bgsu.edu/data/jar3d/models/).
### OPTIONAL DEPENDENCIES FOR USE OF BAYESPAIRING
- Make sure you can `import RNA` in your favorite python3.X. Otherwise, there might be a ViennaRNA installation issue.
- Depending on your distribution, this might require that you add the /usr/local/lib/python3.X/site-packages/ folder, or the libRNA.so library, to your $PYTHONPATH variable. To do so, edit your `~/.bashrc` file and write `export PYTHONPATH="$PYTHONPATH:/usr/local/lib/python3.X/site-packages"` at the end of the file, replacing 3.X by your actual version of Python. Then close and re-open the terminal.
- On Ubuntu/Debian distros, which have stupid conventions and ignore the site-packages/ folder, you need to move the python package to the dist-packages/ folder. E.g., if you compiled ViennaRNA from source, `sudo mv /usr/local/lib/python3.X/site-packages/RNA /usr/local/lib/python3.X/dist-packages/`.
- Make sure you have Python 3.6+ with pip (packages python3-pip and python3-dev on most distros)
- Clone the latest BayesPairing 2 Git repo, and install it :
```
git clone http://jwgitlab.cs.mcgill.ca/sarrazin/rnabayespairing2.git BayesPairing2
cd BayesPairing2
python3 -m pip install .
cd ..
```
### BUILDING
* You might want to edit `Makefile` if you did not install CPLEX or the libRNA in the default location. Please update the top variables $ICONCERT, $ICPLEX, $LCONCERT, and $LCPLEX with the correct locations.
* Build it: `make -j4`
* Check if the executable file exists: `./bin/biorseo --version`.
### BAYESPAIRING USERS: PREPARE BAYESIAN NETWORKS
We run an example job for it to build the bayesian networks of our modules.
```
cd BayesPairing2/bayespairing/src
python3 parse_sequences.py -seq "UUUUUUAAGGAAGAUCUGGCCUUCCCACAAGGGAAGGCCAAAGAAUUUCCUU" -t 4 -d rna3dmotif -ss "......(((((((.((((((((((((....)))))))))..))).)))))))"
```
Use `-d rna3dmotif`, `-d 3dMotifAtlas_ALL` depending on the module source you are planning to use.
This is a quite long step, but the bayesian networks will be ready for all the future uses.
### RUN BIORSEO
Now you can run biorseo.py, but, as you are not into the Docker environment, you MUST provide the options to tell it the jar3d or BayesPairing locations, for example:
```
......@@ -146,5 +91,4 @@ $ ./biorseo.py
--rna3dmotifs --patternmatch --func B
--biorseo-dir /FULL/path/to/the/root/biorseo/dir
--modules-path=./data/modules/DESC
--jar3d-exec=./jar3d_releasedate.jar OR --bypdir=./BayesPairing2/bayespairing/src
```
......
#!/usr/bin/python3
# coding=utf-8
import getopt, multiprocessing, subprocess, sys
import matplotlib.pyplot as plt
from scipy import stats
from os import path, makedirs, getcwd, chdir, devnull, remove, walk
from matplotlib import colors
from math import sqrt
from multiprocessing import cpu_count, Manager
from shutil import move
from ast import literal_eval
# Parse options
try:
cmd_opts, cmd_args = getopt.getopt( sys.argv[1:],
"bc:f:hi:jl:no:O:pt:v",
[ "verbose","rna3dmotifs","3dmotifatlas","carnaval","contacts","jar3d","bayespairing","patternmatch","func=", "MFE", "MEA",
"help","version","seq=","modules-path=", "jar3d-exec=", "bypdir=", "biorseo-dir=", "first-objective=","output=","theta=",
"interrupt-limit=", "outputf="])
except getopt.GetoptError as err:
print(err)
sys.exit(2)
m = Manager()
running_stats = m.list()
running_stats.append(0) # n_launched
running_stats.append(0) # n_finished
running_stats.append(0) # n_skipped
ignored_nt_dict = {}
def is_canonical_nts(seq):
"""Checks if a nucleotide is ACGU, and stores it in a dictionnary if it's not,
with an occurrence count."""
for c in seq[:-1]:
if c not in "ACGU":
if c in ignored_nt_dict.keys():
ignored_nt_dict[c] += 1
else:
ignored_nt_dict[c] = 1
return False
return True
def absolutize_path(p, directory=False):
if p[0] != '/':
p = getcwd() + '/' + p
if directory and p[-1] != '/':
p = p + '/'
return p
class NoDaemonProcess(multiprocessing.Process):
@property
def daemon(self):
return False
@daemon.setter
def daemon(self, value):
pass
class NoDaemonContext(type(multiprocessing.get_context())):
Process = NoDaemonProcess
class MyPool(multiprocessing.pool.Pool):
# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
def __init__(self, *args, **kwargs):
kwargs['context'] = NoDaemonContext()
super(MyPool, self).__init__(*args, **kwargs)
class Loop:
"""Just a data structure to store module informations."""
def __init__(self, header, subsequence, looptype, position):
self.header = header
self.seq = subsequence
self.type = looptype
self.position = position
class InsertionSite:
"""An interval of an RNA sequence where a particular module could be inserted.
The philosophy of Biorseo is : if this portion can fold like this module,
then it may be a loop of the corresponding type."""
def __init__(self, loop, csv_line):
# BEWARE : jar3d csv output is crap because of java's locale settings.
# On french OSes, it uses commas to delimit the fields AND as floating point delimiters !!
# Parse with caution, and check what the csv output files look like on your system...
info = csv_line.split(',')
self.loop = loop # the Loop object that has been searched with jar3d
# position of the loop's components, so the motif's ones, in the query sequence.
self.position = loop.position
# Motif model identifier of the RNA 3D Motif Atlas
self.atlas_id = info[2]
# alignment score of the subsequence to the motif model
self.score = int(float(info[4]))
# should the motif model be inverted to fit the sequence ?
self.rotation = int(info[-2])
def __lt__(self, other):
"""compare two insertion sites scores"""
return self.score < other.score
def __gt__(self, other):
"""compare two insertion sites scores"""
return self.score > other.score
class Job:
"""A class to store the properties of a tool execution, in order to run similar jobs in parallel."""
def __init__(self, command=None, function=None, estimator=None, args=None, how_many_in_parallel=0, priority=1, timeout=None):
self.cmd_ = command
self.func_ = function
self.estimator_ = estimator
self.args_ = args
self.priority_ = priority
self.timeout_ = timeout
if not how_many_in_parallel:
self.nthreads = cpu_count()
elif how_many_in_parallel == -1:
self.nthreads = cpu_count() - 1
else:
self.nthreads = how_many_in_parallel
class RNA:
"""Just a data structure gathering header, sequence and length."""
def __init__(self, header, seq):
self.seq_ = seq
self.header = header
self.length = len(seq)
class BiorseoInstance:
"""A run of the biorseo tool, to predict one or several RNA sequences' folding(s),
including all the necessary previous run of other tools."""
def __init__(self, opts):
# set default run type options
self.type = "dpm" # direct pattern mathcing
self.modules = "desc" # ...with Rna3dMotifs "DESC" modules
self.func = 'B' # ...and function B
self.estimator = 'b' # ...and estimator MEA"""
self.forward_options = [] # options to pass to the C++ biorseo
self.jobcount = 0
self.joblist = []
# set default file output locations
self.finalname = ""
self.outputf = "" # A folder where to output the computation files
if path.exists("/biorseo/results"): # docker image default
self.outputf = "/biorseo/results"
self.output = "" # A file to store the solutions
# set default data input locations
self.mode = 0 # default is single sequence mode
self.inputfile = ""
self.jar3d_exec = "/jar3d_2014-12-11.jar"
self.bypdir = "/byp/src" # Docker image locations
self.HL_motif_dir = "/modules/BGSU/HL/3.2/lib"
self.IL_motif_dir = "/modules/BGSU/IL/3.2/lib"
self.desc_folder = "/modules/DESC"
self.rin_folder = "/modules/RIN/Subfiles"
self.json_folder = "/modules/ISAURE"
self.biorseo_dir = "/biorseo"
self.run_dir = path.dirname(path.realpath(__file__))
self.temp_dir = "temp/"
count = 0
for opt, arg in opts:
if opt == "-h" or opt == "--help":
print( "Biorseo, Bi-Objective RNA Structure Efficient Optimizer\n"
"Bio-objective integer linear programming framework to predict RNA secondary structures by including known RNA modules.\n"
"Developped by Louis Becquey (louis.becquey@univ-evry.fr), 2018-2020\n\n")
print("Usage:\tYou must provide:\n\t1) a FASTA input file with -i,"
"\n\t2) a module type with --rna3dmotifs, --carnaval, --contacts or --3dmotifatlas"
"\n\t3) one module placement method in { --patternmatch, --jar3d, --bayespairing }"
"\n\t4) one scoring function with --func A, B, C or D"
"\n\t5) one estimator between MFE or MEA"
"\n\n\tIf you are not using the Docker image: \n\t6) --modules-path, --biorseo-dir and (--jar3d-exec or --bypdir)")
print()
print("Options:")
print("-h [ --help ]\t\t\tPrint this help message")
print("--version\t\t\tPrint the program version")
print("-i [ --seq=… ]\t\t\tFASTA file with the query RNA sequence")
print("--rna3dmotifs\t\t\tUse DESC modules from Djelloul & Denise, 2008")
print("--carnaval\t\t\tUse RIN modules from Reinharz & al, 2018")
print("--3dmotifatlas\t\t\tUse the HL and IL loops from BGSU's 3D Motif Atlas (updated)")
print("--contacts\t\t\tUse .json motif from Isaure")
print("-p [ --patternmatch ]\t\tUse regular expressions to place modules in the sequence (requires --rna3dmotifs or --carnaval)")
print("-j [ --jar3d ]\t\t\tUse JAR3D to place modules in the sequence (requires --3dmotifatlas)")
print("-b [ --bayespairing ]\t\tUse BayesPairing2 to place modules in the sequence (requires --rna3dmotifs or --3dmotifatlas)")
print("-o [ --output=… ]\t\tFile to summarize the results")
print("-O [ --outputf=… ]\t\tFolder where to output result and temp files")
print("-f [ --func=… ]\t\t\t(A, B, C or D, default is B)"
" Objective function to score module insertions:\n\t\t\t\t (A) insert big modules (B) insert light, high-order modules"
"\n\t\t\t\t (c) insert modules which score well with the sequence\n\t\t\t\t (D) insert light, high-order modules which score well with the sequence."
"\n\t\t\t\t C and D require cannot be used with --patternmatch.")
print("-e [ --MFE ]\t\t\tUse the estimator MFE(Minimum Free Energy)")
print("-a [ --MEA ]\t\t\tUse the estimator MEA(Maximum Expected Energy)")
print("-c [ --first-objective=… ]\t(default 1) Objective to solve in the mono-objective portions of the algorithm."
"\n\t\t\t\t (1) is the module objective given by --func, (2) is the expected accuracy of the structure.")
print("-l [ --limit=… ]\t\t(default 500) Number of solutions in the Pareto set from which"
"\n\t\t\t\t we give up the computation, before eliminating secondary structure doublons.")
print("-t [ --theta=… ]\t\t(default 0.001) Pairing-probability threshold to consider or not the possibility of pairing")
print("-n [ --disable-pseudoknots ]\tAdd constraints to explicitly forbid the formation of pseudoknots")
print("-v [ --verbose ]\t\tPrint what is happening to stdout")
print("--modules-path=…\t\tPath to the modules data.\n\t\t\t\t The folder should contain modules in the DESC format as output by Djelloul & Denise's"
"\n\t\t\t\t 'catalog' program for use with --rna3dmotifs, or the IL/ and HL/ folders\n\t\t\t\t from a release of the RNA 3D Motif Atlas "
"for use with --3dmotifatlas, or the\n\t\t\t\t data/modules/RIN/Subfiles/ folder for use with --carnaval.\n\t\t\t\t Consider placing these files on a fast I/O device (NVMe SSD, ...)")
print("--jar3d-exec=…\t\t\tPath to the jar3d executable.\n\t\t\t\t Default is /jar3d_2014-12-11.jar, you should use this option if you run"
"\n\t\t\t\t BiORSEO from outside the docker image.")
print("--bypdir=…\t\t\tPath to the BayesParing src folder.\n\t\t\t\t Default is /byp/src, you should use this option if you run"
"\n\t\t\t\t BiORSEO from outside the docker image.")
print("--biorseo-dir=…\t\t\tPath to the BiORSEO root directory.\n\t\t\t\t Default is /biorseo, you should use this option if you run"
"\n\t\t\t\t BiORSEO from outside the docker image. Use the FULL path.")
print("\nExamples:")
print("biorseo.py -i myRNA.fa -O myResultsFolder/ --rna3dmotifs --patternmatch --func B --MEA")
print("biorseo.py -i myRNA.fa -O myResultsFolder/ --3dmotifatlas --jar3d --func B --MEA -l 800")
print("biorseo.py -i myRNA.fa -v --3dmotifatlas --bayespairing --func D --MEA")
print("\nThe allowed module/placement-method/function combinations are:\n")
print(" --patternmatch --bayespairing --jar3d")
print("--rna3dmotifs A. B. A. B. C. D.")
print("--3dmotifatlas A. B. C. D. A. B. C. D.")
print("--carnaval A. B.")
print("--contacts A. B. E. F.\n")
sys.exit()
elif opt == "-i" or opt == "--seq":
self.inputfile = arg
elif opt == "-O" or opt == "--outputf":
self.outputf = absolutize_path(arg, directory=True) # output folder
elif opt == "-o" or opt == "--output":
self.output = absolutize_path(arg) # output file
elif opt == "-f" or opt == "--func":
if arg in ['A', 'B', 'C', 'D', 'E', 'F']:
self.func = arg
else:
raise "Unknown scoring function " + arg
elif opt == "-p" or opt == "--patternmatch":
self.type = "dpm"
elif opt == "-j" or opt == "--jar3d":
self.type = "jar3d"
elif opt == "-b" or opt == "--bayespairing":
self.type = "byp"
elif opt == "--carnaval":
self.modules = "rin"
elif opt == "--contacts":
self.modules = "json"
elif opt == "--rna3dmotifs":
self.modules = "desc"
elif opt == "--3dmotifatlas":
self.modules = "bgsu"
elif opt == "--modules-path":
self.HL_motif_dir = absolutize_path(arg, directory=True) + "HL/3.2/lib"
self.IL_motif_dir = absolutize_path(arg, directory=True) + "IL/3.2/lib"
self.desc_folder = absolutize_path(arg, directory=True)
self.rin_folder = absolutize_path(arg, directory=True)
self.json_folder = absolutize_path(arg, directory=True)
print("Looking for modules in", arg)
elif opt == "--jar3d-exec":
self.jar3d_exec = absolutize_path(arg)
print("Using ", arg)
elif opt == "--bypdir":
self.bypdir = absolutize_path(arg, directory=True)
print("Using trained BayesPairing in", arg)
elif opt == "--biorseo-dir":
self.biorseo_dir = absolutize_path(arg, directory=True)
elif opt == "--version":
subprocess.run([self.biorseo_dir+"bin/biorseo", "--version"])
exit(0)
elif opt == "-l" or opt == "--interrupt-limit":
self.forward_options.append("-l")
self.forward_options.append(arg)
elif opt == "-a" or opt == "--MFA":
if count == 0:
self.forward_options.append("-a")
count = 1
else:
raise "Choose only one estimator between MEA or MFE !"
elif opt == "-e" or opt == "--MFE":
if count == 0:
self.estimator = "a"
self.forward_options.append("-e")
count = 1
else:
raise "Choose only one estimator between MEA or MFE !"
elif opt == "-v" or opt == "--verbose":
self.forward_options.append("-v")
elif opt == "-n" or opt == "--disable-pseudoknots":
self.forward_options.append("-n")
elif opt == "-t" or opt == "--theta":
self.forward_options.append("-t")
self.forward_options.append(arg)
elif opt == "-c" or opt == "--first-objective":
self.forward_options.append("-c")
self.forward_options.append(arg)
# Check the argument combination is OK
self.check_args()
if self.outputf != "":
print("saving files to", self.outputf)
# create jobs
self.list_jobs()
# run them
self.execute_jobs()
# locate the results at the right place
if self.output != "":
subprocess.run(["mv", self.temp_dir+self.finalname.split('/')[-1], self.output], check=True)
if self.outputf != "":
for src_dir, _, files in walk(self.temp_dir):
dst_dir = src_dir.replace(self.temp_dir, self.outputf, 1)
if not path.exists(dst_dir):
makedirs(dst_dir)
for file_ in files:
src_file = path.join(src_dir, file_)
dst_file = path.join(dst_dir, file_)
if path.exists(dst_file):
# in case of the src and dst are the same file
if path.samefile(src_file, dst_file):
continue
remove(dst_file)
move(src_file, dst_dir)
subprocess.run(["rm", "-rf", self.temp_dir], check=True) # remove the temp folder
def check_args(self):
"""Checks that the argument combination passed by user is a realistic project"""
warning = "ERROR: The argument list you passed contains errors:"
issues = False
if self.modules == "desc" and self.type == "jar3d":
issues = True
print(warning)
print("/!\\ Using jar3d requires the 3D Motif Atlas modules. Use --3dmotifatlas instead of --rna3dmotifs or --carnaval.")
if (self.modules == "desc" or self.modules == "rin" or self.modules == "bgsu") and (self.func == 'E' or self.func == 'F'):
issues = True
print(warning)
print("/!\\ Functions E and F are only compatible with the contacts library from Isaure.")
if self.modules == "rin" and self.type != "dpm":
issues = True
print(warning)
print("/!\\ CaRNAval does not support placement tools (yet), or scoring tools. Please use it with --patternmatch, not --jar3d nor --bayespairing.")
if self.modules == "json" and (self.type != "dpm" or self.func == 'C' or self.func == 'D'):
issues = True
print(warning)
print("/!\\ Contacts does not support placement tools (yet). Please use it with --patternmatch, not --jar3d nor --bayespairing.")
if self.modules == "bgsu" and self.type == "dpm":
issues = True
print(warning)
print("/!\\ Cannot place the Atlas loops by direct pattern matching. Please use a dedicated tool --jar3d or --bayespairing to do so.")
if issues:
print("\nUsage:\tYou must provide:\n\t1) a FASTA input file with -i,\n\t2) one module type in { --rna3dmotifs, --carnaval, --3dmotifatlas, --contacts }"
"\n\t3) one module placement method in { --patternmatch, --jar3d, --bayespairing }\n\t4) one scoring function with --func A, B, C or D"
"\n\t4) one estimator for the second objective with --MFE or --MEA"
"\n\n\tIf you are not using the Docker image: \n\t6) --modules-path, --biorseo-dir and (--jar3d-exec or --bypdir)")
print("\nThe allowed module/placement-method/function combinations are:\n")
print(" --patternmatch --bayespairing --jar3d")
print("--rna3dmotifs A. B. A. B. C. D.")
print("--3dmotifatlas A. B. C. D. A. B. C. D.")
print("--carnaval A. B.")
print("--contacts A. B. E. F.\n")
exit(1)
def enumerate_loops(self, s):
"""Lists all the loop positions in a dot-bracket notation"""
def resort(unclosedLoops):
loops.insert(len(loops)-1-unclosedLoops, loops[-1])
loops.pop(-1)
opened = []
openingStart = []
closingStart = []
loops = []
loopsUnclosed = 0
consecutiveOpenings = []
if s[0] == '(':
consecutiveOpenings.append(1)
consecutiveClosings = 0
lastclosed = -1
previous = ''
for i in range(len(s)):
# If we arrive on an unpaired segment
if s[i] == '.':
if previous == '(':
openingStart.append(i-1)
if previous == ')':
closingStart.append(i-1)
# Opening basepair
if s[i] == '(':
if previous == '(':
consecutiveOpenings[-1] += 1
else:
consecutiveOpenings.append(1)
if previous == ')':
closingStart.append(i-1)
# We have something like (...(
if len(openingStart) and openingStart[-1] == opened[-1]:
# Create a new loop starting with this component.
loops.append([(openingStart[-1], i)])
openingStart.pop(-1)
loopsUnclosed += 1
# We have something like )...( or even )(
if len(closingStart) and closingStart[-1] == lastclosed:
# Append a component to existing multiloop
loops[-1].append((closingStart[-1], i))
closingStart.pop(-1)
opened.append(i)
# Closing basepair
if s[i] == ')':
if previous == ')':
consecutiveClosings += 1
else:
consecutiveClosings = 1
# This is not supposed to happen in real data, but whatever.
if previous == '(':
openingStart.append(i-1)
# We have something like (...) or ()
if len(openingStart) and openingStart[-1] == opened[-1]:
# Create a new loop, and save it as already closed (HL)
loops.append([(openingStart[-1], i)])
openingStart.pop(-1)
resort(loopsUnclosed)
# We have something like )...)
if len(closingStart) and closingStart[-1] == lastclosed:
# Append a component to existing multiloop and close it.
loops[-1].append((closingStart[-1], i))
closingStart.pop(-1)
loopsUnclosed -= 1
resort(loopsUnclosed)
if i+1 < len(s):
if s[i+1] != ')': # We are on something like: ).
# an openingStart has not been correctly detected, like in ...((((((...)))...)))
if consecutiveClosings < consecutiveOpenings[-1]:
# Create a new loop (uncompleted)
loops.append([(opened[-2], opened[-1])])
loopsUnclosed += 1
# We just completed an HL+stem, like ...(((...))).., we can forget its info
if consecutiveClosings == consecutiveOpenings[-1]:
consecutiveClosings = 0
consecutiveOpenings.pop(-1)
else: # There are still several basepairs to remember, forget only the processed ones, keep the others
consecutiveOpenings[-1] -= consecutiveClosings
consecutiveClosings = 0
else: # We are on something like: ))
# we are on an closingStart that cannot be correctly detected, like in ...(((...(((...))))))
if consecutiveClosings == consecutiveOpenings[-1]:
# Append a component to the uncomplete loop and close it.
loops[-1].append((i, i+1))
loopsUnclosed -= 1
resort(loopsUnclosed)
# Forget the info about the processed stem.
consecutiveClosings = 0
consecutiveOpenings.pop(-1)
opened.pop(-1)
lastclosed = i
previous = s[i]
# print(i,"=",s[i],"\t", "consec. Op=", consecutiveOpenings,"Cl=",consecutiveClosings)
return loops
def launch_JAR3D_worker(self, loop):
"""Launches a jar3d search of specific loop types (IL or HL) on a RNA subsequence (fasta file)"""
# write motif to a file
modulefolder = self.temp_dir + loop.header[1:] + '/'
if not path.exists(modulefolder):
makedirs(modulefolder)
filename = modulefolder + loop.header[1:]+".fasta"
fasta = open(filename, 'w')
fasta.write('>'+loop.header+'\n'+loop.seq+'\n')
fasta.close()
# Launch Jar3D on it
if loop.type == 'h':
cmd = ["java", "-jar", self.jar3d_exec, loop.header[1:]+".fasta", self.HL_motif_dir+"/all.txt",
loop.header[1:]+".HLloop.csv", loop.header[1:]+".HLseq.csv"]
else:
cmd = ["java", "-jar", self.jar3d_exec, loop.header[1:]+".fasta", self.IL_motif_dir+"/all.txt",
loop.header[1:]+".ILloop.csv", loop.header[1:]+".ILseq.csv"]
with open(self.temp_dir + "log_of_the_run.sh", 'a') as logfile:
logfile.write(' '.join(cmd) + '\n')
chdir(modulefolder)
with open(devnull, 'w') as nowhere:
subprocess.run(cmd, stdout=nowhere)
chdir(self.biorseo_dir)
# Retrieve results
insertion_sites = []
if loop.type == 'h':
capstype = "HL"
else:
capstype = "IL"
csv = open(modulefolder + loop.header[1:] +".%sseq.csv" % capstype, 'r')
l = csv.readline()
while l:
if "true" in l:
insertion_sites.append(InsertionSite(loop, l))
l = csv.readline()
csv.close()
return insertion_sites
def launch_JAR3D(self, seq_, basename):
"""Identify loops in a RNA sequence from RNAsubopt results and search
for 3D motif Atlas modules to place on it using jar3d."""
rnasubopt_preds = []
# Extracting probable loops from RNA-subopt structures
rna = open(self.temp_dir + basename + ".subopt", "r")
lines = rna.readlines()
rna.close()
for i in range(2, len(lines)):
ss = lines[i].split(' ')[0]
if ss not in rnasubopt_preds:
rnasubopt_preds.append(ss)
HLs = []
ILs = []
for ss in rnasubopt_preds:
loop_candidates = self.enumerate_loops(ss)
for loop_candidate in loop_candidates:
if len(loop_candidate) == 1 and loop_candidate not in HLs:
HLs.append(loop_candidate)
if len(loop_candidate) == 2 and loop_candidate not in ILs:
ILs.append(loop_candidate)
# Retrieve subsequences corresponding to the possible loops
loops = []
for i, l in enumerate(HLs):
loops.append(Loop(">HL%d" % (i+1), seq_[l[0][0]-1:l[0][1]], "h", l))
for i, l in enumerate(ILs):
loops.append(Loop(">IL%d" % (i+1), seq_[l[0][0]-1:l[0][1]]+'*'+seq_[l[1][0]-1:l[1][1]], "i", l))
# Scanning loop subsequences against motif database
pool = MyPool(processes=cpu_count())
insertion_sites = [ x for y in pool.map(self.launch_JAR3D_worker, loops) for x in y ]
pool.close()
pool.join()
insertion_sites.sort(reverse=True)
# Writing results to CSV file
c = 0
resultsfile = open(self.biorseo_dir + "/" + self.temp_dir+basename+".sites.csv", "w")
resultsfile.write("Motif,Rotation,Score,Start1,End1,Start2,End2\n")
for site in insertion_sites:
if site.score > 10:
c += 1
string = "FOUND with score %d:\t\t possible insertion of motif " % site.score + site.atlas_id
if site.rotation:
string += " (reversed)"
string += (" on " + site.loop.header + " at positions")
resultsfile.write(site.atlas_id+',' + str(bool(site.rotation))+",%d" % site.score+',')
positions = [','.join([str(y) for y in x]) for x in site.position]
if len(positions) == 1:
positions.append("-,-")
resultsfile.write(','.join(positions)+'\n')
resultsfile.close()
def launch_BayesPairing(self, module_type, seq_, header_):
"""DEPRECATED: now we are using Bayespairing 2.
Searches for module occurrences in the provided RNA sequence using BayesPairing 1."""
# Run BayePairing
cmd = ["python3", "parse_sequences.py", "-seq", self.biorseo_dir + '/' + self.temp_dir + header_ + ".fa", "-d", module_type, "-interm", "1"]
logfile = open(self.biorseo_dir + "/" + self.temp_dir + "log_of_the_run.sh", 'a')
logfile.write(" ".join(cmd))
logfile.write("\n")
logfile.close()
chdir(self.bypdir)
out = subprocess.check_output(cmd).decode('utf-8')
BypLog = out.split('\n')
# Extract results from command output to file
idx = 0
l = BypLog[idx]
while l[:3] != "PUR":
idx += 1
l = BypLog[idx]
insertion_sites = [x for x in literal_eval(l.split(":")[1][1:])]
if module_type == "rna3dmotif":
rna = open(self.biorseo_dir + "/" + self.temp_dir + header_ + ".byp.csv", "w")
else:
rna = open(self.biorseo_dir + "/" + self.temp_dir + header_ + ".bgsubyp.csv", "w")
rna.write("Motif,Score,Start1,End1,Start2,End2...\n")
for i, module in enumerate(insertion_sites):
if len(module):
for (score, positions, _) in zip(*[iter(module)]*3):
pos = []
q = -2
for p in positions:
if p-q > 1:
pos.append(q)
pos.append(p)
q = p
pos.append(q)
rna.write(module_type+str(i)+','+str(int(score)))
for (p, q) in zip(*[iter(pos[1:])]*2):
if q > p:
rna.write(','+str(p)+','+str(q))
rna.write('\n')
rna.close()
def launch_BayesPairing2(self, module_type, seq_, header_):
"""Search for module occurrences in the provided RNA sequence using BayesPairing 2"""
# Run BayesPairing 2
if module_type=="rna3dmotif":
BP2_type = "rna3dmotif"
else:
BP2_type = "3dMotifAtlas_ALL"
cmd = ["python3", "parse_sequences.py", "-seq", self.biorseo_dir + '/' + self.temp_dir + header_ + ".fa", "-samplesize", "1000", "-d", BP2_type ]
logfile = open(self.temp_dir + "log_of_the_run.sh", 'a')
logfile.write(" ".join(cmd))
logfile.write("\n")
logfile.close()
chdir(self.bypdir)
out = subprocess.check_output(cmd).decode('utf-8')
Byp2Log = out.splitlines()
# remove what is not in the original input
Byp2Log.pop(0)
Byp2Log.pop(0)
Byp2Log.pop()
Byp2Log.pop()
# remove the 2 first lines of output
Byp2Log.pop(0)
Byp2Log.pop(0)
# Extract results from command line output to file
lines = []
for i in range(len(Byp2Log)):
line = Byp2Log[i].replace("|", ' ').replace(",", ' ').split()
if line != []:
if "=" in line[0]: #skip the "| MODULE N HITS PERCENTAGE |" part
break
module_sequence = line.pop().split("&") #remove the sequence
if line != []:
new_line = [line[0], line[1]]
num_comp = 0
position_index = 2
while num_comp < len(module_sequence):
len_comp = 0
comp = module_sequence[num_comp]
element = line[position_index].split("-")
new_line.append(element[0])
if position_index >= len(line):
print("!!! Skipped one BP2 result : positions not matching sequence length\n")
new_line = []
break
while len_comp < len(comp):
element = line[position_index].split("-")
if len(element)==1:
len_comp += 1
else:
len_comp += int(element[1]) - int(element[0]) + 1
position_index += 1
new_line.append(element[-1])
num_comp += 1
if new_line != [] :
lines.append(new_line)
if module_type=="rna3dmotif":
rna = open(self.biorseo_dir + "/" + self.temp_dir + header_ + ".byp2.csv", "w")
else:
rna = open(self.biorseo_dir + "/" + self.temp_dir + header_ + ".bgsubyp2.csv", "w")
rna.write("Motif,Score,Start1,End1,Start2,End2...\n")
for line in lines:
rna.write(module_type)
for i in range(len(line)-1):
rna.write(line[i] + ",")
rna.write(line[-1] + "\n")
rna.close()
def execute_job(self, j):
"""Execute the command or function stored in a Job class object."""
running_stats[0] += 1
if j.cmd_ is not None:
logfile = open(self.biorseo_dir + "/" + self.temp_dir + "log_of_the_run.sh", 'a')
logfile.write(" ".join(j.cmd_))
logfile.write("\n")
logfile.close()
print("["+str(running_stats[0]+running_stats[2]) +
'/'+str(self.jobcount)+"]\t"+" ".join(j.cmd_))
r = subprocess.run(j.cmd_, timeout=j.timeout_)
elif j.func_ is not None:
print("["+str(running_stats[0]+running_stats[2])+'/'+str(self.jobcount) +
"]\t"+j.func_.__name__+'('+", ".join([a for a in j.args_])+')')
try:
if j.args_ is not None:
r = j.func_(*j.args_)
else:
r = j.func_()
except Exception as e:
r = 1
print("\033[31m", e, "\033[0m")
pass
running_stats[1] += 1
return r
def execute_jobs(self):
"""Groups job by priority and ability to be run in parallel,
and runs them."""
jobs = {}
self.jobcount = len(self.joblist)
for job in self.joblist:
if job.priority_ not in jobs.keys():
jobs[job.priority_] = {}
if job.nthreads not in jobs[job.priority_].keys():
jobs[job.priority_][job.nthreads] = []
jobs[job.priority_][job.nthreads].append(job)
nprio = max(jobs.keys())
if len(jobs) > 1 :
for i in range(1,nprio+1):
if not len(jobs[i].keys()): continue
# check the thread numbers
different_thread_numbers = [ n for n in jobs[i].keys() ]
different_thread_numbers.sort()
for n in different_thread_numbers:
bunch = jobs[i][n]
if not len(bunch): continue
pool = MyPool(processes=n)
pool.map(self.execute_job, bunch)
pool.close()
pool.join()
else:
for job in self.joblist:
self.execute_job(job)
def list_jobs(self):
"""Determines the required tool runs we need before running the C++ Biorseo,
and creates a job list to run."""
if self.outputf != "":
subprocess.run(["mkdir", "-p", self.outputf]) # Create the output folder
subprocess.run(["mkdir", "-p", self.temp_dir]) # Create the temp folder
# Read fasta file, which can contain one or several RNAs
RNAcontainer = []
print("loading file %s..." % self.inputfile)
db = open(self.inputfile, "r")
c = 0
header = ""
seq = ""
while True:
l = db.readline()
if l == "":
break
c += 1
c = c % 2
if c == 1:
if header != "": # This is our second RNA in the fasta file
self.mode = 1 # we switch to batch mode
header = l[1:-1]
if c == 0:
seq = l.upper()
if is_canonical_nts(seq):
header = header.replace('/', '_').replace('\'','').replace('(','').replace(')','').replace(' ','_').replace('>','')
if (header != ""):
RNAcontainer.append(RNA(header, seq))
if not path.isfile(self.temp_dir + header + ".fa"):
rna = open(self.temp_dir + header + ".fa", "w")
rna.write(">" + header +'\n')
rna.write(seq +'\n')
rna.close()
db.close()
for nt, number in ignored_nt_dict.items():
print("ignored %d sequences because of char %c" % (number, nt))
tot = len(RNAcontainer)
print("Loaded %d RNAs." % (tot))
# define job list
for instance in RNAcontainer:
executable = self.biorseo_dir + "bin/biorseo"
fastafile = self.temp_dir+instance.header+".fa"
method_type = ""
priority = 1
if self.type == "jar3d":
ext = ".jar3d"
method_type = "--jar3dcsv"
csv = self.temp_dir + instance.header + ".sites.csv"
# RNAsubopt
self.joblist.append(Job(command=["RNAsubopt", "-i", fastafile, "--outfile="+ instance.header + ".subopt"], priority=1))
self.joblist.append(Job(command=["mv", instance.header + ".subopt", self.temp_dir], priority=2))
# JAR3D
self.joblist.append(Job(function=self.launch_JAR3D, args=[instance.seq_, instance.header], priority=3, how_many_in_parallel=1))
priority = 4
if self.type == "byp":
method_type = "--bayespaircsv"
if self.modules == "desc":
ext = ".byp2"
csv = self.temp_dir + instance.header + ".byp2.csv"
self.joblist.append(Job(function=self.launch_BayesPairing2, args=["rna3dmotif", instance.seq_, instance.header], how_many_in_parallel=1, priority=1))
elif self.modules == "bgsu":
ext = ".bgsubyp2"
csv = self.temp_dir + instance.header + ".bgsubyp2.csv"
self.joblist.append(Job(function=self.launch_BayesPairing2, args=["3dmotifatlas", instance.seq_, instance.header], how_many_in_parallel=1, priority=1))
priority = 2
if self.type == "dpm":
if self.modules == "desc":
method_type = "--descfolder"
csv = self.desc_folder
ext = ".desc_pm"
elif self.modules == "rin":
method_type = "--rinfolder"
csv = self.rin_folder
ext = ".rin_pm"
elif self.modules == "json":
method_type = "--jsonfolder"
csv = self.json_folder
ext = ".json_pm"
command = [ executable, "-s", fastafile ]
if method_type:
command += [ method_type, csv ]
if self.estimator == 'a':
self.finalname = self.temp_dir + instance.header + ext + self.func + "_MFE"
else:
self.finalname = self.temp_dir + instance.header + ext + self.func + "_MEA"
command += ["-o", self.finalname, "--function", self.func]
command += self.forward_options
self.joblist.append(Job(command=command, priority=priority, timeout=3600, how_many_in_parallel=3))
BiorseoInstance(cmd_opts)
......@@ -1142,178 +1142,9 @@ void MOIP::allowed_motifs_from_rin(args_of_parallel_func arg_struct)
}
}
//Check if the sequence is a rna sequence (ATGC) and replace T by U or remove modified nucleotide if necessary
string check_motif_sequence(string seq) {
std::transform(seq.begin(), seq.end(), seq.begin(), ::toupper);
for (int i = seq.size(); i >= 0; i--) {
if(seq[i] == 'T') {
seq[i] = 'U';
} else if (!(seq [i] == 'A' || seq [i] == 'U' || seq [i] == '&'
|| seq [i] == 'G' || seq [i] == 'C')) {
seq = seq.erase(i,1);
}
}
return seq;
}
// Based on the 2d structure find all positions of the pairings.
vector<Link> search_pairing(string& struc, vector<Component>& v) {
vector<Link> vec;
stack<uint> parentheses;
stack<uint> crochets;
stack<uint> accolades;
stack<uint> chevrons;
uint count = 0;
uint debut = v[count].pos.first;
uint gap = 0;
for (uint i = 0; i < struc.size(); i++) {
if (struc[i] == '(') {
parentheses.push(i + debut + gap - count);
} else if (struc[i] == ')') {
Link l;
l.nts.first = parentheses.top();
l.nts.second = i + debut + gap - count;
vec.push_back(l);
parentheses.pop();
} else if (struc[i] == '[') {
crochets.push(i + debut + gap - count);
} else if (struc[i] == ']') {
Link l;
l.nts.first = crochets.top();
l.nts.second = i + debut + gap - count;
vec.push_back(l);
crochets.pop();
} else if (struc[i] == '{') {
accolades.push(i + debut + gap - count);
} else if (struc[i] == '}') {
Link l;
l.nts.first = accolades.top();
l.nts.second = i + debut + gap - count;
vec.push_back(l);
accolades.pop();
} else if (struc[i] == '<') {
chevrons.push(i + debut + gap - count);
} else if (struc[i] == '>') {
Link l;
l.nts.first = chevrons.top();
l.nts.second = i + debut + gap - count;
vec.push_back(l);
chevrons.pop();
} else if (struc[i] == '&') {
count ++;
gap += v[count].pos.first - v[count - 1].pos.second - 1;
}
}
return vec;
}
size_t count_contacts(string contacts) {
size_t count = 0;
for (uint i = 0; i < contacts.size(); i++) {
if (contacts[i] == '*') {
count++;
}
}
return count;
}
uint find_max_occurrences (string filepath) {
uint max = 0;
std::ifstream in = std::ifstream(filepath);
json js = json::parse(in);
string contacts_id;
for(auto it = js.begin(); it != js.end(); ++it) {
contacts_id = it.key();
for(auto it2 = js[contacts_id].begin(); it2 != js[contacts_id].end(); ++it2) {
string test = it2.key();
if (!test.compare("occurences")) {
uint occ = it2.value();
if (occ > max) {
max = occ;
}
}
}
}
return max;
}
uint find_max_sequence (string filepath) {
uint max = 0;
std::ifstream in = std::ifstream(filepath);
json js = json::parse(in);
string contacts_id;
string seq;
for(auto it = js.begin(); it != js.end(); ++it) {
contacts_id = it.key();
for(auto it2 = js[contacts_id].begin(); it2 != js[contacts_id].end(); ++it2) {
string test = it2.key();
if (!test.compare("sequence")) {
seq = it2.value();
uint size = seq.size();
if (size > max) {
max = size;
}
}
}
}
return max;
}
vector<string> find_components(string sequence, string delimiter) {
vector<string> list;
string seq = sequence;
string subseq;
uint fin = 0;
while(seq.find(delimiter) != string::npos) {
fin = seq.find(delimiter);
subseq = seq.substr(0, fin);
seq = seq.substr(fin + 1);
list.push_back(subseq); // new component sequence
}
if (!seq.empty()) {
list.push_back(seq);
}
return list;
}
vector<uint> find_contacts(vector<string>& struc2d, vector<Component>& v) {
vector<uint> positions;
string delimiter = "*";
uint debut;
for (uint i = 0; i < v.size(); i++) {
debut = v[i].pos.first;
uint pos = struc2d[i].find(delimiter, 0);
while(pos != string::npos && pos <= struc2d[i].size())
{
positions.push_back(pos + debut);
pos = struc2d[i].find(delimiter, pos+1);
}
}
return positions;
}
void MOIP::allowed_motifs_from_json(args_of_parallel_func arg_struct, vector<pair<uint, char>> errors_id)
{
/*
Searches where to place some JSON motifs in the RNA
*/
// Searches where to place some JSON motifs in the RNA
path jsonfile = arg_struct.motif_file;
mutex& posInsertionSites_access = arg_struct.posInsertionSites_mutex;
......@@ -1389,7 +1220,7 @@ void MOIP::allowed_motifs_from_json(args_of_parallel_func arg_struct, vector<pai
if (verbose_) cout << "\t> Considering motif JSON " << contacts_id << "\t" << seq << ", " << struct2d << " ";
Motif temp_motif = Motif(v, contacts_id, nb_contacts, tx_occurrences);
temp_motif.links_ = search_pairing(struct2d, v);
temp_motif.links_ = build_motif_pairs(struct2d, v);
temp_motif.pos_contacts = find_contacts(component_contacts, v);
// Check if the motif can be inserted, checking the basepairs probabilities and theta
......@@ -1421,4 +1252,4 @@ void MOIP::allowed_motifs_from_json(args_of_parallel_func arg_struct, vector<pai
component_sequences.clear();
}
}
}
\ No newline at end of file
......
......@@ -11,16 +11,6 @@ using namespace boost::filesystem;
using namespace std;
using json = nlohmann::json;
struct recursive_directory_range {
typedef recursive_directory_iterator iterator;
recursive_directory_range(path p) : p_(p) {}
iterator begin() { return recursive_directory_iterator(p_); }
iterator end() { return recursive_directory_iterator(); }
path p_;
};
Motif::Motif(void) {}
Motif::Motif(const vector<Component>& v, string PDB) : comp(v), PDBID(PDB)
......@@ -275,89 +265,9 @@ char Motif::is_valid_RIN(const string& rinfile)
return (char) 0;
}
//check that there are as many opening parentheses as closing ones
bool checkSecondaryStructure(string struc)
{
stack<uint> parentheses;
stack<uint> crochets;
stack<uint> accolades;
stack<uint> chevrons;
for (uint i = 0; i < struc.length(); i++)
{
if (struc[i] != '(' && struc[i] != ')'
&& struc[i] != '.' && struc[i] != '&'
&& struc[i] != '[' && struc[i] != ']'
&& struc[i] != '{' && struc[i] != '}'
&& struc[i] != '<' && struc[i] != '>') {
return false;
} else {
for (uint i = 0; i < struc.size(); i++) {
if (struc[i] == '(') {
parentheses.push(i);
} else if (struc[i] == ')') {
if (!parentheses.empty())
parentheses.pop();
else return false;
} else if (struc[i] == '[') {
crochets.push(i);
} else if (struc[i] == ']') {
if (!crochets.empty())
crochets.pop();
else return false;
} else if (struc[i] == '{') {
accolades.push(i);
} else if (struc[i] == '}') {
if (!accolades.empty())
accolades.pop();
else return false;
} else if (struc[i] == '<') {
chevrons.push(i);
} else if (struc[i] == '>') {
if (!chevrons.empty())
chevrons.pop();
else return false;
}
}
}
}
return (parentheses.empty() && crochets.empty() && accolades.empty() && chevrons.empty());
}
//count the number of nucleotide in the motif sequence
size_t count_nucleotide(string& seq) {
size_t count = 0;
for(uint i = 0; i < seq.size(); i++) {
char c = seq.at(i);
if (c != '&') {
count++;
}
}
return count;
}
//count the numbre of '&' in the motif sequence
size_t count_delimiter(string& seq) {
size_t count = 0;
for(uint i = 0; i < seq.size(); i++) {
char c = seq.at(i);
if (c == '&') {
count++;
}
}
return count;
}
vector<pair<uint,char>> Motif::is_valid_JSON(const string& jsonfile)
{
// /!\ returns 0 if no errors
// returns 0 if no errors
std::ifstream motif;
motif = std::ifstream(jsonfile);
json js = json::parse(motif);
......@@ -366,6 +276,7 @@ vector<pair<uint,char>> Motif::is_valid_JSON(const string& jsonfile)
uint fin = 0;
std::string keys[6] = {"contacts", "occurences", "pdb", "pfam", "sequence", "struct2d"};
// Iterating over Motifs
for (auto i = js.begin(); i != js.end(); ++i) {
int j = 0;
......@@ -373,21 +284,20 @@ vector<pair<uint,char>> Motif::is_valid_JSON(const string& jsonfile)
size_t size;
string complete_seq;
string contacts;
//cout << id << ": " << endl;
// Iterating over json keys
for (auto it = js[id].begin(); it != js[id].end(); ++it) {
string test = it.key();
//std::cout << "test: " << test << endl;
if (test.compare(keys[j]))
// it.key() contains the key, and it.value() the field's value.
string curr_key = it.key();
if (curr_key.compare(keys[j]))
{
//std::cout << "error header : keys[" << j << "]: " << keys[j] << " vs test: " << test << endl;
//std::cout << "error header : keys[" << j << "]: " << keys[j] << " vs curr_key: " << curr_key << endl;
errors_id.push_back(make_pair(stoi(id), 'd'));
//return 'd';
}
else if(!test.compare(keys[5])) // This is the secondary structure field
else if(!curr_key.compare(keys[5])) // This is the secondary structure field
{
//std::cout << "struct2d: " << it.value() << endl;
string ss = it.value();
if (ss.empty()) {
errors_id.push_back(make_pair(stoi(id), 'f'));
......@@ -400,14 +310,12 @@ vector<pair<uint,char>> Motif::is_valid_JSON(const string& jsonfile)
break;
}
}
else if(!test.compare(keys[0])) // This is the contacts field
else if(!curr_key.compare(keys[0])) // This is the contacts field
{
//std::cout << "struct2d: " << it.value() << endl;
contacts = it.value();
}
else if (!test.compare(keys[4])) // This is the sequence field
else if (!curr_key.compare(keys[4])) // This is the sequence field
{
//std::cout << "sequence: " << it.value() << "\n";
string seq = it.value();
size = count_nucleotide(seq);
complete_seq = seq;
......@@ -462,6 +370,245 @@ vector<pair<uint,char>> Motif::is_valid_JSON(const string& jsonfile)
return errors_id;
}
//count the number of nucleotide in the motif sequence
size_t count_nucleotide(string& seq) {
size_t count = 0;
for(uint i = 0; i < seq.size(); i++) {
char c = seq.at(i);
if (c != '&') {
count++;
}
}
return count;
}
//count the number of '&' in the motif sequence
size_t count_delimiter(string& seq) {
size_t count = 0;
for(uint i = 0; i < seq.size(); i++) {
char c = seq.at(i);
if (c == '&') {
count++;
}
}
return count;
}
//count the number of '*' in the motif
size_t count_contacts(string& contacts) {
size_t count = 0;
for (uint i = 0; i < contacts.size(); i++) {
if (contacts[i] == '*') {
count++;
}
}
return count;
}
//Check if the sequence is a rna sequence (ATGC) and replace T by U or remove modified nucleotide if necessary
string check_motif_sequence(string seq) {
std::transform(seq.begin(), seq.end(), seq.begin(), ::toupper);
for (int i = seq.size(); i >= 0; i--) {
if(seq[i] == 'T') {
seq[i] = 'U';
} else if (!(seq [i] == 'A' || seq [i] == 'U' || seq [i] == '&'
|| seq [i] == 'G' || seq [i] == 'C')) {
seq = seq.erase(i,1);
}
}
return seq;
}
//check that there are as many opening parentheses as closing ones
bool checkSecondaryStructure(string struc)
{
stack<uint> parentheses;
stack<uint> crochets;
stack<uint> accolades;
stack<uint> chevrons;
for (uint i = 0; i < struc.length(); i++)
{
if (struc[i] != '(' && struc[i] != ')'
&& struc[i] != '.' && struc[i] != '&'
&& struc[i] != '[' && struc[i] != ']'
&& struc[i] != '{' && struc[i] != '}'
&& struc[i] != '<' && struc[i] != '>') {
return false;
} else {
for (uint i = 0; i < struc.size(); i++) {
if (struc[i] == '(') {
parentheses.push(i);
} else if (struc[i] == ')') {
if (!parentheses.empty())
parentheses.pop();
else return false;
} else if (struc[i] == '[') {
crochets.push(i);
} else if (struc[i] == ']') {
if (!crochets.empty())
crochets.pop();
else return false;
} else if (struc[i] == '{') {
accolades.push(i);
} else if (struc[i] == '}') {
if (!accolades.empty())
accolades.pop();
else return false;
} else if (struc[i] == '<') {
chevrons.push(i);
} else if (struc[i] == '>') {
if (!chevrons.empty())
chevrons.pop();
else return false;
}
}
}
}
return (parentheses.empty() && crochets.empty() && accolades.empty() && chevrons.empty());
}
// Converts a dot-bracket motif secondary structure to vector of Link
vector<Link> build_motif_pairs(string& struc, vector<Component>& v) {
vector<Link> vec;
stack<uint> parentheses;
stack<uint> crochets;
stack<uint> accolades;
stack<uint> chevrons;
uint count = 0;
uint debut = v[count].pos.first;
uint gap = 0;
for (uint i = 0; i < struc.size(); i++) {
if (struc[i] == '(') {
parentheses.push(i + debut + gap - count);
} else if (struc[i] == ')') {
Link l;
l.nts.first = parentheses.top();
l.nts.second = i + debut + gap - count;
vec.push_back(l);
parentheses.pop();
} else if (struc[i] == '[') {
crochets.push(i + debut + gap - count);
} else if (struc[i] == ']') {
Link l;
l.nts.first = crochets.top();
l.nts.second = i + debut + gap - count;
vec.push_back(l);
crochets.pop();
} else if (struc[i] == '{') {
accolades.push(i + debut + gap - count);
} else if (struc[i] == '}') {
Link l;
l.nts.first = accolades.top();
l.nts.second = i + debut + gap - count;
vec.push_back(l);
accolades.pop();
} else if (struc[i] == '<') {
chevrons.push(i + debut + gap - count);
} else if (struc[i] == '>') {
Link l;
l.nts.first = chevrons.top();
l.nts.second = i + debut + gap - count;
vec.push_back(l);
chevrons.pop();
} else if (struc[i] == '&') {
count ++;
gap += v[count].pos.first - v[count - 1].pos.second - 1;
}
}
return vec;
}
uint find_max_occurrences(string& filepath) {
uint max = 0;
std::ifstream in = std::ifstream(filepath);
json js = json::parse(in);
string contacts_id;
for(auto it = js.begin(); it != js.end(); ++it) {
contacts_id = it.key();
for(auto it2 = js[contacts_id].begin(); it2 != js[contacts_id].end(); ++it2) {
string test = it2.key();
if (!test.compare("occurences")) {
uint occ = it2.value();
if (occ > max) {
max = occ;
}
}
}
}
return max;
}
uint find_max_sequence(string& filepath) {
uint max = 0;
std::ifstream in = std::ifstream(filepath);
json js = json::parse(in);
string contacts_id;
string seq;
for(auto it = js.begin(); it != js.end(); ++it) {
contacts_id = it.key();
for(auto it2 = js[contacts_id].begin(); it2 != js[contacts_id].end(); ++it2) {
string test = it2.key();
if (!test.compare("sequence")) {
seq = it2.value();
uint size = seq.size();
if (size > max) {
max = size;
}
}
}
}
return max;
}
vector<string> find_components(string& sequence, string delimiter) {
vector<string> list;
string seq = sequence;
string subseq;
uint fin = 0;
while(seq.find(delimiter) != string::npos) {
fin = seq.find(delimiter);
subseq = seq.substr(0, fin);
seq = seq.substr(fin + 1);
list.push_back(subseq); // new component sequence
}
if (!seq.empty()) {
list.push_back(seq);
}
return list;
}
vector<uint> find_contacts(vector<string>& struc2d, vector<Component>& v) {
vector<uint> positions;
string delimiter = "*";
uint debut;
for (uint i = 0; i < v.size(); i++) {
debut = v[i].pos.first;
uint pos = struc2d[i].find(delimiter, 0);
while(pos != string::npos && pos <= struc2d[i].size())
{
positions.push_back(pos + debut);
pos = struc2d[i].find(delimiter, pos+1);
}
}
return positions;
}
bool is_desc_insertible(const string& descfile, const string& rna)
{
std::ifstream motif;
......
......@@ -86,6 +86,18 @@ vector<Motif> load_json_folder(const string& path, const string& r
vector<vector<Component>> find_next_ones_in(string rna, uint offset, vector<string>& vc);
vector<vector<Component>> json_find_next_ones_in(string rna, uint offset, vector<string>& vc);
// utilities for Json motifs
size_t count_nucleotide(string&);
size_t count_delimiter(string&);
size_t count_contacts(string&);
string check_motif_sequence(string);
bool checkSecondaryStructure(string);
vector<Link> build_motif_pairs(string&, vector<Component>&);
uint find_max_occurrences(string&);
uint find_max_sequence(string&);
vector<string> find_components(string&, string);
vector<uint> find_contacts(vector<string>&, vector<Component>&);
// utilities to compare secondary structures:
bool operator==(const Motif& m1, const Motif& m2);
bool operator!=(const Motif& m1, const Motif& m2);
......
......@@ -74,67 +74,57 @@ int main(int argc, char* argv[])
("help,h", "Print the help message")
("version", "Print the program version")
("seq,s", po::value<string>(&inputName)->required(), "Fasta file containing the RNA sequence")
("descfolder,d", po::value<string>(&motifs_path_name), "A folder containing modules in .desc format, as produced by Djelloul & Denise's catalog program")
("rinfolder,x", po::value<string>(&motifs_path_name), "A folder containing CaRNAval's RINs in .txt format, as produced by script transform_caRNAval_pickle.py")
("jsonfolder,a", po::value<string>(&motifs_path_name), "A folder containing a custom motif library in .json format")
("jar3dcsv,j", po::value<string>(&motifs_path_name), "A file containing the output of JAR3D's search for motifs in the sequence, as produced by biorseo.py")
("bayespaircsv,b", po::value<string>(&motifs_path_name), "A file containing the output of BayesPairing's search for motifs in the sequence, as produced by biorseo.py")
("first-objective,c", po::value<unsigned int>(&MOIP::obj_to_solve_)->default_value(2), "Objective to solve in the mono-objective portions of the algorithm")
("descfolder,d", po::value<string>(&motifs_path_name), "A folder containing modules in .desc format, as produced by Djelloul & Denise's catalog program (deprecated)")
("rinfolder,r", po::value<string>(&motifs_path_name), "A folder containing CaRNAval's RINs in .txt format, as produced by script transform_caRNAval_pickle.py")
("jsonfolder,j", po::value<string>(&motifs_path_name), "A folder containing a custom motif library in .json format")
("pre-placed,x", po::value<string>(&motifs_path_name), "A CSV file providing motif insertion sites obtained with another tool.")
("function,f", po::value<char>(&obj_function_nbr)->default_value('B'),
"(A, B, C, D, E or F) Objective function to score module insertions:\n"
" (A) insert big modules\n (B) light, high-order modules\n"
" (C) well-scored modules\n (D) light, high-order, well-scored\n modules\n"
" (E, F) insert big modules with many\n contacts with proteins, different\n ponderations.\n"
" C and D require position scores\n provided by --pre-placed.\n"
" E and F require protein-contact\n information and should be\n used only with --jsonfolder.")
("mfe,E", "Minimize stacking energies\n (leads to MFE extimator)")
("mea,A", "(default) Maximize expected accuracy\n (leads to MEA estimator)")
("first-objective,c", po::value<unsigned int>(&MOIP::obj_to_solve_)->default_value(2),
"(1 or 2) Objective to solve in the mono-objective portions of the algorithm.\n"
" (1) is the module objective,\n given by --function\n"
" (2) is energy-based objective,\n either MFE or MEA")
("output,o", po::value<string>(&outputName), "A file to summarize the computation results")
("theta,t", po::value<float>(&theta_p_threshold)->default_value(1e-3, "0.001"), "Pairing probability threshold to consider or not the possibility of pairing")
("function,f", po::value<char>(&obj_function_nbr)->default_value('B'), "What objective function to use to include motifs: square of motif size in nucleotides like "
"RNA-MoIP (A), light motif size + high number of components (B), site score (C), light motif size + site score + high number of components (D)")
("mfe,E", "Minimize stacking energies as second criteria (should lead to MFE structure)")
("mea,A", "(default) Maximize expected accuracy as second criteria (should lead to MEA structure)")
("disable-pseudoknots,n", "Add constraints forbidding the formation of pseudoknots")
("limit,l", po::value<unsigned int>(&MOIP::max_sol_nbr_)->default_value(500), "Intermediate number of solutions in the Pareto set above which we give up the calculation.")
("verbose,v", "Print what is happening to stdout");
("verbose,v", "Print what is happening to console");
po::variables_map vm;
po::store(po::parse_command_line(argc, argv, desc), vm);
basename = remove_ext(inputName.c_str(), '.', '/');
try {
po::store(po::parse_command_line(argc, argv, desc), vm); // can throw
po::notify(vm);
if (vm.count("help") or vm.count("-h")) {
cout << "Biorseo, bio-objective integer linear programming framework to predict RNA secondary "
"structures by including known RNA modules."
<< endl
<< "developped by Louis Becquey, 2018-2021" << endl
<< "Lénaïc Durand, 2019" << endl
<< "Nathalie Bernard, 2021" << endl
<< endl
<< desc << endl;
cout << "Biorseo, Bi-Objective RNA Structure Efficient Optimizer" << endl
<< "Bio-objective integer linear programming framework to predict RNA secondary structures by including known RNA modules." << endl
<< "Developped by Louis Becquey, 2018-2021\nLénaïc Durand, 2019\nNathalie Bernard, 2021" << endl << endl
<< "Usage:\tYou must provide:\n\t1) a FASTA input file with -s," << endl
<< "\t2) a module type with --rna3dmotifs, --carnaval, --contacts or --pre-placed," << endl
<< "\t3) one module-based scoring function with --func A, B, C, D, E or F," << endl
<< "\t4) one energy-based scoring function with --mfe or --mea," << endl
<< "\t5) how to display results: in console (-v), or in a result file (-o)." << endl
<< endl
<< desc << endl;
return EXIT_SUCCESS;
}
if (vm.count("version")) {
cout << "Biorseo v2.1, dockerized, November 2021" << endl;
return EXIT_SUCCESS;
}
po::notify(vm); // throws on error, so do after help in case there are any problems
if (vm.count("mfe")) mea_or_mfe = 'a';
if (vm.count("mea")) mea_or_mfe = 'b';
if (vm.count("verbose")) verbose = true;
if (vm.count("disable-pseudoknots")) MOIP::allow_pk_ = false;
if (!vm.count("jar3dcsv") and !vm.count("bayespaircsv") and !vm.count("descfolder") and !vm.count("rinfolder") and !vm.count("jsonfolder")) {
cerr << "\033[31mYou must provide at least one of --descfolder, --rinfolder, --jsonfolder, --jar3dcsv or --bayespaircsv.\033[0m See --help "
"for more information."
<< endl;
return EXIT_FAILURE;
}
if ((vm.count("descfolder") or vm.count("jsonfolder") or vm.count("rinfolder")) and (obj_function_nbr == 'C' or obj_function_nbr == 'D')) {
cerr << "\033[31mYou must provide --jar3dcsv or --bayespaircsv to use --function C or --function D.\033[0m See "
"--help for more information."
<< endl;
return EXIT_FAILURE;
}
po::notify(vm); // throws on error, so do after help in case there are any problems
} catch (po::error& e) {
cerr << "ERROR: \033[31m" << e.what() << "\033[0m" << endl;
cerr << desc << endl;
......@@ -158,7 +148,7 @@ int main(int argc, char* argv[])
myRNA = RNA(fa->name(), fa->seq(), verbose);
if (verbose) cout << "\t> " << inputName << " successfuly loaded (" << myRNA.get_RNA_length() << " nt)" << endl;
// load CSV file
// check motif folder exists
if (access(motifs_path_name.c_str(), F_OK) == -1) {
cerr << "\033[31m" << motifs_path_name << " not found\033[0m" << endl;
return EXIT_FAILURE;
......@@ -167,16 +157,14 @@ int main(int argc, char* argv[])
/* FIND PARETO SET */
string source;
if (vm.count("jar3dcsv"))
source = "jar3dcsv";
else if (vm.count("bayespaircsv"))
source = "bayespaircsv";
else if (vm.count("rinfolder"))
if (vm.count("rinfolder"))
source = "rinfolder";
else if (vm.count("descfolder"))
source = "descfolder";
else if (vm.count("jsonfolder"))
source = "jsonfolder";
else if (vm.count("pre-placed"))
source = "csvfile";
else
cerr << "ERR: no source of modules provided !" << endl;
......@@ -254,22 +242,12 @@ int main(int argc, char* argv[])
if (verbose) cout << "Saving structures to " << outputName << "..." << endl;
outfile.open(outputName);
outfile << fa->name() << endl << fa->seq() << endl;
for (uint i = 0; i < myMOIP.get_n_solutions(); i++) {
outfile << myMOIP.solution(i).to_string() << endl << structure_with_contacts(myMOIP.solution(i)) << endl;
string str1 = myMOIP.solution(i).to_string();
// Check if the best score for obj2 is actually included in the results
// string delimiter = "\t";
// string obj2 = str1.substr(str1.find(delimiter) + delimiter.size());
// obj2 = obj2.substr(0, obj2.find(delimiter));
// if (obj2.compare(best_score) == 0)
// flag = true;*/
}
// if (!flag)
// cout << "\033[1m\033[31mBest score not found for " << outputName << " !\033[0m" << endl;
// else
// cout << "OK for "<< outputName << "!" << endl;
outfile.close();
}
......
__'CRYSTAL_STRUCTURE_OF_A_TIGHT-BINDING_GLUTAMINE_TRNA_BOUND_TO_GLUTAMINE_AMINOACYL_TRNA_SYNTHETASE_'_(PDB_00376)
GGGGUAUCGCCAAGCGGUAAGGCACCGGAUUCUGAUUCCGGAGGUCGAGGUUCGAAUCCUCGUACCCCAGCCA
(((((((((([..)))....(((.(((((.({..).))))).])))((((.(.}.).)))))))))))..... + JSON432 + JSON615 32.0000000 19.2280288
................****.................................................****
(((((((((([..)))....(((.((((((....).))))).])))((((.(...).)))))))))))..... + JSON169 + JSON432 + JSON615 48.0000000 19.2123884
................****..........****...................................****
(((((((.(([.....))..(((.(((((.({..).))))).])))((((.(.}.).)))))))))))..... + JSON322 + JSON478 52.0000000 18.3718588
............****................**...................................****
((((((((((...)))....(([.((((((....).)))))...))((((.(...).))))))))))).]... + JSON169 + JSON615 + JSON763 57.0000000 18.3583187
................****..........****.......***...........................**
(((((((((([..)))....(((.((((((....).))))).])))(((.........))))))))))..... + JSON169 + JSON386 + JSON432 + JSON615 64.0000000 18.2180422
................****..........****...............****................****
(((((([(((...))).)..(((.(((((.(...).))))).){))((((.(...).))))]})))))..... + JSON322 + JSON471 80.0000000 17.8060915
........**.....................***................*...........*........**
(((((([((([..))).)..((..((((((....).))))).]{))((((.(...).))))]})))))..... + JSON169 + JSON432 + JSON471 96.0000000 17.7717258
........**............*.......****......................*.....*......****
(((((.((((...)))....(((.(((((.[.....))))).){))((((.(..]).)))))})))))..... + JSON154 + JSON432 + JSON471 105.0000000 17.5888835
........**......**.............****.....................*.....*......****
(((((.(((([..)))....((..((((((....).))))).][))((((.(...).)))))])))))..... + JSON169 + JSON432 + JSON471 + JSON615 112.0000000 17.5772109
........**......****..*.......****......................*.....*......****
(((((.((((...)))....(((.((((([[.....))))).){))(((.....]]..))))})))))..... + JSON154 + JSON386 + JSON432 + JSON471 121.0000000 16.5955778
........**......**.............****..............****...*.....*......****
(((((.(((([..)))....((..((((((....).))))).][))(((.........))))])))))..... + JSON169 + JSON386 + JSON432 + JSON471 + JSON615 128.0000000 16.5828646
........**......****..*.......****...............****...*.....*......****
(((((.(.(([.........{)).(((((.(...).))))).][(}[[[[.[..)].]]]])])))))..... + JSON322 + JSON471 + JSON478 + JSON615 132.0000000 15.2549277
........**..********...........***................*...........*......****
(((((((.............(...((((((....).)))))....)(((.........))))))))))..... + JSON169 + JSON386 + JSON432 + JSON432 + JSON473 + JSON478 + JSON615 141.0000000 14.8831569
........************.**.......****.......****....****...............*****
((((((..(((.........))).(((((.......)))))....(((((.(...).)))))))))))..... + JSON38 2916.0000000 14.7452026
*******.*******......*****.....******............................********
((((((..(((.........))).(((((.......)))))....((((.........))))))))))..... + JSON38 + JSON386 2932.0000000 13.7508564
*******.*******......*****.....******............****............********
((((((..(((.........))).(((((.......)))))....((....(...)....))))))))..... + JSON38 + JSON473 2941.0000000 11.8254010
*******.*******......*****.....******..........****...*..........********
(((((((.............(...((((((....).)))))....)(((.........))))))))))..... + JSON169 + JSON386 + JSON432 + JSON432 + JSON473 + JSON478 + JSON615 141.0000000 14.8831569
........************.**.......****.......****....****..*.............****
(((((.(.(([.........{)).(((((.(...).))))).][(}[[[[.[..)].]]]])])))))..... + JSON322 + JSON471 + JSON478 + JSON615 132.0000000 15.2549277
........**..********...........***......................*.....*......****
(((((.(((([..)))....((..((((((....).))))).][))((((.(...).)))))])))))..... + JSON169 + JSON432 + JSON471 + JSON615 112.0000000 17.5772109
........**......****..*.......****................*...........*......****
(((((.((((...)))....(((.(((((.[.....))))).){))((((.(..]).)))))})))))..... + JSON154 + JSON432 + JSON471 105.0000000 17.5888835
........**......**.............****...............*...........*......****
(((((([((([..))).)..((..((((((....).))))).]{))((((.(...).))))]})))))..... + JSON169 + JSON432 + JSON471 96.0000000 17.7717258
........**............*.......****................*...........*......****
(((((([(((...))).)..(((.(((((.(...).))))).){))((((.(...).))))]})))))..... + JSON322 + JSON471 80.0000000 17.8060915
........**.....................***......................*.....*........**
(((((((((([..)))....(((.(((((.({..).))))).])))((((.(.}.).)))))))))))..... + JSON322 + JSON615 32.0000000 19.2280288
................****............**.....................................**
__'CRYSTAL_STRUCTURE_OF_A_TIGHT-BINDING_GLUTAMINE_TRNA_BOUND_TO_GLUTAMINE_AMINOACYL_TRNA_SYNTHETASE_'_(PDB_00376)
GGGGUAUCGCCAAGCGGUAAGGCACCGGAUUCUGAUUCCGGAGGUCGAGGUUCGAAUCCUCGUACCCCAGCCA
(((((((((([..)))....(((.(((((.({..).))))).])))((((.(.}.).)))))))))))..... + JSON322 + JSON615 1.5000000 19.2280288
................****............**.....................................**
(((((((((([..)))....(((.(((((.(...).))))).])))((((.(...).)))))))))))..... + JSON322 + JSON615 1.5000200 19.2269926
................****............**.....................................**
(((((((.(([.....))..(((.(((((.({..).))))).])))((((.(.}.).)))))))))))..... + JSON322 + JSON478 1.7737056 18.3718588
............****................**...................................****
((((((((((...)))....(([.((((((....).)))))...))((((.(...).))))))))))).]... + JSON169 + JSON615 + JSON763 1.8613531 18.3583187
................****..........****.......***...........................**
.((((((((([..)))....(((.(((((.({..).))))).])))((((.(.}.).))))))))))...... + JSON322 + JSON432 + JSON615 2.0000000 18.2455939
................****............**.................................******
(((((((((({..)))....(((.((((([([..).))))).})))(((....].]..))))))))))..... + JSON322 + JSON386 + JSON615 2.0000200 18.2347086
................****............**...............****..................**
(((((([(((...))).)..(((.(((((.(...).))))).){))((((.(...).))))]})))))..... + JSON322 + JSON471 3.0000000 17.8060915
........**.....................***......................*.....*........**
(((((.((((...)))....(((.(((((.(...).))))).)[))((((.(...).)))))])))))..... + JSON322 + JSON471 + JSON615 3.5000000 17.6115766
........**......****...........***......................*.....*........**
.((((.((((...)))....(((.(((((.(...).))))).)[))((((.(...).)))))]))))...... + JSON322 + JSON432 + JSON471 + JSON615 4.0000000 16.6291417
........**......****...........***......................*.....*....******
.((((.((((...)))....(((.((((([(...).))))).){))(((......]..))))}))))...... + JSON322 + JSON386 + JSON432 + JSON471 + JSON615 4.5000000 15.6358360
........**......****...........***...............****...*.....*....******
(((((.(.(([.........{)).(((((<(...).))))).][(}[[[.....)>..]]])])))))..... + JSON322 + JSON386 + JSON471 + JSON478 + JSON615 4.7737056 14.2616220
........**..********...........***...............****...*.....*......****
.((((.((((...)))....(((.(((((.(...).)))))[){))(....(...)...]))}))))...... + JSON322 + JSON432 + JSON471 + JSON473 + JSON615 4.8613531 13.7313706
........**......****...........***.............****...*.*.....*....******
.((((.((.....[[)....(]].((((([(...).)))))..{[)(((.....]]..))))}))))...... + JSON322 + JSON386 + JSON432 + JSON432 + JSON471 + JSON615 5.0000000 13.7177100
........****.**.****...........***...............****...*.....*....******
(((((.(.((..........[)).(((((.(...).)))))(({(][....[..)]..))])})))))..... + JSON322 + JSON471 + JSON473 + JSON478 + JSON615 5.1350587 12.3759713
........**..********...........***.............****.....*.....*.....*****
.((((.....(..[[)....(]]..(((([(...).))))...{[)(((.....]]..))).}))))...... + JSON322 + JSON386 + JSON432 + JSON471 + JSON615 + JSON632 5.2737056 11.9911280
......****...**.****...**......***...............****...*.....*....******
.((((.((.....[[)....(]].(((((.(...).)))))[[{.)(....(...)..]]))}))))...... + JSON322 + JSON432 + JSON432 + JSON471 + JSON473 + JSON615 5.3613531 11.8157013
........****.**.****...........***.............****...*.*.....*....******
.((....(.....[[)....(]].((((([(...).))))).{{[)(((.....]]..))).}.}))...... + JSON322 + JSON386 + JSON432 + JSON432 + JSON471 + JSON615 + JSON928 5.5000000 11.2444565
...****.****.**.****...........***...............****...*.....*....******
.(........(..[[)....(]].((((([(...).))))).{{[)(((.....]]..))).}..})...... + JSON322 + JSON386 + JSON432 + JSON433 + JSON471 + JSON615 + JSON632 5.7737056 10.1268257
..********...**.****...........***...............****...*.....***..******
.((....(.....[[)....(]].(((((.(...).)))))[[{[)(....(..])..]]).}..))...... + JSON322 + JSON432 + JSON432 + JSON471 + JSON473 + JSON615 + JSON928 5.8613531 9.3610813
...****.****.**.****...........***.............****.....*.....**...******
.((....(.....[[)....(]].((.[.[(...)....)).{{[)(((.....]].]))).}.}))...... + JSON156 + JSON322 + JSON386 + JSON432 + JSON432 + JSON471 + JSON615 + JSON928 6.0000000 8.2749650
...****.****.**.****...........***.****..........****...*.....*....******
.(........(..[[)....(]].(((((.(...).)))))[[{.)(....(...)..]]).}...)...... + JSON322 + JSON432 + JSON433 + JSON471 + JSON473 + JSON615 + JSON632 6.1350587 8.2098083
..********...**.****...........***.............****...*.*.....***..******
.(........(..[[)....(]].((.[.[(...)....)).{{[)(((.....]].]))).}..})...... + JSON156 + JSON322 + JSON386 + JSON432 + JSON433 + JSON471 + JSON615 + JSON632 6.2737056 7.1573341
..********...**.****...........***.****..........****...*.....***..******
.((....(.....[[)....(]].((.[..(...)....)){{<[)(....(..]).]}}).>..))...... + JSON156 + JSON322 + JSON432 + JSON432 + JSON471 + JSON473 + JSON615 + JSON928 6.3613531 6.3915897
...****.****.**.****...........***.****........****.....*.....**...******
.(........(..[[)....(]].((.[..(...)....)){{<.)(....(...).]}}).>...)...... + JSON156 + JSON322 + JSON432 + JSON433 + JSON471 + JSON473 + JSON615 + JSON632 6.6350587 5.2403168
..********...**.****...........***.****........****...*.*.....***..******
.((((.((((...)))....((..(((((.({..).)))))[[<))(....(.}.)..]]))>))))...... + JSON322 + JSON432 + JSON471 + JSON473 + JSON615 4.8613531 13.7314599
........**......****..*.........**.............****...*.*.....*....******
.((((.((((...)))....(((.(((((.(...).))))).)[))((((.(...).)))))]))))...... + JSON322 + JSON432 + JSON471 + JSON615 4.0000000 16.6291417
........**......****...........***................*...........*....******
(((((.((((...)))....(((.(((((.(...).))))).)[))((((.(...).)))))])))))..... + JSON322 + JSON471 + JSON615 3.5000000 17.6115766
........**......****...........***................*...........*........**
(((((([(((...))).)..(((.(((((.(...).))))).){))((((.(...).))))]})))))..... + JSON322 + JSON471 3.0000000 17.8060915
........**.....................***................*...........*........**
__'CRYSTAL_STRUCTURE_OF_A_TIGHT-BINDING_GLUTAMINE_TRNA_BOUND_TO_GLUTAMINE_AMINOACYL_TRNA_SYNTHETASE_'_(PDB_00376)
GGGGUAUCGCCAAGCGGUAAGGCACCGGAUUCUGAUUCCGGAGGUCGAGGUUCGAAUCCUCGUACCCCAGCCA
(((((((((([..)))....(((.(((((.({..).))))).])))((((.(.}.).)))))))))))..... 0.0000000 19.2280288
.........................................................................
(((((((((([..)))....(((.(((((.({..).))))).])))((((.(.}.).)))))))))))..... + JSON322 0.0000000 19.2280288
................................**.....................................**