Showing
7 changed files
with
9 additions
and
5 deletions
... | @@ -94,6 +94,8 @@ The detailed list of options is below: | ... | @@ -94,6 +94,8 @@ The detailed list of options is below: |
94 | -h [ --help ] Print this help message | 94 | -h [ --help ] Print this help message |
95 | --version Print the program version | 95 | --version Print the program version |
96 | 96 | ||
97 | +-f [ --full-inference ] Infer new 3D->family mappings even if Rfam already provides some. Yields more copies of chains | ||
98 | + mapped to different families. | ||
97 | -r 4.0 [ --resolution=4.0 ] Maximum 3D structure resolution to consider a RNA chain. | 99 | -r 4.0 [ --resolution=4.0 ] Maximum 3D structure resolution to consider a RNA chain. |
98 | -s Run statistics computations after completion | 100 | -s Run statistics computations after completion |
99 | --extract Extract the portions of 3D RNA chains to individual mmCIF files. | 101 | --extract Extract the portions of 3D RNA chains to individual mmCIF files. |
... | @@ -105,7 +107,7 @@ The detailed list of options is below: | ... | @@ -105,7 +107,7 @@ The detailed list of options is below: |
105 | RNAcifs/ Full structures containing RNA, in mmCIF format | 107 | RNAcifs/ Full structures containing RNA, in mmCIF format |
106 | rna_mapped_to_Rfam/ Extracted 'pure' RNA chains | 108 | rna_mapped_to_Rfam/ Extracted 'pure' RNA chains |
107 | datapoints/ Final results in CSV file format. | 109 | datapoints/ Final results in CSV file format. |
108 | ---seq-folder=… Path to a folder to store the sequence and alignment files. | 110 | +--seq-folder=… Path to a folder to store the sequence and alignment files. Subfolders will be: |
109 | rfam_sequences/fasta/ Compressed hits to Rfam families | 111 | rfam_sequences/fasta/ Compressed hits to Rfam families |
110 | realigned/ Sequences, covariance models, and alignments by family | 112 | realigned/ Sequences, covariance models, and alignments by family |
111 | --no-homology Do not try to compute PSSMs and do not align sequences. | 113 | --no-homology Do not try to compute PSSMs and do not align sequences. |
... | @@ -117,11 +119,12 @@ The detailed list of options is below: | ... | @@ -117,11 +119,12 @@ The detailed list of options is below: |
117 | --update-homologous Re-download Rfam and SILVA databases, realign all families, and recompute all CSV files | 119 | --update-homologous Re-download Rfam and SILVA databases, realign all families, and recompute all CSV files |
118 | --from-scratch Delete database, local 3D and sequence files, and known issues, and recompute. | 120 | --from-scratch Delete database, local 3D and sequence files, and known issues, and recompute. |
119 | --archive Create a tar.gz archive of the datapoints text files, and update the link to the latest archive | 121 | --archive Create a tar.gz archive of the datapoints text files, and update the link to the latest archive |
122 | +--no-logs Do not save per-chain logs of the numbering modifications | ||
120 | ``` | 123 | ``` |
121 | 124 | ||
122 | Typical usage: | 125 | Typical usage: |
123 | ``` | 126 | ``` |
124 | -nohup bash -c 'time ~/Projects/RNANet/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences -s --archive' & | 127 | +nohup bash -c 'time ~/Projects/RNANet/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences -s' & |
125 | ``` | 128 | ``` |
126 | 129 | ||
127 | ## Post-computation task: estimate quality | 130 | ## Post-computation task: estimate quality | ... | ... |
This diff is collapsed. Click to expand it.
This diff is collapsed. Click to expand it.
This diff could not be displayed because it is too large.
... | @@ -11,7 +11,7 @@ | ... | @@ -11,7 +11,7 @@ |
11 | # - Use a specialised database (SILVA) : better alignments (we guess?), but two kind of jobs | 11 | # - Use a specialised database (SILVA) : better alignments (we guess?), but two kind of jobs |
12 | # - Use cmalign --small everywhere (homogeneity) | 12 | # - Use cmalign --small everywhere (homogeneity) |
13 | # Moreover, --small requires --nonbanded --cyk, which means the output alignement is the optimally scored one. | 13 | # Moreover, --small requires --nonbanded --cyk, which means the output alignement is the optimally scored one. |
14 | -# To date, we trust Infernal as the best tool to realign RNA. Is it ? | 14 | +# To date, we trust Infernal as the best tool to realign ncRNA. Is it ? |
15 | 15 | ||
16 | # Contact: louis.becquey@univ-evry.fr (PhD student), fariza.tahi@univ-evry.fr (PI) | 16 | # Contact: louis.becquey@univ-evry.fr (PhD student), fariza.tahi@univ-evry.fr (PI) |
17 | 17 | ||
... | @@ -28,7 +28,7 @@ pd.set_option('display.max_rows', None) | ... | @@ -28,7 +28,7 @@ pd.set_option('display.max_rows', None) |
28 | LSU_set = ["RF00002", "RF02540", "RF02541", "RF02543", "RF02546"] # From Rfam CLAN 00112 | 28 | LSU_set = ["RF00002", "RF02540", "RF02541", "RF02543", "RF02546"] # From Rfam CLAN 00112 |
29 | SSU_set = ["RF00177", "RF02542", "RF02545", "RF01959", "RF01960"] # From Rfam CLAN 00111 | 29 | SSU_set = ["RF00177", "RF02542", "RF02545", "RF01959", "RF01960"] # From Rfam CLAN 00111 |
30 | 30 | ||
31 | -with sqlite3.connect("results/RNANet.db") as conn: | 31 | +with sqlite3.connect(os.getcwd()+"/results/RNANet.db") as conn: |
32 | df = pd.read_sql("SELECT rfam_acc, max_len, nb_total_homol, comput_time, comput_peak_mem FROM family;", conn) | 32 | df = pd.read_sql("SELECT rfam_acc, max_len, nb_total_homol, comput_time, comput_peak_mem FROM family;", conn) |
33 | 33 | ||
34 | to_remove = [ f for f in df.rfam_acc if f in LSU_set+SSU_set ] | 34 | to_remove = [ f for f in df.rfam_acc if f in LSU_set+SSU_set ] |
... | @@ -74,7 +74,7 @@ ax.set_ylabel("Maximum length of sequences ") | ... | @@ -74,7 +74,7 @@ ax.set_ylabel("Maximum length of sequences ") |
74 | ax.set_zlabel("Computation time (s)") | 74 | ax.set_zlabel("Computation time (s)") |
75 | 75 | ||
76 | plt.subplots_adjust(wspace=0.4) | 76 | plt.subplots_adjust(wspace=0.4) |
77 | -plt.savefig("results/cmalign_jobs_performance.png") | 77 | +plt.savefig(os.getcwd()+"/results/cmalign_jobs_performance.png") |
78 | 78 | ||
79 | # # ======================================================== | 79 | # # ======================================================== |
80 | # # Linear Regression of max_mem as function of max_length | 80 | # # Linear Regression of max_mem as function of max_length | ... | ... |
This diff is collapsed. Click to expand it.
-
Please register or login to post a comment