Showing
7 changed files
with
10 additions
and
6 deletions
... | @@ -94,6 +94,8 @@ The detailed list of options is below: | ... | @@ -94,6 +94,8 @@ The detailed list of options is below: |
94 | -h [ --help ] Print this help message | 94 | -h [ --help ] Print this help message |
95 | --version Print the program version | 95 | --version Print the program version |
96 | 96 | ||
97 | +-f [ --full-inference ] Infer new 3D->family mappings even if Rfam already provides some. Yields more copies of chains | ||
98 | + mapped to different families. | ||
97 | -r 4.0 [ --resolution=4.0 ] Maximum 3D structure resolution to consider a RNA chain. | 99 | -r 4.0 [ --resolution=4.0 ] Maximum 3D structure resolution to consider a RNA chain. |
98 | -s Run statistics computations after completion | 100 | -s Run statistics computations after completion |
99 | --extract Extract the portions of 3D RNA chains to individual mmCIF files. | 101 | --extract Extract the portions of 3D RNA chains to individual mmCIF files. |
... | @@ -105,7 +107,7 @@ The detailed list of options is below: | ... | @@ -105,7 +107,7 @@ The detailed list of options is below: |
105 | RNAcifs/ Full structures containing RNA, in mmCIF format | 107 | RNAcifs/ Full structures containing RNA, in mmCIF format |
106 | rna_mapped_to_Rfam/ Extracted 'pure' RNA chains | 108 | rna_mapped_to_Rfam/ Extracted 'pure' RNA chains |
107 | datapoints/ Final results in CSV file format. | 109 | datapoints/ Final results in CSV file format. |
108 | ---seq-folder=… Path to a folder to store the sequence and alignment files. | 110 | +--seq-folder=… Path to a folder to store the sequence and alignment files. Subfolders will be: |
109 | rfam_sequences/fasta/ Compressed hits to Rfam families | 111 | rfam_sequences/fasta/ Compressed hits to Rfam families |
110 | realigned/ Sequences, covariance models, and alignments by family | 112 | realigned/ Sequences, covariance models, and alignments by family |
111 | --no-homology Do not try to compute PSSMs and do not align sequences. | 113 | --no-homology Do not try to compute PSSMs and do not align sequences. |
... | @@ -117,11 +119,12 @@ The detailed list of options is below: | ... | @@ -117,11 +119,12 @@ The detailed list of options is below: |
117 | --update-homologous Re-download Rfam and SILVA databases, realign all families, and recompute all CSV files | 119 | --update-homologous Re-download Rfam and SILVA databases, realign all families, and recompute all CSV files |
118 | --from-scratch Delete database, local 3D and sequence files, and known issues, and recompute. | 120 | --from-scratch Delete database, local 3D and sequence files, and known issues, and recompute. |
119 | --archive Create a tar.gz archive of the datapoints text files, and update the link to the latest archive | 121 | --archive Create a tar.gz archive of the datapoints text files, and update the link to the latest archive |
122 | +--no-logs Do not save per-chain logs of the numbering modifications | ||
120 | ``` | 123 | ``` |
121 | 124 | ||
122 | Typical usage: | 125 | Typical usage: |
123 | ``` | 126 | ``` |
124 | -nohup bash -c 'time ~/Projects/RNANet/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences -s --archive' & | 127 | +nohup bash -c 'time ~/Projects/RNANet/RNAnet.py --3d-folder ~/Data/RNA/3D/ --seq-folder ~/Data/RNA/sequences -s' & |
125 | ``` | 128 | ``` |
126 | 129 | ||
127 | ## Post-computation task: estimate quality | 130 | ## Post-computation task: estimate quality | ... | ... |
This diff could not be displayed because it is too large.
This diff is collapsed. Click to expand it.
This diff could not be displayed because it is too large.
... | @@ -11,7 +11,7 @@ | ... | @@ -11,7 +11,7 @@ |
11 | # - Use a specialised database (SILVA) : better alignments (we guess?), but two kind of jobs | 11 | # - Use a specialised database (SILVA) : better alignments (we guess?), but two kind of jobs |
12 | # - Use cmalign --small everywhere (homogeneity) | 12 | # - Use cmalign --small everywhere (homogeneity) |
13 | # Moreover, --small requires --nonbanded --cyk, which means the output alignement is the optimally scored one. | 13 | # Moreover, --small requires --nonbanded --cyk, which means the output alignement is the optimally scored one. |
14 | -# To date, we trust Infernal as the best tool to realign RNA. Is it ? | 14 | +# To date, we trust Infernal as the best tool to realign ncRNA. Is it ? |
15 | 15 | ||
16 | # Contact: louis.becquey@univ-evry.fr (PhD student), fariza.tahi@univ-evry.fr (PI) | 16 | # Contact: louis.becquey@univ-evry.fr (PhD student), fariza.tahi@univ-evry.fr (PI) |
17 | 17 | ||
... | @@ -28,7 +28,7 @@ pd.set_option('display.max_rows', None) | ... | @@ -28,7 +28,7 @@ pd.set_option('display.max_rows', None) |
28 | LSU_set = ["RF00002", "RF02540", "RF02541", "RF02543", "RF02546"] # From Rfam CLAN 00112 | 28 | LSU_set = ["RF00002", "RF02540", "RF02541", "RF02543", "RF02546"] # From Rfam CLAN 00112 |
29 | SSU_set = ["RF00177", "RF02542", "RF02545", "RF01959", "RF01960"] # From Rfam CLAN 00111 | 29 | SSU_set = ["RF00177", "RF02542", "RF02545", "RF01959", "RF01960"] # From Rfam CLAN 00111 |
30 | 30 | ||
31 | -with sqlite3.connect("results/RNANet.db") as conn: | 31 | +with sqlite3.connect(os.getcwd()+"/results/RNANet.db") as conn: |
32 | df = pd.read_sql("SELECT rfam_acc, max_len, nb_total_homol, comput_time, comput_peak_mem FROM family;", conn) | 32 | df = pd.read_sql("SELECT rfam_acc, max_len, nb_total_homol, comput_time, comput_peak_mem FROM family;", conn) |
33 | 33 | ||
34 | to_remove = [ f for f in df.rfam_acc if f in LSU_set+SSU_set ] | 34 | to_remove = [ f for f in df.rfam_acc if f in LSU_set+SSU_set ] |
... | @@ -74,7 +74,7 @@ ax.set_ylabel("Maximum length of sequences ") | ... | @@ -74,7 +74,7 @@ ax.set_ylabel("Maximum length of sequences ") |
74 | ax.set_zlabel("Computation time (s)") | 74 | ax.set_zlabel("Computation time (s)") |
75 | 75 | ||
76 | plt.subplots_adjust(wspace=0.4) | 76 | plt.subplots_adjust(wspace=0.4) |
77 | -plt.savefig("results/cmalign_jobs_performance.png") | 77 | +plt.savefig(os.getcwd()+"/results/cmalign_jobs_performance.png") |
78 | 78 | ||
79 | # # ======================================================== | 79 | # # ======================================================== |
80 | # # Linear Regression of max_mem as function of max_length | 80 | # # Linear Regression of max_mem as function of max_length | ... | ... |
This diff is collapsed. Click to expand it.
-
Please register or login to post a comment