Showing
6 changed files
with
434 additions
and
65 deletions
... | @@ -68,12 +68,13 @@ To help you design your own SQL requests, we provide a description of the databa | ... | @@ -68,12 +68,13 @@ To help you design your own SQL requests, we provide a description of the databa |
68 | * `puckering`: Conformation of the ribose cycle (10 classes depending on the phase_angle value) | 68 | * `puckering`: Conformation of the ribose cycle (10 classes depending on the phase_angle value) |
69 | 69 | ||
70 | ## Table `align_column`, for positions in multiple sequence alignments | 70 | ## Table `align_column`, for positions in multiple sequence alignments |
71 | -* `column_id`: A unique identifier | ||
72 | * `rfam_acc`: The family's MSA the column belongs to | 71 | * `rfam_acc`: The family's MSA the column belongs to |
73 | * `index_ali`: Position of the column in the alignment (starts at 1) | 72 | * `index_ali`: Position of the column in the alignment (starts at 1) |
73 | +* `cm_coord`: Position of the column in the Rfam covariance model of the family (starts at 1). The value is NULL in portions that are insertions compared to the model. | ||
74 | * `freq_A`, `freq_C`, `freq_G`, `freq_U`, `freq_other`: Nucleotide frequencies in the alignment at this position | 74 | * `freq_A`, `freq_C`, `freq_G`, `freq_U`, `freq_other`: Nucleotide frequencies in the alignment at this position |
75 | * `gap_percent`: The frequencies of gaps at this position in the alignment (between 0.0 and 1.0) | 75 | * `gap_percent`: The frequencies of gaps at this position in the alignment (between 0.0 and 1.0) |
76 | * `consensus`: A consensus character (ACGUN or '-') summarizing the column, if we can. If >75% of the sequences are gaps at this position, the gap is picked as consensus. Otherwise, A/C/G/U is chosen if >50% of the non-gap positions are A/C/G/U. Otherwise, N is the consensus. | 76 | * `consensus`: A consensus character (ACGUN or '-') summarizing the column, if we can. If >75% of the sequences are gaps at this position, the gap is picked as consensus. Otherwise, A/C/G/U is chosen if >50% of the non-gap positions are A/C/G/U. Otherwise, N is the consensus. |
77 | +* `cons_sec_struct`: A consensus secondary structure for the RNAs of the family, obtained from the Infernal alignement. The structure is well-nested (no pseudoknots) and the possible symbols are '.' (unpaired) or '(' and ')' (paired). The field is NULL in portions that are insertions compared to the Rfam model of the family, meaning that their is no consensus on the structure. | ||
77 | 78 | ||
78 | There always is an entry, for each family (rfam_acc), with index_ali = 0; gap_percent = 1.0; and nucleotide frequencies set to 0.0. This entry is used when the nucleotide frequencies cannot be determined because of local alignment issues. | 79 | There always is an entry, for each family (rfam_acc), with index_ali = 0; gap_percent = 1.0; and nucleotide frequencies set to 0.0. This entry is used when the nucleotide frequencies cannot be determined because of local alignment issues. |
79 | 80 | ... | ... |
... | @@ -17,12 +17,12 @@ | ... | @@ -17,12 +17,12 @@ |
17 | * [DONE] Get filtered versions of the sequence alignments containing the 3D chains, publicly available for download | 17 | * [DONE] Get filtered versions of the sequence alignments containing the 3D chains, publicly available for download |
18 | * [DONE] Get a consensus residue for each alignement column | 18 | * [DONE] Get a consensus residue for each alignement column |
19 | * [DONE] Get an option to limit the number of cores | 19 | * [DONE] Get an option to limit the number of cores |
20 | +* [DONE] Move to SILVA LSU release 138.1 | ||
20 | * [UPCOMING] Automated annotation of detected Recurrent Interaction Networks (RINs), see http://carnaval.lri.fr/ . | 21 | * [UPCOMING] Automated annotation of detected Recurrent Interaction Networks (RINs), see http://carnaval.lri.fr/ . |
21 | * [UPCOMING] Possibly, automated detection of HLs and ILs from the 3D Motif Atlas (BGSU). Maybe. Their own website already does the job. | 22 | * [UPCOMING] Possibly, automated detection of HLs and ILs from the 3D Motif Atlas (BGSU). Maybe. Their own website already does the job. |
22 | * [UPCOMING] Weight sequences in alignment to give more importance to rarer sequences | 23 | * [UPCOMING] Weight sequences in alignment to give more importance to rarer sequences |
23 | * [UPCOMING] Give both gap_percent and insertion_gap_percent | 24 | * [UPCOMING] Give both gap_percent and insertion_gap_percent |
24 | * A field estimating the quality of the sequence alignment in table family. | 25 | * A field estimating the quality of the sequence alignment in table family. |
25 | * Possibly, more metrics about the alignments coming from Infernal. | 26 | * Possibly, more metrics about the alignments coming from Infernal. |
26 | -* Move to SILVA LSU release 138 | ||
27 | * Run cmscan ourselves from the NDB instead of using Rfam-PDB mappings ? (Iff this actually makes a real difference, untested yet) | 27 | * Run cmscan ourselves from the NDB instead of using Rfam-PDB mappings ? (Iff this actually makes a real difference, untested yet) |
28 | * Use and save Infernal alignment bounds and truncation information | 28 | * Use and save Infernal alignment bounds and truncation information | ... | ... |
... | @@ -862,10 +862,10 @@ class Downloader: | ... | @@ -862,10 +862,10 @@ class Downloader: |
862 | try: | 862 | try: |
863 | print(f"Downloading {unit} from SILVA...", end='', flush=True) | 863 | print(f"Downloading {unit} from SILVA...", end='', flush=True) |
864 | if unit == "LSU": | 864 | if unit == "LSU": |
865 | - subprocess.run(["wget", "-nv", "http://www.arb-silva.de/fileadmin/arb_web_db/release_132/ARB_files/SILVA_132_LSURef_07_12_17_opt.arb.gz", | 865 | + subprocess.run(["wget", "-nv", "https://www.arb-silva.de/fileadmin/arb_web_db/release_138_1/ARB_files/SILVA_138.1_LSURef_opt.arb.gz", |
866 | "-O", path_to_seq_data + "realigned/LSU.arb.gz"]) | 866 | "-O", path_to_seq_data + "realigned/LSU.arb.gz"]) |
867 | else: | 867 | else: |
868 | - subprocess.run(["wget", "-nv", "http://www.arb-silva.de/fileadmin/silva_databases/release_138/ARB_files/SILVA_138_SSURef_05_01_20_opt.arb.gz", | 868 | + subprocess.run(["wget", "-nv", "https://www.arb-silva.de/fileadmin/arb_web_db/release_138_1/ARB_files/SILVA_138.1_SSURef_opt.arb.gz", |
869 | "-O", path_to_seq_data + "realigned/SSU.arb.gz"]) | 869 | "-O", path_to_seq_data + "realigned/SSU.arb.gz"]) |
870 | except: | 870 | except: |
871 | warn(f"Error downloading the {unit} database from SILVA", error=True) | 871 | warn(f"Error downloading the {unit} database from SILVA", error=True) |
... | @@ -1379,11 +1379,11 @@ class Pipeline: | ... | @@ -1379,11 +1379,11 @@ class Pipeline: |
1379 | for r in results: | 1379 | for r in results: |
1380 | align = AlignIO.read(path_to_seq_data + "realigned/" + r[0] + "++.afa", "fasta") | 1380 | align = AlignIO.read(path_to_seq_data + "realigned/" + r[0] + "++.afa", "fasta") |
1381 | nb_3d_chains = len([1 for r in align if '[' in r.id]) | 1381 | nb_3d_chains = len([1 for r in align if '[' in r.id]) |
1382 | - if r[0] in SSU_set: # SSU v138 is used | 1382 | + if r[0] in SSU_set: # SSU v138.1 is used |
1383 | - nb_homologs = 2225272 # source: https://www.arb-silva.de/documentation/release-138/ | 1383 | + nb_homologs = 2224740 # source: https://www.arb-silva.de/documentation/release-1381/ |
1384 | nb_total_homol = nb_homologs + nb_3d_chains | 1384 | nb_total_homol = nb_homologs + nb_3d_chains |
1385 | - elif r[0] in LSU_set: # LSU v132 is used | 1385 | + elif r[0] in LSU_set: # LSU v138.1 is used |
1386 | - nb_homologs = 198843 # source: https://www.arb-silva.de/documentation/release-132/ | 1386 | + nb_homologs = 227331 # source: https://www.arb-silva.de/documentation/release-1381/ |
1387 | nb_total_homol = nb_homologs + nb_3d_chains | 1387 | nb_total_homol = nb_homologs + nb_3d_chains |
1388 | else: | 1388 | else: |
1389 | nb_total_homol = len(align) | 1389 | nb_total_homol = len(align) |
... | @@ -1433,6 +1433,32 @@ class Pipeline: | ... | @@ -1433,6 +1433,32 @@ class Pipeline: |
1433 | p.join() | 1433 | p.join() |
1434 | exit(1) | 1434 | exit(1) |
1435 | 1435 | ||
1436 | + def extractCMs(self): | ||
1437 | + """ | ||
1438 | + Extracts Rfam CMs of the families for which we have 3D structures and compresses | ||
1439 | + them for later use with cmscan. | ||
1440 | + """ | ||
1441 | + | ||
1442 | + # Retrieve the list of families and write them to a "keys file" | ||
1443 | + allfams = [] | ||
1444 | + with sqlite3.connect(runDir+"/results/RNANet.db", timeout=10.0) as conn: | ||
1445 | + conn.execute('pragma journal_mode=wal') | ||
1446 | + allfams = sql_ask_database(conn, "SELECT DISTINCT rfam_acc FROM chain ORDER BY rfam_acc ASC;") | ||
1447 | + allfams = [ r[0]+"\n" for r in allfams ] | ||
1448 | + if not len(allfams): | ||
1449 | + return | ||
1450 | + with open(runDir + "/results/available_families.txt", "w") as f: | ||
1451 | + f.writelines(allfams) | ||
1452 | + | ||
1453 | + # Extract the families from Rfam.cm | ||
1454 | + os.makedirs(runDir + "/results/cm/", exist_ok=True) | ||
1455 | + subprocess.run(["cmfetch", "-f", "-o", runDir + "/results/cm/RNANet.cm", path_to_seq_data + "Rfam.cm", runDir + "/results/available_families.txt"]) | ||
1456 | + os.remove(runDir + "/results/available_families.txt") | ||
1457 | + | ||
1458 | + # Compress the cm database for use with cmscan | ||
1459 | + subprocess.run(["rm", "-f", runDir + "/results/cm/RNANet.cm.i1p", runDir + "/results/cm/RNANet.cm.i1i", runDir + "/results/cm/RNANet.cm.i1m", runDir + "/results/cm/RNANet.cm.i1f"]) | ||
1460 | + subprocess.run(["cmpress", runDir + "/results/cm/RNANet.cm"]) | ||
1461 | + | ||
1436 | def output_results(self): | 1462 | def output_results(self): |
1437 | """Produces CSV files, archive them, and additional metadata files | 1463 | """Produces CSV files, archive them, and additional metadata files |
1438 | 1464 | ||
... | @@ -1665,8 +1691,8 @@ def sql_define_tables(conn): | ... | @@ -1665,8 +1691,8 @@ def sql_define_tables(conn): |
1665 | pair_count_tSS SMALLINT, | 1691 | pair_count_tSS SMALLINT, |
1666 | pair_count_other SMALLINT, | 1692 | pair_count_other SMALLINT, |
1667 | UNIQUE (structure_id, chain_name, rfam_acc), | 1693 | UNIQUE (structure_id, chain_name, rfam_acc), |
1668 | - FOREIGN KEY(structure_id) REFERENCES structure(pdb_id), | 1694 | + FOREIGN KEY(structure_id) REFERENCES structure(pdb_id) ON DELETE CASCADE, |
1669 | - FOREIGN KEY(rfam_acc) REFERENCES family(rfam_acc) | 1695 | + FOREIGN KEY(rfam_acc) REFERENCES family(rfam_acc) ON DELETE CASCADE |
1670 | ); | 1696 | ); |
1671 | CREATE TABLE IF NOT EXISTS nucleotide ( | 1697 | CREATE TABLE IF NOT EXISTS nucleotide ( |
1672 | chain_id INT, | 1698 | chain_id INT, |
... | @@ -1721,6 +1747,7 @@ def sql_define_tables(conn): | ... | @@ -1721,6 +1747,7 @@ def sql_define_tables(conn): |
1721 | CREATE TABLE IF NOT EXISTS align_column ( | 1747 | CREATE TABLE IF NOT EXISTS align_column ( |
1722 | rfam_acc CHAR(7) NOT NULL, | 1748 | rfam_acc CHAR(7) NOT NULL, |
1723 | index_ali INT NOT NULL, | 1749 | index_ali INT NOT NULL, |
1750 | + cm_coord INT, | ||
1724 | freq_A REAL, | 1751 | freq_A REAL, |
1725 | freq_C REAL, | 1752 | freq_C REAL, |
1726 | freq_G REAL, | 1753 | freq_G REAL, |
... | @@ -1728,12 +1755,17 @@ def sql_define_tables(conn): | ... | @@ -1728,12 +1755,17 @@ def sql_define_tables(conn): |
1728 | freq_other REAL, | 1755 | freq_other REAL, |
1729 | gap_percent REAL, | 1756 | gap_percent REAL, |
1730 | consensus CHAR(1), | 1757 | consensus CHAR(1), |
1758 | + cons_sec_struct CHAR(1), | ||
1731 | PRIMARY KEY (rfam_acc, index_ali), | 1759 | PRIMARY KEY (rfam_acc, index_ali), |
1732 | - FOREIGN KEY(rfam_acc) REFERENCES family(rfam_acc) | 1760 | + FOREIGN KEY(rfam_acc) REFERENCES family(rfam_acc) ON DELETE CASCADE |
1733 | ); | 1761 | ); |
1734 | """) | 1762 | """) |
1735 | conn.commit() | 1763 | conn.commit() |
1736 | 1764 | ||
1765 | + # Prepare the WAL files while we're in single thread, otherwise it sometimes fails | ||
1766 | + # at the first access in WAL mode | ||
1767 | + conn.execute("pragma journal_mode=wal") | ||
1768 | + | ||
1737 | @trace_unhandled_exceptions | 1769 | @trace_unhandled_exceptions |
1738 | def sql_ask_database(conn, sql, warn_every=10): | 1770 | def sql_ask_database(conn, sql, warn_every=10): |
1739 | """ | 1771 | """ |
... | @@ -1966,12 +1998,12 @@ def work_infer_mappings(update_only, allmappings, fullinference, codelist) -> li | ... | @@ -1966,12 +1998,12 @@ def work_infer_mappings(update_only, allmappings, fullinference, codelist) -> li |
1966 | sel_5_to_3 = (inferred_mappings['pdb_start'] < inferred_mappings['pdb_end']) | 1998 | sel_5_to_3 = (inferred_mappings['pdb_start'] < inferred_mappings['pdb_end']) |
1967 | thisfam_5_3 = (inferred_mappings['rfam_acc'] == rfam) & sel_5_to_3 | 1999 | thisfam_5_3 = (inferred_mappings['rfam_acc'] == rfam) & sel_5_to_3 |
1968 | thisfam_3_5 = (inferred_mappings['rfam_acc'] == rfam) & (sel_5_to_3 == False) | 2000 | thisfam_3_5 = (inferred_mappings['rfam_acc'] == rfam) & (sel_5_to_3 == False) |
1969 | - print() | 2001 | + # print() |
1970 | - warn(f"Found mappings to {rfam} in both directions on the same interval, keeping only the 5'->3' one.") | 2002 | + # warn(f"Found mappings to {rfam} in both directions on the same interval, keeping only the 5'->3' one.") |
1971 | else: | 2003 | else: |
1972 | warn(f"There are mappings for {rfam} in both directions:", error=True) | 2004 | warn(f"There are mappings for {rfam} in both directions:", error=True) |
1973 | print(inferred_mappings) | 2005 | print(inferred_mappings) |
1974 | - exit(1) | 2006 | + # exit(1) |
1975 | 2007 | ||
1976 | # Compute consensus for chains in 5' -> 3' sense | 2008 | # Compute consensus for chains in 5' -> 3' sense |
1977 | if len(inferred_mappings[thisfam_5_3]): | 2009 | if len(inferred_mappings[thisfam_5_3]): |
... | @@ -2297,8 +2329,10 @@ def work_realign(rfam_acc): | ... | @@ -2297,8 +2329,10 @@ def work_realign(rfam_acc): |
2297 | 2329 | ||
2298 | # Align the new sequences | 2330 | # Align the new sequences |
2299 | with open(new_ali_path, 'w') as o: | 2331 | with open(new_ali_path, 'w') as o: |
2300 | - p1 = subprocess.run(["cmalign", path_to_seq_data + f"realigned/{rfam_acc}.cm", | 2332 | + p1 = subprocess.run(["cmalign", "--nonbanded", "--ifile", path_to_seq_data + f"realigned/{rfam_acc}.ins", |
2301 | - path_to_seq_data + f"realigned/{rfam_acc}_new.fa"], | 2333 | + "--sfile", path_to_seq_data + f"realigned/{rfam_acc}.tsv", |
2334 | + path_to_seq_data + f"realigned/{rfam_acc}.cm", | ||
2335 | + path_to_seq_data + f"realigned/{rfam_acc}_new.fa"], | ||
2302 | stdout=o, stderr=subprocess.PIPE) | 2336 | stdout=o, stderr=subprocess.PIPE) |
2303 | notify("Aligned new sequences together") | 2337 | notify("Aligned new sequences together") |
2304 | 2338 | ||
... | @@ -2337,8 +2371,8 @@ def work_realign(rfam_acc): | ... | @@ -2337,8 +2371,8 @@ def work_realign(rfam_acc): |
2337 | # Alignment does not exist yet. We need to compute it from scratch. | 2371 | # Alignment does not exist yet. We need to compute it from scratch. |
2338 | print(f"\t> Aligning {rfam_acc} sequences together (cmalign) ...", end='', flush=True) | 2372 | print(f"\t> Aligning {rfam_acc} sequences together (cmalign) ...", end='', flush=True) |
2339 | 2373 | ||
2340 | - p = subprocess.run(["cmalign", "--small", "--cyk", "--noprob", "--nonbanded", "--notrunc", | 2374 | + p = subprocess.run(["cmalign", "--nonbanded", '-o', path_to_seq_data + f"realigned/{rfam_acc}++.stk", |
2341 | - '-o', path_to_seq_data + f"realigned/{rfam_acc}++.stk", | 2375 | + "--ifile", path_to_seq_data + f"realigned/{rfam_acc}.ins", "--sfile", path_to_seq_data + f"realigned/{rfam_acc}.tsv", |
2342 | path_to_seq_data + f"realigned/{rfam_acc}.cm", | 2376 | path_to_seq_data + f"realigned/{rfam_acc}.cm", |
2343 | path_to_seq_data + f"realigned/{rfam_acc}++.fa"], | 2377 | path_to_seq_data + f"realigned/{rfam_acc}++.fa"], |
2344 | stdout=subprocess.DEVNULL, stderr=subprocess.PIPE) | 2378 | stdout=subprocess.DEVNULL, stderr=subprocess.PIPE) |
... | @@ -2532,6 +2566,30 @@ def work_pssm_remap(f): | ... | @@ -2532,6 +2566,30 @@ def work_pssm_remap(f): |
2532 | 2566 | ||
2533 | setproctitle(f"RNAnet.py work_pssm_remap({f}) saving") | 2567 | setproctitle(f"RNAnet.py work_pssm_remap({f}) saving") |
2534 | 2568 | ||
2569 | + # Get back the information of match/insertion states from the STK file | ||
2570 | + alignstk = AlignIO.read(path_to_seq_data + "realigned/" + f + "++.stk", "stockholm") | ||
2571 | + consensus_2d = alignstk.column_annotations["secondary_structure"] | ||
2572 | + del alignstk | ||
2573 | + cm_coord = 1 | ||
2574 | + cm_coords = [] | ||
2575 | + cm_2d = [] | ||
2576 | + for x in consensus_2d: | ||
2577 | + if x == ".": | ||
2578 | + cm_coords.append(None) | ||
2579 | + cm_2d.append(None) | ||
2580 | + else: | ||
2581 | + cm_coords.append(cm_coord) | ||
2582 | + if x in "[(<{": | ||
2583 | + cm_2d.append("(") | ||
2584 | + elif x in "])>}": | ||
2585 | + cm_2d.append(")") | ||
2586 | + elif x in ",_-:": | ||
2587 | + cm_2d.append(".") | ||
2588 | + else: | ||
2589 | + warn("Unsupported WUSS secondary structure symbol : "+x) | ||
2590 | + cm_2d.append(".") | ||
2591 | + cm_coord += 1 | ||
2592 | + | ||
2535 | # Save the re_mappings | 2593 | # Save the re_mappings |
2536 | conn = sqlite3.connect(runDir + '/results/RNANet.db', timeout=20.0) | 2594 | conn = sqlite3.connect(runDir + '/results/RNANet.db', timeout=20.0) |
2537 | conn.execute('pragma journal_mode=wal') # Allow multiple other readers to ask things while we execute this writing query | 2595 | conn.execute('pragma journal_mode=wal') # Allow multiple other readers to ask things while we execute this writing query |
... | @@ -2550,20 +2608,20 @@ def work_pssm_remap(f): | ... | @@ -2550,20 +2608,20 @@ def work_pssm_remap(f): |
2550 | conn.commit() | 2608 | conn.commit() |
2551 | 2609 | ||
2552 | # Save the useful columns in the database | 2610 | # Save the useful columns in the database |
2553 | - data = [(f, j) + tuple(pssm_info[:,j-1]) + (consensus[j-1],) for j in sorted(columns_to_save)] | 2611 | + data = [(f, j, cm_coords[j-1]) + tuple(pssm_info[:,j-1]) + (consensus[j-1], cm_2d[j-1]) for j in sorted(columns_to_save)] |
2554 | - sql_execute(conn, """INSERT INTO align_column (rfam_acc, index_ali, freq_A, freq_C, freq_G, freq_U, freq_other, gap_percent, consensus) | 2612 | + sql_execute(conn, """INSERT INTO align_column (rfam_acc, index_ali, cm_coord, freq_A, freq_C, freq_G, freq_U, freq_other, gap_percent, consensus, cons_sec_struct) |
2555 | - VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) ON CONFLICT(rfam_acc, index_ali) DO | 2613 | + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ON CONFLICT(rfam_acc, index_ali) DO |
2556 | - UPDATE SET freq_A=excluded.freq_A, freq_C=excluded.freq_C, freq_G=excluded.freq_G, freq_U=excluded.freq_U, | 2614 | + UPDATE SET cm_coord=excluded.cm_coord, freq_A=excluded.freq_A, freq_C=excluded.freq_C, freq_G=excluded.freq_G, freq_U=excluded.freq_U, |
2557 | - freq_other=excluded.freq_other, gap_percent=excluded.gap_percent, consensus=excluded.consensus;""", many=True, data=data) | 2615 | + freq_other=excluded.freq_other, gap_percent=excluded.gap_percent, consensus=excluded.consensus, cons_sec_struct=excluded.cons_sec_struct;""", many=True, data=data) |
2558 | # Add an unknown values column, with index_ali 0 (for nucleotides unsolved in 3D giving a gap '-' but found facing letter in the alignment) | 2616 | # Add an unknown values column, with index_ali 0 (for nucleotides unsolved in 3D giving a gap '-' but found facing letter in the alignment) |
2559 | - sql_execute(conn, f"""INSERT OR IGNORE INTO align_column (rfam_acc, index_ali, freq_A, freq_C, freq_G, freq_U, freq_other, gap_percent, consensus) | 2617 | + sql_execute(conn, f"""INSERT OR IGNORE INTO align_column (rfam_acc, index_ali, cm_coord, freq_A, freq_C, freq_G, freq_U, freq_other, gap_percent, consensus, cons_sec_struct) |
2560 | - VALUES (?, 0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, '-');""", data=(f,)) | 2618 | + VALUES (?, 0, NULL, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, '-', "NULL");""", data=(f,)) |
2561 | # Save the number of "used columns" to table family ( = the length of the alignment if it was composed only of the RNANet chains) | 2619 | # Save the number of "used columns" to table family ( = the length of the alignment if it was composed only of the RNANet chains) |
2562 | sql_execute(conn, f"UPDATE family SET ali_filtered_len = ? WHERE rfam_acc = ?;", data=(len(columns_to_save), f)) | 2620 | sql_execute(conn, f"UPDATE family SET ali_filtered_len = ? WHERE rfam_acc = ?;", data=(len(columns_to_save), f)) |
2563 | conn.close() | 2621 | conn.close() |
2564 | 2622 | ||
2565 | ########################################################################################## | 2623 | ########################################################################################## |
2566 | - # Saving the filtered alignement with only the saved positinos | 2624 | + # Saving the filtered alignement with only the saved positions |
2567 | ########################################################################################## | 2625 | ########################################################################################## |
2568 | 2626 | ||
2569 | setproctitle(f"RNAnet.py work_pssm_remap({f}) filtering alignment") | 2627 | setproctitle(f"RNAnet.py work_pssm_remap({f}) filtering alignment") |
... | @@ -2600,8 +2658,8 @@ def work_save(c, homology=True): | ... | @@ -2600,8 +2658,8 @@ def work_save(c, homology=True): |
2600 | conn.execute('pragma journal_mode=wal') | 2658 | conn.execute('pragma journal_mode=wal') |
2601 | if homology: | 2659 | if homology: |
2602 | df = pd.read_sql_query(f""" | 2660 | df = pd.read_sql_query(f""" |
2603 | - SELECT index_chain, old_nt_resnum, nt_position, nt_name, nt_code, nt_align_code, | 2661 | + SELECT index_chain, old_nt_resnum, nt_position, nt_name, nt_code, nt_align_code, cm_coord, |
2604 | - is_A, is_C, is_G, is_U, is_other, freq_A, freq_C, freq_G, freq_U, freq_other, gap_percent, consensus, dbn, | 2662 | + is_A, is_C, is_G, is_U, is_other, freq_A, freq_C, freq_G, freq_U, freq_other, gap_percent, consensus, cons_sec_struct, dbn, |
2605 | paired, nb_interact, pair_type_LW, pair_type_DSSR, alpha, beta, gamma, delta, epsilon, zeta, epsilon_zeta, | 2663 | paired, nb_interact, pair_type_LW, pair_type_DSSR, alpha, beta, gamma, delta, epsilon, zeta, epsilon_zeta, |
2606 | chi, bb_type, glyco_bond, form, ssZp, Dp, eta, theta, eta_prime, theta_prime, eta_base, theta_base, | 2664 | chi, bb_type, glyco_bond, form, ssZp, Dp, eta, theta, eta_prime, theta_prime, eta_base, theta_base, |
2607 | v0, v1, v2, v3, v4, amplitude, phase_angle, puckering FROM | 2665 | v0, v1, v2, v3, v4, amplitude, phase_angle, puckering FROM |
... | @@ -2682,6 +2740,7 @@ if __name__ == "__main__": | ... | @@ -2682,6 +2740,7 @@ if __name__ == "__main__": |
2682 | work_save(c, homology=False) | 2740 | work_save(c, homology=False) |
2683 | print("Completed.") | 2741 | print("Completed.") |
2684 | exit(0) | 2742 | exit(0) |
2743 | + | ||
2685 | 2744 | ||
2686 | # At this point, structure, chain and nucleotide tables of the database are up to date. | 2745 | # At this point, structure, chain and nucleotide tables of the database are up to date. |
2687 | # (Modulo some statistics computed by statistics.py) | 2746 | # (Modulo some statistics computed by statistics.py) |
... | @@ -2716,6 +2775,7 @@ if __name__ == "__main__": | ... | @@ -2716,6 +2775,7 @@ if __name__ == "__main__": |
2716 | idxQueue = thr_idx_mgr.Queue() | 2775 | idxQueue = thr_idx_mgr.Queue() |
2717 | 2776 | ||
2718 | pp.remap() | 2777 | pp.remap() |
2778 | + pp.extractCMs() | ||
2719 | 2779 | ||
2720 | # At this point, the align_column and re_mapping tables are up-to-date. | 2780 | # At this point, the align_column and re_mapping tables are up-to-date. |
2721 | 2781 | ... | ... |
known_issues.txt
0 → 100644
1 | +2z9q_1_A_1-72 | ||
2 | +1ml5_1_b_5-121 | ||
3 | +1ml5_1_a_1-2914 | ||
4 | +3ep2_1_Y_1-72 | ||
5 | +3eq3_1_Y_1-72 | ||
6 | +4v48_1_A6_1-73 | ||
7 | +1ml5_1_A_2-1520 | ||
8 | +1qzb_1_B_1-73 | ||
9 | +1qza_1_B_1-73 | ||
10 | +1ls2_1_B_1-73 | ||
11 | +1gsg_1_T_1-72 | ||
12 | +3jcr_1_H_1-115 | ||
13 | +1vy7_1_AY_1-73 | ||
14 | +1vy7_1_CY_1-73 | ||
15 | +4w2h_1_CY_1-73 | ||
16 | +5zzm_1_M_3-118 | ||
17 | +2rdo_1_A_3-118 | ||
18 | +4v48_1_A9_3-118 | ||
19 | +4v47_1_A9_3-118 | ||
20 | +2ob7_1_A_10-319 | ||
21 | +1x1l_1_A_1-132 | ||
22 | +1zc8_1_Z_1-93 | ||
23 | +2ob7_1_D_1-132 | ||
24 | +4v42_1_BB_5-121 | ||
25 | +4v42_1_BA_1-2914 | ||
26 | +1r2x_1_C_1-58 | ||
27 | +1r2w_1_C_1-58 | ||
28 | +1eg0_1_L_1-56 | ||
29 | +5zzm_1_N_1-2904 | ||
30 | +2rdo_1_B_1-2904 | ||
31 | +3dg2_1_B_1-2904 | ||
32 | +3dg0_1_B_1-2904 | ||
33 | +4v48_1_A0_1-2904 | ||
34 | +4v47_1_A0_1-2904 | ||
35 | +3dg4_1_B_1-2904 | ||
36 | +3dg5_1_B_1-2904 | ||
37 | +3dg2_1_A_1-1542 | ||
38 | +3dg0_1_A_1-1542 | ||
39 | +4v48_1_BA_1-1543 | ||
40 | +4v47_1_BA_1-1542 | ||
41 | +3dg4_1_A_1-1542 | ||
42 | +3dg5_1_A_1-1542 | ||
43 | +1eg0_1_O_1-73 | ||
44 | +1zc8_1_A_1-59 | ||
45 | +1mvr_1_D_1-61 | ||
46 | +4adx_1_9_1-123 | ||
47 | +1zn1_1_B_1-59 | ||
48 | +1jgq_1_A_2-1520 | ||
49 | +4v42_1_AA_2-1520 | ||
50 | +1jgo_1_A_2-1520 | ||
51 | +1jgp_1_A_2-1520 | ||
52 | +1emi_1_B_1-108 | ||
53 | +3iy9_1_A_498-1027 | ||
54 | +3ep2_1_B_1-50 | ||
55 | +3eq3_1_B_1-50 | ||
56 | +3eq4_1_B_1-50 | ||
57 | +3pgw_1_R_1-164 | ||
58 | +3pgw_1_N_1-164 | ||
59 | +3cw1_1_x_1-138 | ||
60 | +3cw1_1_w_1-138 | ||
61 | +3cw1_1_V_1-138 | ||
62 | +3cw1_1_v_1-138 | ||
63 | +2iy3_1_B_9-105 | ||
64 | +3jcr_1_N_1-107 | ||
65 | +2vaz_1_A_64-177 | ||
66 | +2ftc_1_R_81-1466 | ||
67 | +3jcr_1_M_1-141 | ||
68 | +4v5z_1_B0_1-2902 | ||
69 | +5g2x_1_A_595-692 | ||
70 | +3iy8_1_A_1-540 | ||
71 | +4v5z_1_BY_2-113 | ||
72 | +4v5z_1_BZ_1-70 | ||
73 | +4v5z_1_B1_2-125 | ||
74 | +4adx_1_0_1-2925 | ||
75 | +1mvr_1_B_3-96 | ||
76 | +3eq4_1_Y_1-69 | ||
77 | +6uz7_1_8_2140-2827 | ||
78 | +4v5z_1_AA_1-1563 |
known_issues_reasons.txt
0 → 100644
1 | +2z9q_1_A_1-72 | ||
2 | +DSSR warning 2z9q.json: no nucleotides found. Ignoring 2z9q_1_A_1-72. | ||
3 | + | ||
4 | +1ml5_1_b_5-121 | ||
5 | +Could not find nucleotides of chain b in annotation 1ml5.json. Either there is a problem with 1ml5 mmCIF download, or the bases are not resolved in the structure. Delete it and retry. | ||
6 | + | ||
7 | +1ml5_1_a_1-2914 | ||
8 | +Could not find nucleotides of chain a in annotation 1ml5.json. Either there is a problem with 1ml5 mmCIF download, or the bases are not resolved in the structure. Delete it and retry. | ||
9 | + | ||
10 | +3ep2_1_Y_1-72 | ||
11 | +DSSR warning 3ep2.json: no nucleotides found. Ignoring 3ep2_1_Y_1-72. | ||
12 | + | ||
13 | +3eq3_1_Y_1-72 | ||
14 | +DSSR warning 3eq3.json: no nucleotides found. Ignoring 3eq3_1_Y_1-72. | ||
15 | + | ||
16 | +4v48_1_A6_1-73 | ||
17 | +DSSR warning 4v48.json: no nucleotides found. Ignoring 4v48_1_A6_1-73. | ||
18 | + | ||
19 | +1ml5_1_A_2-1520 | ||
20 | +Could not find nucleotides of chain A in annotation 1ml5.json. Either there is a problem with 1ml5 mmCIF download, or the bases are not resolved in the structure. Delete it and retry. | ||
21 | + | ||
22 | +1qzb_1_B_1-73 | ||
23 | +DSSR warning 1qzb.json: no nucleotides found. Ignoring 1qzb_1_B_1-73. | ||
24 | + | ||
25 | +1qza_1_B_1-73 | ||
26 | +DSSR warning 1qza.json: no nucleotides found. Ignoring 1qza_1_B_1-73. | ||
27 | + | ||
28 | +1ls2_1_B_1-73 | ||
29 | +DSSR warning 1ls2.json: no nucleotides found. Ignoring 1ls2_1_B_1-73. | ||
30 | + | ||
31 | +1gsg_1_T_1-72 | ||
32 | +DSSR warning 1gsg.json: no nucleotides found. Ignoring 1gsg_1_T_1-72. | ||
33 | + | ||
34 | +3jcr_1_H_1-115 | ||
35 | +DSSR warning 3jcr.json: no nucleotides found. Ignoring 3jcr_1_H_1-115. | ||
36 | + | ||
37 | +1vy7_1_AY_1-73 | ||
38 | +Sequence is too short. (< 5 resolved nts) | ||
39 | + | ||
40 | +1vy7_1_CY_1-73 | ||
41 | +Sequence is too short. (< 5 resolved nts) | ||
42 | + | ||
43 | +4w2h_1_CY_1-73 | ||
44 | +Sequence is too short. (< 5 resolved nts) | ||
45 | + | ||
46 | +5zzm_1_M_3-118 | ||
47 | +DSSR warning 5zzm.json: no nucleotides found. Ignoring 5zzm_1_M_3-118. | ||
48 | + | ||
49 | +2rdo_1_A_3-118 | ||
50 | +DSSR warning 2rdo.json: no nucleotides found. Ignoring 2rdo_1_A_3-118. | ||
51 | + | ||
52 | +4v48_1_A9_3-118 | ||
53 | +DSSR warning 4v48.json: no nucleotides found. Ignoring 4v48_1_A9_3-118. | ||
54 | + | ||
55 | +4v47_1_A9_3-118 | ||
56 | +DSSR warning 4v47.json: no nucleotides found. Ignoring 4v47_1_A9_3-118. | ||
57 | + | ||
58 | +2ob7_1_A_10-319 | ||
59 | +DSSR warning 2ob7.json: no nucleotides found. Ignoring 2ob7_1_A_10-319. | ||
60 | + | ||
61 | +1x1l_1_A_1-132 | ||
62 | +DSSR warning 1x1l.json: no nucleotides found. Ignoring 1x1l_1_A_1-132. | ||
63 | + | ||
64 | +1zc8_1_Z_1-93 | ||
65 | +DSSR warning 1zc8.json: no nucleotides found. Ignoring 1zc8_1_Z_1-93. | ||
66 | + | ||
67 | +2ob7_1_D_1-132 | ||
68 | +DSSR warning 2ob7.json: no nucleotides found. Ignoring 2ob7_1_D_1-132. | ||
69 | + | ||
70 | +4v42_1_BB_5-121 | ||
71 | +Could not find nucleotides of chain BB in annotation 4v42.json. Either there is a problem with 4v42 mmCIF download, or the bases are not resolved in the structure. Delete it and retry. | ||
72 | + | ||
73 | +4v42_1_BA_1-2914 | ||
74 | +Could not find nucleotides of chain BA in annotation 4v42.json. Either there is a problem with 4v42 mmCIF download, or the bases are not resolved in the structure. Delete it and retry. | ||
75 | + | ||
76 | +1r2x_1_C_1-58 | ||
77 | +DSSR warning 1r2x.json: no nucleotides found. Ignoring 1r2x_1_C_1-58. | ||
78 | + | ||
79 | +1r2w_1_C_1-58 | ||
80 | +DSSR warning 1r2w.json: no nucleotides found. Ignoring 1r2w_1_C_1-58. | ||
81 | + | ||
82 | +1eg0_1_L_1-56 | ||
83 | +DSSR warning 1eg0.json: no nucleotides found. Ignoring 1eg0_1_L_1-56. | ||
84 | + | ||
85 | +5zzm_1_N_1-2904 | ||
86 | +DSSR warning 5zzm.json: no nucleotides found. Ignoring 5zzm_1_N_1-2904. | ||
87 | + | ||
88 | +2rdo_1_B_1-2904 | ||
89 | +DSSR warning 2rdo.json: no nucleotides found. Ignoring 2rdo_1_B_1-2904. | ||
90 | + | ||
91 | +3dg2_1_B_1-2904 | ||
92 | +DSSR warning 3dg2.json: no nucleotides found. Ignoring 3dg2_1_B_1-2904. | ||
93 | + | ||
94 | +3dg0_1_B_1-2904 | ||
95 | +DSSR warning 3dg0.json: no nucleotides found. Ignoring 3dg0_1_B_1-2904. | ||
96 | + | ||
97 | +4v48_1_A0_1-2904 | ||
98 | +DSSR warning 4v48.json: no nucleotides found. Ignoring 4v48_1_A0_1-2904. | ||
99 | + | ||
100 | +4v47_1_A0_1-2904 | ||
101 | +DSSR warning 4v47.json: no nucleotides found. Ignoring 4v47_1_A0_1-2904. | ||
102 | + | ||
103 | +3dg4_1_B_1-2904 | ||
104 | +DSSR warning 3dg4.json: no nucleotides found. Ignoring 3dg4_1_B_1-2904. | ||
105 | + | ||
106 | +3dg5_1_B_1-2904 | ||
107 | +DSSR warning 3dg5.json: no nucleotides found. Ignoring 3dg5_1_B_1-2904. | ||
108 | + | ||
109 | +3dg2_1_A_1-1542 | ||
110 | +DSSR warning 3dg2.json: no nucleotides found. Ignoring 3dg2_1_A_1-1542. | ||
111 | + | ||
112 | +3dg0_1_A_1-1542 | ||
113 | +DSSR warning 3dg0.json: no nucleotides found. Ignoring 3dg0_1_A_1-1542. | ||
114 | + | ||
115 | +4v48_1_BA_1-1543 | ||
116 | +DSSR warning 4v48.json: no nucleotides found. Ignoring 4v48_1_BA_1-1543. | ||
117 | + | ||
118 | +4v47_1_BA_1-1542 | ||
119 | +DSSR warning 4v47.json: no nucleotides found. Ignoring 4v47_1_BA_1-1542. | ||
120 | + | ||
121 | +3dg4_1_A_1-1542 | ||
122 | +DSSR warning 3dg4.json: no nucleotides found. Ignoring 3dg4_1_A_1-1542. | ||
123 | + | ||
124 | +3dg5_1_A_1-1542 | ||
125 | +DSSR warning 3dg5.json: no nucleotides found. Ignoring 3dg5_1_A_1-1542. | ||
126 | + | ||
127 | +1eg0_1_O_1-73 | ||
128 | +DSSR warning 1eg0.json: no nucleotides found. Ignoring 1eg0_1_O_1-73. | ||
129 | + | ||
130 | +1zc8_1_A_1-59 | ||
131 | +DSSR warning 1zc8.json: no nucleotides found. Ignoring 1zc8_1_A_1-59. | ||
132 | + | ||
133 | +1mvr_1_D_1-61 | ||
134 | +DSSR warning 1mvr.json: no nucleotides found. Ignoring 1mvr_1_D_1-61. | ||
135 | + | ||
136 | +4adx_1_9_1-123 | ||
137 | +DSSR warning 4adx.json: no nucleotides found. Ignoring 4adx_1_9_1-123. | ||
138 | + | ||
139 | +1zn1_1_B_1-59 | ||
140 | +DSSR warning 1zn1.json: no nucleotides found. Ignoring 1zn1_1_B_1-59. | ||
141 | + | ||
142 | +1jgq_1_A_2-1520 | ||
143 | +Could not find nucleotides of chain A in annotation 1jgq.json. Either there is a problem with 1jgq mmCIF download, or the bases are not resolved in the structure. Delete it and retry. | ||
144 | + | ||
145 | +4v42_1_AA_2-1520 | ||
146 | +Could not find nucleotides of chain AA in annotation 4v42.json. Either there is a problem with 4v42 mmCIF download, or the bases are not resolved in the structure. Delete it and retry. | ||
147 | + | ||
148 | +1jgo_1_A_2-1520 | ||
149 | +Could not find nucleotides of chain A in annotation 1jgo.json. Either there is a problem with 1jgo mmCIF download, or the bases are not resolved in the structure. Delete it and retry. | ||
150 | + | ||
151 | +1jgp_1_A_2-1520 | ||
152 | +Could not find nucleotides of chain A in annotation 1jgp.json. Either there is a problem with 1jgp mmCIF download, or the bases are not resolved in the structure. Delete it and retry. | ||
153 | + | ||
154 | +1emi_1_B_1-108 | ||
155 | +DSSR warning 1emi.json: no nucleotides found. Ignoring 1emi_1_B_1-108. | ||
156 | + | ||
157 | +3iy9_1_A_498-1027 | ||
158 | +DSSR warning 3iy9.json: no nucleotides found. Ignoring 3iy9_1_A_498-1027. | ||
159 | + | ||
160 | +3ep2_1_B_1-50 | ||
161 | +DSSR warning 3ep2.json: no nucleotides found. Ignoring 3ep2_1_B_1-50. | ||
162 | + | ||
163 | +3eq3_1_B_1-50 | ||
164 | +DSSR warning 3eq3.json: no nucleotides found. Ignoring 3eq3_1_B_1-50. | ||
165 | + | ||
166 | +3eq4_1_B_1-50 | ||
167 | +DSSR warning 3eq4.json: no nucleotides found. Ignoring 3eq4_1_B_1-50. | ||
168 | + | ||
169 | +3pgw_1_R_1-164 | ||
170 | +DSSR warning 3pgw.json: no nucleotides found. Ignoring 3pgw_1_R_1-164. | ||
171 | + | ||
172 | +3pgw_1_N_1-164 | ||
173 | +DSSR warning 3pgw.json: no nucleotides found. Ignoring 3pgw_1_N_1-164. | ||
174 | + | ||
175 | +3cw1_1_x_1-138 | ||
176 | +DSSR warning 3cw1.json: no nucleotides found. Ignoring 3cw1_1_x_1-138. | ||
177 | + | ||
178 | +3cw1_1_w_1-138 | ||
179 | +DSSR warning 3cw1.json: no nucleotides found. Ignoring 3cw1_1_w_1-138. | ||
180 | + | ||
181 | +3cw1_1_V_1-138 | ||
182 | +DSSR warning 3cw1.json: no nucleotides found. Ignoring 3cw1_1_V_1-138. | ||
183 | + | ||
184 | +3cw1_1_v_1-138 | ||
185 | +DSSR warning 3cw1.json: no nucleotides found. Ignoring 3cw1_1_v_1-138. | ||
186 | + | ||
187 | +2iy3_1_B_9-105 | ||
188 | +DSSR warning 2iy3.json: no nucleotides found. Ignoring 2iy3_1_B_9-105. | ||
189 | + | ||
190 | +3jcr_1_N_1-107 | ||
191 | +DSSR warning 3jcr.json: no nucleotides found. Ignoring 3jcr_1_N_1-107. | ||
192 | + | ||
193 | +2vaz_1_A_64-177 | ||
194 | +DSSR warning 2vaz.json: no nucleotides found. Ignoring 2vaz_1_A_64-177. | ||
195 | + | ||
196 | +2ftc_1_R_81-1466 | ||
197 | +DSSR warning 2ftc.json: no nucleotides found. Ignoring 2ftc_1_R_81-1466. | ||
198 | + | ||
199 | +3jcr_1_M_1-141 | ||
200 | +DSSR warning 3jcr.json: no nucleotides found. Ignoring 3jcr_1_M_1-141. | ||
201 | + | ||
202 | +4v5z_1_B0_1-2902 | ||
203 | +DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_B0_1-2902. | ||
204 | + | ||
205 | +5g2x_1_A_595-692 | ||
206 | +Sequence is too short. (< 5 resolved nts) | ||
207 | + | ||
208 | +3iy8_1_A_1-540 | ||
209 | +DSSR warning 3iy8.json: no nucleotides found. Ignoring 3iy8_1_A_1-540. | ||
210 | + | ||
211 | +4v5z_1_BY_2-113 | ||
212 | +DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_BY_2-113. | ||
213 | + | ||
214 | +4v5z_1_BZ_1-70 | ||
215 | +DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_BZ_1-70. | ||
216 | + | ||
217 | +4v5z_1_B1_2-125 | ||
218 | +DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_B1_2-125. | ||
219 | + | ||
220 | +4adx_1_0_1-2925 | ||
221 | +DSSR warning 4adx.json: no nucleotides found. Ignoring 4adx_1_0_1-2925. | ||
222 | + | ||
223 | +1mvr_1_B_3-96 | ||
224 | +DSSR warning 1mvr.json: no nucleotides found. Ignoring 1mvr_1_B_3-96. | ||
225 | + | ||
226 | +3eq4_1_Y_1-69 | ||
227 | +DSSR warning 3eq4.json: no nucleotides found. Ignoring 3eq4_1_Y_1-69. | ||
228 | + | ||
229 | +6uz7_1_8_2140-2827 | ||
230 | +Could not find nucleotides of chain 8 in annotation 6uz7.json. Either there is a problem with 6uz7 mmCIF download, or the bases are not resolved in the structure. Delete it and retry. | ||
231 | + | ||
232 | +4v5z_1_AA_1-1563 | ||
233 | +DSSR warning 4v5z.json: no nucleotides found. Ignoring 4v5z_1_AA_1-1563. | ||
234 | + |
... | @@ -958,6 +958,8 @@ def par_distance_matrix(filelist, f, label, consider_all_atoms, s): | ... | @@ -958,6 +958,8 @@ def par_distance_matrix(filelist, f, label, consider_all_atoms, s): |
958 | 958 | ||
959 | @trace_unhandled_exceptions | 959 | @trace_unhandled_exceptions |
960 | def get_avg_std_distance_matrix(f, consider_all_atoms, multithread=False): | 960 | def get_avg_std_distance_matrix(f, consider_all_atoms, multithread=False): |
961 | + np.seterr(divide='ignore') # ignore division by zero issues | ||
962 | + | ||
961 | if consider_all_atoms: | 963 | if consider_all_atoms: |
962 | label = "base" | 964 | label = "base" |
963 | else: | 965 | else: |
... | @@ -1024,44 +1026,38 @@ def get_avg_std_distance_matrix(f, consider_all_atoms, multithread=False): | ... | @@ -1024,44 +1026,38 @@ def get_avg_std_distance_matrix(f, consider_all_atoms, multithread=False): |
1024 | p.join() | 1026 | p.join() |
1025 | exit(1) | 1027 | exit(1) |
1026 | 1028 | ||
1027 | - with warnings.catch_warnings(record=True) as w: | 1029 | + # Calculation of the average matrix |
1028 | - warnings.simplefilter("error") | 1030 | + avg = np.divide(avg, counts, where=counts>0, out=np.full_like(avg, np.NaN)) # Ultrafancy way to take avg/counts or NaN if counts is 0 |
1029 | - | 1031 | + |
1030 | - # Calculation of the average matrix | 1032 | + np.savetxt(runDir + '/results/distance_matrices/' + f + '_'+ label + '/' + f + '_average.csv' , avg, delimiter=",", fmt="%.3f") |
1031 | - try: | 1033 | + |
1032 | - avg = avg/counts | 1034 | + fig, ax = plt.subplots() |
1033 | - assert (counts >0).all() | 1035 | + im = ax.imshow(avg) |
1034 | - except RuntimeWarning as e: | 1036 | + cbar = ax.figure.colorbar(im, ax=ax) |
1035 | - # there will be division by 0 if count is 0 at some position = gap only column | 1037 | + cbar.ax.set_ylabel("Angströms", rotation=-90, va="bottom") |
1036 | - pass | 1038 | + ax.set_title(f"Average distance between {f} residues (Angströms)") |
1037 | - np.savetxt(runDir + '/results/distance_matrices/' + f + '_'+ label + '/' + f + '_average.csv' , avg, delimiter=",", fmt="%.3f") | 1039 | + fig.tight_layout() |
1038 | - | 1040 | + fig.savefig(runDir + '/results/distance_matrices/' + f + '_'+ label + '/' + f + '_average.png', dpi=300) |
1039 | - fig, ax = plt.subplots() | 1041 | + plt.close() |
1040 | - im = ax.imshow(avg) | ||
1041 | - cbar = ax.figure.colorbar(im, ax=ax) | ||
1042 | - cbar.ax.set_ylabel("Angströms", rotation=-90, va="bottom") | ||
1043 | - ax.set_title(f"Average distance between {f} residues (Angströms)") | ||
1044 | - fig.tight_layout() | ||
1045 | - fig.savefig(runDir + '/results/distance_matrices/' + f + '_'+ label + '/' + f + '_average.png', dpi=300) | ||
1046 | - plt.close() | ||
1047 | 1042 | ||
1048 | - # Calculation of the standard deviation matrix by the Huygens theorem | 1043 | + # Calculation of the standard deviation matrix by the Huygens theorem |
1049 | - try: | 1044 | + std = np.divide(std, counts, where=counts>0, out=np.full_like(std, np.NaN)) |
1050 | - std = np.sqrt(std/counts - np.power(avg/counts, 2)) | 1045 | + mask = np.invert(np.isnan(std)) |
1051 | - assert (counts >0).all() | 1046 | + value = std[mask] - np.power(avg[mask], 2) |
1052 | - except RuntimeWarning as e: | 1047 | + if ((value[value<0] < -1e-2).any()): |
1053 | - # there will be division by 0 if count is 0 at some position = gap only column | 1048 | + warn("Erasing very negative variance value !") |
1054 | - pass | 1049 | + value[value<0] = 0.0 # floating point problems ! |
1055 | - np.savetxt(runDir + '/results/distance_matrices/' + f + '_'+ label + '/' + f + '_stdev.csv' , std, delimiter=",", fmt="%.3f") | 1050 | + std[mask] = np.sqrt(value) |
1056 | - | 1051 | + np.savetxt(runDir + '/results/distance_matrices/' + f + '_'+ label + '/' + f + '_stdev.csv' , std, delimiter=",", fmt="%.3f") |
1057 | - fig, ax = plt.subplots() | 1052 | + |
1058 | - im = ax.imshow(std) | 1053 | + fig, ax = plt.subplots() |
1059 | - cbar = ax.figure.colorbar(im, ax=ax) | 1054 | + im = ax.imshow(std) |
1060 | - cbar.ax.set_ylabel("Angströms", rotation=-90, va="bottom") | 1055 | + cbar = ax.figure.colorbar(im, ax=ax) |
1061 | - ax.set_title(f"Standard deviation of distances between {f} residues (Angströms)") | 1056 | + cbar.ax.set_ylabel("Angströms", rotation=-90, va="bottom") |
1062 | - fig.tight_layout() | 1057 | + ax.set_title(f"Standard deviation of distances between {f} residues (Angströms)") |
1063 | - fig.savefig(runDir + '/results/distance_matrices/' + f + '_'+ label + '/' + f + '_std.png', dpi=300) | 1058 | + fig.tight_layout() |
1064 | - plt.close() | 1059 | + fig.savefig(runDir + '/results/distance_matrices/' + f + '_'+ label + '/' + f + '_std.png', dpi=300) |
1060 | + plt.close() | ||
1065 | 1061 | ||
1066 | # Save log | 1062 | # Save log |
1067 | with open(runDir + '/results/distance_matrices/' + f + '_'+ label + '/' + f + '.log', 'a') as logfile: | 1063 | with open(runDir + '/results/distance_matrices/' + f + '_'+ label + '/' + f + '.log', 'a') as logfile: | ... | ... |
-
Please register or login to post a comment