A first round of designs might just be the stepping stone towards a second generation. It can be used to learn and better stir the next generation according to whatever is our final aim.
Note
All the examples here will generate new sequences. Once those new sequences are generated, we can generate with a call to DesignFrame.make_resfile()
the residue files that can be provided to Rosetta through the
ReadResfile to guide
the design process.
We are not calling the method in this tutorial as it generates files which cannot be shown.
As in Sequence Analysis, we will need to load a reference with get_sequence_and_structure()
.
Note
Through all the process several times the chainID
of the decoy of interest will be called. This is due to the fact that the library can manipulate
decoys with multiple chains (designed or not), and, thus, analysis must be called upon the sequences of interest.
In [1]: import rstoolbox as rs
...: import pandas as pd
...: import matplotlib.pyplot as plt
...: import seaborn as sns
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 500)
...: pd.set_option("display.max_seq_items", 3)
...: baseline = rs.io.get_sequence_and_structure('../rstoolbox/tests/data/2pw9C.pdb')
...: baseline.get_sequence('C')
...: baseline.add_reference_sequence('C', baseline.get_sequence('C'))
...: baseline.add_reference_shift('C', 32)
...:
Again, we are mimicking Sequence Analysis.
In [2]: rules = {'scores_ignore': ['fa_*', 'niccd_*', 'hbond_*', 'lk_ball_wtd', 'pro_close', 'dslf_fa13', 'C_ni_rmsd_threshold',
...: 'omega', 'p_aa_pp', 'yhh_planarity', 'ref', 'rama_prepro', 'time'],
...: 'sequence': 'C',
...: 'labels': ['MOTIF', 'SSE03', 'SSE05']}
...: df = rs.io.parse_rosetta_file('../rstoolbox/tests/data/input_ssebig.minisilent.gz', rules)
...: df.add_reference_sequence('C', baseline.get_sequence('C'))
...: df.add_reference_shift('C', 32)
...: df.head(3)
...:
Out[2]:
score ALIGNRMSD BUNS COMPRRMSD C_ni_mtcontacts C_ni_rmsd C_ni_trials MOTIFRMSD cav_vol driftRMSD finalRMSD packstat C_ni_rmsd_type description lbl_MOTIF lbl_SSE03 lbl_SSE05 sequence_C
0 -64.070 0.608 12.0 7.585 4.0 3.301 1.0 0.957 66.602 0.083 3.323 0.544 no_motif nubinitio_wauto_18326_2pw9C_0001_0001 [C] [C] [C] TTWIKFFAGGTLVEEFEYSSVNWEEIEKRAWKKLGRWKKAEEGDLMIVYPDGKVVSWA
1 -70.981 0.639 12.0 2.410 8.0 1.423 1.0 0.737 0.000 0.094 1.395 0.552 no_motif nubinitio_wauto_18326_2pw9C_0002_0001 [C] [C] [C] NTWSTNILNGHPKITLLVEERGAEEIHLEWLKKQGLRKKAEENVYTTKLPNGAVKVYG
2 -43.863 0.480 8.0 4.279 6.0 2.110 1.0 0.819 93.641 0.110 2.106 0.575 no_motif nubinitio_wauto_18326_2pw9C_0003_0001 [C] [C] [C] PRWFIAMGDGVIWEIVLGSEQNLEEIAKKGLKRRGLYKKAEESIYTIIYPDGIAHTFG
We’ve seen multiple ways to identify and view mutations in Sequence Analysis. Let’s imagine that we have identified a good decoy candidate but we want to try all the putative back mutations available. Basically, we ask if a less mutated decoy will perform as well as the one found.
For this, we will take the best scored decoy and we will try to generate_wt_reversions()
on the residues belonging to strand 3 (SSE03
) and 5 (SSE05
):
In [3]: kres = df.get_label('SSE03', 'C').values[0] + df.get_label('SSE05', 'C').values[0]
...: ds = df.sort_values('score').iloc[0].generate_wt_reversions('C', kres)
...: ds.shape[0]
...:
Out[3]: 16384
In [4]: ds.head(4)