Comparing features between populations is one of the best ways to benchmark different conditions in an experiment.
This example works over two populations designed in the same manner, with the only difference that one was performed in the presence of a binder while the other was design alone.
We are ignoring some scores to facilitate general reading, but this particularity is not mentioned any more during the tutorial. Similarly, we are using only a sample of the 20K available sequences on each decoy population.
In [1]: import rstoolbox as rs
...: import pandas as pd
...: import numpy as np
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 500)
...: pd.set_option("display.max_seq_items", 3)
...: binder = rs.io.parse_rosetta_file('../rstoolbox/tests/data/compare/binder.mini.gz', {'sequence': 'B', 'scores_ignore': ['time', 'fa_*', 'dslf_fa13', 'yhh_planarity', 'pro_close']}).sample(frac=0.05)
...: binder.columns.values
...:
Out[1]:
array(['score', 'hbond_sr_bb', 'hbond_lr_bb', 'hbond_bb_sc', 'hbond_sc',
'rama', 'omega', 'p_aa_pp', 'ref', 'BUNS', 'B_ni_mtcontacts',
'B_ni_rmsd', 'B_ni_rmsd_threshold', 'B_ni_trials', 'GRMSD2Target',
'GRMSD2Template', 'LRMSD2Target', 'LRMSDH2Target',
'LRMSDLH2Target', 'cav_vol', 'design_score', 'packstat',
'rmsd_drift', 'description', 'sequence_B'], dtype=object)
In [2]: nobinder = rs.io.parse_rosetta_file('../rstoolbox/tests/data/compare/nobinder.mini.gz', {'sequence': 'B', 'scores_ignore': ['time', 'fa_*', 'dslf_fa13', 'yhh_planarity', 'pro_close']}).sample(frac=0.05)
...: nobinder.columns.values
...: