rstoolbox.io.parse_rosetta_file

rstoolbox.io.parse_rosetta_file(filename, description=None, multi=False)

Read a Rosetta score or silent file and returns the design population in a DesignFrame.

By default, it will pick the data contained in all the score columns with the exception of positional scores (such as per-residue ddg). The user can specify scores to be ignored.

When working with silent files, extra information can be picked, such as sequence and secondary structure data, residue labels or positional scores. The fine control of these options is explained in detail in tutorial: reading Rosetta.

Some basic usage cases:

# (1) The default scenario, just read scores from a single file.
df = rstoolbox.io.parse_rosetta_file("silentfile")

# (2) Reading from multiple files. Assumes all files start with
# the particular prefix.
df = rstoolbox.io.parse_rosetta_file("silentfile", multi=True)

# (3) Getting all scores and the sequence of each design.
description = {'sequence': 'A'}
df = rstoolbox.io.parse_rosetta_file("silentfile", description)

# (4) Get only total_score and RMSD, and rename total_score to score.
description = {'scores': ['RMSD'], 'scores_rename': {'total_score': 'score'}}
df = rstoolbox.io.parse_rosetta_file("silentfile", description)
Parameters:
  • filename (Union[str, list()]) – file name, file pattern to search or list of files.
  • description (Union[str, dict]) – Parsing rules. It can be a dictionary describing the rules or the name of a file containing such dictionary. The dictionary definition is explained in tutorial: reading Rosetta.
  • multi (bool) – When True, indicates that data is readed from multiple files.
Returns:

DesignFrame.

Raises:
IOError:if filename cannot be found.
IOError:if filename pattern (multi=True) generates no files.

Example

In [1]: from rstoolbox.io import parse_rosetta_file
   ...: import pandas as pd
   ...: pd.set_option('display.width', 1000)
   ...: pd.set_option('display.max_columns', 500)
   ...: df = parse_rosetta_file("../rstoolbox/tests/data/input_2seq.minisilent.gz")
   ...: df.head(2)
   ...: 
Out[1]: 
     score    fa_atr   fa_rep   fa_sol  fa_intra_rep  fa_elec  pro_close  hbond_sr_bb  hbond_lr_bb  hbond_bb_sc  hbond_sc  dslf_fa13    rama   omega   fa_dun  p_aa_pp  yhh_planarity     ref  BUNS  B_ni_mtcontacts  B_ni_rmsd  B_ni_rmsd_threshold  B_ni_trials  GRMSD2Target  GRMSD2Template  LRMSD2Target  LRMSDH2Target  LRMSDLH2Target  cav_vol  design_score  packstat  rmsd_drift    time                     description
0 -206.678 -1510.021  268.657  853.020  2.921        -145.015  5.825     -150.177     -2.452       -13.326      -36.936    0.0       -28.499  42.312  551.509 -23.219   0.000         -21.277  22.0  57.0             0.568      5.0                  1.0          1.976         1.927           4.404         4.055          2.49            387.371 -255.445       0.633     1.677       3194.0  test_3lhp_binder_labeled_00001
1 -214.362 -1490.968  267.328  824.258  3.019        -133.421  6.018     -151.609     -2.452       -12.584      -33.021    0.0       -29.998  38.315  545.056 -23.612   0.003         -20.693  14.0  54.0             0.333      5.0                  1.0          2.659         2.417           4.469         4.124          2.73            332.657 -264.239       0.577     2.240       3210.0  test_3lhp_binder_labeled_00002