One of the key advantadges of rstoolbox
is the ability to control the amount
and type of data that is loaded from a silent/score file. This control is managed
through a definition
, a dictionary that describes the type of data that can be
loaded.
Note
definition
is meant to be applied to parse_rosetta_file()
.
As of now, there are 10 different options that can be convined into a definition
:
definition term | description |
---|---|
scores | Basic selection of the scores to store. Default is all scores. |
scores_ignore | Selection of specific scores to ignore. |
scores_rename | Rename some score names to others. |
scores_by_residue | Pick score by residue types into a single array value. |
scores_missing | Names of scores that might be missing in some decoys. |
naming | Use the decoy identifier’s name to create extra score terms. |
sequence | Pick sequence data from the silent file. |
structure | Pick structural data from the silent file. |
psipred | Pick PSIPRED data from the silent file. |
dihedrals | Retrieve dihedral data from the silent file. |
labels | Retrieve residue labels from the silent file. |
graft_ranges | When using the MotifGraftMover, multi-columns will be created when more than one segment is grafted. Provide here the number of segments. |
Tip
definition
can be passed directly as a dictionary or can be saved as a
JSON or YAML file and loaded from there.
This is the most basic parameter, and refer to regular scores in the silent/score
file. It allows to select just the scores that are wanted for the analysis.
There are three main ways to define scores
, provide a list naming the
scores of interest:
{'scores': ['score', 'packstat', 'description']}
add a string asterisc if all scores all wanted (this is the default value for this parameter):
{'scores': '*'}
or add a minus sign, which will ignore all scores:
{'scores': '-'}
Example
In [1]: from rstoolbox.io import parse_rosetta_file
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: ifile = '../rstoolbox/tests/data/input_2seq.minisilent.gz'
...: definition1 = {'scores': ['score', 'packstat', 'description']}
...: df = parse_rosetta_file(ifile, definition1)
...: df.head()
...:
Out[1]:
score packstat description
0 -206.678 0.633 test_3lhp_binder_labeled_00001
1 -214.362 0.577 test_3lhp_binder_labeled_00002
2 -203.582 0.568 test_3lhp_binder_labeled_00003
3 -213.779 0.614 test_3lhp_binder_labeled_00004
4 -213.972 0.591 test_3lhp_binder_labeled_00005
In [2]: definition2 = {'scores': '*'}
...: df1 = parse_rosetta_file(ifile, definition2)
...: df2 = parse_rosetta_file(ifile)
...: df1.head()
...: