rstoolbox.utils.
split_values
(df, keys)¶Reshape the data to aide plotting of multiple comparable scores.
Note
This might change the data in a way that a decoy would be repeated multiple times.
The dictionary that needs to be provided to split the data container has three main keys:
keep
: Identity the columns to keep (they cannot be the ones that split). If not provided, all columns are kept.split
: List with columns to split. Each position is a tuple. The first position is the name of the column to split and the rest will be the value names that will be used to identify it.names
: Names of the columns. The first one will be the name of the column where the values will be assigned, the rest will be the names of the columns for the rest of the identifiers.Parameters: | |
---|---|
Returns: | Altered Data container. |
Example
In [1]: from rstoolbox.io import parse_rosetta_file
...: from rstoolbox.utils import split_values
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 500)
...: ifile = '../rstoolbox/tests/data/input_2seq.minisilent.gz'
...: scorel = ['score', 'GRMSD2Target', 'GRMSD2Template', 'LRMSD2Target',
...: 'LRMSDH2Target', 'LRMSDLH2Target', 'description']
...: df = parse_rosetta_file(ifile, {'scores': scorel})
...: df
...:
Out[1]:
score GRMSD2Target GRMSD2Template LRMSD2Target LRMSDH2Target LRMSDLH2Target description
0 -206.678 1.976 1.927 4.404 4.055 2.490 test_3lhp_binder_labeled_00001
1 -214.362 2.659 2.417 4.469 4.124 2.730 test_3lhp_binder_labeled_00002
2 -203.582 2.026 1.607 5.208 4.598 2.907 test_3lhp_binder_labeled_00003
3 -213.779 2.407 2.047 5.728 4.866 3.002 test_3lhp_binder_labeled_00004
4 -213.972 2.245 1.907 3.787 3.258 2.692 test_3lhp_binder_labeled_00005
5 -195.138 2.581 2.453 5.021 4.127 2.473 test_3lhp_binder_labeled_00006
In [2]: split1 = {'split': [('GRMSD2Target', 'grmsdTr'), ('GRMSD2Template', 'grmsdTp'),
...: ('LRMSD2Target', 'lrmsdTp'), ('LRMSDH2Target', 'lrmsdh2'),
...: ('LRMSDLH2Target', 'lrmsdlh2')],
...: 'names': ['rmsd', 'rmsd_type']}
...: split_values(df, split1)
...: