rstoolbox.analysis.
positional_sequence_similarity
(df, seqID=None, ref_seq=None, key_residues=None, matrix='BLOSUM62')¶Per position identity and similarity against a reference_sequence
.
Provided a data container with a set of sequences, it will evaluate the percentage of
identities and similarities that the whole set has against a reference_sequence
.
It would do so by sequence position instead that by each individual sequence.
In a way, this generates an extreme simplification from a SequenceFrame
.
Parameters: |
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Returns: |
|
||||||||||
Raises: |
|
Example
In [1]: from rstoolbox.io import parse_rosetta_file
...: from rstoolbox.analysis import positional_sequence_similarity
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 500)
...: df = parse_rosetta_file("../rstoolbox/tests/data/input_2seq.minisilent.gz",
...: {'scores': ['score'], 'sequence': 'B'})
...: df.add_reference_sequence('B', df.get_sequence('B').values[0])
...: df = positional_sequence_similarity(df.iloc[1:], 'B')
...: df.head()
...:
Out[1]:
identity_perc positive_perc
1 0.4 0.4
2 0.2 1.0
3 0.8 0.8
4 1.0 1.0
5 1.0 1.0