rstoolbox.analysis.
sequential_frequencies
(df, seqID, query='sequence', seqType='protein', cleanExtra=True, cleanUnused=-1)¶Generates a SequenceFrame
for the frequencies of the sequences in the
DesignFrame
with seqID
identifier.
If there is a reference_sequence
for this seqID
, it will also
be attached to the SequenceFrame
.
All letters in the sequence will be capitalized. All symbols that
do not belong to string.ascii_uppercase
will be transformed to “*”
as this is the symbol recognized by the substitution matrices as gap
.
This function is directly accessible through some DesignFrame
methods.
Parameters: |
|
---|---|
Returns: |
See also
DesignFrame.sequence_frequencies()
DesignFrame.sequence_bits()
DesignFrame.structure_frequencies()
DesignFrame.structure_bits()
Example
In [1]: from rstoolbox.io import parse_rosetta_file
...: from rstoolbox.analysis import sequential_frequencies
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 500)
...: df = parse_rosetta_file("../rstoolbox/tests/data/input_2seq.minisilent.gz",
...: {'scores': ['score'], 'sequence': 'AB'})
...: df = sequential_frequencies(df, 'B')
...: df.head()
...:
Out[1]:
C D S Q K I P T F N G H L R W A V E Y M
1 0.0 0.0 0.0 0.0 0.166667 0.0 0.333333 0.5 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.666667 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.333333 0.000000 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.000000 0.0 0.833333 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.166667 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 1.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 1.0 0.0 0.0