rstoolbox.analysis.
sequence_similarity
(df, seqID, key_residues=None, matrix='BLOSUM62')¶Evaluate the sequence similarity between each decoy and the reference_sequence
for a given seqID
.
Sequence similarity is understood in the context of substitution matrices. Thus, a part from identities, also similarities can be evaluated.
It will return the input data container with several new columns:
New Column | Data Content |
---|---|
<matrix>_<seqID>_raw | Score obtained by applying <matrix> |
<matrix>_<seqID>_perc | Score obtained by applying <matrix> over score of reference_sequence against itself |
<matrix>_<seqID>_identity | Total identity matches |
<matrix>_<seqID>_positive | Total positive matches according to <matrix> |
<matrix>_<seqID>_negative | Notal negative matches according to <matrix> |
<matrix>_<seqID>_ali | Representation of aligned residues |
<matrix>_<seqID>_per_res | Per position score of applying <matrix> |
Matrix name in each new column is setup in lowercase.
Tip
If key_residues
are applied, the scoring is only used on those, but nothing in the
naming of the columns will indicate a partial evaluation. It is important to keep that in
mind moving forward on whatever analysis you are performing.
Running this function multiple times (different key_residue selections, for example) adds suffix to the previously mentioned columns following pandas’ merge naming logic (_x, _y, _z, …).
Parameters: | |||||||
---|---|---|---|---|---|---|---|
Returns: | |||||||
Raises: |
|
Example
In [1]: from rstoolbox.io import parse_rosetta_file
...: from rstoolbox.analysis import sequence_similarity
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 500)
...: df = parse_rosetta_file("../rstoolbox/tests/data/input_2seq.minisilent.gz",
...: {'scores': ['score'], 'sequence': 'B'})
...: df.add_reference_sequence('B', df.get_sequence('B').values[0])
...: df = sequence_similarity(df.iloc[1:], 'B')
...: df.head()
...:
Out[1]:
score sequence_B blosum62_B_raw blosum62_B_identity blosum62_B_positive blosum62_B_negative blosum62_B_ali blosum62_B_per_res blosum62_B_perc
0 -214.362 PKPEEAMREAYKLIKKYMLKAQKEAQEEWERMRRTDGTKEEKDMFPEKMIAQALRAIGEIFNAYYWAFLKLQEFKKYPSVRWEEQEEARKRLKIMMKIGAEWAREIAREMKERIKR 183 41 68 48 .+PEEA...A++L.+..M.K..+E.+.EWE..+R....+EE+DM.PE+MIA.ALRAIGEIFNA.+...L++++.+K.P+...E+.+E.+K....+.......A....++.+E+.++ [-1, 2, 7, 5, 5, 4, -1, 0, 0, 4, 2, 2, 4, -1, 1, -3, -2, 5, -2, 5, 0, -2, 1, 5, -2, 2, 0, 5, 11, 5, -3, -1, 2, 5, 0, 0, -2, -2, 2, 5, 5, 2, 6, 5, 0, 7, 5, 2, 5, 4, 4, -1, 4, 4, 5, 4, 4, 6, 5, 4, 6, 6, 4, -2, 2, -2, -1, -3, 4, 1, 2, 2, 1, -3, 2, 5, -2, 7, 1, 0, -2, -3, 5, 1, 0, 1, 5, -1, 2, 5, 0, -1, -3, -3, 1, -1, -2, -1, -2, 0, ...] 0.288189
1 -203.582 TKPEEMAREAYKRMLKALKQGEEEMKRMYEQMKKGVDSKEERDMEPEKMIAIALRAIGELFNAWMKALRHMKELRKLGTSGPKEEEKHWRWIFELHRWAGEEIQRAAEIQERKARW 154 39 65 51 T+PEE....A++....A+++G.EE.+R.+E..K+....+EERDM.PE+MIA.ALRAIGE+FNA..+....M++.RK...+G.++.++..+..+++..+.G.......+....K.R. [5, 2, 7, 5, 5, -1, -1, 0, 0, 4, 2, 2, -2, -1, -3, -3, 4, 2, 2, 1, 6, -3, 5, 5, -2, 1, 5, -2, 2, 5, -2, -1, 5, 2, 0, -2, -1, 0, 2, 5, 5, 5, 6, 5, -3, 7, 5, 2, 5, 4, 4, -1, 4, 4, 5, 4, 4, 6, 5, 2, 6, 6, 4, -3, -1, 1, -1, -2, -2, 0, 5, 1, 1, -3, 5, 5, -3, -2, 0, 1, 6, -1, 1, 1, -2, 1, 1, 0, -3, 2, -3, -1, 1, 1, 2, -2, -2, 2, -3, 6, ...] 0.242520
2 -213.779 TKPEEWARWAYKEHLKMAEKHRKEMEIEWEELKRRDGKEEEKDMWPERMIAMALRAIGELFNHHMYAEMRAKEEKKKPEAKTEEARRARREIMKYHHEAGRLIEEAMRRLMERHKK 178 42 63 53 T+PEE....A++.......K..+E.E.EWE..KR.....EE+DM.PERMIA.ALRAIGE+FN......+..++E+K.P.A..E+.+..++E..K..+..G.+....+++..E+.+K [5, 2, 7, 5, 5, -3, -1, 0, -3, 4, 2, 2, -3, -2, -3, -3, -1, -1, 0, 5, -2, -3, 1, 5, -2, 5, -3, 5, 11, 5, -3, -2, 5, 5, -1, 0, -2, -2, 0, 5, 5, 2, 6, 5, -2, 7, 5, 5, 5, 4, 4, -1, 4, 4, 5, 4, 4, 6, 5, 2, 6, 6, -2, 0, -1, -1, -1, 0, 2, 0, -1, 1, 1, 5, 2, 5, 0, 7, 0, 4, -2, -1, 5, 1, -1, 2, 0, -1, 2, 2, 5, -1, -1, 5, -1, -2, 2, -2, -3, 6, ...] 0.280315
3 -213.972 KKWEEMMREAERQGKEYAQKAWKEALLEWKWMRKRPVTEEMKDMAPEWMIAAALRAIGEHFNIYWQQKLEHEKLRKIPNVPEEELEKGKEELKRIEEEAARMAEKYMQELRKKMES 208 47 67 49 .+.EE....A.R..+...+K.W+E...EW+W.++.....E.+DM.PE.MIAAALRAIGE.FN..WQ.+LE.EK.RK.PN..EE++++.K+E..+I......MA..++++.R+K... [-1, 2, -4, 5, 5, -1, -1, 0, 0, 4, -3, 5, -2, 0, 1, -3, -2, -1, 1, 5, 0, 11, 1, 5, -2, -3, -2, 5, 11, 1, 11, -1, 2, 2, -1, -1, -2, -2, 0, 5, -2, 2, 6, 5, -1, 7, 5, -3, 5, 4, 4, 4, 4, 4, 5, 4, 4, 6, 5, -3, 6, 6, -1, -2, 11, 5, 0, 2, 4, 5, -2, 5, 5, -3, 5, 5, -3, 7, 6, 0, -2, 5, 5, 1, 2, 1, 1, -2, 5, 1, 5, -1, -3, 2, 4, -1, -2, -2, -3, 0, ...] 0.327559
4 -195.138 PRPEEMARFAKEEMHKHEEKAYREFLLEYELAIRKNPTEEPKDMQPEWAIAAALRAIGEIFNQWMYHLLEIRKENGSSHTRYEEREKYRKLAKRLHEEAAKEIWKFMHEAMRRFES 101 35 52 64 .RPEE....A.........K.+.E...E+E...R.+...E.+DM.PE..IAAALRAIGEIFN......LE+.KE..+.+...E+.++.+K.A.++..........++.+...+... [-1, 5, 7, 5, 5, -1, -1, 0, -3, 4, -3, 0, -3, -1, 0, -3, -2, -2, 0, 5, 0, 2, 0, 5, -1, -3, -2, 5, 2, 5, -2, -3, -3, 5, -1, 1, -1, -2, 0, 5, -1, 2, 6, 5, -2, 7, 5, -3, -1, 4, 4, 4, 4, 4, 5, 4, 4, 6, 5, 4, 6, 6, -1, -3, -1, -1, -2, -2, 4, 5, 1, 0, 5, 5, 0, -2, 1, -1, 1, 0, -2, -2, 5, 1, -1, 1, 1, -1, 2, 5, -3, 4, -3, 2, 2, -2, -2, -2, -3, 0, ...] 0.159055