DesignSeries.
generate_mutants_from_matrix
(seqID, matrix, count, key_residues=None, limit_refseq=False)¶From a provided positional frequency matrix, generates count
random variants.
It takes into account the individual frequency assigned to each residue type and position. It does not generate the highest possible scored sequence according to the matrix, but picks randomly at each position according to the frequencies in for that position.
For each DesignSeries
, it will generate a DesignFrame
in which the
original sequence becomes the reference_sequence
, inheriting the reference_shift
.
Warning
This is a computationaly expensive function. Take this in consideration when trying to run it.
Each DesignFrame
will have the following structure:
Column | Data Content |
---|---|
description | Identifier fo the mutant |
sequence_<seqID> | Sequence content |
pssm_score_<seqID> | Score obtained by applying matrix |
Parameters: |
|
||
---|---|---|---|
Returns: |
|
||
Raises: |
|
See also
DesignFrame.generate_mutant_variants()
DesignFrame.score_by_pssm()
DesignSeries.generate_mutant_variants()
DesignSeries.score_by_pssm()
Example
In [1]: from rstoolbox.io import parse_rosetta_file
...: from rstoolbox.tests.helper import random_frequency_matrix
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 500)
...: df = parse_rosetta_file("../rstoolbox/tests/data/input_2seq.minisilent.gz",
...: {'scores': ['score', 'description'], 'sequence': 'B'})
...: df.add_reference_sequence('B', df.get_sequence('B').values[0])
...: matrix = random_frequency_matrix(len(df.get_reference_sequence('B')), 0)
...: key_res = [3,5,8,12,15,19,25,27]
...: mutants = df.iloc[1].generate_mutants_from_matrix('B', matrix, 5, key_res)
...: mutants[0].identify_mutants('B')
...:
Out[1]:
description sequence_B pssm_score_B mutants_B mutant_positions_B mutant_count_B
0 test_3lhp_binder_labeled_00002_v0001 PKMEDAMYEAYSLIMKYMHKAQKEGQMEWERMRRTDGTKEEKDMFPEKMIAQALRAIGEIFNAYYWAFLKLQEFKKYPSVRWEEQEEARKRLKIMMKIGAEWAREIAREMKERIKR 6.699162 P3M,E5D,R8Y,K12S,K15M,L19H,A25G,E27M 3,5,8,12,15,19,25,27 8
1 test_3lhp_binder_labeled_00002_v0002 PKTEEAMIEAYDLIEKYMCKAQKEVQPEWERMRRTDGTKEEKDMFPEKMIAQALRAIGEIFNAYYWAFLKLQEFKKYPSVRWEEQEEARKRLKIMMKIGAEWAREIAREMKERIKR 6.663401 P3T,R8I,K12D,K15E,L19C,A25V,E27P 3,8,12,15,19,25,27 7
2 test_3lhp_binder_labeled_00002_v0003 PKMEDAMGEAYTLIRKYMDKAQKEKQTEWERMRRTDGTKEEKDMFPEKMIAQALRAIGEIFNAYYWAFLKLQEFKKYPSVRWEEQEEARKRLKIMMKIGAEWAREIAREMKERIKR 6.607991 P3M,E5D,R8G,K12T,K15R,L19D,A25K,E27T 3,5,8,12,15,19,25,27 8
3 test_3lhp_binder_labeled_00002_v0004 PKSEYAMDEAYDLIIKYMAKAQKECQEEWERMRRTDGTKEEKDMFPEKMIAQALRAIGEIFNAYYWAFLKLQEFKKYPSVRWEEQEEARKRLKIMMKIGAEWAREIAREMKERIKR 6.294339 P3S,E5Y,R8D,K12D,K15I,L19A,A25C 3,5,8,12,15,19,25 7
4 test_3lhp_binder_labeled_00002_v0005 PKKEEAMSEAYQLIVKYMVKAQKEHQEEWERMRRTDGTKEEKDMFPEKMIAQALRAIGEIFNAYYWAFLKLQEFKKYPSVRWEEQEEARKRLKIMMKIGAEWAREIAREMKERIKR 6.644130 P3K,R8S,K12Q,K15V,L19V,A25H 3,8,12,15,19,25 6