rstoolbox.components.DesignFrame

class rstoolbox.components.DesignFrame(*args, **kwargs)

The DesignFrame extends the DataFrame adding some functionalities in order to improve its usability in the analysis of sets of design decoys.

Filled through the functions provided through this library, each row represents a decoy while each column represents the scores attached to it.

As a rule, it is assumed that the object:

  1. has a column named description that stores the identifier of the corresponding decoy.
  2. holds sequences (a design decoy might be composed of multiple chains) in columns named sequence_<seqID>.

This two assumptions are easily adapted if casting a DataFrame into the class, and several functions of the library depend on them.

Note

This assumptions are automatically fulfilled when the data container is loaded through parse_rosetta_file(). To obtain sequence information is is necessary to request for that particular data, as described in tutorial: reading Rosetta.

The DesignFrame basically contains four extra attributes (accessible through the appropiate functions):

  1. reference_sequence: A reference sequence can be added for each seqID present in the DesignFrame. By adding this sequence, other functions of the library can add that information to its calculations.
  2. reference_structure: A reference secondary structure can be added for each seqID present in the DesignFrame. By adding this sequence, other functions of the library can add that information to its calculations.
  3. reference_shift: A reference shift can be added for each seqID present in the DesignFrame. In short, this would be the initial number of the protein in the source PDB. This allows working with the right numbering. This value is, by default, 1 in all seqID. A more complex alternative allows for a list of numbers to also be assigned as reference_shift. This is usefull when the original structure does not have a continuous numbering schema.
  4. source_files: The object stores the source files from which it has been loaded (as long as it is loaded with parse_rosetta_file()). This information can be used to extract the structures from the silent files.

Getters

get_id() Return identifier data for the design(s).
get_available_sequences() List which sequence identifiers are available in the data container
get_sequence(seqID[, key_residues]) Return the sequence data for seqID available in the container.
get_available_structures() List which structure identifiers are available in the data container
get_structure(seqID[, key_residues]) Return the structure data for seqID available in the container.
get_available_structure_predictions() List which structure prediction identifiers are available in the data container.
get_structure_prediction(seqID[, key_residues]) Return the structure prediction(s) data.
get_sequential_data(query, seqID) Provides data on the requested query.
get_dihedrals(seqID[, key_residues]) Return the dihedrals data for phi-psi available in the container.
get_phi(seqID[, key_residues]) Return the phi angle for seqID available in the container.
get_psi(seqID[, key_residues]) Return the psi angle for seqID available in the container.
get_available_labels() List which slabels are available in the data container.
get_label(label[, seqID]) Return the content(s) of the labels of interest as a Selection for a given sequece.

Reference Data

has_reference_sequence(seqID) Checks if there is a reference_sequence for sequID.
add_reference_sequence(seqID, sequence) Add a reference_sequence attached to chain seqID.
get_reference_sequence(seqID[, key_residues]) Get the reference_sequence attached to chain seqID.
has_reference_structure(seqID) Checks if there is a reference_structure for seqID.
add_reference_structure(seqID, structure) Add a reference_structure attached to chain seqID.
get_reference_structure(seqID[, key_residues]) Get the reference_structure attached to chain seqID.
add_reference_shift(seqID, shift[, shift_labels]) Add a reference_shift attached to a chain seqID.
get_reference_shift(seqID) Get a reference_shift attached to a particular seqID.
get_available_references() List which decoy chain identifiers have some kind of reference data.
add_reference(seqID[, sequence, structure, …]) Single access to add_reference_sequence(), add_reference_structure() and add_reference_shift().
transfer_reference(df) Transfer reference data from one container to another.
delete_reference(seqID[, shift_labels]) Remove all reference data regarding a particular seqID.

Source Files

add_source_file(file) Adds a source_file to the DesignFrame.
add_source_files(files) Adds source_file to the DesignFrame.
get_source_files() Get source_file stored in the data container.
has_source_files() Checks if there are source files added.
replace_source_files(files) Replaces source_file of the DesignFrame.

Frequencies

sequence_bits(seqID[, seqType, cleanExtra, …]) Create a bit-based SequenceFrame.
sequence_distance(seqID[, other]) Make identity sequence distance between the selected decoys.
sequence_frequencies(seqID[, seqType, …]) Create a frequency-based SequenceFrame.
structure_bits(seqID[, seqType, cleanExtra, …]) Create a bit-based SequenceFrame for secondary structure assignation.
structure_frequencies(seqID[, seqType, …]) Create a frequency-based SequenceFrame for secondary structure assignation.

Mutation Methods

identify_mutants(seqID) Assess mutations of each decoy for sequence seqID againt the reference_sequence.
get_identified_mutants() List for which sequence identifiers mutants have been calculated.
get_mutation_count(seqID) Return the number of mutantion positions data for seqID available in the container.
get_mutation_positions(seqID) Return the mutantion positions data for seqID available in the container.
get_mutations(seqID) Return the mutantions data for seqID available in the container.
get_sequence_with(seqID, selection[, …]) Selects those decoys with a particular set of residue matches.
generate_mutant_variants(seqID, mutations[, …]) Expands selected decoy sequences generating all the provided mutant combinations.
generate_mutants_from_matrix(seqID, matrix, …) From a provided positional frequency matrix, generates count random variants.
generate_wt_reversions(seqID[, key_residues]) Generate all variant that revert decoy sequences to the reference_sequence.
make_resfile(seqID, header, filename[, write]) Generate a Rosetta resfile to match the design’s sequence assuming the reference_sequence as the starting point.
apply_resfile(seqID, filename[, rscript, …]) Apply a generated Rosetta resfile to the decoy.
score_by_pssm(seqID, matrix) Score sequences according to a provided PSSM matrix.
view_mutants_alignment(seqID[, …]) Generates a pretty representation alignment of the mutations in Jupyter Notebooks.

Miscellaneous

clean_rosetta_suffix() Remove the numerical suffix that Rosetta adds to the output identifiers.
retrieve_sequences_from_pdbs([prefix, dropna]) Obtain sequence data related to the decoys through their Rosetta-generated PDB files.