These are the list of dedicated objects provided to manage design data. They can be called through rstoolbox.components
.
Selection ([selection]) |
Complex management of residue selection from a sequence. |
SelectionContainer (*args) |
Helper class to manage representation of selectors in pandas . |
DesignSeries (*args, **kwargs) |
The DesignSeries extends the Series adding some functionalities in order to improve its usability in the analysis of a single design decoys. |
DesignFrame (*args, **kwargs) |
The DesignFrame extends the DataFrame adding some functionalities in order to improve its usability in the analysis of sets of design decoys. |
SequenceFrame (*args, **kwargs) |
Per position frequency occurrence for a set of decoys. |
FragmentFrame (*args, **kw) |
Data container for Fragment data. |
Helper functions to read/write direct sequence information. They can be called through rstoolbox.io
.
read_fasta (filename[, expand, multi, defchain]) |
Reads one or more FASTA files and returns the appropiate object containing the requested data: the DesignFrame . |
write_fasta (df, seqID[, separator, …]) |
Writes fasta files of the selected decoys. |
write_clustalw (df, seqID[, filename]) |
Write sequences of selected designs as a CLUSTALW alignment. |
write_mutant_alignments (df, seqID[, filename]) |
Writes a text file containing only the positions changed with respect to the reference_sequence . |
read_hmmsearch (filename) |
Read output from hmmsearch or hmmscan . |
pymol_mutant_selector (df) |
Generate selectors for the mutations in target decoys. |
Helper functions to read/write outputs of programs based on protein structure. They can be called through rstoolbox.io
.
parse_master_file (filename[, max_rmsd, …]) |
Load data obtained from a MASTER search. |
Helper functions to read/write data generated with Rosetta. They can be called through rstoolbox.io
.
parse_rosetta_file (filename[, description, …]) |
Read a Rosetta score or silent file and returns the design population in a DesignFrame . |
parse_rosetta_json (filename) |
Read a json formated rosetta score file. |
parse_rosetta_pdb (filename[, keep_weights, …]) |
Read the POSE_ENERGIES_TABLE from a Rosetta output PDB file. |
parse_rosetta_contacts (filename) |
Read a residue contact file as generated by ContactMapMover. |
parse_rosetta_fragments (filename[, source]) |
Read a Rosetta fragment-file and return the appropiate FragmentFrame . |
write_rosetta_fragments (df[, frag_size, …]) |
Writes a Rosetta fragment-file (new format) from an appropiate FragmentFrame . |
write_fragment_sequence_profiles (df[, …]) |
Write a sequence profile from FragmentFrame to load into Rosetta’s SeqprofConsensus. |
get_sequence_and_structure (pdbfile[, …]) |
Provided a PDB file, it will run a small RosettaScript to capture its sequence and structure, i.e. |
make_structures (df[, outdir, tagsfilename, …]) |
Extract the selected decoys (if any). |
Helper functions to read/write data generated through wedlab experiments. They can be called through rstoolbox.io
.
read_SPR (filename) |
Reads Surface Plasmon Resonance data. |
read_CD (dirname[, prefix, invert_temp, …]) |
Read Circular Dichroism data for multiple temperatures. |
read_MALS (filename[, mmfile]) |
Read data from Multi-Angle Light Scattering data. |
read_fastq (filename[, seqID]) |
Reads a FASTQ file and stores the ID together with the sequence. |
Helper functions for sequence analysis. They can be called through rstoolbox.analysis
.
sequential_frequencies (df, seqID[, query, …]) |
Generates a SequenceFrame for the frequencies of the sequences in the DesignFrame with seqID identifier. |
sequence_similarity (df, seqID[, …]) |
Evaluate the sequence similarity between each decoy and the reference_sequence for a given seqID . |
positional_sequence_similarity (df[, seqID, …]) |
Per position identity and similarity against a reference_sequence . |
binary_similarity (df, seqID[, key_residues, …]) |
Binary profile for each design sequence against the reference_sequence . |
binary_overlap (df, seqID[, key_residues, matrix]) |
Overlap the binary similarity representation of all decoys in a DesignFrame . |
positional_enrichment (df, other, seqID) |
Calculates per-residue enrichment from sequences in the first DesignFrame with respect to the second. |
positional_structural_count (df[, seqID, …]) |
Percentage of secondary structure types for each sequence position of all decoys. |
positional_structural_identity (df[, seqID, …]) |
Per position evaluation of how many times the provided data matches the expected reference_structure . |
secondary_structure_percentage (df, seqID[, …]) |
Calculate the percentage of the different secondary structure types. |
selector_percentage (df, seqID, key_residues) |
Calculate the percentage coverage of a Selection over the sequence. |
label_percentage (df, seqID, label) |
Calculate the percentage coverage of a label over the sequence. |
label_sequence (df, seqID, label[, complete]) |
Gets the sequence of a label . |
cumulative (values[, bins, max_count, …]) |
Generates, for a given list of values, its cumulative distribution values. |
Once the data is loaded in the different components, it is ready to use into any
plotting library, but some special plotting alternatives are offered through rstoolbox.plot
.
multiple_distributions (df, fig, grid[, …]) |
Automatically plot boxplot distributions for multiple score types of the decoy population. |
sequence_frequency_plot (df, seqID, ax[, …]) |
Makes a heatmap subplot into the provided axis showing the sequence distribution of each residue type for each position. |
logo_plot (df, seqID[, refseq, key_residues, …]) |
Generates full figure classic LOGO plots. |
logo_plot_in_axis (df, seqID, ax[, refseq, …]) |
Generates classic LOGO plot in a given axis. |
positional_sequence_similarity_plot (df, ax) |
Generates a plot covering the amount of identities and positives matches from a population of designs to a reference sequence according to a substitution matrix. |
per_residue_matrix_score_plot (df, seqID, ax) |
Plot a linear representation of the scoring obtained by applying a substitution matrix. |
positional_structural_similarity_plot (df, ax) |
Generates a bar plot for positional prevalence of secondary structure elements. |
plot_fragments (small_frags, large_frags, …) |
Plot RMSD quality of a pair of FragmentFrame in two provided axis. |
plot_fragment_profiles (fig, small_frags, …) |
Plots a full summary of the a FragmentFrame quality with sequence and expected secondary structure match. |
plot_alignment (df, seqID, ax[, line_break, …]) |
Make an image representing the alignment of sequences with higlights to mutant positions. |
plot_ramachandran (df, seqID, fig[, grid, …]) |
Generates a ramachandran plot in RAMPAGE style. |
plot_ramachandran_single (df, seqID, ax[, …]) |
Plot only one of the 4 ramachandran plots in RAMPAGE format. |
plot_dssp_vs_psipred (df, seqID, ax) |
Generates a horizontal heatmap showing differences in psipred predictions to dssp assignments. |
Plot data obtained from experimental procedures. Accessible through rstoolbox.plot
.
plot_96wells ([cdata, sdata, bdata, bcolors, …]) |
Plot data of a 96 well plate into an equivalent-shaped plot. |
plot_thermal_melt (df, ax[, linecolor, …]) |
Plot Thermal Melt data. |
plot_MALS (df, ax[, uvcolor, lscolor, …]) |
Plot Multi-Angle Light Scattering data. |
plot_CD (df, ax[, color, wavelengths, sample]) |
Plot Circular Dichroism data. |
plot_SPR (df, ax[, datacolor, fitcolor, …]) |
Plot Surface Plasmon Resonance data. |
Special functions to help personalise your plot easily can be loaded through rstoolbox.utils
.
format_Ipython () |
Ensure monospace representation of DataFrame in Jupyter Notebooks. |
highlight (row, selection[, color, …]) |
Highlight rows in Jupyter Notebooks that match the given index. |
use_qgrid (df, **kwargs) |
Create a QgridWidget object from the qgrid library in Jupyter Notebooks. |
add_left_title (ax, title, **kwargs) |
Add a centered title on the left of the selected axis. |
add_right_title (ax, title, **kwargs) |
Add a centered title on rigth of the selected axis. |
add_top_title (ax, title, **kwargs) |
Add a centered title on top of the selected axis. |
edit_legend_text (ax, labels[, title]) |
Change the labels and title of a legend. |
add_white_to_cmap ([color, cmap, n_colors]) |
Generate a new colormap with white as first instance. |
color_variant (color[, brightness_offset]) |
Make a color darker or lighter. |
Functions aimed to help assess a design population in the context of known protein structures.
load_refdata (ref[, homology]) |
Load the predefined reference data from cath , scop , scop2 or chain . |
make_redundancy_table ([precalculated, select]) |
Query into the PDB to retrieve the pre-calculated homology tables. |
plot_in_context (df, fig, grid, refdata[, …]) |
Plot position of decoys in a backgroud reference dataset. |
distribution_quality (df, refdata, values, …) |
Locate the quantile position of each putative DesingSerie in a list of score distributions. |
Special functions to help transform your data can be loaded through rstoolbox.utils
.
add_column (df, name, value) |
Adds a new column to the DataFrame with the given value. |
split_values (df, keys) |
Reshape the data to aide plotting of multiple comparable scores. |
split_dataframe_rows (df, column_selectors[, …]) |
Given a dataframe in which certain columns are lists, it splits these lists making new rows in the DataFrame out of itself. |
report (df) |
Cast basic sequence count into pdb count for the appropiate columns. |
concat_fragments (fragment_list) |
Combine multiple FragmentFrame . |
Get the RosettaScripts that are called by different functions of the library with rstoolbox.utils
.
baseline ([minimize]) |
RosettaScript to calculate DSSP secondary structure and phi-psi angles. |
mutations ([seqID]) |
RosettaScript to execute a RESFILE. |
Special functions to help obtain data from multiple Next Generation Sequencing data.Can be loaded through rstoolbox.utils
.
translate_dna_sequence (sequence) |
Translates DNA to protein. |
translate_3frames (sequence[, matches]) |
Translates DNA to protein trying all possible frames. |
adapt_length (seqlist, start, stop[, inclusive]) |
Pick only the sequence between the provided pattern tags. |
sequencing_enrichment (indata[, enrichment, …]) |
Retrieve data from multiple NGS files. |
This functions are only of interest if you plan on writing new functionalities in rstoolbox
.
io.open_rosetta_file (filename[, multi, …]) |
Internal function; reads through a Rosetta silent file and yields only the lines that the library knows how to parse. |
components.get_selection (key_residues, seqID) |
Internal function; global management and casting of Selection . |
utils.make_rosetta_app_path (application) |
Provided the expected Rosetta application, add path and suffix. |
tests.helper.random_frequency_matrix (size[, …]) |
Generate a random frequency matrix. |
tests.helper.random_proteins (size, count) |
Generate random protein sequences. |
tests.helper.random_fastq (sequence, …) |
Generate a requested number of fastq files. |