rstoolbox.components.DesignFrame.sequence_bits

DesignFrame.sequence_bits(seqID, seqType='protein', cleanExtra=True, cleanUnused=False)

Create a bit-based SequenceFrame.

Generates a SequenceFrame for the bits of the sequences in the designFrame with seqID identifier. If there is a reference_sequence for this seqID, it will also be attached to the SequenceFrame.

Bit calculation is performed as explained in http://www.genome.org/cgi/doi/10.1101/gr.849004 such as:

Rseq = Smax - Sobs = log2 N - (-sum(n=1,N):pn * log2 pn)
Where:
  • N is the total number of options (4: DNA/RNA; 20: PROTEIN).
  • pn is the observed frequency of the symbol n.
Parameters:
  • seqID (str) – Identifier of the sequence of interest.
  • seqType (str) – Type of sequence: protein, dna, rna.
  • cleanExtra (bool) – Remove from the class:.SequenceFrame the non-regular amino/nucleic acids if they are empty for all positions.
  • cleanUnused (int) – Remove from the class:.SequenceFrame the regular amino/nucleic acids if they frequency is equal or under the value . Default is -1, so nothing is deleted.
Returns:

SequenceFrame