rstoolbox.utils.translate_3frames(sequence, matches=None)¶Translates DNA to protein trying all possible frames.
Tests the three possible reading frames. To decide which one to return,
it can follow a double logic: when matches is None it will
return the longest sequence until a stop codon, otherwise it will return
the longest sequence that contains the required match protein sequence. If
none match, it will return an empty str.
All provided matches need to be found
| Parameters: | |
|---|---|
| Returns: |
Example 1: No matches - Prepend nucleotides to change reading frame
In [1]: from rstoolbox.io import read_fastq
...: from rstoolbox.utils import translate_3frames
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 500)
...: df = read_fastq("../rstoolbox/tests/data/cdk2_rand_001.fasq.gz")
...: df.iloc[0]['sequence_A']
...:
Out[1]: 'GGTGCGTCGTACTTTATGCAGATCCCCCATAGGCGCATGTCAGTATTCGGTATCGCCAAAGTGCACGCTCGTCACAAGCACTTAACAGGTGAGGTGGTAGCTCTTAAGAAAATACGCCTGTTCCAACCAGAACCAGGGCCGATCATGGTCAAGCCGAATATGTGTCCCTACTACTATGAATGGATTGGAAAGCGTAATCAACTGGATTCCTTTGCGCCCTGCATATCGTGTAAGATAAAGAAACGTGACACAAAGGTGAGGGGGGTTTGTTTTCATAATAGCGCAATACATTGTAAAAGTTATCGGTGCGTCGATCAAATCTTCTGCGGTTGTATAAAATGGATGATGATGGGCCGCGATTGTGAGGGGCAGGGGGAATCTCAGAATAATACGGATATAGGGGGTCCAACGGGATGTGATATCAATTGGCGAACATGTCATTTTACAGAACTTCGACATGACTGTGAAAACTGGCAAAGCGTCATCTGCAGTACTCATCACATATGTACGATGGGCCATATCGACCAGACTTCTGCTTCGGAGACCCAGGACTGGGATTCCTTTCAATGGGTGATGCTCCGATACATCCACGGCGAACAGAAGAAATATAGCATTCAGTTGGGCAATTGGGATGCTAAACAGGCAGTCAACATGCATAGACAGGAGCTGAAGGTGCTTGTGAAGAAGCGCCACGAGGAAGGCAAGATTTGTGCATGCTGCGTAATGTCACACATCGGTGTCGAAATTTCATTCTTTGGCAAGCGCTCACAGAGATTTCAGAGCGAATTTATGCAACATTGGGTGGCAAACTTCGCTATGAAGTTCAAATTTAGGAATATAGGTTGGCCACACACATCGTGGACCCAGCTCGCTGCACTGGGGGGTTGGGAGGGCTGGCACAAACCCGGGACT'
In [2]: translate_3frames('AT' + df.iloc[0]['sequence_A'])