rstoolbox.utils.translate_3frames

rstoolbox.utils.translate_3frames(sequence, matches=None)

Translates DNA to protein trying all possible frames.

Tests the three possible reading frames. To decide which one to return, it can follow a double logic: when matches is None it will return the longest sequence until a stop codon, otherwise it will return the longest sequence that contains the required match protein sequence. If none match, it will return an empty str.

All provided matches need to be found

Parameters:
  • sequence (str) – DNA sequence
  • matches (list() of str) – sequence pattern to match
Returns:

str

Example 1: No matches - Prepend nucleotides to change reading frame

In [1]: from rstoolbox.io import read_fastq
   ...: from rstoolbox.utils import translate_3frames
   ...: import pandas as pd
   ...: pd.set_option('display.width', 1000)
   ...: pd.set_option('display.max_columns', 500)
   ...: df = read_fastq("../rstoolbox/tests/data/cdk2_rand_001.fasq.gz")
   ...: df.iloc[0]['sequence_A']
   ...: 
Out[1]: 'GGTGCGTCGTACTTTATGCAGATCCCCCATAGGCGCATGTCAGTATTCGGTATCGCCAAAGTGCACGCTCGTCACAAGCACTTAACAGGTGAGGTGGTAGCTCTTAAGAAAATACGCCTGTTCCAACCAGAACCAGGGCCGATCATGGTCAAGCCGAATATGTGTCCCTACTACTATGAATGGATTGGAAAGCGTAATCAACTGGATTCCTTTGCGCCCTGCATATCGTGTAAGATAAAGAAACGTGACACAAAGGTGAGGGGGGTTTGTTTTCATAATAGCGCAATACATTGTAAAAGTTATCGGTGCGTCGATCAAATCTTCTGCGGTTGTATAAAATGGATGATGATGGGCCGCGATTGTGAGGGGCAGGGGGAATCTCAGAATAATACGGATATAGGGGGTCCAACGGGATGTGATATCAATTGGCGAACATGTCATTTTACAGAACTTCGACATGACTGTGAAAACTGGCAAAGCGTCATCTGCAGTACTCATCACATATGTACGATGGGCCATATCGACCAGACTTCTGCTTCGGAGACCCAGGACTGGGATTCCTTTCAATGGGTGATGCTCCGATACATCCACGGCGAACAGAAGAAATATAGCATTCAGTTGGGCAATTGGGATGCTAAACAGGCAGTCAACATGCATAGACAGGAGCTGAAGGTGCTTGTGAAGAAGCGCCACGAGGAAGGCAAGATTTGTGCATGCTGCGTAATGTCACACATCGGTGTCGAAATTTCATTCTTTGGCAAGCGCTCACAGAGATTTCAGAGCGAATTTATGCAACATTGGGTGGCAAACTTCGCTATGAAGTTCAAATTTAGGAATATAGGTTGGCCACACACATCGTGGACCCAGCTCGCTGCACTGGGGGGTTGGGAGGGCTGGCACAAACCCGGGACT'

In [2]: translate_3frames('AT' + df.iloc[0]['sequence_A'])
Out[2]: 'GASYFMQIPHRRMSVFGIAKVHARHKHLTGEVVALKKIRLFQPEPGPIMVKPNMCPYYYEWIGKRNQLDSFAPCISCKIKKRDTKVRGVCFHNSAIHCKSYRCVDQIFCGCIKWMMMGRDCEGQGESQNNTDIGGPTGCDINWRTCHFTELRHDCENWQSVICSTHHICTMGHIDQTSASETQDWDSFQWVMLRYIHGEQKKYSIQLGNWDAKQAVNMHRQELKVLVKKRHEEGKICACCVMSHIGVEISFFGKRSQRFQSEFMQHWVANFAMKFKFRNIGWPHTSWTQLAALGGWEGWHKPGT'

Example 2: With matches - Prepend nucleotides to change reading frame

In [3]: from rstoolbox.io import read_fastq
   ...: from rstoolbox.utils import translate_3frames
   ...: import pandas as pd
   ...: pd.set_option('display.width', 1000)
   ...: pd.set_option('display.max_columns', 500)
   ...: df = read_fastq("../rstoolbox/tests/data/cdk2_rand_001.fasq.gz")
   ...: matches = ['GAS', 'FFG']
   ...: translate_3frames('AT' + df.iloc[0]['sequence_A'], matches)
   ...: 
Out[3]: 'GASYFMQIPHRRMSVFGIAKVHARHKHLTGEVVALKKIRLFQPEPGPIMVKPNMCPYYYEWIGKRNQLDSFAPCISCKIKKRDTKVRGVCFHNSAIHCKSYRCVDQIFCGCIKWMMMGRDCEGQGESQNNTDIGGPTGCDINWRTCHFTELRHDCENWQSVICSTHHICTMGHIDQTSASETQDWDSFQWVMLRYIHGEQKKYSIQLGNWDAKQAVNMHRQELKVLVKKRHEEGKICACCVMSHIGVEISFFGKRSQRFQSEFMQHWVANFAMKFKFRNIGWPHTSWTQLAALGGWEGWHKPGT'

In [4]: translate_3frames('AT' + df.iloc[1]['sequence_A'], matches)
Out[4]: ''