rstoolbox.utils.
adapt_length
(seqlist, start, stop, inclusive=False)¶Pick only the sequence between the provided pattern tags.
When inclusive
is False
and the boundary tags are
not found, the original sequence is returned, as it is assumed
that the tags were out of the boundary of the retrieved sequence.
When inclusive
is True
and the boundary tags are
not found, an empty sequence is returned for that position, as
we understand that the interest was of getting them too and we could
not.
Parameters: | |
---|---|
Returns: |
|
Example
In [1]: from rstoolbox.io import read_fastq
...: from rstoolbox.utils import translate_dna_sequence
...: from rstoolbox.utils import adapt_length
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 500)
...: df = read_fastq("../rstoolbox/tests/data/cdk2_rand_001.fasq.gz")
...: df['sequence_A'] = df.apply(lambda row: translate_dna_sequence(row['sequence_A']),
...: axis=1)
...: bounds = ['GAS', 'FFG']
...: df['sequence_A'].values[:5]
...:
Out[1]:
array(['GASYFMQIPHRRMSVFGIAKVHARHKHLTGEVVALKKIRLFQPEPGPIMVKPNMCPYYYEWIGKRNQLDSFAPCISCKIKKRDTKVRGVCFHNSAIHCKSYRCVDQIFCGCIKWMMMGRDCEGQGESQNNTDIGGPTGCDINWRTCHFTELRHDCENWQSVICSTHHICTMGHIDQTSASETQDWDSFQWVMLRYIHGEQKKYSIQLGNWDAKQAVNMHRQELKVLVKKRHEEGKICACCVMSHIGVEISFFGKRSQRFQSEFMQHWVANFAMKFKFRNIGWPHTSWTQLAALGGWEGWHKPGT',
'HGMPITNCPSDRYDRLEHMCVRTYLTGEVVALKKIRLHVQDMAHTLDHTLDHMKWAQSFRNGLMYSEHRGHCAYPVCSLRSSTVVRWTMVVEYPFWHTALWKPIQGTKVLMIGTRKNCVIQMLMRFETRANENTACPNTNFTDGGERCWCCACRFCKHEMLQHIEEKQIDITDWCLFMSQRQVRFKWVVLRLWLDTPIKTSSAVGIGSTNGATDNFEGCSWDTMALEYGSQEHNNCPVDIRDRLEFQDDGGLRNLNPSTDIYPYEMTLFLFMIKKYTFVRCEVNLDCQMRPEWIGDAL',
'GASKYCPRARIQCEHYQEAFVCQTIITLTGEVVALKKIRLMFFEQSAEMLKQRMHGHHMGDDRRGWEYVSCWWCYAIHRWIHHSHFHEIRQETVTILGEYIRITCDQYLCKFKFAEVIRDAFVGMECITAKKKSQNKRNGIQYMTTASVALTQWHQVGLFTNVNQLDINQMTDSAREANFTPIYWIKQDCFLKTPYQNYEATVFQTADIWCRHEAECWDHQTWDWPNPLTQFCEEHKPSDVNGLENYRVFYFDWAFHKAILCHVKDMAQPFALRVFDEGCFWRCQVEQDYTLIPESWKCVTPGT',
'MANNCDPAMEEVMLRCFGLDRCGLLTGEVVALKKIRLHEYIHQLWMSSYQSHNAHKTRYSTCHSQEQVCWQCDVFICAWCDQTFLVYTVNAYDYCCWWRKRCLNEGTTFKMPVAVPNWPYQLTHLCEEAISMDFGMGNWMCEHHEEWLHYSHMNMCFCFFTQWIQEYEDYQPDMCIDVNQQCTVMGDYKEIPIECKVQAYIARCLIYPIVKAGTHTFHGVGFPPGWGQGDKFHWNHKMSKFPGGEIMKTVYVVCQISGPPMQNEYRYLKPSNTLQNMSWYDGNHTSLSVASGWEDFFT',
'GASFATLQKDKPPVMDTPKHCDAKKTWLTGEVVALKKIRLRPPCTSLGQRYVAIDIKHAVYSHKQHRSVMDIFHTLYFGKKWAIRVKEADSVREQTAWDFWWNWKHINQTCGFEDEVIHPNQMCMRILNNTKRDLLFQWPVVSCVKRVHIRIQTAFRFYACIVGFPYEPKMDIRTQICRGTIEEWFRFDVYRERIWRNEMYSQSEKSHNCWNNKNTQMCAPNMLKGSHNACKQSRHHWNAMDIRIHDELRIVGSPYYQTISYRVYNLNTPVKKSRNSGTARGHWHVGNHLKLRHDIYDCCAPGT'],
dtype=object)
In [2]: adapt_length(df['sequence_A'].values[:5], bounds[0], bounds[1])