rstoolbox.io.
read_fastq
(filename, seqID='A')¶Reads a FASTQ file and stores the ID together with the sequence.
The default generated DesignFrame
will contain two columns:
Column Name | Data Content |
---|---|
description | Sequence identifier. |
sequence_<chain> | Sequence content. |
Parameters: | |
---|---|
Returns: |
Example
In [1]: from rstoolbox.io import read_fastq
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 500)
...: df = read_fastq("../rstoolbox/tests/data/cdk2_rand_001.fasq.gz")
...: df.head(8)
...:
Out[1]:
description sequence_A
0 cdk2_v0008 GGTGCGTCGTACTTTATGCAGATCCCCCATAGGCGCATGTCAGTATTCGGTATCGCCAAAGTGCACGCTCGTCACAAGCACTTAACAGGTGAGGTGGTAGCTCTTAAGAAAATACGCCTGTTCCAACCAGAACCAGGGCCGATCATGGTCAAGCCGAATATGTGTCCCTACTACTATGAATGGATTGGAAAGCGTAATCAACTGGATTCCTTTGCGCCCTGCATATCGTGTAAGATAAAGAAACGTGACACAAAGGTGAGGGGGGTTTGTTTTCATAATAGCGCAATACATTGTAAAAGTTATCGGTGCGTCGATCAAATCTTCTGCGGTTGTATAAAATGGATGATGATGGGCCGCGATTGTGAGGGGCAGGGGGAATCTCAGAATAATACGGATATAGGGGGTCCAACGGGATGTGATATCAATTGGCGAACATGTCATTTTACAGAACTTCGACATGACTGTGAAAACTGGCAAAGCGTCATCTGCAGTACTCATCACATATGTACGATGGGCCATATCGACCAGACTTCTGCTTCGGAGACCCAGGACTGGGATTCCTTTCAATGGGTGATGCTCCGATACATCCACGGCGAACAGAAGAAATATAGCATTCAGTTGGGCAATTGGGATGCTAAACAGGCAGTCAACATGCATAGACAGGAGCTGAAGGTGCTTGTGAAGAAGCGCCACGAGGAAGGCAAGATTTGTGCATGCTGCGTAATGTCACACATCGGTGTCGAAATTTCATTCTTTGGCAAGCGCTCACAGAGATTTCAGAGCGAATTTATGCAACATTGGGTGGCAAACTTCGCTATGAAGTTCAAATTTAGGAATATAGGTTGGCCACACACATCGTGGACCCAGCTCGCTGCACTGGGGGGTTGGGAGGGCTGGCACAAACCCGGGACT
1 cdk2_v0004 CATGGGATGCCAATCACAAACTGCCCGAGCGACCGATATGACCGACTTGAGCACATGTGTGTCCGCACATATCTGACTGGGGAGGTGGTGGCACTTAAAAAGATTCGGCTCCACGTGCAAGACATGGCCCATACGTTGGATCATACATTAGACCATATGAAGTGGGCGCAGTCTTTCCGTAACGGGTTGATGTACTCTGAACATCGGGGGCACTGTGCCTATCCTGTATGCTCCCTGAGATCCTCGACCGTAGTCAGGTGGACGATGGTTGTAGAATACCCCTTTTGGCACACCGCCTTATGGAAGCCCATTCAAGGCACGAAGGTGTTAATGATCGGGACGCGTAAAAACTGCGTGATCCAAATGTTAATGAGGTTCGAAACGAGGGCAAACGAAAACACAGCCTGTCCCAATACTAACTTTACTGATGGTGGCGAACGTTGTTGGTGTTGTGCTTGTCGGTTTTGTAAGCATGAGATGCTGCAGCATATAGAGGAGAAACAGATAGATATCACAGATTGGTGCCTGTTTATGAGTCAACGACAAGTAAGATTCAAATGGGTTGTACTCAGGCTCTGGTTAGATACTCCTATAAAGACAAGTTCAGCCGTAGGTATCGGCTCGACTAACGGGGCAACCGACAATTTCGAGGGGTGCAGTTGGGACACGATGGCCCTTGAGTATGGATCGCAAGAGCATAATAATTGCCCCGTTGACATTAGAGATAGACTGGAGTTTCAAGACGATGGCGGGCTGAGGAACCTAAATCCTAGTACTGACATATATCCCTACGAAATGACCCTTTTCTTGTTTATGATTAAAAAGTATACCTTTGTAAGATGTGAGGTTAATCTTGATTGCCAGATGAGACCAGAATGGATTGGTGATGCCTTG
2 cdk2_v0001 GGTGCGTCGAAATATTGTCCTCGGGCGCGGATTCAGTGCGAGCATTACCAGGAAGCGTTCGTTTGTCAGACGATAATAACATTAACTGGGGAGGTCGTTGCACTGAAAAAAATCAGACTGATGTTCTTTGAACAAAGCGCCGAGATGCTAAAACAAAGAATGCACGGGCATCATATGGGAGATGATCGAAGGGGCTGGGAATATGTTTCGTGCTGGTGGTGCTACGCCATCCATCGGTGGATCCATCATTCTCACTTTCATGAAATTCGTCAAGAAACTGTAACAATACTGGGGGAATACATTAGAATCACGTGCGATCAATATTTGTGCAAGTTCAAGTTTGCGGAGGTTATTCGAGATGCGTTTGTGGGGATGGAATGTATCACCGCGAAGAAAAAGTCGCAGAACAAAAGAAACGGAATACAGTATATGACTACAGCCAGTGTCGCGCTAACGCAATGGCACCAAGTAGGACTTTTCACTAACGTTAACCAACTTGACATTAATCAAATGACCGATTCCGCTCGAGAGGCTAACTTTACGCCTATTTATTGGATCAAACAGGACTGTTTCTTAAAGACACCATATCAGAACTACGAGGCTACGGTCTTCCAGACCGCAGACATTTGGTGCCGTCATGAGGCTGAATGCTGGGATCACCAAACATGGGATTGGCCCAATCCGCTAACCCAATTTTGTGAAGAACACAAGCCCAGCGATGTTAACGGGCTCGAGAATTATAGGGTCTTTTACTTTGACTGGGCATTCCATAAGGCTATACTCTGCCATGTCAAAGACATGGCACAACCGTTCGCTCTACGGGTATTCGACGAAGGCTGCTTTTGGCGATGTCAGGTTGAACAAGATTATACCCTCATCCCCGAGAGCTGGAAGTGCGTGACCCCCGGGACT
3 cdk2_v0005 ATGGCTAACAATTGTGATCCAGCAATGGAAGAGGTCATGCTACGATGCTTCGGCCTGGATAGATGCGGGTTACTCACAGGAGAGGTCGTGGCTCTTAAAAAAATAAGATTACACGAATACATCCACCAACTATGGATGAGCAGTTATCAGAGTCATAATGCGCACAAAACACGTTACTCAACTTGTCACTCCCAAGAACAGGTGTGTTGGCAATGTGATGTGTTCATATGTGCCTGGTGTGACCAAACATTCTTAGTCTATACCGTGAATGCTTATGATTATTGCTGCTGGTGGAGGAAAAGGTGCCTGAACGAGGGCACAACTTTCAAGATGCCCGTAGCAGTCCCAAACTGGCCCTACCAGCTAACACACCTTTGTGAGGAAGCAATCTCCATGGACTTTGGAATGGGGAATTGGATGTGTGAGCATCATGAAGAGTGGTTACACTATAGTCACATGAACATGTGCTTCTGTTTTTTTACACAGTGGATACAAGAATATGAAGACTACCAACCCGACATGTGTATAGATGTTAATCAACAATGTACGGTCATGGGCGATTACAAAGAAATACCTATTGAATGTAAAGTGCAGGCCTACATAGCTAGGTGCTTGATCTATCCCATTGTTAAAGCTGGGACACACACTTTCCATGGGGTAGGCTTTCCACCCGGTTGGGGTCAGGGTGATAAGTTCCATTGGAACCATAAGATGTCCAAGTTCCCTGGGGGGGAGATTATGAAAACGGTGTATGTCGTATGTCAAATCTCGGGGCCTCCTATGCAAAACGAATACCGGTACCTCAAGCCTTCAAATACGCTCCAGAACATGTCTTGGTATGATGGCAATCATACTTCATTGTCAGTCGCGAGCGGATGGGAAGACTTCTTCACT
4 cdk2_v0006 GGTGCGTCGTTCGCCACGTTACAAAAAGATAAACCGCCGGTCATGGACACTCCTAAGCATTGTGATGCAAAAAAGACATGGCTAACCGGGGAAGTGGTGGCCTTGAAAAAAATCAGGCTACGACCTCCGTGTACATCTCTTGGACAGAGATACGTCGCTATTGACATAAAACACGCAGTTTACAGCCATAAGCAGCATAGGTCAGTCATGGACATCTTTCACACGCTGTATTTTGGCAAAAAGTGGGCCATTCGCGTTAAAGAGGCAGACTCCGTGAGGGAACAGACAGCCTGGGATTTTTGGTGGAACTGGAAGCACATCAATCAGACGTGCGGTTTCGAGGACGAAGTTATACATCCAAACCAAATGTGCATGCGCATATTGAACAACACGAAACGGGACTTACTTTTCCAATGGCCTGTCGTGTCATGTGTGAAACGAGTCCATATCCGTATCCAAACCGCATTTCGTTTCTATGCGTGTATAGTAGGGTTTCCATATGAGCCCAAGATGGATATACGAACCCAAATATGTCGTGGAACGATCGAGGAGTGGTTCCGATTCGATGTCTATAGGGAACGTATTTGGAGAAATGAAATGTATTCACAGTCTGAGAAGAGTCACAATTGTTGGAACAATAAAAATACCCAAATGTGCGCTCCCAACATGTTAAAGGGTTCTCACAATGCCTGCAAACAATCTAGGCATCATTGGAACGCAATGGATATAAGGATCCACGACGAACTACGGATCGTAGGTAGTCCATATTATCAAACGATAAGCTACCGAGTGTACAACCTAAACACTCCAGTCAAAAAAAGTCGTAACAGCGGGACCGCTCGAGGCCATTGGCATGTTGGAAATCATCTAAAGTTACGGCACGATATATACGATTGTTGCGCTCCCGGGACT
5 cdk2_v0003 CTGTGGTTATGGATGGAAATTGGTTGGCGGCATAGGTGGCAATATAAAAGTGTGGACAATCAGGCTCCTATGTTGACGGGTGAAGTCGTGGCACTAAAAAAAATAAGGCTAGGCCATCATGAACAGCCGTCTAAGCAGCTAGAGCCCGAGATCGATATGGTTATGCTTCAAATAGACCACCGGTGTCACCTTCGGGTCGAGGATCACTACATAGGTCACAAAGATGGAATATCGCAATTCCCTCGTCAACCACTTGCGTGTTCCGTAAACAATCAAAAACTACACGACAGGGATCGGGACTGCTGTCACGATCTCCAAAAAATGAATTTGGTCGCTCTGATTAGTGAGCCGGCGCATGGCATGATCAACATGCGGTGTATGCCCATTCGAAGTCGATATGATCGAGATAACCCTACAGACTGTTCGACCATAGCAACTCCATCTGATAATGTTCAGCCAAACAAAGGTGCAGGAACCACACCTTATGGGCCAGAAATGTTAGAGGATCATTGGCCGTTGTGGAAGAGAACTACACGGTACGAGTGTCACTGGCACACAGATTGCGAACTTAAAAGAGACCCATGCGGCCCCCCCTTTTGGTATGCCCAACATGGTGGGTATGCGAGGTGGCAGGCAGGTTCCCTCACTTGGTGCCACGTGGATAACGAAGAATGCGCGAAAAACATTGACGGCGAATCTAAATGGCATCGCCTGATGATTCCACCGCAGGTACGGCTCTTTAAATTCCCAAAGCTGTCGCCTTGGCCGGTTCGGGTTTGTAATCCTCCAGTAAGTGGTCTATTTCCCTTAGAATTTCAAGAACGGACTGATGAATATATGCAAGTTTACGCCGGATTCGACTTAGCAATGGGCACCAACATGCAGAAGCGATAC
6 cdk2_v0001 GGTGCGTCGAAATATTGTCCTCGGGCGCGGATTCAGTGCGAGCATTACCAGGAAGCGTTCGTTTGTCAGACGATAATAACATTAACTGGGGAGGTCGTTGCACTGAAAAAAATCAGACTGATGTTCTTTGAACAAAGCGCCGAGATGCTAAAACAAAGAATGCACGGGCATCATATGGGAGATGATCGAAGGGGCTGGGAATATGTTTCGTGCTGGTGGTGCTACGCCATCCATCGGTGGATCCATCATTCTCACTTTCATGAAATTCGTCAAGAAACTGTAACAATACTGGGGGAATACATTAGAATCACGTGCGATCAATATTTGTGCAAGTTCAAGTTTGCGGAGGTTATTCGAGATGCGTTTGTGGGGATGGAATGTATCACCGCGAAGAAAAAGTCGCAGAACAAAAGAAACGGAATACAGTATATGACTACAGCCAGTGTCGCGCTAACGCAATGGCACCAAGTAGGACTTTTCACTAACGTTAACCAACTTGACATTAATCAAATGACCGATTCCGCTCGAGAGGCTAACTTTACGCCTATTTATTGGATCAAACAGGACTGTTTCTTAAAGACACCATATCAGAACTACGAGGCTACGGTCTTCCAGACCGCAGACATTTGGTGCCGTCATGAGGCTGAATGCTGGGATCACCAAACATGGGATTGGCCCAATCCGCTAACCCAATTTTGTGAAGAACACAAGCCCAGCGATGTTAACGGGCTCGAGAATTATAGGGTCTTTTACTTTGACTGGGCATTCCATAAGGCTATACTCTGCCATGTCAAAGACATGGCACAACCGTTCGCTCTACGGGTATTCGACGAAGGCTGCTTTTGGCGATGTCAGGTTGAACAAGATTATACCCTCATCCCCGAGAGCTGGAAGTGCGTGACCCCCGGGACT
7 cdk2_v0012 CTCGGGAACCAAATCGCTGCGGCTCCGTCCGTTGCTCCTCTGCGGGCGGATCTCGAAAACCGACCTCGGCCATTGACGGGTGAGGTCGTTGCTCTGAAGAAAATTAGGTTACGCTGTATGCAGTTCGGGTCGATGTATCAAAAACCCGAACAGGCAGGGTGGATGGCCCCCCGTTACCATTATTTTTATCAACAACATTGTTGGATCGAAATGATCGCCGCGGAGCGCATGGGGAAGGCCGATGAAGATGTAAATTGGTCCGTGGTTTGGTTTTATAGACACAGCGAATATTGGAATTCCGTATTTTACATCTTCCAATGTATGTGCGAACACATAGGTGGCATCCACGCCCAGGGAGGACAATTAACCATGGAAAATGCAATTACCAATGAAGAACCCTTCGCGAAGTGGGGTAATAGTATGACGGTAGCTCATACTAAACTCAAGGCATGGAACGCCATGATGGATGTAGGAGAGGACCACTGCGACTTTCAGTTTAACCAAGACGTGAGAGGGCAAATAACCCTGACGGAGGAGGCAGTCCACACCATGTTTGGAATATTCAAATGGATACTATGGTACATGTGGGGTGCTATGCAATGGAGGAAACACTCGGTTTACGCGGTTAAGGAGCAATACTGGGATTGCCGCGAGGAAATGGATTGGATGTGCACTTGGATGCATAAATGTTGGAAAATGTACTGGTCGGATTGGTGGGGCAAGGAAGAGCTCGAATTCCGATCTGACGCCAAGCCTATCACGAAAATGATGATGTGTTGGATCCTACCGTACTCCCTACACGAGGATACCAGGCAGGTTTTTTCCCCTGCAAACTGTATTTGGTTCTGTGGGAGTGTATACTTGCAAAAGATATGGAAGATGCACAAGGACGCT