rstoolbox.utils.
sequencing_enrichment
(indata, enrichment=None, bounds=None, matches=None, seqID='A')¶Retrieve data from multiple NGS files.
Allows to obtain data from multiple files while ataching them to two conditions, a primary one (key1) and a secondary one (key2).
For instance, let’s assume that one has data obtained through selection of sequences by two
different binders and three different concentration of binder each; we would define a
indata
dictionary such as:
{'binder1': {'conc1': 'file1.fastq', 'conc2': 'file2.fastq', 'conc3': 'file3.fastq'},
'binder2': {'conc1': 'file4.fastq', 'conc2': 'file5.fastq', 'conc3': 'file6.fastq'}}
Also, for each binder we could decide to calculate the enrichment between any two
concentrations; we can do that by defining a enrichment
dictionary such as:
{'binder1': ['conc1', 'conc3'],
'binder2': ['conc1', 'conc3']}
Parameters: |
|
---|---|
Returns: |
|
Example
(We skip printing the sequence column to ease visibility of the differences)
In [1]: from rstoolbox.io import read_fastq
...: from rstoolbox.utils import sequencing_enrichment
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 20)
...: indat = {'binder1': {'conc1': '../rstoolbox/tests/data/cdk2_rand_001.fasq.gz',
...: 'conc2': '../rstoolbox/tests/data/cdk2_rand_002.fasq.gz',
...: 'conc3': '../rstoolbox/tests/data/cdk2_rand_003.fasq.gz'},
...: 'binder2': {'conc1': '../rstoolbox/tests/data/cdk2_rand_004.fasq.gz',
...: 'conc2': '../rstoolbox/tests/data/cdk2_rand_005.fasq.gz',
...: 'conc3': '../rstoolbox/tests/data/cdk2_rand_006.fasq.gz'}}
...: df = sequencing_enrichment(indat)
...: df[[_ for _ in df.columns if _ != 'sequence_A']].head()
...:
Out[1]:
description binder1_conc1 binder1_conc2 binder1_conc3 binder2_conc1 binder2_conc2 binder2_conc3 len
0 0 4.0 1.0 0.0 1.0 0.0 3.0 304
1 1 4.0 2.0 1.0 2.0 1.0 0.0 304
2 2 3.0 2.0 4.0 1.0 1.0 1.0 304
3 3 3.0 1.0 1.0 1.0 0.0 3.0 304
4 4 3.0 0.0 1.0 2.0 2.0 1.0 298
In [2]: enrich = {'binder1': ['conc1', 'conc3'],
...: 'binder2': ['conc1', 'conc3']}
...: df = sequencing_enrichment(indat, enrich)
...: df[[_ for _ in df.columns if _ != 'sequence_A']].head()
...: