Correlate Metagenomic and Metatranscriptomic Data¶
GitHub repository: speeding-up-science-workshops/speeding-up-sci-correlation/
Summary¶
Visualization codes from the 2nd "speeding up science workshop". This repository contains codes to plot and calculate correlation (linear regression) between metagenomic and metatranscriptomic sequencing results acquired from the same sample.
There is an example of visualizing gene abundances (DNA) compared to their expression levels (RNA) included in the binder. The data are from one Mediterranean site from the TARA Oceans project (https://science.sciencemag.org/content/348/6237/1261359.long). The metagenomics come from sample accession SAMEA2619782, and the metatransciptomics come from SAMEA2619784.
Quick Start¶
- Click the jupyter notebook file (Correlation_dna_rna.ipynb) to enter the interactive user interface.
- You can either run the notebook with the included example or upload new data files by clicking the
Upload
button at the upper right corner of the binder homepage. See below for examples of input files. - The code chunks can be excuted by pressing
Ctrl
+Enter
, or click theRun
button on top of the notebook.
Example Input¶
1. A count table containing genes found in both DNA and RNA sequencing results.¶
Gene | DNA | RNA | |
---|---|---|---|
0 | TOBG-MED-1076_1101 | 3.57863 | 12.9926 |
1 | TOBG-MED-1076_1116 | 0.71486 | 4.03726 |
2 | TOBG-MED-1076_1131 | 7.72704 | 5.45492 |
3 | TOBG-MED-1076_1151 | 2.85944 | 15.5723 |
4 | TOBG-MED-1076_1195 | 8.81305 | 12.3797 |
2. An annotation table. If a gene has "nan" value for KO ID, this means that this gene does not have any match within the KEGG database. These "nan" values will be removed.¶
Gene | KO_ID | |
---|---|---|
0 | TOBG-MED-1076_1019 | nan |
1 | TOBG-MED-1076_1027 | nan |
2 | TOBG-MED-1076_1028 | K04084 |
3 | TOBG-MED-1076_1032 | nan |
4 | TOBG-MED-1076_1038 | K05540 |
3. A KEGG Orthology table. Each KO ID may belong to multiple pathways. Therefore, the user will need to manually curate this table.¶
KO_ID | Category1 | |
---|---|---|
0 | K00360 | Nitrogen metabolism |
1 | K00362 | Nitrogen metabolism |
2 | K00363 | Nitrogen metabolism |
3 | K00366 | Nitrogen metabolism |
4 | K00367 | Nitrogen metabolism |
Example Output¶
The figure below is a static example of the output figure. The actual figure generated by is an interactive plot. User can hover over each dot and line to see their annotation.
Authors¶
- Zhengyao "Zeya" Xue, Github ID @zeyaxue and ORCID
- Michael D. Lee, Github ID @AstrobioMike and ORCID