PC_ENRICH: Enrichment visualization
-----------------------------------------
For a small genome (e.g., yeast), the sequencing depth is generally enough (> 10-fold). In such cases, the genome-wide ChIP/Input enrichment distribution is informative because the technical and biological bias in high throughput sequencing can be minimized.
Here, we show an example according to the sample script ``sample.yeast.sh``, which can be found in the "tutorial" directory.
Downloading data
+++++++++++++++++++++++++++++++
Here, we use the data of replication analysis (Repli-seq) for *S. cerevisiae*, which can be treated in the same manner as ChIP-seq.
The original paper is: `Origin Association of Sld3, Sld7, and Cdc45 Proteins Is a Key Step for Determination of Origin-Firing Timing `_
The CRAM-format map files can be downloaded from our Google Drive account:
- `YST1019_Gal_0min-n2-k1.sort.cram `_
- `YST1019_Gal_60min-n2-k1.sort.cram `_
- `YST1019_Raf_0min-n2-k1.sort.cram `_
- `YST1019_Raf_60min-n2-k1.sort.cram `_
- `YST1053_Gal_0min-n2-k1.sort.cram `_
- `YST1053_Gal_60min-n2-k1.sort.cram `_
Parse2wig
++++++++++++++++++++++++++++++
The command below generates a bigWig data for the six CRAM files::
gt=../data/genometable/genometable.sacCer3.txt
mptable=../data/mptable/mptable.UCSC.sacCer3.50mer.flen150.txt
for cell in YST1019_Gal YST1019_Raf YST1053_Gal; do
for min in 0min 60min; do
cram=${cell}_${min}-n2-k1.sort.cram
parse2wig+ -i $cram -o ${cell}_${min} --gt $gt --mptable $mptable -n GR
done
done
Generating the enrichment distribution
++++++++++++++++++++++++++++++++++++++++++
To generate a PDF file of the enrichment distribution for *S. cerevisiae* with the gene annotation, type::
$ dir=parse2wigdir+
$ gene=../data/S_cerevisiae/SGD_features.tab
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast --gt $gt -g $gene --gftype 2 \
--scale_ratio 1 --ls 200 --sm 10 --lpp 3
.. figure:: img/drompa_yeast.jpg
:width: 600px
:align: center
:alt: Alternate
Generation of the enrichment distribution of *S. cerevisiae*.
Supply the ``--ars`` option to visualize the DNA replication origin (ARS) available for *S. cerevisiae* and *S. pombe*.
The annotation data can be obtained from `OriDB `_.::
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast-ARS --gt $gt --ars $ars \
--scale_ratio 1 --ls 200 --sm 10 --lpp 3
.. figure:: img/drompa_yeast-ARS.jpg
:width: 600px
:align: center
:alt: Alternate
Visualization of the DNA replication origin available for *S. cerevisiae*.
To check the enrichment level accurately, specify the number of y-axis memories and y-axis height using the ``--bn`` and ``--ystep`` options, respectively::
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast-detail --gt $gt --ars $ars \
--scale_ratio 1 --ls 200 --sm 10 --lpp 3 \
--bn 5 --ystep 10
.. figure:: img/drompa-yeast-detail.jpg
:width: 600px
:align: center
:alt: Alternate
Checking the enrichment level by specifying the number of y-axis memories and y-axis height.
Highlight peaks
+++++++++++++++++++++++++
With the ``--callpeak`` option, **PC_ENRICH** mode highlights in red the bins containing ChIP/Input enrichments above the enrichment threshold (2.0 by default)::
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
--callpeak \
-o drompa-yeast-ARS-peak1 --gt $gt --ars $ars \
--scale_ratio 1 --ls 200 --sm 10 --lpp 3
.. figure:: img/drompa_yeast-ARS-peak1.jpg
:width: 600px
:align: center
:alt: Alternate
Highlighting peaks for the default enrichment threshold.
In Fig. 3.12, the difference of replicated regions between the samples is more pronounced.
To change the enrichment threshold, supply ``--ethre`` as follows::
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
--callpeak --ethre 1.5 \
-o drompa-yeast-ARS-peak2 --gt $gt --ars $ars \
--scale_ratio 1 --ls 200 --sm 10 --lpp 3
.. figure:: img/drompa_yeast-ARS-peak2.jpg
:width: 600px
:align: center
:alt: Alternate
Highlighting peaks for a specified enrichment threshold.
Log-ratio distribution
+++++++++++++++++++++++++
Log-scaled ChIP/Input enrichment can be visualized by supplying ``--showratio 2``::
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast-log2ratio \
--gt $gt --ars $ars \
--showratio 2 --scale_ratio 2 \
--ls 200 --sm 10 --bn 4 --lpp 3 \
--chr I
where ``--chr I`` is supplied to generate the PDF file for chrI only. ``--bn 4`` is supplied to increase the number of y-axis memories.
.. figure:: img/drompa-yeast-log2ratio.jpg
:width: 600px
:align: center
:alt: Alternate
Visualization of log-scaled enrichment.
In this mode, ``--scale_ratio`` indicates the base of the logarithm. To use log10, specify ``--scale_ratio 10``::
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast-log10ratio \
--gt $gt --ars $ars \
--showratio 2 --scale_ratio 10 \
--ls 200 --sm 10 --bn 4 --lpp 3 \
--chr I
.. figure:: img/drompa-yeast-log10ratio.jpg
:width: 600px
:align: center
:alt: Alternate
Visualization of log-scaled enrichment for log10.
Use the ``--callpeak`` option to change colors between >1 and <1::
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast-log2ratio2 \
--gt $gt --ars $ars \
--showratio 2 --scale_ratio 2 \
--ls 200 --sm 10 --bn 4 --lpp 3 \
--callpeak \
--chr I
.. figure:: img/drompa-yeast-log2ratio2.jpg
:width: 600px
:align: center
:alt: Alternate
Visualization of log-scaled enrichment using the ``--callpeak`` option.