3.2. PC_ENRICH: Enrichment visualization
For a small genome (e.g., yeast), the sequencing depth is generally enough (> 10-fold). In such cases, the genome-wide ChIP/Input enrichment distribution is informative because the technical and biological bias in high throughput sequencing can be minimized.
Here, we show an example according to the sample script sample.yeast.sh, which can be found in the “tutorial” directory.
3.2.1. Downloading data
Here, we use the data of replication analysis (Repli-seq) for S. cerevisiae, which can be treated in the same manner as ChIP-seq. The original paper is: Origin Association of Sld3, Sld7, and Cdc45 Proteins Is a Key Step for Determination of Origin-Firing Timing
The CRAM-format map files can be downloaded from our Google Drive account:
3.2.2. Parse2wig
The command below generates a bigWig data for the six CRAM files:
gt=../data/genometable/genometable.sacCer3.txt
mptable=../data/mptable/mptable.UCSC.sacCer3.50mer.flen150.txt
for cell in YST1019_Gal YST1019_Raf YST1053_Gal; do
for min in 0min 60min; do
cram=${cell}_${min}-n2-k1.sort.cram
parse2wig+ -i $cram -o ${cell}_${min} --gt $gt --mptable $mptable -n GR
done
done
3.2.3. Generating the enrichment distribution
To generate a PDF file of the enrichment distribution for S. cerevisiae with the gene annotation, type:
$ dir=parse2wigdir+
$ gene=../data/S_cerevisiae/SGD_features.tab
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast --gt $gt -g $gene --gftype 2 \
--scale_ratio 1 --ls 200 --sm 10 --lpp 3
Fig. 3.10 Generation of the enrichment distribution of S. cerevisiae.
Supply the --ars option to visualize the DNA replication origin (ARS) available for S. cerevisiae and S. pombe.
The annotation data can be obtained from OriDB.:
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast-ARS --gt $gt --ars $ars \
--scale_ratio 1 --ls 200 --sm 10 --lpp 3
Fig. 3.11 Visualization of the DNA replication origin available for S. cerevisiae.
To check the enrichment level accurately, specify the number of y-axis memories and y-axis height using the --bn and --ystep options, respectively:
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast-detail --gt $gt --ars $ars \
--scale_ratio 1 --ls 200 --sm 10 --lpp 3 \
--bn 5 --ystep 10
Fig. 3.12 Checking the enrichment level by specifying the number of y-axis memories and y-axis height.
3.2.4. Highlight peaks
With the --callpeak option, PC_ENRICH mode highlights in red the bins containing ChIP/Input enrichments above the enrichment threshold (2.0 by default):
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
--callpeak \
-o drompa-yeast-ARS-peak1 --gt $gt --ars $ars \
--scale_ratio 1 --ls 200 --sm 10 --lpp 3
Fig. 3.13 Highlighting peaks for the default enrichment threshold.
In Fig. 3.12, the difference of replicated regions between the samples is more pronounced.
To change the enrichment threshold, supply --ethre as follows:
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
--callpeak --ethre 1.5 \
-o drompa-yeast-ARS-peak2 --gt $gt --ars $ars \
--scale_ratio 1 --ls 200 --sm 10 --lpp 3
Fig. 3.14 Highlighting peaks for a specified enrichment threshold.
3.2.5. Log-ratio distribution
Log-scaled ChIP/Input enrichment can be visualized by supplying --showratio 2:
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast-log2ratio \
--gt $gt --ars $ars \
--showratio 2 --scale_ratio 2 \
--ls 200 --sm 10 --bn 4 --lpp 3 \
--chr I
where --chr I is supplied to generate the PDF file for chrI only. --bn 4 is supplied to increase the number of y-axis memories.
Fig. 3.15 Visualization of log-scaled enrichment.
In this mode, --scale_ratio indicates the base of the logarithm. To use log10, specify --scale_ratio 10:
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast-log10ratio \
--gt $gt --ars $ars \
--showratio 2 --scale_ratio 10 \
--ls 200 --sm 10 --bn 4 --lpp 3 \
--chr I
Fig. 3.16 Visualization of log-scaled enrichment for log10.
Use the --callpeak option to change colors between >1 and <1:
$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
-i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
-i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
-i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
-o drompa-yeast-log2ratio2 \
--gt $gt --ars $ars \
--showratio 2 --scale_ratio 2 \
--ls 200 --sm 10 --bn 4 --lpp 3 \
--callpeak \
--chr I
Fig. 3.17 Visualization of log-scaled enrichment using the --callpeak option.