3.2. PC_ENRICH: Enrichment visualization

For a small genome (e.g., yeast), the sequencing depth is generally enough (> 10-fold). In such cases, the genome-wide ChIP/Input enrichment distribution is informative because the technical and biological bias in high throughput sequencing can be minimized.

Here, we show an example according to the sample script sample.yeast.sh, which can be found in the “tutorial” directory.

3.2.1. Downloading data

Here, we use the data of replication analysis (Repli-seq) for S. cerevisiae, which can be treated in the same manner as ChIP-seq. The original paper is: Origin Association of Sld3, Sld7, and Cdc45 Proteins Is a Key Step for Determination of Origin-Firing Timing

The CRAM-format map files can be downloaded from our Google Drive account:

3.2.2. Parse2wig

The command below generates a bigWig data for the six CRAM files:

gt=../data/genometable/genometable.sacCer3.txt
mptable=../data/mptable/mptable.UCSC.sacCer3.50mer.flen150.txt
for cell in YST1019_Gal YST1019_Raf YST1053_Gal; do
   for min in 0min 60min; do
       cram=${cell}_${min}-n2-k1.sort.cram
       parse2wig+ -i $cram  -o ${cell}_${min} --gt $gt --mptable $mptable -n GR
   done
done

3.2.3. Generating the enrichment distribution

To generate a PDF file of the enrichment distribution for S. cerevisiae with the gene annotation, type:

$ dir=parse2wigdir+
$ gene=../data/S_cerevisiae/SGD_features.tab
$ drompa+ PC_ENRICH \
      -i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
      -i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
      -i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
      -o drompa-yeast --gt $gt -g $gene --gftype 2 \
      --scale_ratio 1 --ls 200 --sm 10 --lpp 3
Alternate

Fig. 3.10 Generation of the enrichment distribution of S. cerevisiae.

Supply the --ars option to visualize the DNA replication origin (ARS) available for S. cerevisiae and S. pombe. The annotation data can be obtained from OriDB.:

$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
      -i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
      -i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
      -i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
      -o drompa-yeast-ARS --gt $gt --ars $ars \
      --scale_ratio 1 --ls 200 --sm 10 --lpp 3
Alternate

Fig. 3.11 Visualization of the DNA replication origin available for S. cerevisiae.

To check the enrichment level accurately, specify the number of y-axis memories and y-axis height using the --bn and --ystep options, respectively:

$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
      -i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
      -i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
      -i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
      -o drompa-yeast-detail --gt $gt --ars $ars \
      --scale_ratio 1 --ls 200 --sm 10 --lpp 3 \
      --bn 5 --ystep 10
Alternate

Fig. 3.12 Checking the enrichment level by specifying the number of y-axis memories and y-axis height.

3.2.4. Highlight peaks

With the --callpeak option, PC_ENRICH mode highlights in red the bins containing ChIP/Input enrichments above the enrichment threshold (2.0 by default):

$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
      -i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
      -i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
      -i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
      --callpeak \
      -o drompa-yeast-ARS-peak1 --gt $gt --ars $ars \
      --scale_ratio 1 --ls 200 --sm 10 --lpp 3
Alternate

Fig. 3.13 Highlighting peaks for the default enrichment threshold.

In Fig. 3.12, the difference of replicated regions between the samples is more pronounced. To change the enrichment threshold, supply --ethre as follows:

$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
      -i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
      -i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
      -i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
      --callpeak --ethre 1.5 \
      -o drompa-yeast-ARS-peak2 --gt $gt --ars $ars \
      --scale_ratio 1 --ls 200 --sm 10 --lpp 3
Alternate

Fig. 3.14 Highlighting peaks for a specified enrichment threshold.

3.2.5. Log-ratio distribution

Log-scaled ChIP/Input enrichment can be visualized by supplying --showratio 2:

$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
      -i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
      -i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
      -i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
      -o drompa-yeast-log2ratio \
      --gt $gt --ars $ars \
      --showratio 2 --scale_ratio 2 \
      --ls 200 --sm 10 --bn 4 --lpp 3 \
      --chr I

where --chr I is supplied to generate the PDF file for chrI only. --bn 4 is supplied to increase the number of y-axis memories.

Alternate

Fig. 3.15 Visualization of log-scaled enrichment.

In this mode, --scale_ratio indicates the base of the logarithm. To use log10, specify --scale_ratio 10:

$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
      -i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
      -i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
      -i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
      -o drompa-yeast-log10ratio \
      --gt $gt --ars $ars \
      --showratio 2 --scale_ratio 10 \
      --ls 200 --sm 10 --bn 4 --lpp 3 \
      --chr I
Alternate

Fig. 3.16 Visualization of log-scaled enrichment for log10.

Use the --callpeak option to change colors between >1 and <1:

$ dir=parse2wigdir+
$ ars=../data/S_cerevisiae/ARS-oriDB_scer.txt
$ drompa+ PC_ENRICH \
      -i $dir/YST1019_Gal_60min.100.bw,$dir/YST1019_Gal_0min.100.bw,YST1019_Gal,,,200 \
      -i $dir/YST1019_Raf_60min.100.bw,$dir/YST1019_Raf_0min.100.bw,YST1019_Raf,,,200 \
      -i $dir/YST1053_Gal_60min.100.bw,$dir/YST1053_Gal_0min.100.bw,YST1053_Gal,,,200 \
      -o drompa-yeast-log2ratio2 \
      --gt $gt --ars $ars \
      --showratio 2 --scale_ratio 2 \
      --ls 200 --sm 10 --bn 4 --lpp 3 \
      --callpeak \
      --chr I
Alternate

Fig. 3.17 Visualization of log-scaled enrichment using the --callpeak option.