Oct 19, 2008

Chip-Seq

I have been working quite hard on Chip-Seq applications for Illumina (Solexa) data.


These boil down to four basic functions:
  • peak calling - taking sequence reads aligned to a reference genome and counting the number of hits per genome interval, subtracting background or a control lane, smoothing, cutting off shoulders, splitting double peaks, and coming up with some statistic that suggests that the peaks are real vs. false positives
  • annotation - finding the location of peaks on the genome as compared to known features, especially the transcription start sites of known genes
  • visualization - looking at peaks in one of the genome browsers 
  • motif detection - finding patterns of common bases within the peaks, comparing these patterns with known transcription factor binding sites

We have evaluated quite a few different pieces of software that supply various of these functions:

"An integrated software system for analyzing ChIP-chip and ChIP-seq data"
Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH.


FindPeaks
BC Cancer Agency: FindPeaks
This is a good peak finder, easy to use, with a reasonable statistical model (based on comparison of your genome mapped data vs. a MonteCarlo random distribution of tags)

SISSRS (Site Identification from Short Sequence Reads)
Makes use of +/- strand information in Chip-Seq reads to precisely identify transcription factor binding sites within a few tens of base pairs. 


written by Yong Zhang and Tao Liu from the lab of Shirley Liu at Harvard

C++ program (requires C++ compiler) - author Anton Valouev in Sidow lab at Stanford
Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data, Valouev, et al. Nature Methods 5, 829 - 834 (2008)



Peak finder and visualization via UCSB Genome Browser

MIT Integrative Genome Viewer
note the alignment processor that creates tag counts from Next-Gen aligned reads (such as Eland output files)

Web-based peak calling at the Swiss Institute of Bioinformatics


ChIPDiff - identification of differential histone modification sites by comparison of two ChIP-Seq libraries prepared from different tissues (various cell types, stages, or environmental responses). Uses a Hidden Markov Model to identify differences in ChIP tag counts.
http://bioinformatics.oxfordjournals.org/cgi/content/full/24/20/2344
Available from Genome Institue of Singapore


3 comments:

Habib Hamidi Travel and Services said...

Thank you Stuart. Just what I was looking for.

Anamika said...

Dear sir

i am working with chip-seq data.sir i have tried with SISSRS,QuEST,MACS,SICER.

Sir my problme is like ,i am not able to recognize files...like there are several file formats with me..all are chip-seq data...but i don't know whether this all files can i used with all softwares what i mentioned above ..sir please let me know what kind of data is this???

i know chip-seq data always present in following format

chr4 130135336 130135360 U0 0 -
chr1 110547319 110547343 U0 0 -
chr2 71081880 71081904 U0 0 +

I used SISSRS for such files (bed files)


now there are other formats also like

1 E2H2.aligned.txt
chr13 81419432 81419468 + 205E9.6.559265 2
chr11 44462781 44462817 + 205E9.6.559267 0

2.densities.txt

chr1 25 -1
chr1 50 -1
chr1 75 -1
chr1 100 -1
chr1 125 -1
chr1 150 -1
chr1 175 -1
chr1 200 -1

3.chip3034_multi_hg18.txt

AGAGTGTTTCAAACCTGCTCCATGAA 13000 13
AGACGAAGTCTCACTCTGTCACCCAG 13000 164
ATTCCATTCCACTCTGTTCCATTCCA 11953 24

i used this file format for QuEST

4.bed file

chr1 454 489 CCTAACCCTAACCCTCGCGGTACCCTCAGCCGGCC 0 + - - 0,0,255
chr1 512 547 TTTCGGTGGTACTCTGAAGGCGGAGCACAGTTCTC 0 - - - 255,0,0
5.bam files(these files are not opening in my system)

6.bed files .
6 38662156 38662189 +
8 102050882 102050916 +
16 16805607 16805640 -

7..bed files

chr1 564621 564687 . 0 . 5.575970 3.58854 -1
chr1 569893 569962 . 0 . 7.441230 6.19321 -1
chr1 712868 713455 . 0 . 11.857200 11.4429 -1


8.peaks.txt

chr1 6216808 6219103 985 186 5.29979577395856 799 1.34744732317805e-129
chr6 158010381 158011325 686 65 10.5893955160332 621 1.43057401891788e-129

9.bed file

chr1 5319 6069
chr1 15612 16329


10.bed file


chr14 68535052 68535087 Neg2 1 - 68535052 68535087 153,255,153
chr10 72774109 72774144 Neg3 1 - 72774109 72774144 153,255,153

msoumya.info said...

Thank you for sharing your idea. Can you tell me where the Chip sequence operation is used?