These boil down to four basic functions:
- peak calling - taking sequence reads aligned to a reference genome and counting the number of hits per genome interval, subtracting background or a control lane, smoothing, cutting off shoulders, splitting double peaks, and coming up with some statistic that suggests that the peaks are real vs. false positives
- annotation - finding the location of peaks on the genome as compared to known features, especially the transcription start sites of known genes
- visualization - looking at peaks in one of the genome browsers
- motif detection - finding patterns of common bases within the peaks, comparing these patterns with known transcription factor binding sites
We have evaluated quite a few different pieces of software that supply various of these functions:
"An integrated software system for analyzing ChIP-chip and ChIP-seq data"
Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH.
FindPeaks
This is a good peak finder, easy to use, with a reasonable statistical model (based on comparison of your genome mapped data vs. a MonteCarlo random distribution of tags)
SISSRS (Site Identification from Short Sequence Reads)
Makes use of +/- strand information in Chip-Seq reads to precisely identify transcription factor binding sites within a few tens of base pairs.
written by Yong Zhang and Tao Liu from the lab of Shirley Liu at Harvard
C++ program (requires C++ compiler) - author Anton Valouev in Sidow lab at Stanford
Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data, Valouev, et al. Nature Methods 5, 829 - 834 (2008)Wold Lab software suite (@ Caltech)
Peak finder and visualization via UCSB Genome Browser
MIT Integrative Genome Viewer
note the alignment processor that creates tag counts from Next-Gen aligned reads (such as Eland output files)
Web-based peak calling at the Swiss Institute of Bioinformatics
ChIPDiff - identification of differential histone modification sites by comparison of two ChIP-Seq libraries prepared from different tissues (various cell types, stages, or environmental responses). Uses a Hidden Markov Model to identify differences in ChIP tag counts.
http://bioinformatics.oxfordjournals.org/cgi/content/full/24/20/2344Available from Genome Institue of Singapore
4 comments:
Thank you Stuart. Just what I was looking for.
Dear sir
i am working with chip-seq data.sir i have tried with SISSRS,QuEST,MACS,SICER.
Sir my problme is like ,i am not able to recognize files...like there are several file formats with me..all are chip-seq data...but i don't know whether this all files can i used with all softwares what i mentioned above ..sir please let me know what kind of data is this???
i know chip-seq data always present in following format
chr4 130135336 130135360 U0 0 -
chr1 110547319 110547343 U0 0 -
chr2 71081880 71081904 U0 0 +
I used SISSRS for such files (bed files)
now there are other formats also like
1 E2H2.aligned.txt
chr13 81419432 81419468 + 205E9.6.559265 2
chr11 44462781 44462817 + 205E9.6.559267 0
2.densities.txt
chr1 25 -1
chr1 50 -1
chr1 75 -1
chr1 100 -1
chr1 125 -1
chr1 150 -1
chr1 175 -1
chr1 200 -1
3.chip3034_multi_hg18.txt
AGAGTGTTTCAAACCTGCTCCATGAA 13000 13
AGACGAAGTCTCACTCTGTCACCCAG 13000 164
ATTCCATTCCACTCTGTTCCATTCCA 11953 24
i used this file format for QuEST
4.bed file
chr1 454 489 CCTAACCCTAACCCTCGCGGTACCCTCAGCCGGCC 0 + - - 0,0,255
chr1 512 547 TTTCGGTGGTACTCTGAAGGCGGAGCACAGTTCTC 0 - - - 255,0,0
5.bam files(these files are not opening in my system)
6.bed files .
6 38662156 38662189 +
8 102050882 102050916 +
16 16805607 16805640 -
7..bed files
chr1 564621 564687 . 0 . 5.575970 3.58854 -1
chr1 569893 569962 . 0 . 7.441230 6.19321 -1
chr1 712868 713455 . 0 . 11.857200 11.4429 -1
8.peaks.txt
chr1 6216808 6219103 985 186 5.29979577395856 799 1.34744732317805e-129
chr6 158010381 158011325 686 65 10.5893955160332 621 1.43057401891788e-129
9.bed file
chr1 5319 6069
chr1 15612 16329
10.bed file
chr14 68535052 68535087 Neg2 1 - 68535052 68535087 153,255,153
chr10 72774109 72774144 Neg3 1 - 72774109 72774144 153,255,153
Thank you for sharing your idea. Can you tell me where the Chip sequence operation is used?
Greaat reading
Post a Comment