<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4457216402399127579</id><updated>2012-01-12T04:54:42.338-05:00</updated><category term='Introduction'/><category term='sequencing'/><category term='consumer genetic tests'/><category term='clinical genetics'/><title type='text'>Next-Gen Sequencing</title><subtitle type='html'>A working guide to the rapidly developing world of Next-Generation DNA sequencing, with an emphasis on bioinformatics</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>30</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-244481811097220083</id><published>2011-12-15T16:08:00.000-05:00</published><updated>2011-12-15T16:08:42.737-05:00</updated><title type='text'>Job Opening: Sequencing Informatics Scientist</title><content type='html'>One of my very good informatics people is leaving at the end of the year, so I have a job vacancy to fill in our Sequencing Informatics unit (funded as an Institutional core to support our Nex-Gen Sequencing Lab and our investigators, not from any one grant). I want someone with either a Masters and some experience with Next-Gen sequencing informatics, or a PhD. (bioinformatics or computer science, or something similar) who is looking for a more stable, service oriented position, rather than the usual highly competitive postdoc. &amp;nbsp;There will be opportunity for both collaborative and independent work on various projects, and publications are expected. UNIX/Perl/Java skills are necessary.&lt;br /&gt;&lt;br /&gt;The job previously involved informatics support for 454 sequencing, but that turned out to be less than 30% of the actual work. Looking forward, our Microbiome work will be done mostly on Illumina, bacterial genomes on Illumina... you get the idea. Send cv's to stuart.brown@gmail.com&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-244481811097220083?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/244481811097220083/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=244481811097220083' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/244481811097220083'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/244481811097220083'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2011/12/job-opening-sequencing-informatics.html' title='Job Opening: Sequencing Informatics Scientist'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-3281395983780415461</id><published>2011-10-25T11:45:00.001-04:00</published><updated>2011-10-25T11:46:28.162-04:00</updated><title type='text'>Sequence Squeeze</title><content type='html'>The storage of NGS data has reached and passed the critical point. The owner of a HiSeq machine can expect to generate hundreds of Terabytes per year. Even more critical than the current large data volumes is the trend over the next few years - sequencing will grow faster and cheaper much more rapidly than hard drives. Current trends show the doubling of drive capacity (at a constant cost) every 18 months, but the doubling of sequencing output (also at constant cost) every 5 months. So you can expect to pay 3X more for NGS data storage every year.&lt;br /&gt;&lt;br /&gt;The Pistoia Alliance, a trade group that includes most of the big Pharma companies and a bunch of software/informatics companies (but no sequencing machine vendors), has proposed a "Sequence Squeeze" challenge with a prize of $15,000&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: 'Myriad Pro', Myriad, 'MS Trebuchet', 'Trebuchet MS', Trebuchet, Helvetica, sans-serif; font-size: 14px; line-height: 23px;"&gt;for the best novel open-source NGS compression algorithm. Nice.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Myriad Pro', Myriad, 'MS Trebuchet', 'Trebuchet MS', Trebuchet, Helvetica, sans-serif; font-size: medium;"&gt;&lt;span class="Apple-style-span" style="font-size: 14px; line-height: 23px;"&gt;&lt;a href="http://www.sequencesqueeze.org/index.html"&gt;www.sequencesqueeze.org&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Myriad Pro', Myriad, 'MS Trebuchet', 'Trebuchet MS', Trebuchet, Helvetica, sans-serif; font-size: medium;"&gt;&lt;span class="Apple-style-span" style="font-size: 14px; line-height: 23px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Myriad Pro', Myriad, 'MS Trebuchet', 'Trebuchet MS', Trebuchet, Helvetica, sans-serif; font-size: medium;"&gt;&lt;span class="Apple-style-span" style="font-size: 14px; line-height: 23px;"&gt;I think the basic outline of a solution has already been published in this paper by&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: arial, helvetica, sans-serif; font-size: 12px; line-height: 18px;"&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22Hsi-Yang%20Fritz%20M%22%5BAuthor%5D" style="border-bottom-color: initial; border-bottom-style: initial; border-bottom-width: 0px; color: #333333; text-decoration: underline;"&gt;Hsi-Yang Fritz&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22Leinonen%20R%22%5BAuthor%5D" style="border-bottom-color: initial; border-bottom-style: initial; border-bottom-width: 0px; color: #333333; text-decoration: underline;"&gt;Leinonen&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22Cochrane%20G%22%5BAuthor%5D" style="border-bottom-color: initial; border-bottom-style: initial; border-bottom-width: 0px; color: #333333; text-decoration: underline;"&gt;Cochrane&lt;/a&gt;, and&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22Birney%20E%22%5BAuthor%5D" style="border-bottom-color: initial; border-bottom-style: initial; border-bottom-width: 0px; color: #333333; text-decoration: underline;"&gt;Birney&lt;/a&gt;:&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Myriad Pro', Myriad, 'MS Trebuchet', 'Trebuchet MS', Trebuchet, Helvetica, sans-serif; font-size: medium;"&gt;&lt;span class="Apple-style-span" style="font-size: 14px; line-height: 23px;"&gt;&lt;span class="Apple-style-span" style="font-family: arial, helvetica, sans-serif; font-size: 12px; line-height: 18px;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Myriad Pro', Myriad, 'MS Trebuchet', 'Trebuchet MS', Trebuchet, Helvetica, sans-serif; font-size: medium;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Myriad Pro', Myriad, 'MS Trebuchet', 'Trebuchet MS', Trebuchet, Helvetica, sans-serif; font-size: medium;"&gt;&lt;h1 style="font-size: 1.3333em; font-weight: bold; line-height: 1.125em; margin-bottom: 0.375em; margin-left: 0px; margin-right: 0px; margin-top: 0.375em;"&gt;&lt;span class="highlight"&gt;Efficient&lt;/span&gt;&amp;nbsp;&lt;span class="highlight"&gt;storage&lt;/span&gt;&amp;nbsp;of&amp;nbsp;&lt;span class="highlight"&gt;high&lt;/span&gt;&amp;nbsp;&lt;span class="highlight"&gt;throughput&lt;/span&gt;&amp;nbsp;&lt;span class="highlight"&gt;DNA&lt;/span&gt;&amp;nbsp;&lt;span class="highlight"&gt;sequencing&lt;/span&gt;&amp;nbsp;&lt;span class="highlight"&gt;data&lt;/span&gt;&amp;nbsp;using&amp;nbsp;&lt;span class="highlight"&gt;reference-based&lt;/span&gt;&amp;nbsp;&lt;span class="highlight"&gt;compression&lt;/span&gt;.&lt;/h1&gt;&lt;/span&gt;&lt;br /&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21245279"&gt;http://www.ncbi.nlm.nih.gov/pubmed/21245279&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Their basic idea is to reduce the amount of data stored that exactly reproduces a Reference Genome. Why store the same invariant data over and over again? Just save the interesting differences, and the quality scores near these differences.&lt;br /&gt;&lt;br /&gt;First align all reads to a Reference Genome, then compress high quality reads (all bases Q&amp;gt;20) that perfectly match the Reference down to just a start position and a length. For Illumina reads, all the read lengths are the same, so that value just needs to be saved once for the entire data file. The aligned reads are sorted and indexed, so the position of each read can be marked just as an increment from the previous read. Groups of identical reads can be replaced by a count.&lt;br /&gt;&lt;br /&gt;For reads that do not perfectly match the Ref. Genome, there may still be stretches of high quality matching bases. These can be represented by a set of start-stop coordinates with respect to the read start position, then an efficient formula to store differences for non-matching bases and the qualities of surrounding bases. &amp;nbsp;Many such variant summaries already exist.&lt;br /&gt;&lt;br /&gt;Another interesting idea is to use many different Reference Genomes (for humans), and match each sample to the most similar Reference. This might reduce the number of common variants observed by anything from 2x to 10X.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-3281395983780415461?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/3281395983780415461/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=3281395983780415461' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/3281395983780415461'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/3281395983780415461'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2011/10/sequence-squeeze.html' title='Sequence Squeeze'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-3873097829931683630</id><published>2011-10-05T10:29:00.000-04:00</published><updated>2011-10-05T10:29:57.250-04:00</updated><title type='text'>Exome: A Goldrush in Clinical Sequencing</title><content type='html'>&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;o:DocumentProperties&gt;   &lt;o:Template&gt;Normal.dotm&lt;/o:Template&gt;   &lt;o:Revision&gt;0&lt;/o:Revision&gt;   &lt;o:TotalTime&gt;0&lt;/o:TotalTime&gt;   &lt;o:Pages&gt;1&lt;/o:Pages&gt;   &lt;o:Words&gt;297&lt;/o:Words&gt;   &lt;o:Characters&gt;8044&lt;/o:Characters&gt;   &lt;o:Company&gt;NYU Langone Medical Center&lt;/o:Company&gt;   &lt;o:Lines&gt;297&lt;/o:Lines&gt;   &lt;o:Paragraphs&gt;297&lt;/o:Paragraphs&gt;   &lt;o:CharactersWithSpaces&gt;8430&lt;/o:CharactersWithSpaces&gt;   &lt;o:Version&gt;12.0&lt;/o:Version&gt;  &lt;/o:DocumentProperties&gt;  &lt;o:OfficeDocumentSettings&gt;   &lt;o:AllowPNG/&gt;  &lt;/o:OfficeDocumentSettings&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:WordDocument&gt;   &lt;w:Zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:TrackMoves&gt;false&lt;/w:TrackMoves&gt;   &lt;w:TrackFormatting/&gt;   &lt;w:PunctuationKerning/&gt;   &lt;w:DrawingGridHorizontalSpacing&gt;18 pt&lt;/w:DrawingGridHorizontalSpacing&gt;   &lt;w:DrawingGridVerticalSpacing&gt;18 pt&lt;/w:DrawingGridVerticalSpacing&gt;   &lt;w:DisplayHorizontalDrawingGridEvery&gt;0&lt;/w:DisplayHorizontalDrawingGridEvery&gt;   &lt;w:DisplayVerticalDrawingGridEvery&gt;0&lt;/w:DisplayVerticalDrawingGridEvery&gt;   &lt;w:ValidateAgainstSchemas/&gt;   &lt;w:SaveIfXMLInvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:IgnoreMixedContent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:AlwaysShowPlaceholderText&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:Compatibility&gt;    &lt;w:BreakWrappedTables/&gt;    &lt;w:DontGrowAutofit/&gt;    &lt;w:DontAutofitConstrainedTables/&gt;    &lt;w:DontVertAlignInTxbx/&gt;   &lt;/w:Compatibility&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:LatentStyles DefLockedState="false" LatentStyleCount="276"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;  &lt;!--[if gte mso 10]&gt; &lt;style&gt; /* Style Definitions */table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-ascii-font-family:Cambria; mso-hansi-font-family:Cambria;}&lt;/style&gt; &lt;![endif]--&gt;    &lt;!--StartFragment--&gt;  &lt;br /&gt;&lt;div class="MsoNormal" style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;A number of new companies have recently been created, or have refocused their primary business effort on opportunities in clinical sequencing and personalized medicine. This area has received a lot of speculative attention in the past few years, but the recent development of “Exome” sequencing technology has suddenly made it a practical area for commercial investment.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;There are several challenges in that must be overcome to make DNA sequencing a clinically relevant tool:&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;"&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span style="mso-bidi-font-family: Cambria;"&gt;&lt;span style="mso-list: Ignore;"&gt;1)&lt;span style="font: 7.0pt &amp;quot;Times New Roman&amp;quot;;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;the cost of the assay, which includes sample collection from the patient, sample preparation, and operation of the DNA sequencing machine&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;!--[if !supportLists]--&gt; &lt;!--[if !supportLineBreakNewLine]--&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br style="mso-special-character: line-break;" /&gt; &lt;!--[endif]--&gt;&lt;/span&gt;&lt;br /&gt;&lt;div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;"&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span style="mso-bidi-font-family: Cambria;"&gt;&lt;span style="mso-list: Ignore;"&gt;2)&lt;span style="font: 7.0pt &amp;quot;Times New Roman&amp;quot;;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;bioinformatics to identify sequence variants in the patient’s DNA&amp;nbsp;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;!--[if !supportLists]--&gt; &lt;!--[if !supportLineBreakNewLine]--&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br style="mso-special-character: line-break;" /&gt; &lt;!--[endif]--&gt;&lt;/span&gt;&lt;br /&gt;&lt;div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;"&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span style="mso-bidi-font-family: Cambria;"&gt;&lt;span style="mso-list: Ignore;"&gt;3)&lt;span style="font: 7.0pt &amp;quot;Times New Roman&amp;quot;;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;filtering and interpretation of sequence variants for clinical relevance – i.e. identify variants that provide information that directly impacts disease treatment decisions.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;!--[if !supportLists]--&gt;&lt;br /&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Exome sequencing addresses all three of these challenges. The exome is defined as the protein coding exons of genes, which make up approximately 50 MB of the human genome – about 1.5% of the entire genome. New sample preparation reagents make it possible to capture this portion of the genome in a single step in a single tube for less than $100. The current Illumina HiSeq sequencing machine produces about 20 Gbp per lane for about $1500, which is equivalent to 400X coverage of the exome. Since current bioinformatics methods require only 50-100X coverage for optimal discovery of sequence variants, this allows 4 to 8 samples to be multiplexed into a single lane. Therefore exome sequencing can be used to scan all of a patient’s genes for under $500 in sequencing and sample preparation costs. The $1000 genome is available right now. &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Since the exome is a much smaller amount of sequence than the entire genome, and it is focused on the best characterized regions, the task of identifying variants is simplified. The problem of false positives is reduced both by the smaller extent of sequence and by the deeper coverage (≥50X). The challenge of interpretation is also greatly reduced since exons are by definition protein coding. All exon sequence variants can be characterized as changing amino acids or not (or creating frameshifts &amp;amp;/or stop codons), and the likely impact on a protein of an amino acid change can be assessed by a number of existing algorithms. Most genes can be further characterized by existing knowledge about protein function such as metabolic and regulatory pathways, as well as databases of clinical genetic and pharmacogenetic information.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Since the technical ability to perform exome sequencing and basic discovery of sequence variants is available to anyone with a HiSeq machine (and a few skilled bioinformaticians), companies are currently trying to distinguish themselves with the clinical interpretation that they can offer. Some companies are skipping the sequencing entirely and focusing solely on the interpretation of clinical sequence data. &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;• &lt;b style="mso-bidi-font-weight: normal;"&gt;Ambry Genetics&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Ambry Genetics is the first laboratory to provide CLIA-approved exome services for applications in clinical diagnostics along with clinical interpretation and classification of variant data. The expert bioinformatics team makes Clinical Diagnostic Exome™ possible with a robust data analysis pipeline for Mendelian disease discovery.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;• &lt;b style="mso-bidi-font-weight: normal;"&gt;Knome&lt;/b&gt; Offers Whole-Genome Sequencing, Interpretation for $5K&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Founder: George Church&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: #222222;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;KnomeSelect, a targeted sequencing service that covers the exome, costs $24,500 for individuals. A comparative analysis of genomes includes a short list of suspect variants, genes and networks. Custom desktop software is provided for further analysis, including KnomeFinder for candidate variant discovery and KnomePathways for finding gene-gene interactions and gene networks. The company recently opened up its services to scientists interested in sequencing exomes or genomes of small numbers of humans as part of research studies.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;• &lt;b style="mso-bidi-font-weight: normal;"&gt;Personalis&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: 15.0pt; margin-bottom: .25in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Founders: &lt;span style="color: #222222;"&gt;Stanford founders are Russ Altman, chair of the bioengineering department; Euan Ashley, director of the Stanford Center for Inherited Cardiovascular Disease; Atul Butte, chief of the division of systems medicine at the department of pediatrics; and Michael Snyder, chair of the genetics department and director of the Stanford Center for Genomics and Personalized Medicine. John West, the former CEO of Solexa, is the new firm's CEO.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;“&lt;span style="color: #222222;"&gt;Its core capability will be the medical interpretation of human genomes. Personalis expects to work closely with a variety of sequencing technology and service providers — including Illumina, Complete Genomics, and others.” &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: #222222;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;• &lt;b style="mso-bidi-font-weight: normal;"&gt;Omica&lt;/b&gt; is a new startup company. It has developed and published the VAAST system for annotating sequence variants. VAAST is a probabilistic search tool that identifies disease-causing variants in genome sequence data. It combines elements from existing amino acid substitution and aggregative approaches that increase accuracy and make it easy to use. The tool can score both coding and non-coding variants, and evaluate rare and common variants. The platform, to be used for clinical annotations of both whole genomes and more targeted data such as exomes or gene panels, is currently in beta testing with several undisclosed collaborators. Besides VAAST, which generates disease candidate lists, the Omica service will also include annotation tools that will provide additional information about the role of the genes. Users can submit their genome sequence, and it puts all the clinical annotations on top of it. It also has an interface that can relate variants to diseases.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;• &lt;b style="mso-bidi-font-weight: normal;"&gt;GenomeQuest&lt;/b&gt; is a provider of cloud-based computing solutions for analysis of Next Generation sequencing data. The GQ-DxSM product analyzes and reports comprehensive genomic information about variations and changes in genes and proteins to improve disease treatment. The workflow can be used for Whole-Genome, Whole-Exome, and selected Gene Panels including:&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;- Automated transfer of raw data from sequencing machines&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;- Alignment of the reads against reference genomes&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;- Variant detection and annotation&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;- Mapping and documentation of variants against known inherited and somatic mutations&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;- Integration with other clinical data systems such as Electronic Health Records and therapy protocols to create a comprehensive patient diagnostic record&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Designed for academic research laboratories, diagnostics labs, IVD manufacturers, and pharmaceutical companion diagnostic groups, GQ-Dx is already being used in clinical research. In collaboration with GenomeQuest, pathologists at Beth Israel Deaconess Medical Center, a teaching hospital of Harvard Medical School, are developing “clinical grade” annotation methods and databases for cancer diagnoses. GenomeQuest has also created a GeneTests-based diagnostic panel that generates a comprehensive report on disease susceptibility, diagnosis, and treatment on more than 2,000 disorders from a single, whole-genome sequence of a patient.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;• &lt;b style="mso-bidi-font-weight: normal;"&gt;Foundation Medicine&lt;/b&gt; has narrowed the focus even further. They provide diagnostic exome sequencing of 300 cancer related genes on FFPE tumor samples submitted by clinical pathologists. They sequence these 300 genes to very deep coverage (500X) to allow detection of rare somatic variants in heterogeneous tumor tissue. The selected gene set is intended to include only genes with directly disease related functions that impact cancer treatment decisions. The test is intended to replace many different single-gene diagnostic tests currently on the market.&lt;span style="mso-spacerun: yes;"&gt;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: 16.0pt; margin-bottom: 10.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;b&gt;&lt;span style="color: #272727; font-family: Helvetica;"&gt;• &lt;/span&gt;&lt;span style="color: #272727;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;23andMe&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;span style="color: #272727;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt; has started a pilot program that offers full exome sequencing for $999. While the company’s regular personal genome service uses Illumina genotyping arrays with around 1 million SNPs (single nucleotide polymorphisms), the exome sequencing actually sequences around 50 million DNA bases with 80x coverage.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: 16.0pt; margin-bottom: 10.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="color: #272727;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Customers will get the raw data, without any additional reports, so it will only be useful to people who actually know how to handle this raw genetic data. 23andMe plans to eventually add a limited set of tools and content that utilize exome sequence data.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: #272727;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;23andMe is not the first company to offer whole-genome sequencing to consumers, but it is the first to do so at a sub-$1000 pricepoint. For hardcore bioscientists who know their way around raw genetic data, this is as good a deal as you can currently get.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;!--EndFragment--&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-3873097829931683630?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/3873097829931683630/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=3873097829931683630' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/3873097829931683630'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/3873097829931683630'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2011/10/exome-goldrush-in-clinical-sequencing.html' title='Exome: A Goldrush in Clinical Sequencing'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-5836235249603331945</id><published>2011-09-29T16:12:00.000-04:00</published><updated>2011-09-29T16:12:05.017-04:00</updated><title type='text'>Foundation Medicine grabs for the low-hanging fruit of NGS cancer diagnostics</title><content type='html'>I was at the CHI&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: Consolas; font-size: 15px;"&gt;APPLYING NEXT-GENERATION SEQUENCING conference in Providence RI, where I&lt;/span&gt;&amp;nbsp;heard an extremely interesting presentation from a new Genomics company called Foundation Medicine. This company plans to offer a clinical diagnostic test based on very deep sequencing of all exons from about 300 cancer related genes. They will sequence directly from pathologist's FFPE blocks using Illumina HiSeq to a depth of 500 to 1000X.&lt;br /&gt;&lt;br /&gt;Here is a recent poster they presented at ASCO, but the information at the CHI conference was updated and more in depth.&lt;br /&gt;&lt;a href="http://www.foundationmedicine.com/pdfs/ASCO%20Poster%202011%20handout%20-%20FINAL.pdf"&gt;ASCO poster&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here is why I think this is very important. First, this test will include all existing genes that are currently being tested for any type of cancer (BRCA1&amp;amp;2, KRAS, BRAF, HER2, EGFR, etc), but will include all exons and greater diagnostic sensitivity for mutations present in low abundance in heterogenous samples which may suffer from mixed tumor and normal tissue, multiple clones, mixed aneuploidy etc. It will likely also contain the majority of known pharmacogenomic genes. So this one test could put all the other providers of cancer related genetic tests out of business.&lt;br /&gt;&lt;br /&gt;It is also very important that the test is highly targeted only at "actionable" genes.&amp;nbsp;Foundation Med. plans to deliver a report for each patient (in 14 days) that lists all mutations observed in the diagnostic genes, as well as some key items drawn from the literature, clinical trials, and a curated knowledge base about treatments relevant to those genes. In the presentation, COO Kevin Krenitsky said that they typically found 2-3 mutated genes per patient. This is an amount of data that the oncologist or pathologist can reasonably be expected to deal with — rather than the hundreds to thousands of mutated genes with questionable to zero clinical implications that will be produced by whole genome sequencing.&lt;br /&gt;&lt;br /&gt;Another interesting discovery reported by Foundation Med. was that in a small number of cases (perhaps 5%), they found mutations for genes that were associated with a different type of cancer. This suggests the use of a non-traditional drug, possibly in combination with other more typical therapies, as an individualized treatment for that one patient. There are currently about 30 drugs for which genetic information can aid in treatment decisions, but this is clearly an area of intense development. Foundation Med. can easily modify its test to include any relevant new genes. We are clearly heading to the point where every cancer patient will benefit from an individualized genomics workup.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-5836235249603331945?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/5836235249603331945/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=5836235249603331945' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5836235249603331945'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5836235249603331945'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2011/09/foundation-medicine-grabs-for-low.html' title='Foundation Medicine grabs for the low-hanging fruit of NGS cancer diagnostics'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-6583389421099316872</id><published>2011-07-18T16:22:00.000-04:00</published><updated>2011-07-18T16:22:01.951-04:00</updated><title type='text'>GWAS vs Exome Sequencing</title><content type='html'>I learned something interesting today about the SNP arrays used for GWAS. There has been a lot of discussion about the nature of mutations/alleles discovered by GWAS studies in terms of the "common disease: common variant" hypothesis. It is clear that SNP arrays are designed to cover common variants - alleles that are present in at least 2% of the human population (or at least of some population). Contrary-wise, genome sequencing studies tend to focus on rare variants. In fact a number of recent studies show that major diseases such as cancer and autism tend to be associated with novel, very severe mutations in coding regions of genes.&lt;br /&gt;&lt;br /&gt;Now this is the interesting part. We took a look at the intersection between the Illumina 2.5 M SNP array and the regions targeted by the Agilent Sure Select exon enrichment kit. It turns out that only about 90K of the Illumina SNPs are in the exon regions. This matches up with Illumina's own annotation file showing that more than 80% of the SNPs on the array are intron or intergenic. &amp;nbsp;My human genetics colleague suggests that the SNP array targets sequence variants (alleles) with small effects, while the exon sequencing strategy targets mutations with large effects. So we can't really replace the SNP array with exome sequencing, they are looking at completely different things.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-6583389421099316872?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/6583389421099316872/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=6583389421099316872' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/6583389421099316872'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/6583389421099316872'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2011/07/gwas-vs-exome-sequencing.html' title='GWAS vs Exome Sequencing'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-100489810657643322</id><published>2011-06-28T16:05:00.001-04:00</published><updated>2011-06-28T16:36:29.330-04:00</updated><title type='text'>The False Discovery of Mutations by Sequencing</title><content type='html'>&amp;nbsp;&amp;nbsp; &amp;nbsp; I am amazed by the success reported in recent papers finding mutations by Next-Gen Sequencing in rare genetic diseases and cancer. In our lab, the sequence data for SNPs is messy and difficult to interpret. The basic problem is that&amp;nbsp;NGS data, particularly Illumina data in our case, contains a moderate level of sequencing errors. We get somewhere between 0.5% and 1% errors in our sequence reads from the GAII and HiSeq machines. This is not bad for many practical purposes (ChIPseq and RNAseq experiments have no trouble with this data) and this error level "is within specified operating parameters" according to Illumina Tech support. The errors are not random, they occur much more frequently at the ends of long (100 bp) reads. Some types of errors are systematic in all Illumina sequencing (&lt;a href="http://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2009.01353.x/full"&gt;A&amp;gt;T miscalls are most common&lt;/a&gt;), and other types of errors are common to a particular sample, run, or lane of sequence data. Also, when you are screening billions of bases, looking for mutations, rare overlaps of errors will occur.&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; So if sequence data contains errors, and the point of your experiment is to find mutations, then when you find a difference between your data and the reference genome (a variant), you had better make doubly sure that the difference is real. There is a lot of software designed to filter out real mutations (SNPs) from the random sequence errors. The basic idea is to first filter out bad, low quality bases using the built-in quality scores produced by the sequencer. Second, require that multiple reads show the same variant, and that the fraction of reads showing the variant makes sense in your experiment: 40-60% might be good for a heterozygous allele in a human germline sample, 10% or less might make sense if you are screening for a rare variant in a sample from a mixed population of cells. &amp;nbsp;Also, it is usually wise to filter out all common SNPs in the dbSNP database - we assume that these are not cancer causing, and they have a high likelihood of being present in healthy germline cells as well as tumor cells.&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; We have used the SNP calling tools in the Illumina CASAVA software, the &lt;a href="http://maq.sourceforge.net/maq-man.shtml"&gt;MAQ&lt;/a&gt; software package, similar tools in &lt;a href="http://samtools.sourceforge.net/"&gt;SAMtools&lt;/a&gt;, and recently the &lt;a href="http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit"&gt;GATK toolkit&lt;/a&gt;. In all cases, it is possible to tweak parameters to get a stringent set of predicted mutations, filtering out low quality bases, low frequency mutations, and SNPs that are near other types of genomic problems such as insertion/deletion sites, repetitive sequence, etc. Using their own tools Illumina has published data showing a false positive detection rate of 2.89% (&lt;a href="http://www.illumina.com/truseq/truth_in_data/sbl_comparison/false_positive_and_false_negative_data/false_positive_rates.ilmn"&gt;Illumina FP Rate&lt;/a&gt;). &amp;nbsp;Under many experimental designs, validating 97% of your predicted mutations would be excellent.&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; Unfortunately, our medical scientists don't want predicted SNPs vs. an arbitrary reference genome. They want to find mutations in cancer cells vs. the normal cells (germline or wild type) of the same patient. This is where all the tools seem to fall apart. When we run the same SNP detection tools on two NGS samples, and then look for the mutations that are unique to the tumor vs the wild type (WT), we get a list of garbage, thousands of lines long. We get stupid positions with 21% variant allele detected in tumor and 19% variant in WT. Or we get positions where the 80% variant allele frequency is not called as a SNP in WT because 2 out of 80 reads have a one base deletion near that base. So the stringent settings on our SNP discovery software create &lt;b&gt;FALSE NEGATIVES&lt;/b&gt; where we miss real SNPs in the WT genome, which then show up as tumor-specific mutations in our SNP discovery pipeline.&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &lt;a href="http://www.nyuinformatics.org/people/staff/zuojian-tang"&gt;Zuojian Tang&lt;/a&gt; is creating a post-SNP data filter that imposes a sanity check on the data based on allele frequencies. We are trying out various parameters, but something like a minimum of 40% variant in the tumor and less than 5% variant in the WT narrows the list of tumor-specific mutations down to a manageable number that could be validated by PCR or &lt;a href="http://www.sequenom.com/home/products---services/genetic-analysis/massarray-analyzer-4/"&gt;Sequenom&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-100489810657643322?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/100489810657643322/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=100489810657643322' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/100489810657643322'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/100489810657643322'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2011/06/false-discovery-of-mutations-by.html' title='The False Discovery of Mutations by Sequencing'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-5415744483713166315</id><published>2011-06-06T15:36:00.001-04:00</published><updated>2011-06-06T15:38:46.606-04:00</updated><title type='text'>Involve Bioinformatics in design of every experiment... please</title><content type='html'>Two interesting projects came through our informatics group last week, both in the 'data drop' mode were the investigator asks for help to analyze data as it comes off of the sequencers. I have noted many times before, that our informatics effort is much greater on the poorly designed and failed experiments. &lt;br /&gt;&lt;br /&gt;Experiment #1 was a seemingly standard SNP detection using exome sequences with 100 bp paired-end &amp;nbsp;reads on Illumina HiSeq (Agilent Sure Select capture) - the entire thing done by an private sequencing contractor. The contractor also supplied SNP calls using Illumina CASAVA software. Our job was simply to find overlaps between the SNP calls for various samples and controls, and to annotate the SNPs with genomic information (coding or non-coding, conservative mutations, biological pathways, etc). &amp;nbsp;However, we have an obsession with QC data, which the vendor was very reluctant to supply. Turns out that these sequencing reads have a 1.5% error rate, while our internal sequencing lab generates 0.5% error. We also see 10K novel SNPs in each sample with only minimal overlap across samples (a red flag for me). More QC data is extracted from the vendor, and now we see a steep increase in error at the ends of reads. So we wish to trim all reads down by 10-25% and recall SNPs - extract more files from vendor 3x (Illumina requires a LOT of runtime and intermediate files in order to run CASAVA for SNP calling).&lt;br /&gt;&lt;br /&gt;Meanwhile, Experiment #2 is an RNAseq project where the investigator is interested in alternative splicing. We analyzed one earlier data set with 50bp reads with only moderate success. It seems that very deep coverage is needed to get valid data for alt-splicing, especially when levels of a poorly expressed isoform are suspected to change by a small amount due to biological treatment. The investigator saw some published results suggesting that paired-end RNAseq data would provide more information about splicing isoforms. So, WITHOUT a bioinformatics consult, they sent an existing sample (created for 50bp single end sequencing) to the lab for 100 bp paired-end sequencing. This data came out of our pipeline with more than 20% error and a strange mix of incorrectly oriented read pairs (facing outward rather than inward). After a few days of head scratching and escalating levels of Illumina bioinformatics tech support, we have an explanation. A 225 bp library fragment contains 130 bp of primers and adapters. Thus the insert has an average size of about 95 bp. Some are shorter! &amp;nbsp;Thus, our 100 cyle reads go off the far end of most sequences, adding 5 or more bases of adapter sequence where the alignment software is expecting genomic sequence. In addition, the paired ends overlap more than 100% - so the start of one read is inside the end of the other. Thus they map in the opposite orientation, with an insert size of 5-10 bp. Our best effort to analyze this data will involve chopping all reads back to 36 bp and repeating the Paired-End analysis. So that was 3 days of bioinformatics analysis time not so well spent on forensic QC. &lt;br /&gt;&lt;br /&gt;Now we are looking back to Experiment #1 and wondering about insert sizes in that library. What if that library's insert size was about 110 or 120 bp (perhaps with a sizeable tail of much smaller fragments), and a fraction of the reads also run off into the adapter, adding mismatched bases at the ends of alignments, and thus jacking up the overall error rate.&lt;br /&gt;&lt;br /&gt;Two conclusions: 1) talk to bioinformatics BEFORE you build your sequencing libraries&lt;br /&gt;2) if you want something done right, do it yourself.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-5415744483713166315?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/5415744483713166315/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=5415744483713166315' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5415744483713166315'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5415744483713166315'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2011/06/involve-bioinformatics-in-design-of.html' title='Involve Bioinformatics in design of every experiment... please'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-7123260517996370976</id><published>2011-05-19T11:26:00.002-04:00</published><updated>2011-05-19T11:45:24.241-04:00</updated><title type='text'>$10K bioinformatics on thousand dollar genome</title><content type='html'>It is now possible to get 100x coverage of the exome sequence for a cancer sample (or any other type of human genomic sample) on one lane of an Illumina HiSeq machine. With the Sure Select 50 MB exome kit, it still costs quite a bit more than one thousand dollars to get this data, but it is getting close. At maximum yield, it might currently be possible to multiplex 4 samples into a singe lane and still get 100x coverage of each. This will certainly be true when planned upgrades to the HiSeq machine are available. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Illumina provides some nice software (called CASAVA) that is typically run at the default settings by Core labs and sequencing outsourcing companies. This software gives high-quality genome alignments and pretty good SNP calls - useful for many purposes. However, real-world research needs are often not satisfied with default automated bioinformatics analysis. Narrowing down hundreds of thousands of SNP calls to the few real disease-related mutations is difficult hands-on work for skilled bioinformaticians. Today in my lab group, we are fighting with false-negatives: SNPs that were present but not called in the germ line sample, leading to false identification of mutations unique to the tumor. It looks like we will have to re-run the SNP detection software many times with small changes in various parameters to optimize specificity vs. sensitivity in each sample. Investigators may sub-contract this type of work to the lab that does the sequencing, they may have skilled bioinformaticians in their lab group, or they may hire bioinformatics consultants. In any case, $1K of sequence data may cost more than $10K for analysis. &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-7123260517996370976?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/7123260517996370976/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=7123260517996370976' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/7123260517996370976'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/7123260517996370976'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2011/05/10k-bioinformatics-on-thousand-dollar.html' title='$10K bioinformatics on thousand dollar genome'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-754014510549468881</id><published>2011-02-15T13:24:00.007-05:00</published><updated>2011-02-15T15:05:50.393-05:00</updated><title type='text'>Archiving NGS data</title><content type='html'>Anyone who has worked with NextGen sequence data quickly gains an appreciation for the difficulties associated with long term data storage.  The current 'state of the art,'  at least for Illumina machines, involves saving some fairly raw data files such as fastq text to the NCBI Short Read Archive (SRA).  &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/Traces/sra/static/SRA_Submission_Guidelines.pdf"&gt;SRA_Submission_Guidelines.pdf&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Our GAIIx is producing about 30 million reads per lane, which gives files of 8-10 GB (72 cycles) per lane in either qseq (completely unfiltered) or fastq (quality scored) format. If we max out two runs per week, that is about 140 GB of raw sequence data per Illumina machine per week. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There has been some recent discussion about the possibility of phasing out the SRA at NCBI. &lt;/div&gt;&lt;div&gt;[see this &lt;a href="http://phylogenomics.blogspot.com/2011/02/though-i-generally-love-ncbi.html?showComment=1297359206321#c7246439301219289057"&gt;post&lt;/a&gt; which claims to be a memo from NCBI director David Lipman: "&lt;span class="Apple-style-span" style="font-family: 'Trebuchet MS', Verdana, Arial, sans-serif; font-size: 13px; color: rgb(51, 51, 51); line-height: 18px; "&gt;The Sequence Read Archive (SRA) will also be phased out over the next 12 months."&lt;/span&gt;]&lt;/div&gt;&lt;div&gt;If cost cutting is truly necessary for our national biomedical research infrastructure, I can see why the raw SRA data might be growing at an awkwardly rapid rate and have less value than the higly used databases of GenBank non-redundant nucleotide, GEO, etc. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I think that it is interesting to turn this discussion around and ask why are we archiving all of this raw sequence data? The trivial argument is that: Journals require open access to raw data as a condition of publication."  But that argument ignores the more interesting question: What is the 'raw data' for a sequencing project? No one is loading Illumina (or SOLID or 454) image data into public archives. The impracticality of saving multiple terabytes of image data for each run made that approach moot a couple of years ago. We are saving raw qseq or fastq files right now because our methods for  basecalling and SNP calling (and indel/translocation/copy number calling) are imprecise. I have seen data analysts go back into primary sequence reads for a single sample and find a SNP that was not called because a few reads had below threshold quality scores. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If we consider the actual "useful" data content of a NGS run on a single sample, the landscape looks quite different. ChIP-seq is our most common NGS application. The useful data from a ChIP-seq run is actually just a set of genome positions where read starts are mapped. At most, this is 20-30 million positions. In actuality, 30% of reads are not mapped, and another 10-50% are duplicates (multiple reads that map to the exact same position), so the final data set might be compressed to about 10 million genomic loci with a read count at each spot. After sorting and indexing, this information could be efficiently stored in a very compact file. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;RNA sequencing is becoming increasingly popular. Our clients are typically not interested in the sequence data itself, only in gene expression counts - essentially the same data as produced by a microarray. However, there are some cool new applications that look at alternative splicing, so we may have to keep the actual sequence reads on hand for a while longer. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Human (and mouse) SNP/indel/cnv detection is another popular NGS application. We are only really interested in the variants. However, SNP calling software requires both numbers of reads with reference vs. variant bases and quality scores for each basecall. Some software also uses context dependent quality metrics, such as distance from other SNPs, distance from indels, etc. Given the highly diverse collection of existing SNP detection software, and the likelihood of new software development, it seems impossible to compress this class of data to a set of variant calls and discard the raw reads. This is very unfortunate, since typical variant detection projects use anything from 20x to 50x coverage of the genome. So we are storing 150 GB of raw sequence data in order to track a few million bytes worth of actual variation in the genome of each research sample. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Other applications, such as de novo genome sequencing of new organisms, or metagenomic sequencing of environmental or medical samples will not be easily compressed. Fortunately, these data are currently archived in places other than the SRA. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-754014510549468881?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/754014510549468881/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=754014510549468881' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/754014510549468881'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/754014510549468881'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2011/02/archiving-ngs-data.html' title='Archiving NGS data'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-5295184608733511882</id><published>2011-01-24T11:24:00.004-05:00</published><updated>2011-01-24T12:08:23.057-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><category scheme='http://www.blogger.com/atom/ns#' term='clinical genetics'/><title type='text'>Genetic Disease Diagnostics with NGS</title><content type='html'>A couple of recent papers demonstrate a significant opportunity for the use of NextGen Sequencing in the diagnosis of genetic disease. &lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: rgb(34, 34, 34); line-height: 18px; "&gt;Dennis Lo et al, at &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Lucida Grande', arial, helvetica, sans-serif; font-size: 13px; color: rgb(51, 51, 51); line-height: 16px; "&gt;The Chinese University of Hong Kong, have published results for a NGS fetal genetic diagnostic test based on recovery of fragments of fetal DNA from the mother's blood plasma. Preliminary results show that complete coverage of the fetal diploid genome is possible &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Lucida Grande', arial, helvetica, sans-serif; font-size: 13px; color: rgb(51, 51, 51); line-height: 16px; "&gt;[&lt;a href="http://stm.sciencemag.org/content/2/61/61ra91.abstract"&gt;Science Translational Medicine&lt;/a&gt;] at a resolution that allows for differentiation of heterozygous vs. homozygous mutations in disease genes; and also that aneuploidy, such as trisomy 21 can be detected with high specificity and sensitivity [&lt;a href="http://www.bmj.com/content/342/bmj.c7401.full"&gt;British Medical Journal&lt;/a&gt;]. The key benefit of this approach is that it can be done non-invasively from simple blood draw from the mother, so it avoids the relatively high incidence of pregnancy complications created by amniocentesis or chorionic villus sampling procedures. &lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Lucida Grande', arial, helvetica, sans-serif; font-size: 13px; color: rgb(51, 51, 51); line-height: 16px; "&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Lucida Grande', arial, helvetica, sans-serif; font-size: 13px; color: rgb(51, 51, 51); line-height: 16px; "&gt;Meanwhile, the lab of Stephen Kingsmore at the US &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: rgb(34, 34, 34); line-height: 18px; "&gt;National Center for Genome Resources reported results of a targeted sequencing carrier screen for a total of 448 severe (rare) recessive genetic diseases [&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: rgb(34, 34, 34); line-height: 18px; "&gt;&lt;em style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; font-weight: inherit; font-style: italic; font-size: 14px; font-family: inherit; vertical-align: baseline; "&gt;&lt;a href="http://stm.sciencemag.org/content/3/65/65ra4.abstract"&gt;Science Translational Medicine&lt;/a&gt;&lt;/em&gt;&lt;span style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; font-weight: inherit; font-size: 14px; font-family: inherit; vertical-align: baseline; "&gt;]&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: rgb(34, 34, 34); line-height: 18px; "&gt;. This work is particularly significant because the screen is designed to work in multiplex, allowing for a potential total cost per patient of below $500 (less than $1 per disease screened). While each gene is rare in isolation, the combined screen shows an average of 2.8 mutations per individual tested in the proof-of-concept phase of the study. &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: rgb(34, 34, 34); line-height: 18px; "&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: rgb(34, 34, 34); line-height: 18px; "&gt;Taken together, these advances suggest that routine clinical applications of NGS will soon be practical, attractive, and economically feasible for large numbers of healthy people (pregnant women and marriage minded couples). This is great news for NGS equipment vendors, and also suggests a software engineering opportunity for the development of much more robust bioinformatics pipelines for processing this data and including it in electronic medical records. At the same time, I am worried that the lab folks may be progressing much more rapidly than the thinking in the ELSI community. What kind of databases will be created when every pregnancy and every marriage license is associated with gigabyte files of deep sequencing data? This issue is all the more problematic because disease carrier testing and Down syndrome screening are already so widely accepted. Changing prenatal tests to use sequencing in order to reduce complications in pregnancy, and adding pre-conception tests for diseases that were previously thought to be too rare to merit widespread screening are non-controversial medical advances. The downside might come from the unintentional discovery of other genetic information, the availability to law enforcement and other organizations of large files of genetic information on every person, etc. &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: rgb(34, 34, 34); line-height: 18px; "&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-5295184608733511882?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/5295184608733511882/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=5295184608733511882' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5295184608733511882'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5295184608733511882'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2011/01/genetic-disease-diagnostics-with-ngs.html' title='Genetic Disease Diagnostics with NGS'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-1418837235759653767</id><published>2010-12-06T10:37:00.004-05:00</published><updated>2010-12-06T10:54:16.266-05:00</updated><title type='text'>"NGS without bioinformatics expertise"</title><content type='html'>This showed up in my inbox today:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"   style="  ;font-family:arial, sans-serif;font-size:9.16667px;"&gt;&lt;a href="https://mail.nyumc.org/owa/redir.aspx?C=86086e7790ec4e48bf9f91cfb805ee4d&amp;amp;URL=http%3a%2f%2fwww.google.com%2furl%3fsa%3dX%26q%3dhttp%3a%2f%2fwww.pharmalive.com%2fNews%2findex.cfm%253Farticleid%253D747369%2526categoryid%253D20%26ct%3dga%26cad%3dCAEQAhgAIAAoATAAOABA-fHo5wRIAVAAWABiAmVu%26cd%3daOU5DXGvEXg%26usg%3dAFQjCNHDn5OoE9TrmJILDKuIbqf_0T_kJg" target="_blank" style="color: rgb(17, 17, 204); "&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;Integromics launches SeqSolve, a Next Generation Sequencing functional &lt;/span&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;...&lt;/span&gt;&lt;/b&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-size:-1;"&gt;&lt;a href="https://mail.nyumc.org/owa/UrlBlockedError.aspx" target="_blank" style="text-decoration: none; color: rgb(119, 119, 119); "&gt;PharmaLive.com (press release)&lt;/a&gt;&lt;br /&gt;SeqSolve is the first NGS analysis software on the market specifically developed for data interpretation without requiring &lt;b&gt;bioinformatics&lt;/b&gt; expertise. &lt;b&gt;...&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:arial, sans-serif;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:arial, sans-serif;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Might be a bit presumptious, but I would guess that a lot of scientists would like to have someone with "bioinformatics expertise" do data analysis for their Next-Gen Sequencing projects. But hey, if someone wants to spend $10-$50K on sequencing, but doesn't want an expert to look at the data, good luck with that. Our "Sequencing Informatics Group" at NYU is now taking outside consulting work for all types of NGS bioinformatics projects. It is sort of like fixing your Porche - you can go to the Foreign Car specialist mechanic, or you can go to PepBoys and buy some spark plugs and a wrench kit. &lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:arial, sans-serif;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:arial, sans-serif;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;We (Ross Smith) have been developing our own visualization toolkit for NGS. Latest version allows us to integrate RNAseq and ChIPseq with RefSeq or other annotation. Net result is much more accurate assignment of TF or histone modification sites to genes, and the ability to clearly see which of multiple TSS are actually being used in a particular sample/cell type. It is very beautiful. &lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:arial, sans-serif;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:arial, sans-serif;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 207px;" src="http://2.bp.blogspot.com/_kxDtwcQ6hpg/TP0EatRgn5I/AAAAAAAAABM/kan7Y6U4Kho/s400/CA9-2.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5547595172866465682" /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:arial, sans-serif;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-1418837235759653767?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/1418837235759653767/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=1418837235759653767' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/1418837235759653767'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/1418837235759653767'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2010/12/this-showed-up-in-my-inbox-today.html' title='&quot;NGS without bioinformatics expertise&quot;'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_kxDtwcQ6hpg/TP0EatRgn5I/AAAAAAAAABM/kan7Y6U4Kho/s72-c/CA9-2.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-4526333115589734616</id><published>2010-09-25T17:43:00.002-04:00</published><updated>2010-09-25T17:58:53.230-04:00</updated><title type='text'>ChIPseq QC</title><content type='html'>I am speaking at the CHI Next-Gen Sequencing conf in Providence 9.26.2010 (Sunday short course).  My topic is going to be about the role of bioinformatics in QC for NG sequencing, with examnples from ChIPseq, where I have the most experience.&lt;br /&gt;&lt;br /&gt;My main point is that the informatics team works hardest on experiments that produce poor data - or where the data contradict the investigator's expectations.  When the experiment is beautiful, then you can use your automated (or semi-automated) pipeline, and hand over the analyzed data with a standard report. For a transcription factor type ChIPseq, the standard result is a set of peaks with p-value and fold change vs. an input DNA, annotated by distance to the nearest gene's Transcription Start Site. If pressed, we can deliver this about 2 days after the sequencing run is complete.  For an epigenomics type ChIPseq (histone methylation, acetylation, etc) we deliver both peaks vs input DNA and some type of fold-change for each peak comparing one biological condition vs. another.&lt;br /&gt;&lt;br /&gt;However, we spend a lot more time squabbling about runs with high PCR duplication, weird artifacts, low yield, peaks in the input DNA lane, etc. To deal with this, we have been developing a variety of tools to quantify overall data quality in a ChIPseq run. We are looking as the overall clustering of mapped reads on the Reference genome (average spacing of adjacent/overlapping reads), as well as coverage at various depths. Some of these metrics make intersting graphs, but we have not completely pinned down their predictive power for understanding the data.&lt;br /&gt;&lt;br /&gt;We have recently been playing with selecting sets of genes based on external data  such as gene exprssion values from microarray or RNAseq experiments,  and looking at the aggregate profile of reads mapped near the TSS of groups of genes that are upregulated, downregulated, unchanged, etc. By combining reads for a bunch of genes, we get smooter curves and you can actually say fairly clearly that upregualted genes have (or do NOT have) a change in histone methylation near the TSS as compared with downreg or unchanged genes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-4526333115589734616?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/4526333115589734616/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=4526333115589734616' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/4526333115589734616'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/4526333115589734616'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2010/09/chipseq-qc.html' title='ChIPseq QC'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-8126836150212274538</id><published>2010-03-29T15:04:00.003-04:00</published><updated>2010-03-29T15:10:56.182-04:00</updated><title type='text'>Correlation does not equal causation</title><content type='html'>I am not a biostatistican, but I can play on on Blogger.  Attention science news writers:  "Correlation does not equal causation."  Can we institute a simple electroshock penalty for those who cite such fraudulent statistics? &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Check this out:&lt;/div&gt;&lt;div&gt;"&lt;span class="Apple-style-span" style="font-family: arial, helvetica, sans; font-size: 24px; line-height: 36px; "&gt;new study finds that older women who use multivitamins may be more likely than non-users to develop breast cancer"&lt;/span&gt;&lt;/div&gt;&lt;div&gt;http://www.reuters.com/article/idUSTRE62S4F520100329&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Is there any chance of a sampling bias in this study?  Any chance at all that the population of elderly women who take multivitamins may be different in any health parameters from the population that finds no need for such supplements?   Furthermore, is there any chance that the study subjects taking the vitamins are more likely to be examined more frequently and therefore more likely to have cancer detected? &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This reminds me of the brilliant study (cited from the grocery store tabloids by my sister-in-law) that diet soda makes you fat, because a study found that people who drink diet soda were more likely to be overweight than those who do not drink it. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-8126836150212274538?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/8126836150212274538/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=8126836150212274538' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/8126836150212274538'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/8126836150212274538'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2010/03/correlation-does-not-equal-causation.html' title='Correlation does not equal causation'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-5642117203845065841</id><published>2010-03-22T22:40:00.003-04:00</published><updated>2010-03-22T22:50:26.012-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='consumer genetic tests'/><title type='text'>Esther Dyson disrespects geneticists</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: georgia, 'times new roman', times, serif; font-size: 10px; color: rgb(51, 51, 51); line-height: 15px; "&gt;&lt;div class="columnGroup  first" style="width: auto !important; margin-bottom: 12px; clear: both; margin-right: 7px; margin-left: 10px; "&gt;&lt;div class="articleBody" style="margin-top: 1.5em; margin-bottom: 1.7em; "&gt;&lt;p style="margin-top: 0px; margin-right: 0px; margin-bottom: 1em; margin-left: 0px; font-size: 1.5em; line-height: 1.467em; color: rgb(0, 0, 0); "&gt;I was reading the NY Times on Sunday (3/21/2010). I do that. In the Business section there was an article about the relatively poor sales of the direct-to-consumer genetic tests offered by the startup company 23andMe. Other competitors in this market have done even worse. Several scientists interviewed in the article said basically that there is very little predictive medical value to these SNP profile tests. Then they interviewed Esther Dyson, who is apparently on the board of directors of 23andMe. She provided this clunker of a quote:&lt;/p&gt;&lt;p style="margin-top: 0px; margin-right: 0px; margin-bottom: 1em; margin-left: 0px; font-size: 1.5em; line-height: 1.467em; color: rgb(0, 0, 0); "&gt; Ms. Dyson called it “appallingly paternalistic,” to think consumers could not interpret genetic information without help of a doctor. “People can understand statistics about baseball,” she said, “and I think they ought to understand statistics about genetics.”&lt;/p&gt;&lt;p style="margin-top: 0px; margin-right: 0px; margin-bottom: 1em; margin-left: 0px; font-size: 1.5em; line-height: 1.467em; color: rgb(0, 0, 0); "&gt;Does this not trivialize all of the work of medical geneticists, biostatisticians, and bioinformaticians. Is our work really no more challenging than interpreting baseball statistics? Gee thanks Ms. Dyson. &lt;/p&gt;&lt;p style="margin-top: 0px; margin-right: 0px; margin-bottom: 1em; margin-left: 0px; font-size: 1.5em; line-height: 1.467em; color: rgb(0, 0, 0); "&gt;&lt;br /&gt;&lt;/p&gt;&lt;div&gt;&lt;span class="Apple-style-span"   style="font-size:130%;color:#000000;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 22px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;nyt_correction_bottom&gt;&lt;div class="articleCorrection" style="margin-bottom: 2.8em; "&gt;&lt;/div&gt;&lt;/nyt_correction_bottom&gt;&lt;nyt_update_bottom&gt;&lt;/nyt_update_bottom&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="columnGroup  " style="width: auto !important; margin-bottom: 12px; clear: both; margin-right: 7px; margin-left: 10px; "&gt;&lt;div class="articleFooter"&gt;&lt;div class="articleMeta"&gt;&lt;div class="opposingFloatControl wrap" style="display: block; "&gt;&lt;div class="element1" style="float: left; "&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-5642117203845065841?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/5642117203845065841/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=5642117203845065841' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5642117203845065841'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5642117203845065841'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2010/03/esther-dyson-disrespects-geneticists.html' title='Esther Dyson disrespects geneticists'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-3526041440052693528</id><published>2009-05-08T08:45:00.003-04:00</published><updated>2009-05-08T11:08:27.506-04:00</updated><title type='text'>Targeted Resequencing</title><content type='html'>Targeted &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;Resequencing&lt;/span&gt; is one area of the DNA sequencing landscape that has not yet been revolutionized by Next-Gen technologies.&lt;br /&gt;&lt;br /&gt;Targeted &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;resequencing&lt;/span&gt; typically investigates a few genes (or a few dozen) across large populations. The largest portion of the effort involves lots of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;PCR&lt;/span&gt; to collect all the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;exons&lt;/span&gt; — or in some projects entire gene regions, then  sequencing  each amplification product, while keeping track of which &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;PCR&lt;/span&gt; product comes from which individual. Even a small project - 10 genes, with 10 &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;exons&lt;/span&gt; each, on 100 individuals means 10,000 &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_6"&gt;PCR&lt;/span&gt; reactions, and 10,000 sequencing reactions (while keeping accurate track of 10,000 different DNA fragments and avoiding cross-contamination).&lt;br /&gt;&lt;br /&gt;The Next-Gen approach would amplify the genomic regions in larger chunks, combine all of the chunks from one individual together, then run the library prep protocol (fragment, attach linkers, etc).  So how does this play out in reality?&lt;br /&gt;&lt;br /&gt;I read a paper in Genome Biology yesterday (&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_7"&gt;Harismendy&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;et&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_9"&gt;al&lt;/span&gt; &lt;a href="http://genomebiology.com/2009/10/3/R32"&gt;http://genomebiology.com/2009/10/3/R32&lt;/a&gt;) about targeted sequencing. They looked at six genes, which were covered by 28 large  &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_10"&gt;PCR&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_11"&gt;amplicons&lt;/span&gt; (all &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_12"&gt;exons&lt;/span&gt; plus some &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_13"&gt;introns&lt;/span&gt;) which ranged in size from 3 to 14 &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_14"&gt;kb&lt;/span&gt;, for a total of 266 &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_15"&gt;kb&lt;/span&gt; of genomic DNA. These &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_16"&gt;PCR&lt;/span&gt; products were then combined, and used in the sample prep protocols for 454, &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_17"&gt;ABI&lt;/span&gt; SOLID, and &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_18"&gt;Illumina&lt;/span&gt; GA sequencing.  The same genes also were sequenced by standard Sanger methods using 273 short &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_19"&gt;PCR&lt;/span&gt; reactions (88 &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_20"&gt;kb&lt;/span&gt;).&lt;br /&gt;&lt;br /&gt;Overall, the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_21"&gt;NG&lt;/span&gt; seq methods showed distinct bias favoring the ends of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_22"&gt;PCR&lt;/span&gt; products, and required very high coverage (34-fold, 110-fold and 101-fold for Roche 454, &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_23"&gt;Illumina&lt;/span&gt; GA, and &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_24"&gt;ABI&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_25"&gt;SOLiD&lt;/span&gt;, respectively) to achieve a 10% false positive rate - false negative rates were much lower.&lt;br /&gt;&lt;br /&gt;Lets talk about costs.  Sanger sequencing costs from $3-10 per sample. I've got an Internet offer here for $4 per reaction, so lets use that for this study:&lt;br /&gt;&lt;br /&gt;Sanger:   $4 x 273 =  $1092 per individual&lt;br /&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_26"&gt;Illumina&lt;/span&gt; is about $1000 per sample plus about $300 per sample for the library prep kit.&lt;br /&gt;&lt;br /&gt;So I think they are about the same.&lt;br /&gt;&lt;br /&gt;However, the Next Gen methods come out far ahead if you multiplex a group of individuals together in the same sequencing reaction. This is not possible with Sanger methods since the sequence is read from the average of a large number of molecules. Then the question becomes how deep can you multiplex while still producing enough reads from each individual research subject to achieve the depth of coverage needed?  Our &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_27"&gt;Illumina&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_28"&gt;GAII&lt;/span&gt; currently produces about 2 million (usable) 35 &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_29"&gt;bp&lt;/span&gt; reads per lane, but we are &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_30"&gt;ramping&lt;/span&gt; up toward 5 million 50 &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_31"&gt;bp&lt;/span&gt; reads with the latest upgrades.   &lt;br /&gt;&lt;br /&gt;2 M  X  35 &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_32"&gt;bp&lt;/span&gt; = 70 M bases&lt;br /&gt;5 M  X  50 &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_33"&gt;bp&lt;/span&gt; = 250 M bases&lt;br /&gt;&lt;br /&gt;So for 250 &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_34"&gt;kb&lt;/span&gt; X  100x coverage = 25 M &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_35"&gt;bp&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So it looks like the current generation of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_36"&gt;NG&lt;/span&gt; machines do have a cost advantage over Sanger methods if you include 8, 10, or 12 X  multiplexing. Improved accuracy and reduced sampling bias (sample prep methods) could bring down the coverage requirements and increase the advantage of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_37"&gt;NG&lt;/span&gt; methods.&lt;br /&gt;&lt;br /&gt;I'd really like to hear some other opinions about this issue. We are writing several grant proposals for projects like these and I need some convincing arguments.&lt;br /&gt;&lt;br /&gt;—Stuart&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-3526041440052693528?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/3526041440052693528/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=3526041440052693528' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/3526041440052693528'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/3526041440052693528'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2009/05/targeted-resequencing.html' title='Targeted Resequencing'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-465352133144764724</id><published>2008-11-19T10:04:00.004-05:00</published><updated>2008-11-20T11:00:32.407-05:00</updated><title type='text'>Metagenomics of the effects of antibiotics on the human gut</title><content type='html'>&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;a href="http://biology.plosjournals.org/perlserv/?request=get-document&amp;amp;doi=10.1371%2Fjournal.pbio.0060280&amp;amp;ct=1&amp;amp;SESSID=dd86d748dd6a153ff711b9ea44203c7f"&gt;The Pervasive Effects of an Antibiotic on the Human Gut Microbiota, as Revealed by Deep 16S rRNA Sequencing&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Dethlefsen L, Huse S, Sogin ML, Relman DA&lt;br /&gt;PLoS Biology Vol. 6, No. 11, e280 doi:10.1371/journal.pbio.0060280&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A paper in PLOS Biology from the Relman lab investigates the effect of a treatment with the antibiotic ciprofloxacin on the bacteria in the intestine. They collected over 7,000 full-length 16S rDNA sequences (1100-1400 bp) by Sanger sequencing and over 900,000 reads (~250 bp) from 454 sequencing of the V3 and the V6 regions. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are many important results in this paper, but it is particularly relevant that 454 sequencing reveals more taxonomic variation with greater stability than traditional sequencing. In my own work, I have found that sequence variants that occur only once in the experiment cannot be used to differentiate samples. Deep sequencing reveals more taxa, and also reduces the frequency of singletons. A rare sequence variant (OTU) that occurs only once in the ~7000 full-length sequences occurs about 65 times in the 454 data set, providing more than enough "probability of detection" to be used for comparisons between samples. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;"This set of 7,208 sequences is among the largest datasets of full-length 16S rRNA sequences from the human microbiota (or any environment), the rarefaction curves for V6 and V3 tag pyrosequencing eventually rise higher and display more curvature toward the horizontal than the OTU0.01 curve. These features show that a single run of the [454] FLX sequencer targeting V6 or V3 tags from the human gut microbiota can reveal more taxa, and capture a larger proportion of the detectable taxa, than a more extensive effort directed toward full-length 16S rRNA clone sequencing."&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;img src="http://biology.plosjournals.org/archive/1545-7885/6/11/figure/10.1371_journal.pbio.0060280.g003-M.jpg" width="600" height="376" alt="journal-pbio-0060280-g003" border="0" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-465352133144764724?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/465352133144764724/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=465352133144764724' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/465352133144764724'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/465352133144764724'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/11/metagenomics-of-effects-of-antibiotics.html' title='Metagenomics of the effects of antibiotics on the human gut'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-2196854592174654153</id><published>2008-11-12T12:51:00.002-05:00</published><updated>2008-11-12T13:01:10.716-05:00</updated><title type='text'>CisGenome new software for Chip-Seq</title><content type='html'>&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;CisGenome&lt;/span&gt;&lt;/span&gt; - just published in Nov. Nature Biotechnology.&lt;div&gt;&lt;a href="https://www.box.net/shared/h8kj80u5vo"&gt;An integrated software system for analyzing ChIP-chip and ChIP-seq data.&lt;/a&gt;&lt;br /&gt;Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH.&lt;br /&gt;Nat Biotechnol. 2008 Nov;26(11):1293-300.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;A full-function integrated bioinformatics suite for ChIP-chip and ChIP-Seq including peak-finding, FDR control for single samples, subtraction of control lane, visualization and annotation of peaks on known genomes, and Motif finding.  Functional GUI on Windows and Mac. Wow. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Software website here:  CisGenome&lt;/div&gt;&lt;/div&gt;&lt;a href="http://www.biostat.jhsph.edu/~hji/cisgenome/index.htm"&gt;http://www.biostat.jhsph.edu/~hji/cisgenome/index.htm&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Abstract:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;We present CisGenome, a software system for analyzing genome-wide chromatin immunoprecipitation (ChIP) data. CisGenome&lt;br /&gt;is designed to meet all basic needs of ChIP data analyses, including visualization, data normalization, peak detection, false&lt;br /&gt;discovery rate computation, gene-peak association, and sequence and motif analysis. In addition to implementing previously&lt;br /&gt;published ChIP–microarray (ChIP-chip) analysis methods, the software contains statistical methods designed specifically&lt;br /&gt;for ChlP sequencing (ChIP-seq) data obtained by coupling ChIP with massively parallel sequencing. The modular design of&lt;br /&gt;CisGenome enables it to support interactive analyses through a graphic user interface as well as customized batch-mode&lt;br /&gt;computation for advanced data mining. A built-in browser allows visualization of array images, signals, gene structure,&lt;br /&gt;conservation, and DNA sequence and motif information. We demonstrate the use of these tools by a comparative analysis of&lt;br /&gt;ChIP-chip and ChIP-seq data for the transcription factor NRSF/REST, a study of ChIP-seq analysis with or without a negative&lt;br /&gt;control sample, and an analysis of a new motif in Nanog- and Sox2-binding regions.&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-2196854592174654153?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/2196854592174654153/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=2196854592174654153' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/2196854592174654153'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/2196854592174654153'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/11/cisgenome-new-software-for-chip-seq.html' title='CisGenome new software for Chip-Seq'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-1712882055878141023</id><published>2008-10-28T09:25:00.005-04:00</published><updated>2008-10-28T10:40:17.388-04:00</updated><title type='text'>Gene-Boosted Assembly</title><content type='html'>&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;a href="http://www.cbcb.umd.edu/~salzberg/"&gt;Steven Salzberg&lt;/a&gt;&lt;/span&gt; describes a method for &lt;span class="Apple-style-span" style="font-style: italic;"&gt;de novo&lt;/span&gt; assembly of a bacterial genome (&lt;span class="Apple-style-span" style="font-style: italic;"&gt;Pseudomonas&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt; aeruginosa&lt;/span&gt; strain PAb1 = 6.2 MB) from a set of 33 bp Solexa fragments, using two closely related strains as reference sequences, and "boosting" assembly using predicted protein coding regions.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000186"&gt;PLOS Computational Biology 4(9), Sept 26, 2008&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Salzberg SL, Sommer DD, Puiu D, Lee VT (2008) Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads. PLoS Comput Biol 4(9): e1000186. doi:10.1371/journal.pcbi.1000186&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The &lt;a href="http://amos.sourceforge.net/index.html"&gt;AMOS&lt;/a&gt; assembler used in this project employs several different software modules and a considerable amount of hands-on effort. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://amos.sourceforge.net/docs/pipeline/AMOScmp.html"&gt;AMOScmp&lt;/a&gt; is a comparative alignment tool - it aligns short reads to a &lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;similar&lt;/span&gt;&lt;/span&gt; reference genome, and then builds contigs. This avoids the challenge of all-vs-all assembly for &lt;span class="Apple-style-span" style="font-style: italic;"&gt;de novo&lt;/span&gt; genome sequencing projects. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://amos.sourceforge.net/docs/pipeline/minimus.html"&gt;Minimus &lt;/a&gt;is a highly stringent assembler that uses Smith-Waterman alignments to identify overlaps between reads.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Contigs were then scanned for protein coding sequences using a combination of Glimmer and BLAST. The &lt;a href="http://amos.sourceforge.net/docs/pipeline/abba.html"&gt;ABBA&lt;/a&gt; program uses protein coding information - especially at the ends of contings and singletons to close gaps.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Velvet was also used to independently assemble all the reads into contigs, them &lt;a href="http://mummer.sourceforge.net/"&gt;MUMMer&lt;/a&gt; was used to combine contigs and fill gaps. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;==================&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This method is not going to work for every &lt;span class="Apple-style-span" style="font-style: italic;"&gt;de novo&lt;/span&gt; sequencing problem, but we are going to try something similar for some new Plasmodium and Trichomonas species. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;All software from the Salzberg lab at the Univ. of Maryland is freely available here:&lt;/div&gt;&lt;div&gt;&lt;a href="http://cbcb.umd.edu/software/"&gt;http://cbcb.umd.edu/software/&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;and a page describing the Short Read Assembly methods here:&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.cbcb.umd.edu/research/SR-assembly.shtml"&gt;http://www.cbcb.umd.edu/research/SR-assembly.shtml&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;blockquote&gt;&lt;/blockquote&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-1712882055878141023?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/1712882055878141023/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=1712882055878141023' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/1712882055878141023'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/1712882055878141023'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/gene-boosted-assembly.html' title='Gene-Boosted Assembly'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-8970356782015376108</id><published>2008-10-20T16:08:00.007-04:00</published><updated>2008-11-04T14:54:04.262-05:00</updated><title type='text'>Public Chip-Seq Data</title><content type='html'>Here are some Chip-Seq data sets that have been published and are out there in the public domain.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.broad.mit.edu/node/681"&gt;Broad Institute&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;NHLBI&lt;/div&gt;&lt;div&gt;Jothi et al, - &lt;a href="http://dir.nhlbi.nih.gov/papers/lmi/epigenomes/sissrs/"&gt;Site Identification from Short Sequence Reads&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Barski et al - &lt;a href="http://dir.nhlbi.nih.gov/papers/lmi/epigenomes/hgTcell.html"&gt;High-Resolution Profiling of Histone Methylations&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Valouev et al, Sidow lab @ Stanford, &lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;&lt;a href="http://mendel.stanford.edu/sidowlab/downloads/quest/"&gt;sample data to  validate QuEST software&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Robertson et al, 2007, Nature Methods  4(8) 651-7.&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.bcgsc.ca/data/chipseq"&gt;Eland processed sequence reads and FindPeaks output&lt;/a&gt; for Stat1 and FoxA2 transcription factors&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gds&amp;amp;Cmd=DetailsSearch&amp;amp;Term=Chip%2DSeq%5BAll+Fields%5D"&gt;NCBI GEO&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=table&amp;amp;f=study&amp;amp;m=data&amp;amp;s=study"&gt;NCBI Short Read Archive&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-8970356782015376108?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/8970356782015376108/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=8970356782015376108' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/8970356782015376108'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/8970356782015376108'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/public-chip-seq-data.html' title='Public Chip-Seq Data'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-6630524789693365174</id><published>2008-10-20T14:38:00.008-04:00</published><updated>2008-11-21T14:54:20.649-05:00</updated><title type='text'>File Formats</title><content type='html'>&lt;blockquote&gt;&lt;/blockquote&gt;What is it with bioinformatics people and file formats?!&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Why is it so bloody hard to produce and agree on a single standard to represent sequence data (with quality scores) and a standard for sequence reads aligned on a reference genome? With so many formats, we are all spending exponential amounts of time writing converters between all possible combinations. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here are some of the file formats that I've dealt with in the past couple of weeks:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;SEQUENCE FORMATS&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://maq.sourceforge.net/fastq.shtml"&gt;FASTQ&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Sequence plus Phred quality score encoded as single ascii bytes&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:-webkit-monospace;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 102);"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:-webkit-monospace;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 102);"&gt;@NCYC361-11a03.q1k bases 1 to 1576&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:-webkit-monospace;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 102);"&gt;GCGTGCCCGAAAAAATGCTTTTGGAGCCGCGCGTGAAAT&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:-webkit-monospace;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 102);"&gt;+NCYC361-11a03.q1k bases 1 to 1576&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:-webkit-monospace;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 102);"&gt;!)))))****(((***%%((((*(((+,**(((+**+,-&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:-webkit-monospace;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 102);"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Solexa/Illumina FASTQ like thing...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;s_*_sequence.txt&lt;/div&gt;&lt;pre&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 102);"&gt;@HWI-EAS305_3-30gf5aaxx:8:1:415:1852&lt;br /&gt;GTTAGATTTTGTGTAACTTGCATGTAATGTTAAAA&lt;br /&gt;+HWI-EAS305_3-30gf5aaxx:8:1:415:1852&lt;br /&gt;YYYYYYYYYYYYVYYYYYYVYYYYYYYYVYVVTUU&lt;br /&gt;@HWI-EAS305_3-30gf5aaxx:8:1:187:1286&lt;br /&gt;GTTACACTGAAAAACAAATTCGTTGGAAACGGGAT&lt;br /&gt;+HWI-EAS305_3-30gf5aaxx:8:1:187:1286&lt;br /&gt;YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTVVVV&lt;br /&gt;@HWI-EAS305_3-30gf5aaxx:8:1:202:440&lt;br /&gt;GTGAAAAATGAGAAATGCACACTGAAGGACCTGGA&lt;br /&gt;+HWI-EAS305_3-30gf5aaxx:8:1:202:440&lt;br /&gt;YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYVVUVV&lt;/span&gt;&lt;/pre&gt;&lt;pre&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 102);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;pre&gt;&lt;span class="Apple-style-span"  style="  white-space: normal; font-family:Georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;s_*_eland_extended.txt&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;pre&gt;&lt;span class="Apple-style-span"  style="  white-space: normal; font-family:Georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;Solexa output format from Eland extended&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;pre&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 102);"&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:63:487      GGAGGTAGAGGTATATGGCAAGAAAACTGAAAATC     NM      -&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:415:1852    GTTAGATTTTGTGTAACTTGCATGTAATGTTAAAA     3:1:0   chr14.fa:35121238F35,35121282F35,35121326F32T1T,351&lt;br /&gt;21354F4T30&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:187:1286    GTTACACTGAAAAACAAATTCGTTGGAAACGGGAT     0:4:5   chr6.fa:103599157R16C17A,chr2.fa:98502709R16C18,985&lt;br /&gt;02829R6A9C18,98505080F4AC29,98505200F1A14C18,98505320F16C18,98506416R16C13C2CA,98506537R16C18,chrX.fa:139917587R16C2A13CA&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:202:440     GTGAAAAATGAGAAATGCACACTGAAGGACCTGGA     3:87:58 chr2.fa:98503100F33T1,98506780F35,98507265F35&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:359:505     TATTCAATTTACATACTCTGGCTTTGCCAACATTT     1:0:0   chr9.fa:31339651R35&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:1290:135    TTGATTGTATAGTAGGGGTGAAATGGAATTTTATC     1:0:1   chrM.fa:14790R35&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:627:596     GTGATTTTGAAAGTTGTAGATTGTGTGTTTGTGAT     NM      -&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:379:298     GACGTGAAATATGGCGAGGAAAACTGAAAAAGGTG     31:56:28        -&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;s_*_eland_multi.txt&lt;div&gt;Solexa output format from Eland extended&lt;/div&gt;&lt;pre&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 102);"&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:414:208     GTAAACTATCAATAAAATAATTTGTTACTCTGTAT     20:7:0&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:59:857      TAAATTGTCCACCTTTTTCAGTTTTCCTCGCTATA     0:0:35&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:1414:307    GAGAAAACTGTAAATAAAGGTAAATGAGAAAAAAA     NM&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:330:1758    GGTAAAGTCCACTAAGGAAAAGAAAGAAACAATGT     1:0:0   chr7.fa:97764095R0&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:576:127     GAAGTCAATCTTATGAGTTATTAGGATGGCTACTC     0:7:255 chr7.fa:111867683F1,chr12.fa:51788781R1,115833262F1&lt;br /&gt;,chr6.fa:21403822R1,89734675R1,89780759R1,chrX.fa:15525553R1&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:88:1045     GTTTCTCATTTTCCATGATTTTCAGTTTTCTTGCC     66:110:72&lt;br /&gt;&gt;HWI-EAS305_3-30gf5aaxx:8:1:939:613     TACTTTACTTTCTAGGGAATGTTCACTTCTAAGTG     1:0:0   chr1.fa:150051845R0&lt;/span&gt;&lt;/pre&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;s_*_sorted.txt&lt;/div&gt;&lt;div&gt;filtered eland_extended alignments w/ quality  scores and genome positions&lt;/div&gt;&lt;pre&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 102);"&gt;HWI-EAS305      3-30gf5aaxx     8       66      580     1584                    AGTATGGGTATCGGTTGGTGCAGAGAACTACTGCA     YYYYYYYYYYYYYYYYYYY&lt;br /&gt;YYYYYVYYYYYVVUVU        chr10.fa                3001045 F       35      11&lt;br /&gt;HWI-EAS305      3-30gf5aaxx     8       100     534     1062                    ATTTTCAGGTTGGAGTGACTCGCTAAAACAGCCAA     YYYYYYYYYYYYYYYYYYY&lt;br /&gt;YYYYYYYYYYYTVVVV        chr10.fa                3002892 R       35      29&lt;br /&gt;HWI-EAS305      3-30gf5aaxx     8       59      199     495                     CCACATGCTGTGGCAAAGCCCTTCTGAGCGGGGCG     YYYYTYYYYYYYYYYYRYY&lt;br /&gt;YYYYYYYYYYYTVUVV        chr10.fa                3008958 F       34A     20&lt;br /&gt;HWI-EAS305      3-30gf5aaxx     8       76      779     1406                    AGATGTACAAATGCTCCTCAGATGTTTGTGTCATA     YYYYYYYYYYYYYYYYYYY&lt;br /&gt;YYYYYYYYYYYVVVVV        chr10.fa                3009290 F       35      3&lt;br /&gt;HWI-EAS305      3-30gf5aaxx     8       83      547     1480                    ATCCAAACAGTTACACAAAGTTTTGAGAACATTAT     YYYYYYYYYYYYYYYYYYY&lt;br /&gt;YYYYYYYYYYYVVVVV &lt;/span&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;GENOME ALIGNMENT FORMATS&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.isrec.isb-sib.ch/chipseq/sga_specs.html"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;SGA&lt;/span&gt;&lt;/a&gt; ('Simplified' Genome Annotation)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;GFF  &lt;/span&gt;(General Feature Format)&lt;/div&gt;&lt;div&gt;&lt;a href="http://genome.ucsc.edu/FAQ/FAQformat#format3"&gt;UCSC Genome Browser&lt;/a&gt; &lt;/div&gt;&lt;div&gt;&lt;a href="http://www.sanger.ac.uk/Software/formats/GFF/"&gt;Sanger&lt;/a&gt;&lt;/div&gt;&lt;pre&gt;&lt;span class="Apple-style-span"  style="  white-space: normal;font-family:Georgia;"&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;EXAMPLE:&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;pre&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;track name=regulatory description="TeleGene(tm) Regulatory Regions"&lt;br /&gt;chr22 TeleGene enhancer 1000000 1001000 500 + . touch1&lt;br /&gt;chr22 TeleGene promoter 1010000 1010100 900 + . touch1&lt;br /&gt;chr22 TeleGene promoter 1020000 1020000 800 - . touch2&lt;/span&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.isrec.isb-sib.ch/ssa/ssa_tutorial.html#FPS"&gt;FPS &lt;/a&gt;(Functional Position Set)&lt;/div&gt;&lt;div&gt;Native format for Eukaryotic Promoter Database&lt;/div&gt;&lt;br /&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;EXAMPLE:&lt;/span&gt;&lt;pre&gt;F&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;P   Pv snRNA U1         :+S  EM:J03563.1          1+       352; 17001.098&lt;br /&gt;FP   Ath snRNA U2.5      :+S  EM:AL353994.1        1-     73709; 24016.116&lt;br /&gt;FP   Ath snRNA U5        :+S  EM:X13012.1          1+       678; 23040.&lt;br /&gt;FP   Ta histone H3       :+S  EM:X00937.1          1+       186; 07001.&lt;/span&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://genome.ucsc.edu/goldenPath/help/wiggle.html"&gt;WIG&lt;/a&gt; (Wiggle)&lt;/div&gt;&lt;div&gt;UCSC Genome Browser track format&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;EXAMPLE&lt;/span&gt;&lt;/div&gt;&lt;pre&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;track type=wiggle_0 name="Bed Format" description="BED format" \&lt;br /&gt;visibility=full color=200,100,0 altColor=0,100,200 priority=20&lt;br /&gt;chr19 59302000 59302300 -1.0&lt;br /&gt;chr19 59302300 59302600 -0.75&lt;br /&gt;chr19 59302600 59302900 -0.50&lt;/span&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://genome.ucsc.edu/FAQ/FAQformat#format1"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;BED&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;UCSC Genome Browser&lt;/div&gt;&lt;div&gt;&lt;pre&gt;Example:&lt;br /&gt;Here's an example of an annotation track that uses a complete BED definition:&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;track name=pairedReads description="Clone Paired Reads" useScore=1&lt;br /&gt;chr22 1000 5000 cloneA 960 + 1000 5000 0 2 567,488, 0,3512&lt;br /&gt;chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399, 0,3601&lt;/span&gt;&lt;/pre&gt;&lt;pre&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;pre&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 255);"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;ALN&lt;span class="Apple-style-span" style="color: rgb(0, 0, 0); font-weight: normal; "&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;pre&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 255);"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 0, 0); font-weight: normal; "&gt;Alignment format for &lt;a href="http://www.biostat.jhsph.edu/~hji/cisgenome/index_files/tutorial_chipseq.htm"&gt;CisGenome&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;pre&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;chr1&lt;/span&gt;[tab]&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;359077&lt;/span&gt;[tab]&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;F&lt;br /&gt;chr1&lt;/span&gt;[tab]&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;376890&lt;/span&gt;[tab]&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;R&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;….&lt;br /&gt;&lt;br /&gt;column1 = chromosome where the read is aligned;&lt;br /&gt;column2 = coordinate where the read is aligned;&lt;br /&gt;column3 = ‘F’ or ‘+’: if the read is aligned to the forward strand of the genome assembly;&lt;br /&gt;         ‘R’ or ‘-’: if the read is aligned to the reverse complement strand of the genome.&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-6630524789693365174?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/6630524789693365174/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=6630524789693365174' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/6630524789693365174'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/6630524789693365174'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/file-formats.html' title='File Formats'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-2324147853841770877</id><published>2008-10-19T22:28:00.011-04:00</published><updated>2008-12-05T12:55:39.504-05:00</updated><title type='text'>Service Providers</title><content type='html'>Next-Gen Sequencing as a service (you don't need half a million $$ to play this game)&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.agencourt.com/services/nextgen/"&gt;Agencourt&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;454, ABI SOLID, many related services, located in Beverly MA (USA)&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.seqwright.com/services/AB_SOLiD.php"&gt;SeqWright&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;434, ABI SOLID, located in Houston, Texas&lt;span style="display: block;" id="formatbar_Buttons"&gt;&lt;span class="on" style="display: block;" id="formatbar_CreateLink" title="Link" onmouseover="ButtonHoverOn(this);" onmouseout="ButtonHoverOff(this);" onmouseup="" onmousedown="CheckFormatting(event);FormatbarButton('richeditorframe', this, 8);ButtonMouseDown(this);"&gt;&lt;img src="http://www.blogger.com/img/blank.gif" alt="Link" class="gl_link" border="0" /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://cofactorgenomics.com/"&gt;Cofactor Genomics&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Illumina GA II, can buy a single lane w/ multiplex primers, located in St. Louis MO&lt;br /&gt;&lt;a href="http://www.in-sequence.com/issues/2_47/features/151028-1.html"&gt;Article in "In Sequence"&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.fasteris.com/"&gt;Fasteris&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Illumina Genome Analyzer, located in Geneva Switzerland&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.gatc-biotech.com/en/sequencing/large_scale_sequencing/genome_sequencing.php"&gt;GATC Biotech&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Illumina, 454, and ABI, located in Germany&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.geneservice.co.uk/services/sequencing/"&gt;GeneService&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Illumina Genome Analyzer, custom bioinformatics&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-2324147853841770877?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/2324147853841770877/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=2324147853841770877' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/2324147853841770877'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/2324147853841770877'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/service-providers.html' title='Service Providers'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-5806463410786866921</id><published>2008-10-19T22:05:00.011-04:00</published><updated>2008-10-21T10:55:54.608-04:00</updated><title type='text'>Applications</title><content type='html'>This is where the real action is. New applications for Chip-Seq technology are developing with the unlimited creativity of the worldwide scientific community.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Genome Resequencing&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.1000genomes.org/page.php"&gt;The 1000 Genomes Project&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.genome.gov/26524516"&gt;NHGRI&lt;/a&gt; &lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.personalgenomes.org/"&gt;Personal Genome Project&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;SNP discovery/detection&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;a href="http://nextgenseq.blogspot.com/2008/10/chip-seq.html"&gt;Chip-Seq&lt;/a&gt;&lt;/span&gt; &lt;/span&gt;(transcription factor studies, epigenetics)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;RNA-Seq&lt;/span&gt;&lt;/span&gt; (transcriptome, digital gene expresion)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;MetaGenomics&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://nihroadmap.nih.gov/hmp/"&gt;Human Microbiome Project&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;&lt;a href="http://www.nap.edu/catalog.php?record_id=11902"&gt;The New Science of Metagenomics&lt;/a&gt; (National Academies Press - free PDF)&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;table style="BORDER-RIGHT: black 1px solid; PADDING-RIGHT: 0px; BORDER-TOP: black 1px solid; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; BORDER-LEFT: black 1px solid; WIDTH: 158px; PADDING-TOP: 0px; BORDER-BOTTOM: black 1px solid; HEIGHT: 200px; line-height: 10px; background-color: #ffffff;" cellspacing="0" cellpadding="0" width="158" border="0" height="200"&gt;&lt;tbody&gt;&lt;/tbody&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; WIDTH: 129px; PADDING-TOP: 0px; HEIGHT: 38px;" valign="top" align="right" width="129" colspan="2" height="38"&gt;&lt;img style="VERTICAL-ALIGN: top; WIDTH: 129px; HEIGHT: 38px; border: none; margin: 0; padding: 0;" height="38" alt="" src="http://images.nap.edu/images/widgetdisplay_nap1.gif" width="129" border="0" align="top" /&gt;&lt;/td&gt;&lt;td style="VERTICAL-ALIGN: top; WIDTH: 29px; BACKGROUND-COLOR: #990000; padding: 0; margin: 0;" valign="top" align="left" width="29" bgcolor="#990000" rowspan="4"&gt;&lt;img height="200" alt="" src="http://images.nap.edu/images/widgetdisplay_nap2.gif" width="29" valign="top" style="VERTICAL-ALIGN: top; WIDTH: 29px; border: none; margin: 0; padding: 0;" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top" width="130" style="width: 130px; padding: 0; margin: 0; vertical-align: top; text-align: center; height: 50px;"&gt;&lt;div style="PADDING-RIGHT: 5px; PADDING-LEFT: 0px; PADDING-BOTTOM: 10px; WIDTH: 115px; PADDING-TOP: 2px; HEIGHT: 50px; VERTICAL-ALIGN: top; margin: 0; "&gt;&lt;!-- insert book title --&gt;&lt;a style="PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; FONT: 11/13px arial; COLOR: #990000; PADDING-TOP: 0px; text-decoration: underline;" href="http://www.nap.edu/openbook.php?record_id=11902&amp;amp;utm_source=Network&amp;amp;utm_medium=Widgetv2&amp;amp;utm_content=v2&amp;amp;utm_campaign=Widget" target="_blank"&gt;The New Science of Metagenomics:  Revealing the Secrets of  ...&lt;/a&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="middle" width="130"&gt;&lt;!-- insert tinycov image --&gt;&lt;img style="BORDER-RIGHT: #000 1px solid; BORDER-TOP: #000 1px solid; BORDER-LEFT: #000 1px solid; BORDER-BOTTOM: #000 1px solid; padding: 0; margin: 0" alt="" src="http://images.nap.edu/images/cover.php?id=11902&amp;amp;type=tinycov" width="70" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="PADDING-RIGHT: 2px; PADDING-LEFT: 2px; PADDING-BOTTOM: 2px; PADDING-TOP: 2px" valign="top" align="middle"&gt;&lt;span style="PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; FONT: 11px arial; PADDING-TOP: 0pxcolor:#000000;"&gt;Read this FREE online!&lt;br /&gt;&lt;a style="PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; FONT: 9px arial; COLOR: #990000; PADDING-TOP: 0px; text-decoration: underline;" href="http://www.nap.edu/openbook.php?record_id=11902&amp;amp;utm_source=Network&amp;amp;utm_medium=Widgetv2&amp;amp;utm_content=v2&amp;amp;utm_campaign=Widget"&gt;Full Book&lt;/a&gt; | &lt;a style="PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; FONT: 9px arial; COLOR: #990000; PADDING-TOP: 0px; text-decoration: underline;" href="http://www.nap.edu/nap-cgi/execsumm.cgi?record_id=11902&amp;amp;utm_source=Network&amp;amp;utm_medium=Widgetv2&amp;amp;utm_content=v2&amp;amp;utm_campaign=Widget"&gt;PDF Summary&lt;/a&gt; | &lt;a style="PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; FONT: 9px arial; COLOR: #990000; PADDING-TOP: 0px; text-decoration: underline;" href="http://dels.nas.edu/dels/rpt_briefs/metagenomics_brief_final.pdf?utm_source=Network&amp;amp;utm_medium=Widgetv2&amp;amp;utm_content=v2&amp;amp;utm_campaign=Widget"&gt;PDF Report Brief&lt;/a&gt; | &lt;a style="PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; FONT: 9px arial; COLOR: #990000; PADDING-TOP: 0px; text-decoration: underline;" href="http://media.nap.edu/podcasts/nax18metagenomi.mp3?utm_source=Network&amp;amp;utm_medium=Widgetv2&amp;amp;utm_content=v2&amp;amp;utm_campaign=Widget"&gt;Podcast&lt;/a&gt;&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Mapping translocation breakpoints&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://genome.cshlp.org/cgi/content/short/18/7/1143"&gt;Chen et al, Genome Research 18:1143-1149, 2008.&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-5806463410786866921?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/5806463410786866921/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=5806463410786866921' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5806463410786866921'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5806463410786866921'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/applications.html' title='Applications'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-1101789846843420255</id><published>2008-10-19T20:57:00.009-04:00</published><updated>2008-10-28T11:23:42.129-04:00</updated><title type='text'>Commercial Software</title><content type='html'>A large number of vendors are developing or adapting products to server the Next-Gen Sequencing market.  I will try to collect as much info as possible here with very brief description of functions. We can add comments or review pages for each. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.geospiza.com/"&gt;Geospiza FinchLab Next-Gen&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;A LIMS and analysis system for complete lab management workflow&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.genologics.com/index.php"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 102, 0);"&gt;GenoLogics&lt;/span&gt;&lt;/a&gt; &lt;a href="http://www.genologics.com/files/u2/AppNote_NextGen_v7LoRes.pdf"&gt;Geneus&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;A LIMS system that tracks sample submission, automates analysis pipeline commands, and keeps track of the resulting data.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.clcbio.com/index.php?id=1240"&gt;CLC Genomics Workbench&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;both desktop and server based NG Genomics tools, assembly, chip-seq, transcriptome,&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.genomatix.de/nextgensequencing.html"&gt;Genomatix&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;solutions for transcript mapping (digital gene expression)  and Chip-seq with an emphasis on mapping sequence tags to annotated genome regions (TF binding sites).&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.genomequest.com/"&gt;GenomeQuest&lt;/a&gt;&lt;/div&gt;&lt;div&gt;A full service bioinformatics pipeline and consultant as an online service - if you have a next-gen machine and no bioinformatics support!!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;DNA*&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://www.dnastar.com/products/SMGA.php"&gt;SeqMan NGen&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;desktop solution for assembly of NG data.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://softgenetics.com/NextGENe.html"&gt;NextGene&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;from Softgenetics&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;de novo assembly, SNP detection, and transcriptome/digital gene expression&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-1101789846843420255?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/1101789846843420255/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=1101789846843420255' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/1101789846843420255'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/1101789846843420255'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/commercial-software.html' title='Commercial Software'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-7544938451912706374</id><published>2008-10-19T20:47:00.004-04:00</published><updated>2008-10-20T00:36:05.021-04:00</updated><title type='text'>Conferences</title><content type='html'>&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Upcoming conferences (and links to content from recent confs. if I can find any)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Cambridge Healthtech Inst: &lt;a href="http://www.healthtech.com/seq/overview.aspx"&gt;Next Generation Sequencing&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;meets every 6 months, next one in March '09, San Diego CA&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://metagenomics.calit2.net/"&gt;Metagenomics 2008&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Nov 3-7 Caltech, San Diego, CA&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-7544938451912706374?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/7544938451912706374/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=7544938451912706374' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/7544938451912706374'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/7544938451912706374'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/conferences.html' title='Conferences'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-3556227135422638058</id><published>2008-10-19T20:42:00.008-04:00</published><updated>2008-10-27T15:10:30.634-04:00</updated><title type='text'>MagazineS &amp; Articles</title><content type='html'>These are news articles, editorials, etc about Nex-Gen Sequencing. &lt;div&gt;Published stuff that are not official refereed journal articles.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.in-sequence.com/issues/"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 0);"&gt;In Sequence&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;"The Inside Road on Genome Sequencing"&lt;br /&gt;&lt;/div&gt;&lt;div&gt;a GenomeWeb publication&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.bioinform.com/issues/"&gt;Bio1NF0RM&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://www.bioinform.com/issues/12_42/features/150226-1.html"&gt;Short Read Sequence Software&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;Nature&lt;/span&gt;&lt;/span&gt;: Big Data special &lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space: pre; "&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 0, 0);"&gt;&lt;a href="http://www.nature.com/news/specials/bigdata/index.html" style="text-decoration: none;"&gt; &lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;a href="http://www.nature.com/news/specials/bigdata/index.html" style="text-decoration: none;"&gt;http://www.nature.com/news/specials/bigdata&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;Nature Genetics&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;&lt;a href="http://www.nature.com/nbt/focus/sequencing/"&gt; &lt;/a&gt;&lt;/span&gt;&lt;a href="http://www.nature.com/nbt/focus/sequencing/"&gt;Focus on Next-Generation Sequencing&lt;/a&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;HHMI Bulletin&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;a href="http://www.hhmi.org/bulletin/aug2008/chronicle/sequencing.html"&gt;Next Generation Sequencing&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;Venture Beat&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;a href="http://venturebeat.com/2008/02/15/qa-with-mdvs-bill-ericson-on-pacbios-origin-why-gattaca-isnt-our-future-throwing-out-your-statins-and-more/"&gt;Interview with Bill Ericson&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;BioIT World&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;a href="http://www.bio-itworld.com/headlines/2008/march/applied-biosystems-sequences-60-thousand-dollar-human-genome.html"&gt;ABI and the $60K Human Genome&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-3556227135422638058?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/3556227135422638058/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=3556227135422638058' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/3556227135422638058'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/3556227135422638058'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/magazine-articles.html' title='MagazineS &amp; Articles'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-2490235665974161161</id><published>2008-10-19T19:53:00.013-04:00</published><updated>2008-11-12T12:12:50.036-05:00</updated><title type='text'>Chip-Seq</title><content type='html'>I have been working quite hard on &lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Chip-Seq&lt;/span&gt; applications for Illumina (Solexa) data.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;These boil down to four basic functions:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 0);"&gt;peak calling&lt;/span&gt; &lt;/span&gt;- taking sequence reads aligned to a reference genome and counting the number of hits per genome interval, subtracting background or a control lane, smoothing, cutting off shoulders, splitting double peaks, and coming up with some statistic that suggests that the peaks are real vs. false positives&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-weight: bold; "&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 0);"&gt;annotation&lt;/span&gt;&lt;/span&gt; - finding the location of peaks on the genome as compared to known features, especially the transcription start sites of known genes&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 0);"&gt;visualization&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 0);"&gt; &lt;/span&gt;- looking at peaks in one of the genome browsers &lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 102, 0);"&gt;motif detection&lt;/span&gt;&lt;/span&gt; - finding patterns of common bases within the peaks, comparing these patterns with known transcription factor binding sites&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We have evaluated quite a few different pieces of software that supply various of these functions:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;&lt;a href="http://www.biostat.jhsph.edu/~hji/cisgenome/"&gt;CisGenome&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;"An integrated software system for analyzing ChIP-chip and ChIP-seq data"&lt;/div&gt;&lt;div&gt;Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="https://www.box.net/shared/h8kj80u5vo"&gt;Nat Biotechnol. 2008 Nov;26(11):1293-300.&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;FindPeaks&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;BC Cancer Agency: &lt;a href="http://www.bcgsc.ca/platform/bioinfo/software/findpeaks"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;FindPeaks&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://vancouvershortr.wiki.sourceforge.net/"&gt;Vancouver Short Read Analysis Package&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is a good peak finder, easy to use, with a reasonable statistical model (based on comparison of your genome mapped data vs. a MonteCarlo random distribution of tags)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.rajajothi.com/sissrs/"&gt;SISSRS&lt;/a&gt; (Site Identification from Short Sequence Reads)&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;Makes use of +/- strand information in Chip-Seq reads to precisely identify transcription factor binding sites within a few tens of base pairs. &lt;br /&gt;&lt;/div&gt;&lt;div&gt;J&lt;a href="http://nar.oxfordjournals.org/cgi/content/full/36/16/5221"&gt;othi R, Cuddapah S, Barski A, Cui K, Zhao K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 2008 Sep;36(16):5221-31. &lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://liulab.dfci.harvard.edu/MACS/"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;MACS:&lt;/span&gt; Model-based Analysis for Chip-Seq&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://genomebiology.com/2008/9/9/R137#B16"&gt;Genome Biology article&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;written by Yong Zhang and Tao Liu from the lab of Shirley Liu at Harvard&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://mendel.stanford.edu/sidowlab/downloads/quest/"&gt;QuEST&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;C++ program (requires C++ compiler) - author Anton Valouev in Sidow lab at Stanford&lt;br /&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 0, 153);"&gt;&lt;a href="http://www.nature.com/nmeth/journal/v5/n9/full/nmeth.1246.html"&gt;Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data, Valouev, et al. Nature Methods 5, 829 - 834 (2008)&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://woldlab.caltech.edu/html/software"&gt;Wold Lab software suite&lt;/a&gt; (@ Caltech)&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://woldlab.caltech.edu/html/chipseq_peak_finder"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;ChipSeq Peak Finder&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://code.google.com/p/genetrack/"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;GeneTrack&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;Peak finder and visualization via UCSB Genome Browser&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;MIT Integrative Genome Viewer&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://www.broad.mit.edu/igv/"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;MIT IGV&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;note the alignment processor that creates tag counts from Next-Gen aligned reads (such as Eland output files)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.isrec.isb-sib.ch/chipseq/"&gt;Chip-Seq Analysis Server&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Web-based peak calling at the Swiss Institute of Bioinformatics&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;ChIPDiff&lt;/span&gt; - identification of &lt;span class="Apple-style-span" style="color: rgb(0, 0, 153);"&gt;differential histone modification sites&lt;/span&gt; by comparison of two ChIP-Seq libraries prepared from different tissues (various cell types, stages, or environmental responses). Uses a Hidden Markov Model to identify differences in ChIP tag counts.&lt;/div&gt;&lt;a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/24/20/2344"&gt;http://bioinformatics.oxfordjournals.org/cgi/content/full/24/20/2344&lt;/a&gt;&lt;br /&gt;&lt;div&gt;Available from Genome Institue of Singapore&lt;/div&gt;&lt;div&gt;&lt;a href="http://cmb.gis.a-star.edu.sg/ChIPSeq/tools.htm"&gt;http://cmb.gis.a-star.edu.sg/ChIPSeq/tools.htm&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-2490235665974161161?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/2490235665974161161/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=2490235665974161161' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/2490235665974161161'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/2490235665974161161'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/chip-seq.html' title='Chip-Seq'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-1906738048651644851</id><published>2008-10-19T18:22:00.027-04:00</published><updated>2008-10-27T16:34:58.756-04:00</updated><title type='text'>Software</title><content type='html'>Software for basic next-gen sequencing operations.&lt;div&gt;Each of the commercial vendors has their own proprietary software, so we will emphasize the open source.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A great page about &lt;a href="http://seqanswers.com/forums/showthread.php?t=43"&gt;Next-Gen software on SeqAnswers&lt;/a&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;Image Processing&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;Basecalling&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://bioinformatics.bc.edu/marthlab/Software"&gt;PyroBayes&lt;/a&gt;&lt;/div&gt;&lt;div&gt;alternative base calling for 454 sequencer with improved quality scores. Developed by the Marth lab at Boston College&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://sgenomics.org/swift/"&gt;Swift&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Open source primary data analysis for next-gen DNA sequencers&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;Alignment to a Reference Sequence&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 204);"&gt;Maq&lt;/span&gt;   &lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;Mapping and Assembly with Quality&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://maq.sourceforge.net/"&gt;http://maq.sourceforge.net&lt;/a&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;a href="http://bioinformatics.bc.edu/marthlab/Software"&gt;MOSAIK&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;Gapped alignments to reference genome, another from the Marth lab at Boston College&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.novocraft.com/products.html"&gt;Novoalign&lt;/a&gt; from Novocraft in Kuala Lumpur, Malaysia&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;SOAP  —  Short Oligonucleotide Alignment Program&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;GNU Public License&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;from the Bioinformatics Dept of the Beijing Genomics Institute&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;&lt;span class="Apple-style-span"  style="  white-space: pre; font-family:-webkit-monospace;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="  white-space: pre; font-family:-webkit-monospace;"&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt; &lt;/span&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;a href="http://soap.genomics.org.cn/SOAP_paper.pdf"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;Ruiqiang Li, et. al. SOAP: short oligonucleotide alignment program. Bioinformatics. 2008 24: 713-714&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://rulai.cshl.edu/rmap/"&gt;RMAP/RMAPQ&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Maps sequence reads to genomic location, variable number of mismatches and overall quality cutoff score. By Andrew D. Smith and Zhenyu Xuan in the Zhang lab at Cold Spring Harbor.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.bioinformaticssolutions.com/products/zoom"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 51, 0);"&gt;ZOOM&lt;/span&gt;&lt;/a&gt; (Zillions Of Oligos Mapped)&lt;/div&gt;&lt;div&gt;Product of the Michael Zhang lab at Cold Spring Harbor&lt;/div&gt;&lt;div&gt;&lt;a href="http://bioinformatics.oxfordjournals.org/cgi/content/abstract/24/21/2431"&gt;Bioinformatics&lt;/a&gt; paper&lt;/div&gt;&lt;div&gt;"Zoom is freely available to non-commercial users at &lt;a href="http://www.bioinformaticssolutions.com/products/zoom"&gt;http://bioinfor.com/zoom&lt;/a&gt;"&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;de novo&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt; Assembly&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold; "&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;Edena  (Exact &lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;de novo&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt; Assembler)&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.genomic.ch/edena.php"&gt;http://www.genomic.ch/edena.php&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="  white-space: pre; font-family:'-webkit-monospace';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="  white-space: pre; font-family:'-webkit-monospace';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;D. Hernandez, P. François, L. Farinelli, M. Osteras, and J. Schrenzel.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="  white-space: pre; font-family:'-webkit-monospace';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;a href="http://genome.cshlp.org/cgi/content/abstract/genome;18/5/802"&gt;Genome Research. 18:802-809, 2008.&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Velvet  &lt;/span&gt;&lt;span class="Apple-style-span" style=""&gt;(GPL, 64 bit Linux)&lt;a href="http://www.ebi.ac.uk/~zerbino/velvet/"&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;a href="http://www.ebi.ac.uk/~zerbino/velvet/"&gt;http://www.ebi.ac.uk/~zerbino/velvet&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:'-webkit-monospace';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Velvet: algorithms for de novo short read assembly using de Bruijn graphs.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:'-webkit-monospace';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;D.R. Zerbino and E. Birney. G&lt;a href="http://genome.cshlp.org/cgi/content/full/18/5/821"&gt;enome Research 18:821-829.&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;SHARCGS &lt;/span&gt;&lt;span class="Apple-style-span" style=""&gt;(SHort read Assembler based on Robust Contig extension for Genome Sequencing)&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;a href="http://sharcgs.molgen.mpg.de/"&gt;http://sharcgs.molgen.mpg.de/index.shtml &lt;/a&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:-webkit-monospace;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:-webkit-monospace;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Dohm JC, Lottaz C, Borodina T, Himmelbauer H&lt;/span&gt;&lt;span class="Apple-style-span"  style=" white-space: normal; font-family:Georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;. &lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;SHARCGS, a fast and highly accurate short-read assembly &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:-webkit-monospace;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;algorithm &lt;/span&gt;&lt;span class="Apple-style-span"  style=" white-space: normal; font-family:Georgia;"&gt;&lt;span class="Apple-style-span"  style=" white-space: pre; font-family:'-webkit-monospace';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;for &lt;span class="Apple-style-span" style="font-style: italic;"&gt;de novo&lt;/span&gt; genomic sequencing.&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="  font-weight: bold; white-space: pre; font-family:-webkit-monospace;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="  font-weight: bold; white-space: pre; font-family:-webkit-monospace;"&gt;&lt;a href="http://genome.cshlp.org/cgi/content/full/17/11/1697"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;Genome Res. 2007 17: 1697-1706&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;ALLPATHS&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: normal;"&gt;ALLPATHS: De novo assembly of whole-genome shotgun microreads&lt;br /&gt;Jonathan Butler, Iain MacCallum, Michael Kleber, Ilya A. Shlyakhter, Matthew K. Belmonte, Eric S. Lander, Chad Nusbaum, and David B. Jaffe&lt;br /&gt;Genome Res. 2008 18: 810-820.&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;SSAKE  &lt;span class="Apple-style-span" style="font-weight: normal; "&gt;(GNU Public License, written in Perl)&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.bcgsc.ca/platform/bioinfo/software/ssake"&gt;http://www.bsgc.ca/platform/bioinfo/software/ssake&lt;/a&gt;&lt;/div&gt;The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree.&lt;br /&gt;&lt;pre&gt;René L Warren, Granger G Sutton, Steven JM Jones, Robert A Holt. 2007.&lt;br /&gt;Assembling millions of short DNA sequences using SSAKE. &lt;/pre&gt;&lt;pre&gt;&lt;a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/23/4/500"&gt;Bioinformatics. 23:500-501.&lt;/a&gt;&lt;/pre&gt;&lt;pre&gt;&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;&lt;span class="Apple-style-span"  style="  font-weight: bold; white-space: normal; font-family:Georgia;"&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;SNP Discovery&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;div&gt;&lt;a href="http://bioinformatics.bc.edu/marthlab/Software"&gt;PolyBayes &lt;/a&gt;&lt;/div&gt;&lt;div&gt;Another from the Marth Lab&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;Genome Viewers&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;&lt;a href="http://bioinformatics.bc.edu/marthlab/EagleView"&gt;EagleView&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;Another tool from the Marth Lab&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://genoviz.sourceforge.net/"&gt;Integrated Genome Browser &lt;/a&gt;&lt;/div&gt;&lt;div&gt;from Affymetrix &amp;amp; GenoViz&lt;/div&gt;&lt;div&gt;some&lt;a href="http://bioserver.hci.utah.edu/BioInfo/index.php/Software:IGB"&gt; IGB tips&lt;/a&gt; from Hunstman Cancer Inst. @ Univ of Utah&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.broad.mit.edu/igv/"&gt;MIT Integrative Genomics Viewer&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;LIMS&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://sourceforge.net/projects/solexatools"&gt;Solexa Tools&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-1906738048651644851?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/1906738048651644851/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=1906738048651644851' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/1906738048651644851'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/1906738048651644851'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/software.html' title='Software'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-5285546055357691271</id><published>2008-10-19T18:19:00.012-04:00</published><updated>2008-11-05T14:17:32.963-05:00</updated><title type='text'>Nex-Gen Blogs</title><content type='html'>These are some of the good folks posting blogs full of useful Next-Gen sequencing information:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;INDEPENDENT BLOGS:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;SEQanswers&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;"The next-generation sequencing community"&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://seqanswers.com/"&gt;http://seqanswers.com&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://groups.google.com/group/solexa"&gt;Solexa Google Group&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.fejes.ca/blog.html"&gt;fejes.ca&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Anthony Fejes is a gradstudent in bioinformatics at UBC in Vancouver.&lt;/div&gt;&lt;div&gt;He works on FindPeaks and the Vancouver Short Read Analysis Package.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;MassGenomics&lt;/span&gt;&lt;/div&gt;&lt;div&gt;"Medical Genomics in the post-genome era"&lt;/div&gt;&lt;div&gt;&lt;a href="http://massgenomics.wordpress.com/"&gt;http://massgenomics.wordpress.com&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Genetic Future:&lt;/span&gt; Daniel MacArthur&lt;/div&gt;&lt;div&gt;&lt;a href="http://scienceblogs.com/geneticfuture/"&gt;http://scienceblogs.com/geneticfuture&lt;/a&gt;&lt;/div&gt;&lt;pre&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;The genetic and evolutionary basis of human variation, &lt;/span&gt;&lt;/pre&gt;&lt;pre&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;and the companies trying to sell you information about your genome.&lt;/span&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;S&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;ystems Biology &amp;amp; Bioinformatics&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://lurena.vox.com/library/post/chipseq-and-epigenetics.html"&gt;http://lurena.vox.com&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://phylogenomics.blogspot.com/"&gt;The Tree of Life&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Jonathan Eisen writes about a lot more than Next-Gen Sequencing, but his blog is a must-read for everyone in bioinformatics, genomics, and evolutionary biology&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;VENDOR BLOGS:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.clcngs.com/"&gt;CLC Bio NG Seq&lt;/a&gt;  &lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;(developer of CLC Genomics Workbench)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Geospiza&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://www.geospiza.com/finchtalk/"&gt;FinchTalk&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;DISCUSSION LIST:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="https://stat.ethz.ch/pipermail/bioc-sig-sequencing/"&gt;BioConductor Short Read sequencing mail list&lt;/a&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-5285546055357691271?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/5285546055357691271/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=5285546055357691271' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5285546055357691271'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/5285546055357691271'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/nex-gen-blogs.html' title='Nex-Gen Blogs'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-6888821370010108903</id><published>2008-10-19T17:53:00.004-04:00</published><updated>2008-10-20T00:13:36.840-04:00</updated><title type='text'>Next Gen Sequencing Vendors</title><content type='html'>&lt;div&gt;A list of the vendors of next-generation, high-throughput DNA sequencing machines. &lt;/div&gt;&lt;div&gt;No editorializing here, there will be comment pages for each of these. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;The Big Three:&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;454 (Roche)&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://www.454.com/"&gt;www.454.com&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Illumina (Solexa)&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://www.illumina.com/pages.ilmn?ID=204"&gt;www.illumina.com&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;ABI SOLID&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://marketing.appliedbiosystems.com/mk/get/SOLID_KNOWLEDGE_LANDING"&gt;www.appliedbiosystems.com&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;The New New Thing:&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style=" font-weight: bold;font-size:18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;Helicos&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http:/www.helicosbio.com/"&gt;www.helicosbio.com&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Pacific Biosciences&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://www.pacificbiosciences.com/"&gt;www.pacificbiosciences.com&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[plans to ship first unit in 2010]&lt;/div&gt;&lt;div&gt;&lt;a href="http://venturebeat.com/2008/02/10/pacific-bio-lifts-the-veil-on-its-high-speed-genome-sequencing-effort/"&gt;An analysis of PacBio by David Hamilton&lt;/a&gt;, Feb 10, 2008&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The Polonator (Dover Systems aka George Church &amp;amp; Co.)&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://www.polonator.org/"&gt;www.polonator.org&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Intelligent Biosystems&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://www.intelligentbiosystems.com/"&gt;www.intelligentbiosystems.com&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;VisiGen Biotechnologies&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;a href="http://visigenbio.com/"&gt;www.visigenbio.com&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-6888821370010108903?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/6888821370010108903/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=6888821370010108903' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/6888821370010108903'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/6888821370010108903'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/next-gen-sequencing-vendors.html' title='Next Gen Sequencing Vendors'/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4457216402399127579.post-8306453839601052859</id><published>2008-10-19T17:19:00.003-04:00</published><updated>2008-10-19T17:26:47.039-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Introduction'/><title type='text'></title><content type='html'>Greetings ultra-sequencers and bioinformatics geeks.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The community of next-generation sequencing and its supporting bioinformatics is developing very rapidly, but also fragmenting into many factions. It has become very difficult for anyone to keep track of what is going on in the many different technologies, software development projects, and the various commentators.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I am going to try to make this blog a catchall where people can keep up on news from the technology vendors, survey progress in all sequencing related software, and keep up to date on relevant journal publications.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I can't do it all myself, so everyone is welcome to send in notes on relevant material as soon as they notice it.  Contributors may be invited to become co-authors. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;—Cheers,&lt;/div&gt;&lt;div&gt;Stuart Brown&lt;/div&gt;&lt;div&gt;NYU Bioinformatics Core&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4457216402399127579-8306453839601052859?l=nextgenseq.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nextgenseq.blogspot.com/feeds/8306453839601052859/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4457216402399127579&amp;postID=8306453839601052859' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/8306453839601052859'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4457216402399127579/posts/default/8306453839601052859'/><link rel='alternate' type='text/html' href='http://nextgenseq.blogspot.com/2008/10/greetings-ultra-sequencers-and.html' title=''/><author><name>Stuart Brown</name><uri>http://www.blogger.com/profile/14602560263535951430</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_kxDtwcQ6hpg/SPvHPX70nBI/AAAAAAAAAAM/yhErXuQENQA/S220/browns02.jpg'/></author><thr:total>0</thr:total></entry></feed>
