Many new papers and C2 XL chemistry. Company projected to double the throughput in Q2, and lots of interest from researchers and sequencing centers. Best long read sequencing platform, producing almost finished quality genome.
Sentiment: Strong Buy
Pacbio demonstrated that errors can be corrected with CCS (circular consensus sequence) reads, and longs reads can be very useful for denovo assembly with repeat rich genomes. Long reads are also critical to identify large rearrangements, methylation study and other applications. Pacbio is gaining market shared and it's already eating 454, LifeTech and Illumina pie.
Sentiment: Strong Buy
Entering the era of bacterial epigenomics with single molecule real time DNA sequencing (PacBio) Curr. Op. Microbiol
DNA modifications, such as methylation guide numerous critical biological processes, yet epigenetic information has not routinely been collected as part of DNA sequence analyses. Recently, the development of single molecule real time (SMRT) DNA sequencing has enabled detection of modified nucleotides (e.g. 6mA, 4mC, 5mC) in parallel with acquisition of primary sequence data, based on analysis of the kinetics of DNA synthesis reactions. In bacteria, genome-wide mapping of methylated and unmethylated loci is now feasible. This technological advance sets the stage for comprehensive, mechanistic assessment of the effects of bacterial DNA methyltransferases (MTases) — which are ubiquitous, extremely diverse, and largely uncharacterized — on gene expression, chromosome structure, chromosome replication, and other fundamental biological processes. SMRT sequencing also enables detection of damaged DNA and has the potential to uncover novel DNA modifications.
Even a good Twitter mention from across the Pond...
"@PacBio For a system with long read lengths well suited for microbes especially for outbreak tracking and ID of microbial resistance genes"
Previous complete post has been deleted, but here are some relevant passages from Dr. Brian Krueger, PhD's blog "Labspaces" - Google it - :
"A coming of age for PacBio and long read sequencing? #AGBT13
...I think everyone in attendance today was overwhelmed by a stunning talk from PacBio and the dramatic advancements of their long read technology.
...Jonas Korlach from Pacific Biosciences took the stage to give a talk on the progress of assembling genomes using their long read sequencing technology. If you’ve been in the sequencing game for any amount of time you’ve definitely heard of PacBio and their early promise of accurate, realtime, single molecule, long read sequencing technology and their subsequent absolute failure in rapidly delivering those promises. PacBio has spent the last 2 years trying to dig itself out of a hole by aggressively working with early adopters to fix their original lapses in quality. In an earlier talk in the week (Slides!), Michael Schatz, another plant geneticist and PacBio early adopter declared, “You don’t have to worry about the errors anymore.” This is in part because of major improvements in PacBio chemistry and the introduction of a new more accurate polymerase. PacBio has consistently doubled its average read length over the last two years and has made gains on the error correction front. Last year, Schatz published a paper in Nature Biotechnology which showed that highly accurate illumina or 454 short reads could be used to error correct PacBio long reads to generate the most accurate long read data available today. Korlach followed up on this short read correction scheme by showing that the illumina/454 step can now be eliminated completely and researchers can use the short reads generated during the PacBio run to error correct the much longer reads in a process called Hierarchical Genome Assembly (HGAP), with a base accuracy of 99.99965%. That’s a far cry from the early real world production numbers of 85%! Korlach then supported these tech numbers with an astounding amount of real world data. Granted, all of the de novo sequencing presented was done in pathogenic organisms or bacteria with small circular genomes, but no one can dispute how impressive the data looked during the presentation.
The big question now is: has PacBio finally weathered the storm and can it overcome its previous reputation as a failed sequencing company? Time will really only tell. It’s hard to predict winners and losers in this space, especially in light of the coming sequestration and dwindling research budgets. PacBio may be peaking at the wrong time. With the threat of Moleculo long read technology on the horizon, major sequencing labs may hold out on purchasing PacBio RS systems. Why invest $700,000 in a sequencer if you can get “good enough” long reads off of your HiSeqs?"
Another good read... If you invest in PACB, I have to assume that you're not afraid of a little reading!
A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery
Tuesday, February 26, 2013
Post AGBT: A Longish Item on Long Sequencing
As others have noted, a significant theme at AGBT this year was sequencing at length. While this year lacked true bombshells, PacBio impressed many with their making single-contig bacterial genome assemblies look easy. Moleculo had been the object of much pre-meeting excitement, and while very few additional details emerged about their process, several talks showed what could be done. As I have discussed previously, Nabsys demonstrated their “positional sequencing” system to select invitees in a hotel suite. Optical mapping from OpGen and BioNano Genomics featured in a few posters, but did not attract much attention. Oxford Nanopore had no physical presence, beyond a somewhat secretive suite, but several ONT staffers were happy to reiterate their confidence that they will launch their system – when it is good and ready.
In the end, three things will drive adoption of these technologies and the extent to which each one succeeds, which I will explore in detail below. First, there are the applications; there are different strengths and weaknesses to each, and some systems will be ill-suited for some (or completely unusable, with a lack of commercial availability being the ultimate in unusability). Second, there are cost considerations, though none of the presenters seemed to even touch on this, leaving pundits such as myself to do back-of-envelope estimates (some of which threatening to pop eyeballs). Finally, there are just preferences. For example, as noted elsewhere, Moleculo will be attractive to shops already heavily invested in Illumina, particularly if they are averse to shipping some work elsewhere.
For application space, most examples given at AGBT were either genome assembly (including gap filling and other improvements), structural variant discovery and haplotyping. One poster showed the use of PacBio for cDNA sequencing, which will certainly be a boon to cataloging splice variants. Metagenomics applications came up in Q&A, but I don’t believe any talks or posters actually showed this.
As far as matching technologies, it’s useful first to explore who is just plain absent and who is a pretender to the throne. Ion Torrent simply has ignored this area, and their AGBT presentation was no different. Rothberg’s penary talk, which apparently was a near copy of the one he gave at the earlier Ion sequencing symposium at an adjacent hotel , was big on “Moore’s Law” and enjoying the spotlight (and also inducing many eye rolls), with lots of projections of the capacity improvements coming on Proton (PGM users were only referenced in terms of number of runs; nothing was promised here) and discussion of amplicon, capture and RNA-Seq applications, but no mention of long range information. The pretender to the throne is clearly Roche/454. They presented one nice talk in the bioinformatics session describing valuable work closing gaps in a human cell line sequence, but cost was completely ignored. No surprise: with 454 running north of $10K per gigabase, a 10X genome would be upwards of a quarter million dollars. PacBio’s per gigabase cost is at worst half that – so their 10X human genome was only about $100K (apologies again for posting a much higher number on Twitter previously). The contrast is that 454 seems stuck with incremental improvements in modal length and no significant changes in density, whereas a doubling of PacBio throughput should be rolled out this spring and perhaps another 2X squeezed out of the RS platform over the rest of the year.
Also noticeably absent was any serious mention of BGI, Complete Genomics or Complete Genomics’ Long Fragment Read technology (LFR), covered previously. In their Nature Paper, Complete and collaborators demonstrated much longer haplotyping than is possible with Moleculo, though the underlying approaches are similar. If BGI wants to be part of this new push for long range information, as they seem to have suggested, they need to get the merger distraction out of the way and start making it clear whether they will roll out LFR as a service (likely) or as a kit. Nor were the cool "library-on-an-Illumina flowcell" approaches to long range information in evidence at AGBT, but that remains an interesting approach as well.
Moving to applications, for de novo assembly, my bias would be towards PacBio. Because Moleculo performs de novo assembly, albeit on individual fragments, it can run into problems with long direct repeats and also with any extreme base bias regions which the underlying Illumina technology chokes on. PacBio had a poster demonstrating reading through a very long VNTR in a mucin gene. PacBio might have problems getting the exact number of bases correct on a simple repeat array, but should be able to give relatively tight bounds. In contrast, if Moleculo must deal with a repeat array longer than the fragment size, only some guesstimation based on read depth is going to yield the number of repeats.
In their AGBT presentation, PacBio made snapping bacterial genomes to a single contig, well, a snap. I’m in the process of testing that for myself, but if this is the case PacBio is likely to become the standard approach to high quality bacterial genomes. Illumina will still be valuable for surveying large numbers of genomes at much lower cost, but for high resolution PacBio could rule. However, Illumina makes some strong claims around their new Nextera mate pair kits in this space, and so there may be three grades of genomes: highly fragmented Illumina paired end, good but not single contig Nextera mate pair versions of those and finally PacBio. If there is much cost differential, then some investigators will settle for that middle ground, which may be useful for most studies.
For other classes of ugly sequence, the two technologies are probably so close that only a very carefully designed head-to-head would flag a clear winner. For example, such nasty regions as mammalian MHC showed up in talks, which are characterized by lots of repeats but not necessarily long simple repeat arrays.
On the other hand, for haplotyping I suspect Moleculo will be more popular than PacBio. First, if Illumina is to be believed there will be a sizable cost difference, with Moleculo on a human genome perhaps adding around $10K per genome to a project. Illumina stated in their presentation that a substantial amount of haplotype information could be obtained using low coverage Moleculo, so that may be popular in studies with lots of samples. As noted above, Moleculo may also simply be popular for those heavily invested in Illumina.
For large genomes, it appears that there will still be challenges. That will remain the area of opportunity for mapping companies such as OpGen, BioNano Genomics and soon Nabsys. But as the long read sequencing approaches improve, they will be continually chewing upwards into the mapping companies' space. It's a long way until all the dust settles, which means it will be an interesting space to watch for quite a while into the future.
Posted by Keith Robison at 11:27 PM