There were a couple requests for me to elaborate more on PACB's very high error rate. I've already referenced in a previous post the Genomeweb article that reaches a similar conclusion that PACB's error rate on a per read basis is about 80% (compared to 99% for competitors).
To solidify the argument, I thought that I would walk everyone through some key points from their Science paper (volume 323: pp 133-138). Anyone can download this paper free of charge from Science's website (you do have to register though), so I encourage everyone to check what I'm saying if there is skepticism.
The most important thing for everyone to understand is the difference between per read accuracy and consensus accuracy. Per read accuracy is how accurate the sequencer is from just a single read (in PACB's case a single strand). By averaging many reads together you can dramatically decrease the error rate, so called consensus accuracy. As a baseline, ILMN's per read accuracy is > 99%, LIFE's SOLiD is even higher. You can average as many reads as you want together to drive up accuracy, but of course at the heavy expense of cost and throughput.
The key pages to understand PACB is pp. 136-137. Let me paste some quotes from their paper:
p. 136 "Of the 158 total bases in the alignment, 131 were correctly identified by the automated base caller. The 27 errors consisted of 12 deletions, eight insertions, and seven mismatches." -> This implies an 83% error rate. Note that they have a hard time b/c there are so many deletions and insertions. The polymerase moves fast and in a stochastic fashion; this is hard to catch!
p. 136 "From these data, the deletion rate is estimated to be 7.8%" -> Deletions alone are a huge problem! No other technology has this problem.
Please look carefully at Figure 4a on p. 137. This gives an excellent representation of how difficult this problem is to crack. Imagine trying to deconvolute that tracing consistently into solid base calls. This is very difficult! Moreover, they are almost certainly showing a good tracing for publication purposes, not an average or hard one.
Both in the abstract and on page 138, they note that by reading the same DNA 15 times, they can get to 99.3% accuracy. This is what fools people who don't know this field. This is consensus accuracy, not per read. ILMN and SOLiD are already there with a single read.
Single molecule physics is hard, just ask people from Helicos. PACB's technology, while glitzy, lacks what scientists really want: throughput and accuracy.
Caveat emptor and best wishes to all.
Well, I need to admit when I am wrong and someone else is right. Their read rate is indeed around 85%, see message by proxyprofile (assuming, of course, that this is not the same person). However there must be something about the longer read lengths that allows them to offset the higher error rate. I don't think that they have meaningfully higher parallelism to get the same consensus rate without a throughput penalty. However, their read lengths are without a doubt multiples of that of ILMN. This is not terribly surprising since they do not have a dephasing problem. Perhaps their application is more a niche --- slower denovo reads that then get built upon with ILMN's supposedly higher throughput.
Their consensus read error rate is what is important. Individual read errors do affect throughput because of the need to increase coverage but at that point its speaks to cost and time, not capability.
I think the original postings critical of the read error rate where posed as company killing issues which they apparently are not. The original authors seemed to be trying to say the company was not scientifically viable. However, this is a general characteristic of post 2nd generation techniques such as used by GNOM, hybrid 2nd and 3rd generation, and PACB, being pretty much pure 3rd generation.
Let's be realistic here. If the gold standard is 99%+ and their error rate was at 80% they would not be able to place any machines in the field even "for free". A "free" machine is not free to the recipient. There are installation, personnel, and opportunity costs to deal with. And if you end up not liking the product there are de-install and personnel redeployment costs. Leading genome centers would not deal with the diversion if there was no hope.
I have to chuckle when some knock PACB for peddling unproven technology and then turn around and speculate about Ion Torrent. Not saying whether PACB is good or bad, just that the discussion here seems to be based on at least 2-year old data (publication in early 2009 means that the data is at best of mid-2008 vintage).
You mostly right here. There is a huge cost to trying out of these machines. I remember in 2008, HLCS touting their use at places like the Broad. But their box just sat there and nobody did anything serious with it. I'd check your assumption that major genome centers are doing work on PACB. You can go visit a couple and see for yourself.
I'd be careful if I were you about assuming everybody has stale information. I've posted on this board publications from this summer from the company where they reaffirm their high error rate and low throughput. I'd be even more careful assuming that one can redesign boxes and reagents in < 1-2 years. These are very intense practices that require incredible resources. If you compare the Science data to what the company has been publishing and saying this year, you'll find almost perfect agreement.
Stocks don't normally fall as hard as PACB has without people getting current info!
As I told proxy, you can either be a data-driven investor or a "buy and hope" investor. I choose the former.
Here is my take. I don't think the choice is between PACB and Pepsi. PACB is ultra high risk, Pepsi is "safe" -- but there is a lot in between! ILMN, GPRO, LIFE, CALP, TMO are but a few of many names that have much more moderate risk that investors can scrutinize.
Your argument seems to be along the lines of "we don't know if the company has beat their published specs." True, absolutely. But we don't know that they haven't. Keep digging on the institutional docs I mentioned and you'll find some more datapoints of interest. So my point is that this is a very speculative name predicated not on anything published but more on the thought of "I sure hope that management figures out how to quickly improve their box."
To remind everyone, all the current data (including the company's own very recent publication) point to fairly lousy current specs:
Low throughput, high error rate technology competing with much more mature technologies from larger well capitalized companies is not a recipe that I personally like.
My own investing philosophy is less around "buy and hope" but to scrutinize currently available hard facts and make a call. If we go with current data, no one with expertise in the field is impressed. Also, it's hard to escape the conclusion that PACB is being priced today for perfection.
I see lots and lots of downside here if one or more of the following happen:
1. They deliver only slightly above their currently published specs
2. Any further delays hit their RS system (looking already to be 3-6 months late)
3. The incumbents continue to innovate
4. Some other new entrant launches a differentiated technology (such as Oxford Nanopore)
5. Ion Torrent captures the mass market.
6. Analysts start covering this and apply customary industry metrics and issue low price targets.
7. The shorts see PACB falling and they pile on after smelling blood.
8. Secondary offerings seem inevitable given PACB's burn.
9. The Street does it's normal thing of "hating uncertainty" and shares fall.
A very recent paper from PacBio published in Nucleic Acids Research describes a new method they call the SMRTbell technology, which allows reading of a single DNA molecule multiple times to derive a consensus sequence. I think this will greatly mitigate your concern over the intrinsic error rate issue.
Hopefully you already know this after reading this paper, but this paper confirms what I've been saying about error rate.
The PACB error rate is so lousy that they are trying to read the same strand again and again by circularizing their template and sending the polymerase in a repetitive loop.
This SMRTbell strategy has the effect of:
1) Decreasing throughput
2) Increasing sample prep burden
3) Shortening the readlength (in the paper, they are looking at a ~ 300 bp insert, but mention that a stretch goal is 1 kb)
As I've been saying from the beginning, this high error rate is a *major* handicap to PACB that investors aren't paying attention to. ILMN, Life, and Roche will likely continue to dominate, leaving PACB with some scraps.
Thanks for the researched post ... lets hope the discussion remains a high caliber one for this forum.
Their error rate could be wildly different from Jan 2008. Almost 3 years is a geological era in this space. Does anyone know of recently published figures ?
Even without accounting for improvement on this front, the effect of an increased error rate on a per base read basis is diminished as read length increases. Lots of variables are at work concentrating on one alone does not tell us if the technology will work c.f. http://genomebiology.com/2009/10/3/R32.
Having a long contiguous read relative to the other players does give PACB an advantage in the computational portion of sequencing. And PACB are ahead on this score. The compute effort to process reads to perform alignment, and assembly is probably a lot less I imagine relative to other players.
At the moment these companies are scoring themselves on materials and bio side of their assays but as they move from technologies to commercial companies they are going to have to start to account for the true cost of sequencing and analysis, the whole enchilada.
PACB and its new, and older rivals are wildly different businesses trying to compare them on science alone obscures the core issue which is can they make money with their chosen business models. The science just moves the cost needle thats all.
Good hunting to all ...
The paper is from Jan 2009, not 2008, so not so long ago. Also, they presented the box at AGBT of this year so factoring in some manufacture and design time implies that the specs of the paper are quite close to what they actually manufactured.
Regarding even more current stats, those who were at ASHG last week can speak to this. PACB is saying there are 9 instruments in the field, with current customers reporting read lengths of about 300 bps and very high error rates. It appears that they've struggled to get the longer read lengths they've hoped for. Error rate still dogs them. They appear to be struggling over enzyme photobleaching so there is still some work to be done.
In my view, this read length (less than Roche 454) with very high level of error puts them in no man's land. I struggle to see who would buy this and for what app. If one just wants coarse genomic information, why not do a regular DNA microarray using AFFX's Gene Titan for less cost? If one wants quality data and throughput, ILMN and LIFE are outstanding choices for much less capex. And given the market size, it's already been penetrated by larger, better capitalized companies.
Best wishes to all.
I have also read this paper, but it is important to note that this time of inaccuracy becomes a problem if you are using the machine to analyze small read lengths, as the cost would add up eventually. However, to eventually read entire genomes quickly, which is were this technology is heading, than it is less of a problem. So if scientist are interested in reading small sections of DNA than it would seem that ILMN and LIFE have better tech, but realistically PACB is moving forward with a tech that seems better suited for entire genome analysis. Just my opinion of course.