Data and Software Release: Target Capture and SMRT Sequencing for Alzheimer’s Disease Genes Thursday, September 21, 2017
We’re pleased to release a new data set along with an allele phasing GitHub software workflow for those interested in exploring SMRT Sequencing data from an Alzheimer’s disease candidate gene study. Our team collaborated with Integrated DNA Technologies (IDT) to design a 35-gene panel targeting candidate Alzheimer’s disease genes identified as potential genetic risk loci across many GWAS and linkage studies. Long-read PacBio sequencing was applied to brain and skeletal tissue from two individuals diagnosed with Alzheimer’s disease and a wide range of variants were detected, from SNPs to indels, and larger structural variations up to several kilobases in size. Additionally, alleles were successfully phased which provides a more comprehensive understanding of the biological significance of the variants present in the samples. Here’s an example screenshot of a BIN1 gene phased into two phase blocks across a 62,641 bp region:
The samples were sequenced using the Sequel System (Sequel Chemistry 1.2) and analyzed with our newly updated Phasing Consensus Analysis for Targeted Sequencing Data GitHub repository. Data sets and related files are available on our PacBio DevNet. Captures of 7 kb genomic fragments for brain and skeletal muscle tissues were each sequenced on a single SMRT Cell, yielding roughly 8 GB of mappable data to the human reference genome.
For more about this data collection, don’t miss the upcoming webinar, “Characterizing Alzheimer’s disease candidate genes and transcripts with targeted, long-read, single-molecule sequencing” hosted by IDT on Wednesday, September 27th. We will be deep diving into the project and illustrate how coupling genomic and transcriptomic captures with xGen® Lockdown® probes enable informative results and insights beyond SNPs.
Register now to attend at 7:00 am PDT/10:00 am EDT / or at 11:00 am PDT/2:00 pm EDT.
SAN FRANCISCO (GenomeWeb) – Building off previous work, researchers from Pacific Biosciences and Northeastern University have demonstrated that combining nanopores with PacBio's sequencing system can help preferentially load long DNA molecules and enable lower amounts of input DNA.
The study, published today in Nature Nanotechnology, is one result from research supported by a three-year $825,000 grant from the National Human Genome Research Institute. In 2014, the group published the first results of that work, demonstrating that the hybrid nanopore/zero-mode waveguide (ZMW) configuration could be built and that DNA could be loaded onto the sequencing system through that configuration.
Now, the team has demonstrated that the devices are active, that DNA can be sequenced and read out, and importantly, that less input DNA is needed than typically required for PacBio sequencing.
"It's still a proof of concept," said Meni Wanunu, an associate professor of physics at Northeastern and senior author of the study. The next step is to demonstrate scale — constructing wafers that include a pore for every ZMW, as opposed to making each pore separately, he said. In addition, he said that his team plans to focus on developing direct RNA sequencing applications.
Jonas Korlach, PacBio's chief scientific officer, declined to discuss whether the firm plans to incorporate nanopore ZMWs (NZMWs) into its commercial platform. He said that the Nature Nanotechnology study "does not in any way reflect any projections about commercial implementations or features of the current or future SMRT sequencing platforms."
Wanunu said that the goal of the collaboration with PacBio was to figure out a way to decrease the amount of input DNA needed for the system and to preferentially load longer molecules of DNA.
One challenge with SMRT sequencing, he said, is loading longer DNA molecules into the ZMW, a 100-nanometer sized well. Currently, DNA templates must bind to the ZMW through biotin-streptavidin chemistry, and the molecules make their way to the ZMW through diffusion, a process that favors shorter fragments over longer ones. PacBio has improved the process over the years, but input DNA requirements are still above nanogram levels.
Wanunu's team has come up with an approach that uses NZMWs — essentially a nanopore at the bottom of every ZMW — and applies a voltage to drive DNA through the pore. Such a conformation enables picogram amounts of input DNA and also preferentially loads longer DNA fragments, Wanunu said.
As previously described, the researchers build waveguides on a silicon oxide membrane with nanopores that are 3 to 5 nanmoeters in diameter fabricated at the base of the waveguides using a transmission electron microscope.
The team then used an array of six NZMWs and studied how DNA fragments were captured into the NZMWs for fragment sizes ranging from 1 kilobase to 48.5 kilobases.
The team demonstrated that a 10-picogram DNA sample could be loaded in less than one minute. By comparison, the authors wrote, conventional magnetic bead loading of 10-kilobase SMRTbell samples requires 1.5 nanograms of input DNA and takes one hour.
Next, the team demonstrated that they could capture DNA bound to a polymerase to enable sequencing. In PacBio sequencing, the DNA template is bound to a DNA polymerase-streptavidin fusion protein, which then binds to the biotin groups on the waveguides. The team demonstrated that they could create a similar DNA polymerase-streptavidin complex at the base of the NZMW to capture DNA.
They then created a two-by-two NZMW array on a membrane that had around 100 ZMWs to compare sequencing of the NZMW with the standard ZMW. They first tested a 72-base circular DNA and DNA polymerase template. Template DNA and fluorescently labeled nucleotides were added.
The researchers then applied three 1-second-long voltage pulses, which captured and immobilized the DNA on the NZMWs. The polymerase was then activated, and sequencing began, as seen from the fluorescent bursts of the nucleotides being incorporated. The fluorescent bursts were only seen in the NZMWs and not the ZMWs, since no template DNA was loaded into the standard ZMWs in the short 3-second loading time. But all four NZMWs captured DNA during that time.
Finally, the researchers demonstrated sequencing of a known 20-kilobase SMRTbell sequencing construct. They used less than one nanogram of input DNA and applied a two-second voltage pulse to load the DNA onto the NZMW.
Because the researchers were not using PacBio's actual instrument, Wanunu said the team had to design its own bioinformatics to do base calling. The algorithm resulted in a 67 percent single-read accuracy and read lengths of around 1.6 kilobases.
Wanunu anticipated that performance would be better if the actual commercial instrument were used, as opposed to the Northeastern team's experimental design. "The system we're using is one we've replicated based on the design principals of PacBio," he said. "Our substrate has some inherent noise that's higher than PacBio's."
He said that the next step would be to continue to optimize the design — including scaling up the NZMW design so that each NZMW does not have to be fabricated individually. The team is currently working on a method that involves using a "porous substrate," he said, "as opposed to drilling pores into a substrate."
In addition, Wanunu's team is also collaborating with PacBio to develop techniques for direct RNA sequencing. Last year, the NHGRI awarded his lab $1.7 million to develop a version of the NZMW for direct RNA and DNA sequencing at picogram input levels.
Wanunu said that he planned to focus on direct RNA sequencing using his lab-developed NZMW set-up. "Once we get to the point where we can demonstrate high-quality sequencing of DNA and RNA, then we can think about integration" with PacBio's instrument. "We still have a ways to go," he said.
New Insights in Soil Ecology: SMRT Sequencing Powers Genome Assembly of Parthenogenetic Arthropod Wednesday, September 13, 2017
Photo by Andy Murray In a recent BMC Genomics paper, scientists in the Netherlands report a high-quality genome assembly for Folsomia candida, a soil-dwelling arthropod. The organism, which is known for reproducing parthenogenetically (and only when infected with Wolbachia), is frequently used in the lab for toxicity testing.
Lead author Anna Faddeeva-Vakhrusheva, senior author Dick Roelofs, and collaborators at Vrije Universiteit Amsterdam and other institutions describe their findings in “Coping with living in the soil: the genome of the parthenogenetic springtail Folsomia candida.” The team chose SMRT Sequencing to characterize the genome so they could learn more about the organism’s reproductive process and stress response.
F. candida has a diploid genome with seven pairs of chromosomes. The scientists generated a 221.7 Mb assembly with a contig N50 of 6.5 Mb. It is remarkably complete, with just 0.1% of all bases marked by gaps. Analysis revealed that repeat segments comprise more than 23% of the genome, and GC content was more than 37%. The team performed a number of quality-control and validation steps, concluding that assembly quality was excellent. The assembly also included the complete 15 kb F. candida mitochondrial genome.
The team was particularly interested in genome content acquired through horizontal gene transfer. A systematic analysis of all genes predicted by the assembly identified more than 800 acquired genes, most of which came from bacteria, fungi, and protists. The complement of horizontally transferred genes was impressive: “This number is among the highest found in metazoan genomes, being only exceeded in rotifers and some nematode species,” the scientists report.
Another highlight of the study came from a focus on F. candida’s endosymbiont Wolbachia. “Parthenogenesis is most likely imposed by Wolbachia,” the team writes. “The presence of Wolbachia is essential for reproduction: animals cured of Wolbachia by antibiotic treatment lay eggs that fail to hatch and develop.” The arthropod sequencing effort also yielded a complete assembly of the endosymbiont, which with its 1.8 Mb genome is the largest strain of Wolbachia ever discovered. Forty-eight genes were found to harbor ankyrin repeats, which are known for “mediating protein-protein and protein-DNA interactions with the host cells,” the scientists note.
Intriguingly, the team identified a functional antibiotic biosynthesis cluster, “suggesting the production of yet undiscovered antimicrobial compounds in an animal genome,” they conclude. “This high quality genome will be instrumental for evolutionary biologists investigating deep phylogenetic lineages among arthropods and will provide the basis for a more mechanistic understanding in soil ecology and ecotoxicology.”
Dont know if you guys check Pac's twitter but man they have been busy! Mostly others posting about their Sequel/RS results.. Almost hard to keep up with all the tweets. Don't remember it being that active in past years so much, can only be good!
In an attempt to yet again slam PACB, the clowns at fool.com inadventently extended the highest of compliments. For the first time, PACB is not referred as a niche market, or finishing method for genomic analysis. Now we are "competing directly with the mighty ILMN". I will drink to this compliment! Do you hear the footsteps ILMN? We coming!
The Motley Fool
The Motley Fool provides leading insight and analysis about stocks, helping investors stay informed.
CRISPR/Cas9 and SMRT Sequencing Yield New Phenotype Association for SCA10 Repeat Expansion Disorder Wednesday, September 6, 2017
A compelling new paper from scientists at the Parkinson’s Institute and Clinical Center, Houston Methodist Research Institute, and several other organizations demonstrates the importance of fully sequencing repeat expansion regions for a clearer understanding of the underlying biology of the diseases they cause. This publication also offers a look at how CRISPR/Cas9 capture can be used in combination with SMRT Sequencing to access the expanded repetitive region at a base level resolution without any PCR bias.
“Parkinson’s disease associated with pure ATXN10 repeat expansion” comes from lead authors Birgitt Schüle and Karen McFarland, senior author Tetsuo Ashizawa, and collaborators. The study involved a Mexican family with one individual previously diagnosed with Parkinson’s disease and several members with spinocerebellar ataxia.
Clinical genetic testing had found an ataxia-associated pentanucleotide repeat expansion in the patient with Parkinson’s, and this team hoped to learn more. “To further genetically characterize the ATXN10 repeat expansion and to better understand the phenotypic differences of progressive cerebellar ataxia with seizures and parkinsonism,” they write, “we employed several advanced and novel molecular genetic techniques to dissect the genetic structure of the repeat expansion in this family.”
Among those techniques was a new method that combined the sequence-specific endonuclease activity of the CRISPR/Cas9 system with long-read SMRT Sequencing. The team reports that they were able to use this method to snip out genomic ATXN10 repeat expansion regions, some spanning up to 7 kb in length, and sequence them “as one continuous fragment without prior amplification of the genomic DNA.” This was done for six family members, with results indicating that most affected family members had a string of 480 ATTCT repeats followed by about 920 ATTCC repeat interruptions. Strikingly, the family member with ataxia and parkinsonism had a different expansion: more than 1,300 ATTCT repeats but no ATTCC repeats. “We propose that the absence of repeat interruptions play a role in the underlying disease process acting as a genetic modifier and leading to the clinical presentation of L-Dopa responsive parkinsonism,” the scientists write, adding that the repeat interruptions may contribute to the development of epilepsy.
“Single molecule sequencing paired with SMRT/Cas9 capture approach allowed us to characterize the genetic composition of the complete repeat expansion which revealed a novel phenotype-genotype correlation for Parkinson’s disease and ATXN10,” the team adds, highlighting the importance of adding to existing knowledge of repeat expansion types and possible phenotypes. “We conclude that the underlying genetic architecture of ATXN10 repeat expansions is critical for presentation of clinical phenotypes and presumably also the underlying pathology.”
Photons (Pacbio) vs Electrons (Oxford Nanopure):: Just putting this for someone to expand here-
Quantitative analysis of single-molecule measurements based on photons, force, and electrons. Photonic measurements (left) are usually based on one or more fluorescent labels (either small molecules or fluorescent proteins) or a larger nanoparticle label. ...
Electrons have a number of features that make them useful probes at the very short length scales relevant for single-molecule measurements. Their small mass means they exhibit substantial quantum mechanical tunneling behavior, which allows measurement of distances up to a few nanometers with sub-Ångström sensitivity. Coupling tunneling with inelastic processes enables measurement of vibrational and other energy levels.38,39 The sub-Ångström wavelength of high-energy electrons means they are able to resolve atomic-scale features in diffraction experiments.(40) Since electrons are responsible for molecular bonding, measuring electronic conductance through a molecule can in some cases also probe single-molecule conformations.(41)
The excellent spatial resolution achieved in electron-based single-molecule measurements comes at a cost. Observed areas are typically quite small (often much less than 1 μm2). Thus, while measuring the behavior of a single molecule relative to its immediate environment is straightforward, relating it to micro- to macroscopic features can be more difficult. Electron-based measurements also place fairly stringent requirements on sample preparation: scanning tunneling microscopy generally requires samples no more than a few nanometers thick on conductive substrates, and transmission electron microscopy requires samples to be electron-transparent (usually less than 100 nm thick and composed of low-atomic-number materials).
Photons in the visible and near-visible ranges have much longer wavelengths and, consequently, are used in different ways to quantify single-molecule behavior. The longer length scales typically probed under photonic illumination make such measurements especially useful in quantifying the relationships between single molecules and micro- to macroscopic features in their environment, such as in biological samples. Measuring the behavior of a single molecule requires that the molecule display a unique optical signature to distinguish it from up to trillions of background molecules; almost universally this is achieved by covalently binding a fluorescent emitter or other optical tag to the molecule of interest.(42) Fluorophores can be chosen to be sensitive to pH, electric fields, ionic strength, and other factors, providing a probe of the target molecule’s immediate environment.(43) Fluorescence polarization measurements can be used to determine fluorophore orientation, which correlates with target molecule orientation.(44)
The diffraction limit would appear to restrict photonic measurements to features of hundreds of nanometers and larger. However, subdiffraction optical methods, such as stimulated emission depletion and selective photoactivation, are beginning to allow single fluorophores to be localized down to tens of nanometers, usually at a cost to measurement time and thus the ability to probe dynamics. Measurements of energy transfer efficiency between two fluorescent dyes or plasmonic probes can be used to measure the distance between the probes enabling measurements of the dynamics of protein conformation changes on millisecond time scales.
In addition to diffraction-based limitations on spatial resolution, fluorescence measurements are constrained by the need to add a label to the molecule of interest and by the fact that fluorescent dyes eventually bleach, losing their ability to fluoresce after 104−106 excitations. Nanoparticle probes are less sensitive to bleaching, but are often larger than the molecule being measured.
Medical-Grade Psychotropic May Be Feasible Thanks to New Genomics Study Tuesday, September 5, 2017
Image from Alan Rockefeller A recent paper in the journal Angewandte Chemie describes using SMRT Sequencing to characterize biosynthesis of a psychotropic product in Psilocybe carpophores, better known as magic mushrooms. Scientists from the Hans Knöll Institute in Germany report that the work could pave the way to synthetic production for pharmaceutical use.
“Enzymatic Synthesis of Psilocybin” comes from Janis Fricke, Felix Blei, and Dirk Hoffmeister. The team aimed to uncover the enzymatic mechanisms of biosynthesis for psilocybin, culminating in the characterization of four related enzymes: PsiD, PsiK, PsiM, and PsiH. “In a combined PsiD/PsiK/PsiM reaction, psilocybin was synthesized enzymatically in a step-economic route from 4-hydroxy-l-tryptophan,” the authors write.
Scientists used PacBio sequencing to analyze Psilocybe cyanescens, resulting in a 61.3 Mb assembly with just 217 contigs (meanwhile, a short-read assembly of a closely related mushroom for the same project required more than 2,900 contigs to represent just 41.3 Mb). After identifying the genes involved in producing psilocybin, the team validated the work by splicing them into E. coli and confirming the biosynthesis event.
Since its structure was first characterized in 1959, scientists have been seeking ways to synthesize psilocybin — but without success. As the study authors note, their new results finally “may lay the foundation for its biotechnological production.”
In an article from Chemical & Engineering News, the University of Minnesota’s Courtney Aldrich said the discovery will be important “for developing a fermentation process for production of this powerful psychedelic fungal drug.”
This is the quote that Mike Schatz used to summarize his most resent research which revealed the blind spots that are unseen by the ILMN sequencer. This finding should send a lot of previous work back to the drawing board and opens the door wider for PacBio in the clinical market. "Using @PacBio we find thousands of SVs that have never been seen before in cancer and use them to trace back how HER2 oncogene is amplified."
Also... keeping it real, ONT will need to compete with the projected 32x PACB technology. Not just where the Sequel is today.
MicroCap Small Cap Market Capitalization Med Startups J Charles Assets
We search for the smart new growing companies to invest in with stock deals to watch in pharmacuetical and digital verticals.
Sequel System Data Release: Iso-Seq Results for Hummingbird and Zebra Finch Brain Tissue Thursday, August 31, 2017
Anna’s hummingbird photo by Pat Durkin If you’re interested in avian vocal learning or want to explore a PacBio Iso-Seq data set generated with the Sequel System, we have good news. We’ve just released data from Iso-Seq interrogations of brain tissue from two avian models of vocal learning, Anna’s hummingbird (Calypte anna) and zebra finch (Taeniopygia guttata), sequenced in collaboration with the Erich Jarvis and Olivier Fedrigo labs at the Rockefeller University.
If you’re not familiar with the Iso-Seq method, it’s the long-read sequencing answer to short-read RNA-seq studies. By using SMRT Sequencing for a transcriptome project, scientists can generate full-length isoform data, clearly capturing alternative splicing events to see the real diversity of transcripts. Unlike RNA-seq approaches, the Iso-Seq method takes advantage of long-read data to fully span transcript isoforms from the 5’ end to their poly-A tails, eliminating the need for error-prone transcript reconstruction and inference processes. With the Sequel System, Iso-Seq projects are low cost and time efficient. Currently we recommend only 1-2 SMRT Cells per tissue type for genome annotation.
For this data set, we used the Iso-Seq method to characterize the transcriptomes of two birds, with brain total RNA. The two species’ brain samples were barcoded, pooled, and sequenced using 4 SMRT Cells on the Sequel System. An average of ~460,000 reads was generated per SMRT Cell; total sequencing data yields ranged from 6.1 to 7.7 Gb per SMRT Cell. More than 15,000 isoforms were identified in each species, including thousands that had not been previously annotated in each bird and 400 to 500 new genes.
The data set contains both the raw pooled sequences and the processed, demultiplexed sequence files, separated by species and excluding any raw sequences not containing barcodes. Our initial analysis of these data is presented in this poster (Vierra et al.), which is being presented this week at the Genome 10K and Genome Science Conference at the Earlham Institute. It demonstrates how improved loading on the Sequel System simplifies library prep and how both command-line and new SMRT Link tools can be used for analysis. It also illustrates how full-length transcript data can help identify additional exons and UTRs.
Enjoy the data!
If you use the data and our analyses in our publication before we complete our study, please cite:
Michelle N. Vierra, Sarah B. Kingan , Elizabeth Tseng , Ting Hon, William J. Rowell, Jacquelyn Mountcastle, Olivier Fedrigo, Erich D. Jarvis, Jonas Korlach. From RNA to Full-Length Transcripts: The PacBio Iso-Seq Method for Transcriptome Analysis and Genome Annotation. Genome10K and Genome Science Conference Abstracts 2017.
We are up 65% since last earning report.
Way.. Undervalued, period!
I think the answer is pretty simple. Up until the last CC shorts were able to play on the reasonable fear that the company might go bankrupt. Now it is reasonably certain that they have a 2 year plan to achieve profitability and the money to make it through that time period. The possibility of a big Chinese presence doesn't hurt either. As a result, the unreasonableness downward pressure should abate... barring unforeseen bad news, e.g. a big breakthrough by Oxford Nanopore.
F.D.A. Approves First Gene-Altering Leukemia Treatment, Costing $475,000
The Food and Drug Administration approved the first treatment that genetically alters a patient’s own cells to fight cancer. It will cost $475,000.
MinIon Shows Promise for De Novo Assembly by Including Ultra-Long Reads Apr 28, 2017 | Monica Heger Premium minion3.jpg
SAN FRANCISCO (GenomeWeb) – Two research groups have demonstrated the potential of Oxford Nanopore Technologies' MinIon device for whole-genome sequencing and de novo assembly of large genomes, using the human and the tomato genome as examples.
The two groups — a consortium of laboratories from the UK, the US and Canada as well as a group led by researchers at RWTH Aachen University in Germany — reported their work in publications on the BioRxiv server this month.
The consortium that published the de novo assembly of the human genome previously released a human genome dataset at an Oxford Nanopore-sponsored meeting last year. Since then, though, the group has adopted the newer R9.4 chemistry.
The team used 39 flow cells to generate 91.2 gigabases of sequence data, or around 30x coverage. In addition, Josh Quick, a doctoral researcher in Nick Loman's lab at the University of Birmingham, developed an ultra-long read protocol for the MinIon. Using that method, the group generated an additional 5x coverage of the genome, obtaining an N50 read length of 99.7 kilobases.
The researchers sequenced the genome of GM12878, a well-studied human cell line. Five labs participated in the sequencing, using the latest R9.4 chemistry and the 1D sequencing protocol. The initial sequencing generated more than 14 million reads with a read N50 of 10.6 kilobases. On average, they obtained 2.3 gigabases of sequence per flow cell.
Basecalling was performed by Metrichor and assembly using Canu. The initial assembly was composed of 2,886 contigs with an N50 contig size of 3 megabases. When the researchers aligned the contigs to the GRCh38 reference genome, they showed it had a consensus accuracy of 95.2 percent. Polishing with Illumina sequencing improved consensus accuracy to 99.88 percent.
In addition, they reported that the genome was comparable to previous GM12878 assemblies, identifying similar numbers of structural variations. However, the nanopore assembly did have higher numbers of deletions due primarily accuracy issues in homopolymer regions.
Next, the researchers added the ultra-long reads to the assembly. To generate those longer reads, they used Oxford Nanopore's Rapid Run kit, but saturated it with high molecular weight DNA. The ultra-long reads had a read N50 of 99.7 kilobases, with the longest read reaching 882 kilobases. Adding these reads increased the contig N50 to 6.4 megabases. In addition, it enabled the MHC region to be captured in a single contig.
Adam Phillippy, head of genome informatics at the National Human Genome Research Institute and an author of the study, said that it showed the potential of nanopore sequencing, in particular the long read protocol. The protocol yields molecules that are of "similar length as optical mapping [results]," he said, "but also gives base information, not just tag locations, that can let you get very continuous assemblies." Phillippy has worked with numerous sequencing technologies and recently collaborated with the NHGRI and the US Department of Agriculture team that assembled the goat genome de novo, using a combination of Pacific Biosciences technology, Hi-C sequencing, and Bionano Genomics' optical mapping technology.
The MinIon is "still limited by throughput," Phillippy said, but if it were possible to generate 30x coverage of the genome using the ultra-long read protocol, "you'd be able to have these really contiguous assemblies. In addition, he noted that the accuracy still lags behind that of PacBio. Corrected PacBio reads typically have greater than 99 percent accuracy, versus 92 percent for the nanopore reads in the BioRxiv paper, the authors wrote. Assembly with PacBio also still has a larger contig N50. A group that assembled a de novo Korean reference genome achieved a contig N50 of 17.9 megabases using only PacBio sequence data, for instance.
"PacBio currently gives longer contigs and has higher accuracy, but there's a lot of headroom for growth on the nanopore side," Phillippy said. In particular, the "prospect of continuity is great with the ultra-long reads" on the MinIon, he added.
Phillippy noted that one particular challenge for the MinIon is homopolymers. As demonstrated in the BioRxiv study, the basecallers struggle calling homopolymers longer than five bases in length. The researchers tested three different basecallers — Metrichor, Nanonet, and Scrappie — on a subset of reads that mapped to chromosome 20. Scrappie, a newer basecaller, does much better at calling homopolymers, Phillippy noted, which was an encouraging sign.
In a second BioRxiv paper, researchers from RWTH Aachen University demonstrated they could use the MinIon to assemble the tomato genome de novo. The genome is smaller than the human genome, at just over 1 gigabase. Similar to the group generating the human genome assembly, the researchers found that their assembly was "structurally highly similar to that of the reference" but that it had a "high error rate caused mostly by deletions in homopolymers." After polishing with Illumina data, they reduced that error rate and had a gene completeness of 96.53 percent, which "slightly surpassed" that of the reference genome.
The researchers sequenced the genome using 31 MinIon flow cells, generating around 111 gigabases of data that passed filter, representing around 100x coverage of the genome. The group had a wide range of output per flow cell, varying between 1.1 gigabases and 7.3 gigabases. Average read length also varied significantly, between 6.4 kilobases and 14.9 kilobases.
The team tested three assemblers — Canu, Miniasm, and SMARTdenovo. Miniasm was able to generate the longest N50 and required the least amount of compute time, however it resulted in the highest error rate. In addition, when the researchers tested the functional completeness of the assemblies, they used a tool called BUSCO, which looks for conserved genes. BUSCO estimated a gene completeness score of .21 percent, 26.46 percent, and 26.74 percent for Miniasm, Canu, and SMARTdenovo, respectively. Thus, moving forward, the researchers used the Canu assembler to pre-correct the original reads and then assembled the resulting data using SMARTdenovo. That generated an assembly consisting of 899 contigs with a contig N50 of 2.45 megabases. They then used Illumina sequencing for polishing and ultimately boosted the BUSCO gene completeness score to 96.53 percent.
Phillippy noted that one interesting aspect of basecalling on the MinIon is that the base accuracy of the final assembly seems to vary depending on the type of organism being sequenced. For instance, he said, the final accuracy for the human assembly was slightly lower than what he's achieved for microbial genomes. He said this could be due to the types of data that the basecallers are trained on. For instance, he believes that Oxford Nanopore primarily trained their basecallers on Escherichia coli data. Since basecallers use machine learning, they tend to perform a bit better when analyzing data types that are similar to the training sets. For instance, he said, different types of organisms tend to have different epigenetic profiles, so if a basecaller is used to mostly seeing unmethylated DNA, and then a highly methylated genome is sequenced, it could result in errors at those regions.
In general, though, he said the MinIon performs equally well on microbial, plant, and human genomes, and the basecalling differences seem to be small. Other variables can also impact basecalling accuracy, including DNA extraction, sample prep, and the overall quality of the DNA itself. However, he said, it would be interesting to look further into the idea of training basecallers on DNA from a diverse range of organisms. Filed Under Sequencing Informatics nanopore sequencing NGS Oxford Nanopore We recommend Researchers Use Oxford MinIon, Illumina Sequence Data to Characterize Antibiotic Resistance Island GenomeWeb, 2014 Fungal Genome Assembled With Only Nanopore-Generated Reads GenomeWeb, 2016 PacBio Demos First De Novo Animal Genome as it Plans Longer Reads, Increased Throughput GenomeWeb, 2014 Team Demonstrates De Novo Assembly of Bacterial Genome from Nanopore Data Alone GenomeWeb, 2015 MinIon Early-Access Users Demonstrate De Novo and Hybrid Assemblies, SNP Calling, Outbreak Analysis GenomeWeb, 2014 MinIon Early-Access Users Evaluate Platform's Performance, Explore Various Applications GenomeWeb, 2015