Miseq 2x250 – Does Length Really Matter?

ContactSite MapMy AccountLogin Cart (0)

-->

Miseq 2x250 – Does Length Really Matter?

Posted by Justin Johnson on Nov 16, 2012

At EdgeBio, we are always on the lookout for improving efficiencies, reducing costs, and providing an overall higher quality service to our clients. With this comes an immense amount of R&D and New Technology Development (NTD). A great example of this is a series of five technical white papers we recently released on targeted resequencing applications, including exomes. You can find them here. Today I will share some of our experience with longer reads (2 x 250 bp) on the Illumina MiSeq instrument. In theory, you can get more data from a single prep, with a minimal expense in run time, but with some added costs. So, is it worth it?

I think the answer is “it depends.” For de novo assembly, longer reads provide slight improvements, but only after trimming a substantial number of bases. For mapping based applications, longer reads do not appear to provide significant improvements. We ran both a 2x150 and 2x250 E. coli DH10B MiSeq run and analyzed the data through several mapping and de novo pipelines. Below is a summary of the results, with minimal interpretation.

QC of Reads

2x150

2x250

One can see a marked drop in quality after bases 175-200 in the 250bp run. The 150bp runs look similar to many other HiSeq/MiSeq runs. Average quality distribution of all bases is a bit higher in the 150bp run (36) compared to 250bp run (34), but overall quality is good in both runs.

Mapping/Alignment Results

Sample	Reads (M)	Aligned	% Aligned	Pairs	Aligned	% Aligned
Ecoli-150-BaseSpace	26.020	24.848	95.50%	13.010	12.690	97.54%
Ecoli-150-EdgeBio	26.020	25.727	98.87%	13.010	12.726	97.82%
Ecoli-250-BaseSpace	18.928	18.147	95.88%	9.464	8.593	90.80%
Ecoli-250-EdgeBio	18.928	18.507	97.78%	9.464	8.768	92.65%

The read throughput of the 250bp run was lower, but with the longer reads, the total base yield is about the same or bit higher. The number of Q30 bases is about the same. The difference in throughput could be attributed to normal variance between runs, or could mean that we still haven’t nailed the magic number for cluster density yet for 250bp runs. We continue to optimize the 250bp run workflow as we begin to offer 2x250bp runs to our customers. Mapping of individual reads is similar between the two read lengths, but pair mapping is down by about 5% in the longer reads.

If you are sequencing a targeted panel, such as an exome, I don’t imagine longer reads provide much additional insight, but for RNASeq, gene fusion detection and other applications in need of spanning repeats (such as De novo assembly), longer reads could be helpful.

De novo AssemblyResults (CLC Bio)

Sample	N50 (Kb)	Max (Kb)	Mean (Kb)	Count
Ecoli-150-CLC (UT)	95.3	326.3	35.7	126
Ecoli-150-CLC (T)	107.6	326.3	40.5	111
Ecoli-150-CLC (UT-SS)	95.3	326.3	35.4	127
Ecoli-250-CLC (UT)	96.9	326.3	7.9	585
Ecoli-250-CLC (T)	107.6	326.3	34.9	129
Ecoli-250-CLC (UT-SS)	97.1	326.3	41.6	108

UT = Untrimmed

T = Trimmed

SS = Sub Sampled to 30X

In my opinion the interpretation of these results is rather straight forward. By trimming aggressively for De novo applications there is a small gain in N50 (2Kb) and a slight reduction in contig number. Therefore, one would have to decide for themselves whether or not to incur the extra cost in run time and reagents for the bump in statistics.

Summary

Overall, given that we utilize the MiSeq predominantly to support clients doing targeted human resequencing, we will continue to recommend the use of the 150bp kits. The trajectories of both the MiSeq and the Ion Torrent are exciting advancements in the field, but one must remain grounded in reality. Some, but not all, of the hype surrounding the scalability of these machines comes from extending read lengths. Stay tuned next week as we review the MiSeq cancer panel, and make comparison to our previous analysis of the Ion AmpliSeq Cancer panels.

- Justin

The data for this experiment can be downloaded from GenomeSpace:

https://dm.genomespace.org/datamanager/v1.0/file/Home/Public/EdgeBio/MiSeq

Connect with Us

Services

Sequencing
Platforms
Clinical Partner Program
Sample Submission

Products

Accessories
Sequencing Clean-Up
PCR Purification
Sample Preparation
Competent Cells

Support

FAQs
MSDS
Literature References
Protocols
Troubleshooting
COA Request

About

Our Team
Events
Careers
Int'l Distributors
Terms and Conditions
News

Blog

Bioinformatics
Business
General Information
Next Gen Sequencing
Proj. Design & Mgmt.

Edge BioSystems
201 Perry Parkway, Suite 5
Gaithersburg, MD 20877
Tel: 1-800-326-2685
Fax: 1-301-990-0881

Directions
Terms & Conditions

CLIA #21D2039005
MD State License 1853

ContactSite MapMy AccountLogin