Miseq 2x250 – Does Length Really Matter?
Posted by Justin Johnson on Nov 16, 2012At EdgeBio, we are always on the lookout for improving efficiencies, reducing costs, and providing an overall higher quality service to our clients. With this comes an immense amount of R&D and New Technology Development (NTD). A great example of this is a series of five technical white papers we recently released on targeted resequencing applications, including exomes. You can find them here. Today I will share some of our experience with longer reads (2 x 250 bp) on the Illumina MiSeq instrument. In theory, you can get more data from a single prep, with a minimal expense in run time, but with some added costs. So, is it worth it?
I think the answer is “it depends.” For de novo assembly, longer reads provide slight improvements, but only after trimming a substantial number of bases. For mapping based applications, longer reads do not appear to provide significant improvements. We ran both a 2x150 and 2x250 E. coli DH10B MiSeq run and analyzed the data through several mapping and de novo pipelines. Below is a summary of the results, with minimal interpretation.
QC of Reads
2x150
2x250
One can see a marked drop in quality after bases 175-200 in the 250bp run. The 150bp runs look similar to many other HiSeq/MiSeq runs. Average quality distribution of all bases is a bit higher in the 150bp run (36) compared to 250bp run (34), but overall quality is good in both runs.
Mapping/Alignment Results
Sample |
Reads (M) |
Aligned |
% Aligned |
Pairs |
Aligned |
% Aligned |
---|---|---|---|---|---|---|
Ecoli-150-BaseSpace |
26.020 |
24.848 |
95.50% |
13.010 |
12.690 |
97.54% |
Ecoli-150-EdgeBio |
26.020 |
25.727 |
98.87% |
13.010 |
12.726 |
97.82% |
Ecoli-250-BaseSpace |
18.928 |
18.147 |
95.88% |
9.464 |
8.593 |
90.80% |
Ecoli-250-EdgeBio |
18.928 |
18.507 |
97.78% |
9.464 |
8.768 |
92.65% |
The read throughput of the 250bp run was lower, but with the longer reads, the total base yield is about the same or bit higher. The number of Q30 bases is about the same. The difference in throughput could be attributed to normal variance between runs, or could mean that we still haven’t nailed the magic number for cluster density yet for 250bp runs. We continue to optimize the 250bp run workflow as we begin to offer 2x250bp runs to our customers. Mapping of individual reads is similar between the two read lengths, but pair mapping is down by about 5% in the longer reads.
If you are sequencing a targeted panel, such as an exome, I don’t imagine longer reads provide much additional insight, but for RNASeq, gene fusion detection and other applications in need of spanning repeats (such as De novo assembly), longer reads could be helpful.
De novo AssemblyResults (CLC Bio)
Sample |
N50 (Kb) |
Max (Kb) |
Mean (Kb) |
Count |
---|---|---|---|---|
Ecoli-150-CLC (UT) |
95.3 |
326.3 |
35.7 |
126 |
Ecoli-150-CLC (T) |
107.6 |
326.3 |
40.5 |
111 |
Ecoli-150-CLC (UT-SS) |
95.3 |
326.3 |
35.4 |
127 |
Ecoli-250-CLC (UT) |
96.9 |
326.3 |
7.9 |
585 |
Ecoli-250-CLC (T) |
107.6 |
326.3 |
34.9 |
129 |
Ecoli-250-CLC (UT-SS) |
97.1 |
326.3 |
41.6 |
108 |
UT = Untrimmed
T = Trimmed
SS = Sub Sampled to 30X
In my opinion the interpretation of these results is rather straight forward. By trimming aggressively for De novo applications there is a small gain in N50 (2Kb) and a slight reduction in contig number. Therefore, one would have to decide for themselves whether or not to incur the extra cost in run time and reagents for the bump in statistics.
Summary
Overall, given that we utilize the MiSeq predominantly to support clients doing targeted human resequencing, we will continue to recommend the use of the 150bp kits. The trajectories of both the MiSeq and the Ion Torrent are exciting advancements in the field, but one must remain grounded in reality. Some, but not all, of the hype surrounding the scalability of these machines comes from extending read lengths. Stay tuned next week as we review the MiSeq cancer panel, and make comparison to our previous analysis of the Ion AmpliSeq Cancer panels.
- Justin
The data for this experiment can be downloaded from GenomeSpace:
https://dm.genomespace.org/datamanager/v1.0/file/Home/Public/EdgeBio/MiSeq
Tags
Categories
- BioinformaticsBusinessGeneral InformationConferencesNext Gen SequencingProject Design & Management
Archives
- February 2013 (1)
- January 2013 (1)
- December 2012 (1)
- November 2012 (7)
- October 2012 (3)
- September 2012 (1)
- August 2012 (3)
- June 2012 (2)
- May 2012 (2)
- April 2012 (6)
- March 2012 (3)
- February 2012 (4)
- January 2012 (4)
- December 2011 (2)
- November 2011 (3)
- October 2011 (3)
- September 2011 (2)
- August 2011 (1)
- June 2011 (4)
- May 2011 (1)
- November 2010 (2)
- October 2010 (1)
- September 2010 (3)
- August 2010 (2)