Please review our new paper: Sequencing mRNA from cryo-sliced Drosophila embryos to determine genome-wide spatial patterns of gene expression
It’s no secret to people who read this blog that I hate the way scientific publishing works today. Most of my efforts in this domain have focused on removing barriers to the access and reuse of published papers. But there are other things that are broken with the way scientists communicate with each other, and chief amongst them is pre-publication peer review. I’ve written about this before, and won’t rehash the arguments here, save to say that I think we should publish first, and then review. But one could argue that I haven’t really practiced what I preach, as all of my lab’s papers have gone through peer review before they were published.
No more. From now on we are going to post all of our papers online when we feel they’re ready to share – before they go to a journal. We’ll then solicit comments from our colleagues and use them to improve the work prior to formal publication. Physicists and mathematicians have been doing this for decades, as have an increasing number of biologists. It’s time for this to become standard practice.
Some ground rules. I will not filter comments except to remove obvious spam. You are welcome to post comments under your name or under a pseudonym – I will not reveal anyone’s identity – but I urge you to use your real name as I think we should have fully open peer review in science.
OK. Now for the paper, which is posted on arxiv and can be linked to, cited there. We also have a copy here, in case you’re having trouble with figures on arXiv.
Peter A. Combs and Michael B. Eisen (2013). Sequencing mRNA from cryo-sliced Drosophila embryos to determine genome-wide spatial patterns of gene expression.
Several years ago a postdoc in my lab, Susan Lott (now at UC Davis) developed methods to sequence the RNA’s from single Drosophila embryos. She was interested in looking at expression differences between males and females in early embryogenesis, and published a beautiful paper on that topic.
Although we were initially worried that we wouldn’t be albe to get enough RNA from single embryos to get reliable sequencing results, it turns out we got more than enough. Each embryo yielded around 100ng of total RNA, and we would end up loading only ~10% of the sample onto the sequencer. So it occurred to us that maybe we could work with material from pieces of individual embryos and thereby get spatial expression information on a genomic scale in a single quick experiment – an alternative to highly informative, but slow imaging-based methods.
I recruited a new biophysics student, Peter Combs, to work on slicing embryos with a microtome along the anterior-posterior axis and sequencing each of the sections to identify genes with patterned expression along the A-P axis. In typical PI fashion, I figured this would take a few weeks, but it ended up taking over a year to get right.
The major challenge was that, while a tenth of an embyro contains more than enough RNA to analyze by mRNA-seq, it turned out to be very difficult to shepherd that RNA successfully from a single cryosection to the sequencer. Peter was routinely failing to recover RNA and make libraries from these samples using methods that worked great for whole embryos. While there are various protocols out there claiming to analyze RNA from single cells, we were reluctant to use these amplification-based strategies.
The typical way people deal with loss of small quantities of nucleic acids during experimental manipulation is to add carrier RNA or DNA – something like tRNA or salmon sperm DNA. We didn’t want to do that, since we would just end up with tons of useless sequencing reads. So we came up with a different strategy – adding embryos from distantly related Drosophila species to each slice at an early stage in the process. This brought the total amount of RNA in each sample well amove the threshold where our purification and library preparation worked robustly, and we could easily separate the D. melanogaster RNA we were interested in for this experiment from that of the “carrier” embryo. But we could avoid wasting sequencing reads by turning the carrier RNAs into an experiment of their own – in this case looking at expression variation between species.
With this trick, the method now works great, and the paper is really just a description of the method and a demonstration that accurate expression patterns can be recovered from individual cryosectioned embryos. The resolution here is not that great – we used 6 slices of ~60um each per embryo. But we’ve started to make smaller sections, and a back of the envelope calculation suggests we can, with available sample handling and sequencing techniques, make up to 100 slices per embryo. This would be more than enough to see stripes and other subtle patterns missed in the current dataset.
Our immediate near term goals are to do a developmental time course, compare patterns in male and female embryos, look at other species and examine embryos from strains carrying various patterning defects. For those of you going to the fly meeting in DC in April, Peter’s talk will, I hope, have some of this new data.
Anyway, we would love comments on either the method or the manuscript.
13 Comments
Excellent idea, with a convincing proof of principle. Can’t wait to see the follow-up paper looking at divergence in gene expression across these species.
I have a suggestion to improve the stability and reuse of the code related to the paper. The code is currently hosted at Peter’s personal github account, which is a unstable solution for long-term maintenance of published code, as I detail here: caseybergman.wordpress.com/2012/11/08/on-the-preservation-of-published-bioinformatics-code-on-github/ I would suggest creating a github “organization” for the Eisen Lab where you are the admin and you are the only person who has privileges to delete a repository, and host the repository for this paper from that account. This will ensure the the code is managed for the long term and cannot be accidentally deleted.
A few minor things:
– you might want to cite the “Sequencing the Connectome” paper (www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001411) but emphasize that this is only a thought experiment.
– the following URL doesn’t resolve: eisenlab.org/sliceseq
– references don’t have volume or page numbers
Few questions:
1. How well do your replicates slices correlate? Is the relatively small number of DE genes because of high dispersions among replicates?
2. Did you try to select embryos that are roughly the same sizes (350um)? Or do they not vary enough to warrant this?
3. How did you translate the imaging data into absolute expression values if you only had 95 genes to work with? How did you determine the percentage of the total embryo library that is represented in these 95 genes?
Great idea and a very interesting paper.
I have a suggestion. Spatial expression patterns have been mostly characterized by in situ hybridization of transcripts. This technique permorms poorly to distiguish between alternative transcripts. Your approach could be very powerful to detect such differences (as deep sequencing is great to do that). Is it possible to detect which isoform is being expressed in your sets?
It may not make much sense right now, but as you get more time points and different species it may be possible to detect differences in mRNA processing.
Is it necessary to proceed to formal publication after this process? You will have accomplished the same goal of review and dissemination in a transparent fashion. I only ask because many institutional libraries see this (publishing on the library’s arXiv-like service followed by open peer-review) as a viable alternative, not simply a better order to publication. Off-hand it seems the main benefit of publishing formally (in this example) is layout and design support, but you could use the $1 to $3k to accomplish that more efficiently I am sure.
I think this is a fantastic idea and I liked the paper. I have a couple of questions about the method for the assignment of reads after pooling RNA. Clearly, pooling RNA from a useful source rather than using tRNA is a great idea because you get more useful data, but I think it would be worthwhile (and probably not too difficult) to test the accuracy of your method.
It would be interesting to first take mRNASeq data from a D. melanogaster embryo and map it to the combined fly reference genomes to measure the species misclassification rate for melanogaster mRNASeq. Then, if you have mRNASeq data from D. persimilis, D. willistoni, or D. mojavensis embryos, you could map these reads to the combined reference genomes to find the misclassification rates for the non-melanogaster species. If there is no available mRNASeq data for embryos from these species, you could use simulated mRNASeq data. Next, to measure false negative rates, you could test how frequently reads derived from one species are called ambiguous by mapping non-pooled D. melanogaster embryo mRNASeq reads to the combined reference genomes. Reads that were mapped when they were aligned to only the melaogaster genome but placed in the ambiguous category when aligned to the combined reference would be the relevant reads in this case. I wonder if some well-conserved genes may have many ambiguous reads.
The most interesting experiment to try would be combining two datasets where you know the source of the mRNASeq data (e.g. non-pooled mRNASeq from D. melanogaster and D. persimilis). Based on the alignments of these reads, you could measure the species misclassification rates as well as false negative rates (due to reads that were initially mapped to one genome being placed in the ambiguous category). Obviously the more interesting questions here relate to the high-resolution spatial dynamics of gene expression in Drosophila embryos, but I still think that more closely examining your method would be valuable.
@Damian:
1. Generally pretty well, but not perfectly. Supplemental Figure 3 has the individual embryo data plotted individually. I’ll have to think about what the best metric is for this replicability, possibly a correlation for each equivalent slice.
2. This strain doesn’t seem to vary enough to warrant explicit selection, but I’ve definitely seen embryos that give more or fewer slices. Going forward, I think finer slices will obviate the need to do really careful selection, even if it does make explicit matching of the slices one-to-one somewhat harder.
3. For the absolute expression values, I used the whole-embryo mRNA-seq data from Lott et al 2011. I assumed that the in situ intensity was proportional to the contribution to total FPKM. For example, if we have a gene with average FPKM of 50, then you would divide up that 50 FPKM according to the total amount of in situ intensity in each of the virtual slices. There is the tacit assumption that the total amount of RNA across all slices is the same, and we’ve been thinking about how best to do a spike-in that would actually test that, given that even an extra nanogram of total RNA in one sample or another could lead to pretty large variations in the amount we have.
@Antonio
We’ve looked at differential isoform usage, and couldn’t find any slam-dunk cases. I tried both Cufflinks and DEXSeq, and while they each flagged some isoforms as differentially used (as I recall, not the same genes), when we actually looked at the read pileups, it’s not clear that they weren’t due to sequencing artifacts. There may be something in there, but I can’t find it.
Very cool paper! The following crazy idea occurred to me as a way to discriminate the other two axes of the embryo. You could lay each slice in a consistent orientation onto a substrate that has had deposited onto it a two-dimensional grid of distinct indexing oligos, and then do an in situ ligation reaction to ligate the indexing oligos to the embryonic mRNA. Once the ligation is done, then you sequence the entire slice’s mRNA together, but each mRNA molecule is associated with its position in the grid by virtue of being ligated to the indexing oligo corresponding to that grid position.
There would be a lot of technicalities to work out: Would you do RNA ligation? Would you fragment the mRNA before ligation, maybe by treating the slice with alkaline solution? How would you keep the mRNA from diffusing around the slice once you thaw it? Maybe reversibly crosslinking the RNA to protein in the slice and then later uncrosslinking it after ligation?
Anyway, very nice paper, and something to think about as a way to access the other two axes of the embryo.
Thanks for the feedback everyone! I’m traveling with my family, so it will take me a few days to post responses.
Congratulations, very nice work and forward-thinking philosophy as well.
I too am interested in your method of assigning a position to the slices by reference to the BDTN database and am curious as to its ability to account for differences in length between embryos. The fixation procedure of course introduces a large variation in this quantity, but it seems like your method has a lot of potential to give an unbiased genome-wide accounting of the number of transcripts whose expression varies according to relative position along the A-P axis as opposed to those which are positioned some absolute distance from either pole. I imagine there would be significant implications for understanding the origin of things like allometric scaling.
From the manuscript, it sounds like you used the FPKM values calculated by Cufflinks in each library or section for differential expression analyses. You can get a different FPKM from altering the expression level of the gene, or the levels of other genes. FPKM also ignores the amount of starting material when calculating differential expression. Did you use any sort of normalization across libraries? Assuming that an equivalent amount of control RNA from the distant species was used per slice, you should be able to use the carrier RNA (or selected genes over a range of expression levels) as a spike-in control.
With a few million reads per library (or ~10M reads when pooled), you can get some splicing and mRNA processing information (at least for higher expressed genes). Did you notice differences in 3’UTR usage or splicing across sections? For example, maybe a gene overall doesn’t display localized expression while certain isoforms of it do?
Michael,
We have created a page on PubPeer.com where you could continue this discussion of your results. Pubpeer was specifically designed for the discussion of published data (whether on blogs or in journals). Continuing the discussion there would allow the comments and replies to appear in a more organized and parsable format and it would be alongside discussions of similar articles.
Here is the link to this article:
www.pubpeer.com/publications/AF86609927B0JHK654SGDSDDS8603BD67B2B4CAEBDC
@SF
After some looking into it, the species mis-classification rate is pretty low (less than 2%), but given the relatively small amount of melanogaster RNA compared to the carrier, this could potentially have an effect (I haven’t yet looked into which genes those reads are falling into). Being conservative about the paired ends—requiring both to map unambiguously—can cut this down by an order of magnitude, though. I’ll re-run this in the next few days and see if it makes a difference. Given that there’s already a good match to the in situ data, I’d be surprised if it changes any of our conclusions.
@Jeff
That’s certainly interesting, but given the consistency of our embryo sizes, and the thickness of our slices in this data set, it’s not really possible to distinguish relative vs absolute positioning. The finer slices we’re getting now, and looking at different species could both help with that, and we’re hard at work getting a better data set up and running.
@Jason
It’s not clear what the right normalization to do is. Looking at a set of housekeeping genes would be doable, assuming we really trust whatever set that is; I haven’t found one that would make me sleep well at night. We also can’t use the carrier, since we added similar (but not identical even to our measurement accuracy) amounts to each slice, and we also added it too late in the extraction to buffer against subtly different handling affecting the total yield. This is definitely something that we’re following up on, though!
Hi Mike,
I am full of admiration of your move in publishing your manuscript. A slightly different question. Given what you have been saying about the problems of scientific publishing, would you also consider submitting these new manuscripts only to the open access journals including PLOS ONE, PeerJ or similar ones that are not selecting articles by perceived importance? If an accomplished scientist like you would do this, that would help the science field to wise up and may change the bad behavior of using the glamorous publications to evaluate the research quality.
11 Trackbacks
[…] This is cross posted from Mike’s blog. […]
[…] is great, Please review our new paper: Sequencing mRNA from cryo-sliced Drosophila embryos to determine genome…: It’s no secret to people who read this blog that I hate the way scientific publishing works […]
[…] www.michaeleisen.org/blog/?p=1304 Share this:TwitterFacebookLike this:Like Loading… […]
[…] is great. Michael Eisen puts his unpublished paper online & invites peer review comments from… […]
[…] Please review our new paper: Sequencing mRNA from cryo-sliced Drosophila embryos to determine genome…[Via it is NOT junk] […]
[…] 2013/02/19: MEisen: Please review our new paper: Sequencing mRNA from cryo-sliced Drosophila embryos… […]
[…] Readers are encouraged to make comments on the paper here: www.michaeleisen.org/blog/?p=1304 […]
[…] I wrote about for our last paper, I hate the way scientific publishing works today, especially the insane delays (average is about […]
[…] Please review our new paper: Sequencing mRNA from cryo-sliced Drosophila embryos to determine genome… (michaeleisen.org) […]