Lab Notebook

Carl Boettiger

Entries

Notes From The Week

08 Mar 2013

pageviews: ('not calculated')

(Working from Galveston this week while at Louise’s conference, looking after little one. some progress below.)

Monday

Alan Skype meeting

Comment piece / reply
ews-review
Comments on TE review?
IARPA Conference, funding options?

Tasks

On Marc’s suggestion, writing comparison of pattern-based and mechanism based approaches to modeling, particularly in decision theory context. draft in nonparametric-bayes
Review edits and writing on ews-review

Tuesday

Kathy Skype

Decision theory in early warning signals?
NCEAS / SESYNC working group?
Long term lake monitoring data, including some sedimentary analysis (see Wang et. al. (2012)).
Send Kathy TREE articles: Polasky et. al. (2011) and Fischer et. al. (2009).

Tasks

Writing comparison of pattern-based and mechanism based approaches to modeling, particularly in decision theory context
Review edits and writing on ews-review

Wednesday

more of the same.

Friday

Finished and submitted ews-review.
See commit log and issues tracker for details.

Notebook: Semantics,

stemming-based search seems to perform poorly – i.e. cannot find phrases, weight terms in titles, etc, just most frequently occuring terms in a post. Trying lunr, loads full text into memory but should do so only on selecting search bar? See issue #7
Modified google-analytics plugin to make single call to the API see issue #66
Tuning RDFa, primarily on vita. #22 #63. Tuned FOAF referencing to common nodes. Note that rel always applys to links while property applies to the content of the tag (e.g. anchor text, span context). Use rel in span and div elements, etc, to apply to multiple links.

References

Joern Fischer, Garry D. Peterson, Toby A. Gardner, Line J. Gordon, Ioan Fazey, Thomas Elmqvist, Adam Felton, Carl Folke, Stephen Dovers, (2009) Integrating Resilience Thinking And Optimisation For Conservation. Trends in Ecology & Evolution 24 10.1016/j.tree.2009.03.020
Stephen Polasky, Stephen R. Carpenter, Carl Folke, Bonnie Keeler, (2011) Decision-Making Under Great Uncertainty: Environmental Management in an Era of Global Change. Trends in Ecology & Evolution 26 10.1016/j.tree.2011.04.007
Rong Wang, John A. Dearing, Peter G. Langdon, Enlou Zhang, Xiangdong Yang, Vasilis Dakos, Marten Scheffer, (2012) Flickering Gives Early Warning Signals of A Critical Transition to A Eutrophic Lake State. Nature 492 10.1038/nature11655

Read more

Delayed Release Archives

02 Mar 2013

pageviews: 5

Today I dug out a handful (37) posts from the past two years that were published as #delayed-release content (and have all been tagged as such to indicate they were not public at their publication date). Many of these were connected to the #warning-signals project but involved results that I or Alan weren’t ready to provide ahead of publication, as I discussed in a post when I first began marking some posts for delayed release, challenges with collaboration in open science These posts were created while on the Wordpress platform and had been marked private, so that they would be invisible to public browsing. Wordpress makes this easy to do, as I discuss in the earlier post. When I switched to a Jekyll platform, these posts all received the yaml header published: false and were also excluded from the Github repository until I could revisit and decide which to release. Managing these posts for release in the Jekyll framework just requires a little command-line fu, grep -l 'published: false' *.md grabs the relevant posts for inspection and release.

The notebook is currently back to an ONS All-content, immediate, as I have been able to okay the practice with all current major collaborations. (Sure, there’s always stuff that doesn’t make it into the notebook or is not appropriate for the notebook – the most common case of the latter being peer reviews I write for various journals. Though I make a practice of signing my reviews, I believe they still represent private communication unless the authors, editors and journals were to agree to publishing them.) All of these posts would have been released sooner but for finding a minute to remember to do so – perhaps a cautionary example for the delayed release practice.

Looking back on these posts, I probably would not have delayed most of them today, as I have gotten more comfortable not only with the open notebook, but with discussing the practice with collaborators. My own feelings of not wanting to push my adviser in a discussion of the notebook was as much responsible for these delays as other factors that I discuss in the earlier post. One key thing I wondered about in writing delayed-release posts was whether I would (perhaps subconsciously) write in a different or more candid style than I do for my regular notes. I was glad not to notice any glaring examples (though perhaps a diligent or very bored reader might find some!) of expletives and insulting commentary in the delayed posts. I do notice the overall tone and volume of the textual content changes slightly. However, that trend is actually much more dominant in comparing my very early posts (mostly from the OWW days), which were quite discursive, to the current posts which tend to be much more terse.

So have a look at the #delayed-release posts and let me know what you think!

Read more

Notes

28 Feb 2013

pageviews: 14

Meeting with Marc

selection of problem: consider simpler models only.

step through complexity hierarchy at the beginning

classical approaches are to fix relationship of stock recruitment.
parameter uncertainty
Gaussian processes for model uncertainty

The Discussion section can deal with model uncertainty model choice approaches.

A method where we don’t have to assume the stock recuritment relationship
How well we can do with that.
How much data we need.

discussion of mathematical presentation, clarity

Misc

Finished review. I should stop signing reviews, then I could write more careless quick impressions.
discussed potential of a GSOC proposal for an R NEXML parser
requested to consider presenting at ISI Conference session on early warning.

some reading on semantics

David Shotton and Tanya Gray have release a CiTO tool for adding semantic data to many major publication platforms, taking a clever approach to annotation that I’ve never considered – croudsourcing to readers (davidshotton, 2013). The tool isn’t perfect but is very intuitive. The clever part is that all annotations the reader adds are collated in a database and cand be retrieved through their API. Not sure if many readers will make use of this, but time will tell. Meanwhile authors or publishers might pay attention.

In that vein, David has also written a call of Ten Next Steps for publishers, (davidshotton, 2013b). Some may be redundant, but certainly shows how publishers could go a lot further. If only there was more incentive for publishers to innovate like this instead of crafting clever and questionable strategies to boost impact factors.

Reading

Very interesting paper on implementing/teaching nonlinear model inference to ecologists out today (Bolker et. al. 2013), and the code and data for each of the examples they discuss. Lessons in the mansucript may be familiar to many, but should still be taught more widely in ecology. Should be useful for teaching this stuff in graduate courses too.

Benjamin M. Bolker, Beth Gardner, Mark Maunder, Casper W. Berg, Mollie Brooks, Liza Comita, Elizabeth Crone, Sarah Cubaynes, Trevor Davies, Perry de Valpine, Jessica Ford, Olivier Gimenez, Marc Kery, Eun Jung Kim, Cleridy Lennert-Cody, Arni Magnusson, Steve Martell, John Nash, Anders Nielsen, Jim Regetz, Hans Skaug, Elise Zipkin, (2013) Strategies For Fitting Nonlinear Ecological Models in R, ad Model Builder, And Bugs. Methods in Ecology And Evolution 10.1111/2041-210X.12044
davidshotton, (2013) CiTO Reference Annotation Tools. Semantic Publishing semanticpublishing.wordpress.com/2013/02/26/cito-tools/
davidshotton, (2013) Ten next steps for semantic authors and publishers. Semantic Publishing semanticpublishing.wordpress.com/2013/02/26/ten-next-steps/

Read more

Notes

27 Feb 2013

pageviews: 13

Reviewing

arg, reviewing. so much time.

Seminar

Kate Richerson gave a fanstastic presentation to mega group yesterday showing her integration of behavior and dispersal in krill dynamics. She derives the krill’s behavioral movement patterns through a stochastic dynamic programming solution rather than just dictate it to the dispersal model directly.

Reading

Provicative stuff in PNAS Perretti et. al. (2013). It would be nice to see the comparison against the generating model. The forecasting errors of the fully Bayesian inference using the correct model shouldn’t necessary be small, but they should be just as big as the Bayesian model expects them to be (e.g. over replicates, the Bayesian forecast error reflects the widths of the posteriors (modulo some statement about priors)).

My worry with any approach doing better than the expected error is that it does so by generating biased estimation – e.g. something in the forecasting method, rather than in the data, happens to be biased towards the ‘right answer’. In this example I don’t think that is the case.

We already know that the modes of the posteriors don’t recover the correct model parameters of the generating model from an MCMC of such a complex model with limited data (though perhaps it is good to remind folks of this!), so it is not surprising that a heuristic approach can outperform this in cases.

It certainly is good to wrestle with these examples in any case. Steve’s one of my post-doc advisors now so no doubt I’m biased. Noam also pointed out this earlier article in Ecology Perretti et. al. (2012).

In seminar, Marc mentioned the great paper by Wiedenmann et. al. (2011) that uses the comparison of the optimal foraging strategy evolved by whales in their evolutionary environment to how that strategy performs today (similar spirit to the kinds of comparisons I am interested in with #multiple-uncertainty project.

Reading an old Schaffer paper Schaffer (1984).

Though Nature says the National Academy of Sciences needs an overhaul to stay relevant. Probably true but they still do some excellent reports that could deserve more eyes, like this on model fitting (). We could do more VVUQ in ecology and evolution, and the value of presenting benchmark examples of this in simple (trivial) cases cannot be overstated, e.g. Fig 5.2.1.
GP methods in Biometrika Banerjee et. al. (2012)
Luminaries of evolution forecast the future of the field, with an eye towards data and semantics in Losos et. al. (2013)

Misc

wrote a quick CSL format for knitcitations. (Generate inline citations using the copy function in Mendeley library. A bit silly since normally that would already generate the entire bibliographic citation.)
Should really make a CSL format for Mendeley to paste in a HTML citation (with full metadata in tooltip). Might then use an existing format to copy-paste bib info at end.

bibliography()

A. Banerjee, D. B. Dunson, S. T. Tokdar, (2012) Efficient Gaussian Process Regression For Large Datasets. Biometrika 100 10.1093/biomet/ass068
Jonathan B. Losos, Stevan J. Arnold, Gill Bejerano, E. D. Brodie, David Hibbett, Hopi E. Hoekstra, David P. Mindell, Antónia Monteiro, Craig Moritz, H. Allen Orr, Dmitri A. Petrov, Susanne S. Renner, Robert E. Ricklefs, Pamela S. Soltis, Thomas L. Turner, (2013) Evolutionary Biology For The 21st Century. Plos Biology 11 10.1371/journal.pbio.1001466
Charles Thomas Perretti, George Sugihara, Stephan B. Munch, (2012) Nonparametric Forecasting Outperforms Parametric Methods For A Simulated Multi-Species System. Ecology 10.1890/12-0904.1
C. T. Perretti, S. B. Munch, G. Sugihara, (2013) Model-Free Forecasting Outperforms The Correct Mechanistic Model For Simulated And Experimental Data. Proceedings of The National Academy of Sciences 10.1073/pnas.1216076110
William M. Schaffer, (1984) Stretching And Folding in Lynx Fur Returns: Evidence For A Strange Attractor in Nature?. The American Naturalist 124 10.1086/284318
John Wiedenmann, Katherine A. Cresswell, Jeremy Goldbogen, Jean Potvin, Marc Mangel, (2011) Exploring The Effects of Reductions in Krill Biomass in The Southern Ocean on Blue Whales Using A State-Dependent Foraging Model. Ecological Modelling 222 10.1016/j.ecolmodel.2011.07.013
Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. www.nap.edu/catalog.php?record_id=13395

Read more

Semantic Citations For The Notebook And Knitr

22 Feb 2013

pageviews: 117

I have on ocassion been exploring the use of semantic markup in the notebook. In this post I illustrate how I am handling semantic citations. One of the more intriguing ideas is the ability to add semantic meaning to citations through the CITO ontology of Shotton (2010). Citation counts form a central part of academic discourse, but contain very little information regarding the reason for the citation. Most notably, ‘negative’ citations refuting a claim carry just the same weight as those confirming or relying upon a claim. Given the scale and expansion of academic literature, it is rarely reasonable to explore this citation graph manually. CITO provides a language for embedding the meaning of the citation, such as “discusses”, “refutes”, or “usesMethodIn”, to the citation. (For instance, my earlier citation to Shotton identifies itself as “usesMethodIn”, as I will explain).

The main barrier to this approach is a lack of adoption. One of the primary concerns is the burden it places on authors of adding the extra data. On one hand, authors already bother formatting and reformatting layout, spelling, and reference order to the arcane specifications of different journals, which suggests authors can be persuaded to do some pretty tedious tasks if the publishers would require it. After all, the task of adding citations is already much easier than it was in the days of paper journals. Still, it is much simpler to remove a tedious requirement than to add a new one. My hope is that intelligent tools can simplify this process, as they already have with other aspects of managing citations, and encourage the use of CITO. In this spirit, I have recently started trying to consistently use the CITO ontology in my notebook entries as a test case, using some tools of my own design.

Semantics in knitcitations

Several months ago I created the R package knitcitations to provide a citation platform for knitr dynamic documents, which provide executable code and automatic inclusion of results inside plain-text (markdown) descriptions. I write most of my research scripts and many of my notebook entries in this manner. The package can generate citations by DOI, circumventing the need for maintaining bibtex or similar database of citation information, using commands such as

citet("10.1186/2041-1480-1-S1-S6")

Extending the package to support CITO was rather straight forward. Using the latest version of knitcitations, one can generate in-line citations with CITO semantics simply by passing the reason for the citation as well, such as

citet("10.1186/2041-1480-1-S1-S6", cito="usesMethodIn")

which generates the following HTML:

<a class='dx.doi.org/10.1186/2041-1480-1-S1-S6' property='purl.org/spar/cito/usesMethodIn' >Shotton (2010)</a>

This provides a convient platform to generate semantic citations in this lab notebook. As before, knitcitations will also generate a complete reference list at the end of the document by calling the bibliography function at the end.

Semantic overkill?

It is possible to add far more semantic data to this reference list at the end of an article. Invisible semantic markup can identify to a machine what value corresponds to the volume number or issue number, or journal name, e,g, using the BIBO ontology. I have added support for ths kind of markup to knitcitations as well, and several of my posts provide examples. The raw markup looks like this:

<div prefix="dc: purl.org/dc/terms/,
                      bibo: purl.org/ontology/bibo/,
                      foaf: xmlns.com/foaf/spec/,
                      biro: purl.org/spar/biro/"
        rel="purl.org/spar/biro/ReferenceList"> <ul class='bibliography'> 
<li> <span property="dc:title">Fisheries: Does Catch Reflect Abundance?.</span> <span property="dc:creator"> <span property="foaf:givenName">Daniel</span> <span property="foaf:familyName">Pauly</span>, </span><span property="dc:creator"> <span property="foaf:givenName">Ray</span> <span property="foaf:familyName">Hilborn</span>, </span><span property="dc:creator"> <span property="foaf:givenName">Trevor A.</span> <span property="foaf:familyName">Branch</span>, </span>  (<span property="dc:date">2013</span>)  <span rel="purl.org/dc/terms/isPartOf" 
                            resource="[purl.org/dc/terms/journal]">
                        <span property="purl.org/dc/terms/title"
                                content=" Nature ">
                        </span>
                          <span property="bibo:shortTitle"> Nature </span>
               </span>  <span property="bibo:volume">494</span>    <a property="bibo:doi" class="dx.doi.org/10.1038/494303a">10.1038/494303a</a> </li>
<li> <span property="dc:title">Net Gains.</span> <span property="dc:creator"> <span property="foaf:givenName">unknown</span> <span property="foaf:familyName">unknown</span>, </span>  (<span property="dc:date">2013</span>)  <span rel="purl.org/dc/terms/isPartOf" 
                            resource="[purl.org/dc/terms/journal]">
                        <span property="purl.org/dc/terms/title"
                                content=" Nature ">
                        </span>
                          <span property="bibo:shortTitle"> Nature </span>
               </span>  <span property="bibo:volume">494</span>    <a property="bibo:doi" class="dx.doi.org/10.1038/494282a">10.1038/494282a</a> </li>
 </ul>
</div>

However, I have since decided that such markup is largely overkill. The DOI uniquely identifies the publication already, and allows us to programmatically retrieve the rest of the data (title, authors, journal, etc) from semantically identified XML by querying against services such as CrossRef. This is the essential concept of linked data, by which both source and referer are enriched.

Moreover, DOIs follows a specific construction that lets us reliably identify them in plain text using regular expressions, making any futher semantics to declare that we are citing the article mostly irrelevant. This is convient for identifying all citations appearing in the notebook without any markup. The CITO example above has the advantage of providing a link and associating the DOI with the reason for the citation, by virtue of being inside the same html anchor element.

Replacing the reference list?

If we are not going to semantically mark up the reference list, we could consider abolishing the reference list all together. After all, as a tool for the digital reader the concept is rather vestigal – I hate losing my place by scrolling to the end of an article just to see to what reference number 7 refers. With the method shown thus far, the reader can open the link to access this information, but that still interrupts the flow of reading. The digitally native solution is a mouse-over or tooltip effect that displays this information, as many professional publishers already use in their HTML versions.

Once again, this is straight forward to support using the knitcitations package, at least for sites that include the popular bootstrap javascript libraries, such as this notebook. I have added an option to the in-text citation functions to provide such tooltips in a span element, such that calling the command

<span class='showtooltip' title='Shotton D (2010). "Cito, The Citation Typing Ontology." _Journal of
Biomedical Semantics_, *1*. ISSN 2041-1480, <URL:
dx.doi.org/10.1186/2041-1480-1-S1-S6>.'><a class='dx.doi.org/10.1186/2041-1480-1-S1-S6' property='purl.org/spar/cito/usesMethodIn' >Shotton (2010)</a></span>

This behavior can be toggled on by calling

cite_options(tooltip=TRUE)

after loading the knitcitations library. EDIT: Note that this requires the javascript trigger on the class showtooltip, which can be done by adding this to your header:

    <script type="text/javascript">
      $(document).ready(function (){
        $(".showtooltip").tooltip();
      });
    </script>

Citing without DOIs

Not all the literature we may wish to cite includes DOIs, such as arXiv preprints, Wikipedia pages, or other academic blogs. Even when a DOI is present it is not always trivial to locate. With version 0.4-0, knitcitations can produce citations given any URL using the

Lab Notebook

Coding

Discussing

Reading

Entries

Notes From The Week

Monday

Alan Skype meeting

Tasks

Tuesday

Kathy Skype

Tasks

Wednesday

Friday

Notebook: Semantics,

References

Delayed Release Archives

Notes

Meeting with Marc

Misc

some reading on semantics

Reading

Notes

Reviewing

Seminar

Reading

Misc

Semantic Citations For The Notebook And Knitr

Semantics in knitcitations

Semantic overkill?

Replacing the reference list?

Citing without DOIs