• Home
  • Vita
  • Research
  • Teaching
  • Community
  • Lab Notebook

Lab Notebook

(Introduction)

Coding

  • cboettig commented on issue slashdotdash/jekyll-lunr-js-search#1: Thanks! That definitely fixed the ruby end, I can verify that search.json is being created correctly. The site now builds successfully ..., but I d… 12:58 2013/03/09
  • cboettig pushed to site-search at cboettig/labnotebook: search creates search.html but javascript search not working yet #7 updated site-index branch with master (provides google-analytics) config turn off o 12:55 2013/03/09
  • cboettig closed issue cboettig/ews-review#3: Finalize edits before sending for internal review 12:37 2013/03/09
  • cboettig pushed to master at cboettig/ews-review: we're gonna submit the .tex file so I may as well include it 12:36 2013/03/09
  • cboettig closed issue cboettig/ews-review#4: Venn Diagram figure should have numbers corresponding to the references that examine each area 12:28 2013/03/09

Discussing

  • Ooh, @github now builds automatic 'resumes' : e.g. me: t.co/4wEM41nuNg or @ropensci: t.co/ARmDqbiVEt ht / @hmason:

    05:21 2013/03/08
  • .@SpringerEcology your author submission tool has been "building pdf" of my TeX file for the past 15 minutes... #fail

    05:12 2013/03/08
  • Caffeine Boosts Bees' Memories / in @ScienceMagazine t.co/uSCLuBSUmI

    04:42 2013/03/08
  • @adaptive_plant @phylorich Sounds cool, glad it was useful! link to the blog?

    12:14 2013/03/08
  • From @NSF: US-China Collaborative Sustainable Software Research t.co/Qp6ltJ5vga

    12:03 2013/03/08

Reading

  • Status and solutions for the world's unassessed fisheries.: Science (New York, N.Y.) (2012). Volume: 338, Issue: 6106. Pages: 517-20. Christopher Costello, Daniel Ovando, Ray Hilborn, Steven D Gaines, Olivier Deschenes, Sarah E Lester et al. 03:49 2013/02/25
  • Decision Accuracy and the Role of Spatial Interaction in Opinion Dynamics: Journal of Statistical Physics (2013). Colin J. Torney, Simon a. Levin, Iain D. Couzin et al. 03:47 2013/02/25
  • A systematic review of mathematical models of mosquito-borne pathogen transmission: 1970-2010: Journal of The Royal Society Interface (2013). Volume: 10, Issue: 81. Pages: 20120921-20120921. Robert C Reiner, T Alex Perkins, Christopher M Barker, Tianchan Niu, Luis Fernando Chaves, Alicia M Ellis, Dylan B George, A. Le Menach, Juliet R C Pulliam, Donal Bisanzio, C. Buckee, Christinah Chiyaka, Derek A T Cummings, Andres J Garcia, Michelle L Gatton, P. W. Gething, David M Hartley, Geoffrey Johnston, Eili Y Klein, Edwin Michael, Steven W Lindsay, A. L. Lloyd, David M Pigott, William K Reisen, Nick Ruktanonchai, Brajendra K Singh, A. J. Tatem, Uriel Kitron, Simon I Hay, Thomas W Scott, David L Smith et al. 03:18 2013/02/25
  • Process-based models are required to manage ecological systems in a changing world: Ecosphere (2013). Volume: 4, Issue: February. Pages: 1-12. K Cuddington, MJ Fortin, LR Gerber et al. 03:17 2013/02/25

Entries

Notes From The Week

08 Mar 2013

pageviews: ('not calculated')

(Working from Galveston this week while at Louise’s conference, looking after little one. some progress below.)

Monday

Alan Skype meeting

  • Comment piece / reply
  • ews-review
  • Comments on TE review?
  • IARPA Conference, funding options?

Tasks

  • On Marc’s suggestion, writing comparison of pattern-based and mechanism based approaches to modeling, particularly in decision theory context. draft in nonparametric-bayes
  • Review edits and writing on ews-review

Tuesday

Kathy Skype

  • Decision theory in early warning signals?
  • NCEAS / SESYNC working group?
  • Long term lake monitoring data, including some sedimentary analysis (see Wang et. al. (2012)).
  • Send Kathy TREE articles: Polasky et. al. (2011) and Fischer et. al. (2009).

Tasks

  • Writing comparison of pattern-based and mechanism based approaches to modeling, particularly in decision theory context
  • Review edits and writing on ews-review

Wednesday

  • more of the same.

Friday

  • Finished and submitted ews-review.
  • See commit log and issues tracker for details.

Notebook: Semantics,

  • stemming-based search seems to perform poorly – i.e. cannot find phrases, weight terms in titles, etc, just most frequently occuring terms in a post. Trying lunr, loads full text into memory but should do so only on selecting search bar? See issue #7

  • Modified google-analytics plugin to make single call to the API see issue #66

  • Tuning RDFa, primarily on vita. #22 #63. Tuned FOAF referencing to common nodes. Note that rel always applys to links while property applies to the content of the tag (e.g. anchor text, span context). Use rel in span and div elements, etc, to apply to multiple links.

References

  • Joern Fischer, Garry D. Peterson, Toby A. Gardner, Line J. Gordon, Ioan Fazey, Thomas Elmqvist, Adam Felton, Carl Folke, Stephen Dovers, (2009) Integrating Resilience Thinking And Optimisation For Conservation. Trends in Ecology & Evolution 24 10.1016/j.tree.2009.03.020
  • Stephen Polasky, Stephen R. Carpenter, Carl Folke, Bonnie Keeler, (2011) Decision-Making Under Great Uncertainty: Environmental Management in an Era of Global Change. Trends in Ecology & Evolution 26 10.1016/j.tree.2011.04.007
  • Rong Wang, John A. Dearing, Peter G. Langdon, Enlou Zhang, Xiangdong Yang, Vasilis Dakos, Marten Scheffer, (2012) Flickering Gives Early Warning Signals of A Critical Transition to A Eutrophic Lake State. Nature 492 10.1038/nature11655

Read more



Delayed Release Archives

02 Mar 2013

pageviews: 5

Today I dug out a handful (37) posts from the past two years that were published as #delayed-release content (and have all been tagged as such to indicate they were not public at their publication date). Many of these were connected to the #warning-signals project but involved results that I or Alan weren’t ready to provide ahead of publication, as I discussed in a post when I first began marking some posts for delayed release, challenges with collaboration in open science These posts were created while on the Wordpress platform and had been marked private, so that they would be invisible to public browsing. Wordpress makes this easy to do, as I discuss in the earlier post. When I switched to a Jekyll platform, these posts all received the yaml header published: false and were also excluded from the Github repository until I could revisit and decide which to release. Managing these posts for release in the Jekyll framework just requires a little command-line fu, grep -l 'published: false' *.md grabs the relevant posts for inspection and release.

The notebook is currently back to an ONS All-content, immediate, as I have been able to okay the practice with all current major collaborations. (Sure, there’s always stuff that doesn’t make it into the notebook or is not appropriate for the notebook – the most common case of the latter being peer reviews I write for various journals. Though I make a practice of signing my reviews, I believe they still represent private communication unless the authors, editors and journals were to agree to publishing them.) All of these posts would have been released sooner but for finding a minute to remember to do so – perhaps a cautionary example for the delayed release practice.

Looking back on these posts, I probably would not have delayed most of them today, as I have gotten more comfortable not only with the open notebook, but with discussing the practice with collaborators. My own feelings of not wanting to push my adviser in a discussion of the notebook was as much responsible for these delays as other factors that I discuss in the earlier post. One key thing I wondered about in writing delayed-release posts was whether I would (perhaps subconsciously) write in a different or more candid style than I do for my regular notes. I was glad not to notice any glaring examples (though perhaps a diligent or very bored reader might find some!) of expletives and insulting commentary in the delayed posts. I do notice the overall tone and volume of the textual content changes slightly. However, that trend is actually much more dominant in comparing my very early posts (mostly from the OWW days), which were quite discursive, to the current posts which tend to be much more terse.

So have a look at the #delayed-release posts and let me know what you think!

Read more



Notes

28 Feb 2013

pageviews: 14

Meeting with Marc

selection of problem: consider simpler models only.

step through complexity hierarchy at the beginning

  1. classical approaches are to fix relationship of stock recruitment.
  2. parameter uncertainty
  3. Gaussian processes for model uncertainty

The Discussion section can deal with model uncertainty model choice approaches.

  • A method where we don’t have to assume the stock recuritment relationship
  • How well we can do with that.
  • How much data we need.

discussion of mathematical presentation, clarity

Misc

  • Finished review. I should stop signing reviews, then I could write more careless quick impressions.
  • discussed potential of a GSOC proposal for an R NEXML parser
  • requested to consider presenting at ISI Conference session on early warning.

some reading on semantics

David Shotton and Tanya Gray have release a CiTO tool for adding semantic data to many major publication platforms, taking a clever approach to annotation that I’ve never considered – croudsourcing to readers (davidshotton, 2013). The tool isn’t perfect but is very intuitive. The clever part is that all annotations the reader adds are collated in a database and cand be retrieved through their API. Not sure if many readers will make use of this, but time will tell. Meanwhile authors or publishers might pay attention.

In that vein, David has also written a call of Ten Next Steps for publishers, (davidshotton, 2013b). Some may be redundant, but certainly shows how publishers could go a lot further. If only there was more incentive for publishers to innovate like this instead of crafting clever and questionable strategies to boost impact factors.

Reading

Very interesting paper on implementing/teaching nonlinear model inference to ecologists out today (Bolker et. al. 2013), and the code and data for each of the examples they discuss. Lessons in the mansucript may be familiar to many, but should still be taught more widely in ecology. Should be useful for teaching this stuff in graduate courses too.

  • Benjamin M. Bolker, Beth Gardner, Mark Maunder, Casper W. Berg, Mollie Brooks, Liza Comita, Elizabeth Crone, Sarah Cubaynes, Trevor Davies, Perry de Valpine, Jessica Ford, Olivier Gimenez, Marc Kery, Eun Jung Kim, Cleridy Lennert-Cody, Arni Magnusson, Steve Martell, John Nash, Anders Nielsen, Jim Regetz, Hans Skaug, Elise Zipkin, (2013) Strategies For Fitting Nonlinear Ecological Models in R, ad Model Builder, And Bugs. Methods in Ecology And Evolution 10.1111/2041-210X.12044
  • davidshotton, (2013) CiTO Reference Annotation Tools. Semantic Publishing semanticpublishing.wordpress.com/2013/02/26/cito-tools/
  • davidshotton, (2013) Ten next steps for semantic authors and publishers. Semantic Publishing semanticpublishing.wordpress.com/2013/02/26/ten-next-steps/

Read more



Notes

27 Feb 2013

pageviews: 13

Reviewing

arg, reviewing. so much time.

Seminar

Kate Richerson gave a fanstastic presentation to mega group yesterday showing her integration of behavior and dispersal in krill dynamics. She derives the krill’s behavioral movement patterns through a stochastic dynamic programming solution rather than just dictate it to the dispersal model directly.

Reading

.

  • Provicative stuff in PNAS Perretti et. al. (2013). It would be nice to see the comparison against the generating model.  The forecasting errors of the fully Bayesian inference using the correct model shouldn’t necessary be small, but they should be just as big as the Bayesian model expects them to be (e.g. over replicates, the Bayesian forecast error reflects the widths of the posteriors (modulo some statement about priors)).

My worry with any approach doing better than the expected error is that it does so by generating biased estimation – e.g. something in the forecasting method, rather than in the data, happens to be biased towards the ‘right answer’.  In this example I don’t think that is the case.

We already know that the modes of the posteriors don’t recover the correct model parameters of the generating model from an MCMC of such a complex model with limited data (though perhaps it is good to remind folks of this!), so it is not surprising that a heuristic approach can outperform this in cases.  

It certainly is good to wrestle with these examples in any case.  Steve’s one of my post-doc advisors now so no doubt I’m biased.  Noam also pointed out this earlier article in Ecology Perretti et. al. (2012).

  • In seminar, Marc mentioned the great paper by Wiedenmann et. al. (2011) that uses the comparison of the optimal foraging strategy evolved by whales in their evolutionary environment to how that strategy performs today (similar spirit to the kinds of comparisons I am interested in with #multiple-uncertainty project.

Reading an old Schaffer paper Schaffer (1984).

  • Though Nature says the National Academy of Sciences needs an overhaul to stay relevant. Probably true but they still do some excellent reports that could deserve more eyes, like this on model fitting (). We could do more VVUQ in ecology and evolution, and the value of presenting benchmark examples of this in simple (trivial) cases cannot be overstated, e.g. Fig 5.2.1.

  • GP methods in Biometrika Banerjee et. al. (2012)

  • Luminaries of evolution forecast the future of the field, with an eye towards data and semantics in Losos et. al. (2013)

Misc

  • wrote a quick CSL format for knitcitations. (Generate inline citations using the copy function in Mendeley library. A bit silly since normally that would already generate the entire bibliographic citation.)
  • Should really make a CSL format for Mendeley to paste in a HTML citation (with full metadata in tooltip). Might then use an existing format to copy-paste bib info at end.
bibliography()
  • A. Banerjee, D. B. Dunson, S. T. Tokdar, (2012) Efficient Gaussian Process Regression For Large Datasets. Biometrika 100 10.1093/biomet/ass068
  • Jonathan B. Losos, Stevan J. Arnold, Gill Bejerano, E. D. Brodie, David Hibbett, Hopi E. Hoekstra, David P. Mindell, Antónia Monteiro, Craig Moritz, H. Allen Orr, Dmitri A. Petrov, Susanne S. Renner, Robert E. Ricklefs, Pamela S. Soltis, Thomas L. Turner, (2013) Evolutionary Biology For The 21st Century. Plos Biology 11 10.1371/journal.pbio.1001466
  • Charles Thomas Perretti, George Sugihara, Stephan B. Munch, (2012) Nonparametric Forecasting Outperforms Parametric Methods For A Simulated Multi-Species System. Ecology 10.1890/12-0904.1
  • C. T. Perretti, S. B. Munch, G. Sugihara, (2013) Model-Free Forecasting Outperforms The Correct Mechanistic Model For Simulated And Experimental Data. Proceedings of The National Academy of Sciences 10.1073/pnas.1216076110
  • William M. Schaffer, (1984) Stretching And Folding in Lynx Fur Returns: Evidence For A Strange Attractor in Nature?. The American Naturalist 124 10.1086/284318
  • John Wiedenmann, Katherine A. Cresswell, Jeremy Goldbogen, Jean Potvin, Marc Mangel, (2011) Exploring The Effects of Reductions in Krill Biomass in The Southern Ocean on Blue Whales Using A State-Dependent Foraging Model. Ecological Modelling 222 10.1016/j.ecolmodel.2011.07.013
  • Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. www.nap.edu/catalog.php?record_id=13395

Read more



Semantic Citations For The Notebook And Knitr

22 Feb 2013

pageviews: 117

I have on ocassion been exploring the use of semantic markup in the notebook. In this post I illustrate how I am handling semantic citations. One of the more intriguing ideas is the ability to add semantic meaning to citations through the CITO ontology of Shotton (2010). Citation counts form a central part of academic discourse, but contain very little information regarding the reason for the citation. Most notably, ‘negative’ citations refuting a claim carry just the same weight as those confirming or relying upon a claim. Given the scale and expansion of academic literature, it is rarely reasonable to explore this citation graph manually. CITO provides a language for embedding the meaning of the citation, such as “discusses”, “refutes”, or “usesMethodIn”, to the citation. (For instance, my earlier citation to Shotton identifies itself as “usesMethodIn”, as I will explain).

The main barrier to this approach is a lack of adoption. One of the primary concerns is the burden it places on authors of adding the extra data. On one hand, authors already bother formatting and reformatting layout, spelling, and reference order to the arcane specifications of different journals, which suggests authors can be persuaded to do some pretty tedious tasks if the publishers would require it. After all, the task of adding citations is already much easier than it was in the days of paper journals. Still, it is much simpler to remove a tedious requirement than to add a new one. My hope is that intelligent tools can simplify this process, as they already have with other aspects of managing citations, and encourage the use of CITO. In this spirit, I have recently started trying to consistently use the CITO ontology in my notebook entries as a test case, using some tools of my own design.

Semantics in knitcitations

Several months ago I created the R package knitcitations to provide a citation platform for knitr dynamic documents, which provide executable code and automatic inclusion of results inside plain-text (markdown) descriptions. I write most of my research scripts and many of my notebook entries in this manner. The package can generate citations by DOI, circumventing the need for maintaining bibtex or similar database of citation information, using commands such as

citet("10.1186/2041-1480-1-S1-S6")

Extending the package to support CITO was rather straight forward. Using the latest version of knitcitations, one can generate in-line citations with CITO semantics simply by passing the reason for the citation as well, such as

citet("10.1186/2041-1480-1-S1-S6", cito="usesMethodIn")

which generates the following HTML:

<a class='dx.doi.org/10.1186/2041-1480-1-S1-S6' property='purl.org/spar/cito/usesMethodIn' >Shotton (2010)</a>

This provides a convient platform to generate semantic citations in this lab notebook. As before, knitcitations will also generate a complete reference list at the end of the document by calling the bibliography function at the end.

Semantic overkill?

It is possible to add far more semantic data to this reference list at the end of an article. Invisible semantic markup can identify to a machine what value corresponds to the volume number or issue number, or journal name, e,g, using the BIBO ontology. I have added support for ths kind of markup to knitcitations as well, and several of my posts provide examples. The raw markup looks like this:

<div prefix="dc: purl.org/dc/terms/,
                      bibo: purl.org/ontology/bibo/,
                      foaf: xmlns.com/foaf/spec/,
                      biro: purl.org/spar/biro/"
        rel="purl.org/spar/biro/ReferenceList"> <ul class='bibliography'> 
<li> <span property="dc:title">Fisheries: Does Catch Reflect Abundance?.</span> <span property="dc:creator"> <span property="foaf:givenName">Daniel</span> <span property="foaf:familyName">Pauly</span>, </span><span property="dc:creator"> <span property="foaf:givenName">Ray</span> <span property="foaf:familyName">Hilborn</span>, </span><span property="dc:creator"> <span property="foaf:givenName">Trevor A.</span> <span property="foaf:familyName">Branch</span>, </span>  (<span property="dc:date">2013</span>)  <span rel="purl.org/dc/terms/isPartOf" 
                            resource="[purl.org/dc/terms/journal]">
                        <span property="purl.org/dc/terms/title"
                                content=" Nature ">
                        </span>
                          <span property="bibo:shortTitle"> Nature </span>
               </span>  <span property="bibo:volume">494</span>    <a property="bibo:doi" class="dx.doi.org/10.1038/494303a">10.1038/494303a</a> </li>
<li> <span property="dc:title">Net Gains.</span> <span property="dc:creator"> <span property="foaf:givenName">unknown</span> <span property="foaf:familyName">unknown</span>, </span>  (<span property="dc:date">2013</span>)  <span rel="purl.org/dc/terms/isPartOf" 
                            resource="[purl.org/dc/terms/journal]">
                        <span property="purl.org/dc/terms/title"
                                content=" Nature ">
                        </span>
                          <span property="bibo:shortTitle"> Nature </span>
               </span>  <span property="bibo:volume">494</span>    <a property="bibo:doi" class="dx.doi.org/10.1038/494282a">10.1038/494282a</a> </li>
 </ul>
</div>

However, I have since decided that such markup is largely overkill. The DOI uniquely identifies the publication already, and allows us to programmatically retrieve the rest of the data (title, authors, journal, etc) from semantically identified XML by querying against services such as CrossRef. This is the essential concept of linked data, by which both source and referer are enriched.

Moreover, DOIs follows a specific construction that lets us reliably identify them in plain text using regular expressions, making any futher semantics to declare that we are citing the article mostly irrelevant. This is convient for identifying all citations appearing in the notebook without any markup. The CITO example above has the advantage of providing a link and associating the DOI with the reason for the citation, by virtue of being inside the same html anchor element.

Replacing the reference list?

If we are not going to semantically mark up the reference list, we could consider abolishing the reference list all together. After all, as a tool for the digital reader the concept is rather vestigal – I hate losing my place by scrolling to the end of an article just to see to what reference number 7 refers. With the method shown thus far, the reader can open the link to access this information, but that still interrupts the flow of reading. The digitally native solution is a mouse-over or tooltip effect that displays this information, as many professional publishers already use in their HTML versions.

Once again, this is straight forward to support using the knitcitations package, at least for sites that include the popular bootstrap javascript libraries, such as this notebook. I have added an option to the in-text citation functions to provide such tooltips in a span element, such that calling the command

<span class='showtooltip' title='Shotton D (2010). "Cito, The Citation Typing Ontology." _Journal of
Biomedical Semantics_, *1*. ISSN 2041-1480, <URL:
dx.doi.org/10.1186/2041-1480-1-S1-S6>.'><a class='dx.doi.org/10.1186/2041-1480-1-S1-S6' property='purl.org/spar/cito/usesMethodIn' >Shotton (2010)</a></span>

This behavior can be toggled on by calling

cite_options(tooltip=TRUE)

after loading the knitcitations library. EDIT: Note that this requires the javascript trigger on the class showtooltip, which can be done by adding this to your header:

    <script type="text/javascript">
      $(document).ready(function (){
        $(".showtooltip").tooltip();
      });
    </script>

Citing without DOIs

Not all the literature we may wish to cite includes DOIs, such as arXiv preprints, Wikipedia pages, or other academic blogs. Even when a DOI is present it is not always trivial to locate. With version 0.4-0, knitcitations can produce citations given any URL using the

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.