Accessing Ecological and evolutionary datasets in R

Karthik Ram, Carl Boettiger, and Scott Chamberlain

esa2012@ropensci.org


Shortcuts: M =   ,   G =


Most science is not reproducible or repeatable, even within the same lab group over time.


Science

Data Life Cycle

spacer

source: Michener, 2006 Ecoinformatics.



Open Science


Open data + code

spacer

Source: Wolkovich et al. GCB 2012.



spacer

Source: PLOS, 2007



R Open Science



Open Science needs open source tools

spacer

Source: Revolution Analytics, 2010, Nature editorial, 2012

Why R?

The old way...

spacer

Why R?

A better way



glm(y ~ -1 + a + c + z + a:z, data = mydata, maxit = 30)


This is reproducible, repeatable and can serve as a analytic workflow.




spacer

Wrapping all science APIs




Development team


spacer
Carl Boettiger
  • 🔗

spacer
spacer spacer
Karthik Ram
  • 🔗

spacer
spacer spacer
Scott Chamberlain
  • 🔗

spacer



Advisory team


spacer
Duncan
Temple Lang
  • 🔗
spacer spacer
Hadley Wickham

  • 🔗

spacer spacer
JJ Allaire

  • 🔗
spacer spacer
Bertram
Ludascher
  • 🔗
spacer spacer
Matt Jones

  • 🔗



R and APIs

API keys can be stored in a users.rprofile

 
	options(MendeleyKey = "uf5daib7wyil7ag5buc")
	options(MendeleyPrivateKey = "faj2os5dyd7jop2fok6")
	options(PlosApiKey = "ef3vip9yak7od3hud4g")
	options(SpringerMetdataKey = "ri9hi7woc6jax4vaf8w")
	





Note: These keys aren't real.

Public Library of Science full text - rplos


library(rplos)
plot_throughtime(list("reproducible science"), 500)
spacer


Managing bibliography - RMendeley

Manage libraries and measure impact of research

groupDocInfo(mc, 530031, 4344945792)
$abstract
[1] "SUMMARY: Modern biological experiments create vast amounts of data which are geographically distributed. These datasets consist of petabytes of raw data and billions of documents. Yet to the best of our knowledge, a search engine technology that searches and cross-links all different data types in life sciences does not exist.....

$authors
$authors[[1]]
      forename        surname
   "Dominic S" 	"L\xfctjohann" 
# ....
	


Accessing data behind papers - rdryad

# Get the URL for a data file
dryaddat <- download_url("10255/dryad.1759")

# Get a file given the URL
file <- dryad_getfile(dryaddat)


Tracking altmetrics - raltmet

Tracks altmetrics across various sources such as GitHub, Total impact, CitedIn, CiteULike, Stackoverflow.

GitHub(userorg = "ropensci", repo = "rmendeley")
totimp(id = "10.5061/dryad.8671")
stackexchange(ids = 16632)

Mapping biodiversity data - rgbif

distribution <- occurrencelist(sciname = "Danaus plexippus", coordinatestatus = TRUE, maxresults = 1000, latlongdf = TRUE)
spacer
Also see Cartodb's powerful mapping capabilites and R package.


Sharing unpublished data - (rfigshare)

Using Figshare's new API, it is now possible to share figures, data, and any other object generated in R directly to one's figshare account.


> figshare(data)
# code isn't ready yet but once it is, it will return a persistent identifier






spacer A multi-institution consortium to build infrastructure for open science



DataONE

DataONE creates all the necessary components to support persistent and secure access to earth observation data.




DataONE's upcoming R package will allow users to submit and access data to/from member nodes directly from the console.




Git + Science

Does your version control look like this?

spacer

Using Git with a GUI

spacer
spacer

Rapid peer-peer sharing of code is great for science



R packages early in development can easily be tested, rapidly deployed from GitHub using devtools and revised before submitting to a persistent repository such as CRAN.


library(devtools)
install_github("RMendeley", "ropensci")



R + collaborative writing


knitr + Markdown


spacer
Xie Y (2012). knitr: A general-purpose package for dynamic report generation in R.

knitr + Markdown + GitHub

GitHub automatically renders Markdown and even provides syntax highlighting


spacer

knitr + Markdown + GitHub = executible paper


spacer

knitr + Markdown + GitHub = pre publication review


spacer

Incorporate citations with R + Markdown


knitcitations

citet(c(Halpern2006 = "10.1111/j.1461-0248.2005.00827.x"))
# then cite in your markdown file
citet("Halpern2006")

# or read citations from a bibtex file which can be automatically generated and updated from services like Mendeley
bib <- read.bibtex("example.bib") # then cite inline citet(bib[["knitr"]])

- knitcitations on Carl Boettiger's GitHub
- tutorial

  esa.ropensci.org

Please us if you have feedback or ideas for collaborations.

All ropensci projects are on
also on and

← →

/

#
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.