Lost Boy | The blog of @ldodds

Apr 14 2015

Open Data, Personal

Open data and diabetes

In December my daughter was diagnosed with Type 1 diabetes. It was a pretty rough time. Symptoms can start and escalate very quickly. Hyperglycaemia and ketoacidosis are no joke.

But luckily we have one of the best health services in the world. We’ve had amazing care, help and support. And, while we’re only 4 months into dealing with a life-long condition, we’re all doing well.

Diabetes sucks though.

I’m writing this post to reflect a little on the journey we’ve been on over the last few months from a professional rather than a personal perspective. Basically, the first weeks of becoming a diabetic or the parent of a diabetic, is a crash course in physiology, nutrition, and medical monitoring. You have to adapt to new routines for blood glucose monitoring, learn to give injections (and teach your child to do them), become good at book-keeping, plan for exercise, and remember to keep needles, lancets, monitors, emergency glucose and insulin with you at all times, whilst ensuring prescriptions are regularly filled.

Oh, and there’s a stupid amount of maths because you’ll need to start calculating how much carbohydrates are in all of your meals and inject accordingly. No meal unless you do your sums.

Good job we had that really great health service to support us (there’s data to prove it). And an amazing daughter who has taken it all in her stride.

Diabetics live a quantified life. Tightly regulating blood glucose levels means knowing exactly what you’re eating, and learning how your body reacts to different foods and levels of exercise. For example we’ve learnt the different ways that a regular school day versus school holidays effects my daughters metabolism. That we need to treat ahead for the hypoglycaemia that follows a few hours after some fun on the trampoline. And that certain foods (cereals, risotto) seem to affect insulin uptake.

So to manage the condition we need to know how many carbohydrates are in:

any pre-packaged food my daughter eats
any ingredients we use when cooking, so we can calculate a total portion size
in any snack or meal that we eat out

Food labeling is pretty good these days so the basic information is generally available. But its not always available on menus or in an easy to use format.

The book and app that diabetic teams recommend is called Carbs and Cals. I was a little horrified by it initially as its just a big picture book of different portion sizes of food. You’re encouraged to judge everything by eye or weight. It seemed imprecise to me but with hindsight its perfectly suited to those early stages of learning to live with diabetes. No hunting over packets to get the data you need: just look at a picture, a useful visualisation. Simple is best when you’re overwhelmed with so many other things.

Having tried calorie counting I wanted to try an app to more easily track foods and calculate recipes. My Fitness Pal, for example, is pretty easy to use and does bar-code scanning of many foods. There are others that are more directly targeted at diabetics.

The problem is that, as I’ve learnt from my calorie counting experiments, the data isn’t always accurate. Many apps fill their databases through crowd-sourcing. But recipes and portion sizes change continually. And people make mistakes when they enter data, or enter just the bits they’re interested in. Look-up any food on My Fitness Pal and you’ll find many duplicate entries. It makes me distrust the data because I’m concerned its not reliable. So for now we’re still reading packets.

Eating out is another adventure. There have been recent legislative changes to require restaurants to make more nutritional information available. If you search you may find information on a company website and can plan ahead. Sometimes its only available if you contact customer support. If you ask in a (chain) restaurant they may have it available in a ring-binder you can consult with the menu. This doesn’t make a great experience for anyone. Recently we’ve been told in a restaurant to just check online for the data (when we know it doesn’t exist), because they didn’t want to risk any liability by providing information directly. On another occasion we found that certain dishes – items from the childrens menu – weren’t included on the nutritional charts.

Basically, the information we want is:

often not available at all
available, but only if you know were to look or who to ask
potentially out of date, as it comes from non-authoritative sources
incomplete or inaccurate, even from the authoritative sources
not regularly updated
not in easy to use formats
available electronically, e.g. in an app, but without any clear provenance

The reality is that this type of nutritional and ingredient data is basically in the same state as government data was 6-7 years ago. It’s something that really needs to change.

Legislation can help encourage supermarkets and restaurants to make data available, but really its time for them to recognize that this is essential information for many people. All supermarkets, manufacturers and major chains will have this data already, there should be little effort required in making it public.

I’ve wondered whether this type of data ought to be considered as part of the UK National Information Infrastructure. It could be collected as part of the remit of the Food Standards Agency. Having a national source would help remove ambiguity around how data has been aggregated.

Whether you’re calorie or carb counting, open data can make an important difference. Its about giving people the information they need to live healthy lives.

Oct 08 2014

4 Comments

Semantic Web

Creating an Application Using the British National Bibliography

This is the fourth and final post in a series (1, 2, 3, 4) of providing background and tutorial material about the British National Bibliography. The tutorials were written as part of some freelance work I did for the British Library at the end of 2012. The material was used as input to creating the new documentation for their Linked Data platform but hasn’t been otherwise published. They are now published here with permission of the BL.

The British National Bibliography (BNB) is a bibliographic database that contains data on a wide range of books and serial publications published in the UK and Ireland since the 1950s. The database is available under a public domain license and can be accessed via an online API which supports the SPARQL query language.

This tutorial provides an example of building a simple web application using the BNB SPARQL endpoint using Ruby and various open source libraries. The tutorial includes:

a description of the application and its intended behaviour
a summary of the various open source components used to build the application
a description of how SPARQL is used to implement the application functionality

The example is written in a mixture of Ruby and Javascript. The code is well documented to support readers more familiar with other languages.

The “Find Me A Book!” Application

The Find Me a Book! demonstration application illustrates how to use the data in the BNB to recommend books to readers. The following design brief describes the intended behaviour.

The application will allow a user to provide an ISBN which is used to query the BNB in order find other books that the user might potentially want to read. The application will also confirm the book title to the user to ensure that it has found the right information.

Book recommendations will be made in two ways:

More By The Author: will provide a list of 10 other books by the same author(s)
More From Reading Lists: will attempt to suggest 10 books based on series or categories in the BNB data

The first use case is quite straight-forward and should generate some “safe” recommendations: it’s likely that the user will like other works by the author.

The second attempts will use the BNB data a little more creatively and so the suggestions are likely to be a little more varied.

Related books will be found by looking to see if the user’s book is in a series. If it is then the application will recommend other books from that series. If the book is not included in any series, then recommendations will be driven off the standard subject classifications. The idea is that series present ready made reading lists that are a good source of suggestions. By falling back to a broader categorisation, the user should always be presented with some recommendations.

To explore the recommended books further, the user will be provided with links to LibraryThing.com.

The Application Code

The full source code of the application is available on github.com. The code has been placed into the Public Domain so can be freely reused or extended.

The application is written in Ruby and should run on Ruby 1.8.7 or higher. Several open source frameworks were used to build the application:

Sinatra — a light-weight Ruby web application framework
SPARQL Client — an client library for accessing SPARQL endpoints from Ruby
The JQuery javascript library for performing AJAX requests and HTML manipulation
The Boostrap CSS framework is used to build the basic page layout

The application code is very straight-forward and can be separated into server-side and client-side components.

Server Side

The server side implementation can be found in app.rb. The Ruby application delivers the application assets (CSS, images, etc) and also exposes several web services that act as proxies for the BNB dataset. These services submit SPARQL queries to the BNB SPARQL endpoint and then process the results to generate a custom JSON output.

The three services, which each accept an isbn parameter are:

/title — find a book title from an ISBN. Example output and implementation
/by-author — uses an ISBN to find more books by an author. Example output and implementation
/related — uses an ISBN to related books based on either series or category relationships. Example output and implementation

Each of the services works in essentially the same way:

The isbn parameter is extracted from the request. If the parameter is not found then an error is returned to the client. The ISBN value is also normalised to remove any spaces or dashes
A SPARQL client object is created to provide a way to interact with the SPARQL endpoint
The ISBN parameter is injected into the SPARQL query that will be run against the BNB, using the add_parameters function
The final query is then submitted to the SPARQL endpoint and the results used to build the JSON response

The /related service may actually makes two calls to the endpoint. If the first query doesn’t return any results then a fallback query is used instead.

Client-Side

The client side Javascript code can all be found in find-me-a-book.js. It uses the JQuery library to trigger custom code to be executed when the user submits the search form with an ISBN.

The findTitle function calls the /title service to attempt to resolve the ISBN into the title of a book. This checks that the ISBN is in the BNB and provides useful feedback for the user.

If this initial call succeeds then the find function is called twice to submit parallel AJAX requests. One to the /by-author service, and one to the /related service. The function accepts two parameters, the first parameter identifies the service to call, the second provides a name that is used to guide the processing of the results.

The HTML markup uses a naming convention to allow the find function to write the results of the request into the correct parts of the page, depending on its second parameter.

The ISBN and title information found in the results from the AJAX requests are used to build links to the LibraryThing website. But these could also be processed in other ways, e.g. to provide multiple links or invoke other APIs.

Installing and Running the Application

A live instance of the application has been deployed to allow the code to be tested without having to install and run it locally. The application can be found at:

findmeabook.herokuapp.com/

For readers interested in customising the application code, this section provides instructions on how to access the source code and run the application.

The instructions have been tested on Ubuntu. Follow the relevant documentation links for help with installation of the various dependencies on other systems.

Source Code

The application source code is available on Github and is organised into several directories:

public — static files including CSS, Javascript and Images. The main client-side Javascript code can be found in find-me-a-book.js
views — the templates used in the application
src — the application source code, which is contained in app.rb

The additional files in the project directory provide support for deploying the application and installing the dependencies.

Running the Application

To run the application locally, ensure that Ruby, RubyGems and
git are installed on the local machine.

To download all of the source code and asserts, clone the git repository:

git clone https://github.com/ldodds/bnb-example-app.git

This will create a bnb-example-app directory. To simplify the installation of further dependencies, the project uses the Bundler dependency management tool. This must be installed first:

sudo gem install bundle

Bundler can then be run to install the additional Ruby Gems required by the project:

cd bnb-example-app
sudo bundle install

Once complete the application can be run as follows:

rackup

The rackup application will then start the application as defined in config.ru. By default the application will launch on port 9292 and should be accessible from:

localhost:9292

Summary

This tutorial has introduced a simple demonstration application that illustrates one way of interacting with the BNB SPARQL endpoint. The application uses SPARQL queries to build a very simple book recommendation tool. The logic used to build the recommendations is deliberately simple to help illustrate the basic principles of working with the dataset and the API.

The source code for the application is available under a public domain license so can be customised or reused as necessary. A live instance provides a way to test the application against the real data.

Tagged bnb

Oct 08 2014

4 Comments

Semantic Web

Accessing the British National Bibliography Using SPARQL

This is the third in a series of posts (1, 2, 3, 4) providing background and tutorial material about the British National Bibliography. The tutorials were written as part of some freelance work I did for the British Library at the end of 2012. The material was used as input to creating the new documentation for their Linked Data platform but hasn’t been otherwise published. They are now published here with permission of the BL.

Note: while I’ve attempted to fix up these instructions to account with changes to the platform on which the data is published, there may still be some errors. If there are then please leave a comment or drop me an email and I’ll endeavour to fix.

The tutorial introduces developers to the BNB API which supports querying of the dataset via the SPARQL query language and protocol. The tutorial provides:

Pointers to relevant background material and tutorials on SPARQL and the SPARQL Protocol
A collection of useful queries and query patterns for working with the BNB dataset

The queries described in this tutorial have been published as a collection of files that can be download from github.

What is SPARQL?

SPARQL is a W3C standard which defines a query language for RDF databases. Roughly speaking SPARQL is the equivalent of SQL for graph databases. SPARQL 1.0 was first published as an official W3C Recommendation in 2008. At the time of writing SPARQL 1.1, which provides a number of new language features, will shortly be published as a final recommendation.

A SPARQL endpoint implements the SPARQL protocol allowing queries to be submitted over the web. Public SPARQL endpoints offer an API that allows application developers to query and extract data from web or mobile applications.

A complete SPARQL tutorial is outside the scope of this document, but there are a number of excellent resources available for developers wishing to learn more about the query language. Some recommended tutorials and reference guides include:

Apache Jena SPARQL tutorial — a short tutorial that introduces the key features of the language
SPARQL by Example — a thorough introduction to all aspects of the SPARQL query language
SPARQL Cheat sheet — a set of summary slides that provides a reference to key language features
SPARQL Query Language Reference — a printable reference sheet that illustrates the important SPARQL syntax

The BNB SPARQL Endpoint

The BNB public SPARQL endpoint is available from:

bnb.data.bl.uk/sparql

No authentication or API keys are required to use this API.

The BNB endpoint supports SPARQL 1.0 only. Queries can be submitted to the endpoint using either GET or POST requests. For POST requests the query is submitted as the body of the request, while for GET requests the query is URL encoded and provided in the query parameter, e.g:

bnb.data.bl.uk/sparql?query=SELECT+%3Fs+%3Fp+%3Fo+WHERE+%7B%3Fs+%3Fp+%3Fo%7D+LIMIT+1

Refer to the SPARQL protocol specification for additional background on submitting queries. Client libraries for interacting with SPARQL endpoints are available in a variety of languages, including python, ruby, nodejs, PHP and Java.

Types of SPARQL Query and Result Formats

There are four different types of SPARQL query. Each of the different types supports a different use case:

ASK: returns a true or false response to test whether data is present in a dataset, e.g. to perform assertions or check for interesting data before submitting queries. Note these no longer seem to be supported by the BL SPARQL endpoint. All ASK queries now return an error.
SELECT: like the SQL SELECT statement this type of query returns a simple tabular result set. Useful for extracting values for processing in non-RDF systems
DESCRIBE: requests that the SPARQL endpoint provides a default description of the queried results in the form of an RDF graph
CONSTRUCT: builds a custom RDF graph based on data in the dataset

Query results can typically be serialized into multiple formats. ASK and SELECT queries have standard XML and JSON result formats. The graphs produced by DESCRIBE and CONSTRUCT queries can be serialized into any RDF format including Turtle and RDF/XML. The BNB endpoint also supports RDF/JSON output from these types of query. Alternate formats can be selected using the output URL parameter, e.g. output=json:

bnb.data.bl.uk/sparql?query=SELECT+%3Fs+%3Fp+%3Fo+WHERE+%7B%3Fs+%3Fp+%3Fo%7D+LIMIT+1&output=json

General Patterns

The following sections provide a number of useful query patterns that illustrate some basic ways to query the BNB.

Discovering URIs

One very common use case when working with a SPARQL endpoint is the need to discover the URI for a resource. For example, the ISBN number for a book or an ISSN number of a serial is likely to be found in a wide variety of databases. It would be useful to be able to use those identifiers to look up the corresponding resource in the BNB.

Here’s a simple SELECT query that looks up a book based on its ISBN-10:


#Declare a prefix for the bibo schema
PREFIX bibo: <purl.org/ontology/bibo/>
SELECT ?uri WHERE {
  #Match any resource that has the specific property and value
  ?uri bibo:isbn10 "0261102214".
}

As can be seen from executing this query there are actually 4 different editions of The
that have been published using this ISBN.

Here is a variation of the same query that identifies the resource with an ISSN of 1356-0069:


PREFIX bibo: <purl.org/ontology/bibo/>
SELECT ?uri WHERE {
  ?uri bibo:issn "1356-0069".
}

The basic query pattern is the same in each case. Resources are matched based on the value of a literal property. To find different resources just substitute in a different value or match on a different property. The results can be used in further queries or used to access the BNB Linked Data by performing a GET request on the URI.

In some cases it may just be useful to know whether there is a resource that has a matching identifier in the dataset. An ASK query supports this use case. The following query should return true as there is a resource in the BNB with the given ISSN:


PREFIX bibo: <purl.org/ontology/bibo/>
ASK WHERE {
  ?uri bibo:issn "1356-0069".
}

Note ASK queries no longer seem to be supported by the BL SPARQL endpoint. All ASK queries now return an error

Extracting Data Using Identifiers

Rather than just request a URI or list of URIs it would be useful to extract some additional attributes of the resources. This is easily done by extending the query pattern to include more properties.

The following example extracts the URI, title and BNB number for all books with a given ISBN:


#Declare some additional prefixes
PREFIX bibo: <purl.org/ontology/bibo/>
PREFIX blterms: <www.bl.uk/schemas/bibliographic/blterms#>
PREFIX dct: <purl.org/dc/terms/>

SELECT ?uri ?bnb ?title WHERE {
  #Match the books by ISBN
  ?uri bibo:isbn10 "0261102214";
       #bind some variables to their other attributes
       blterms:bnb ?bnb;
       dct:title ?title.
}

This patterns extends the previous examples in several ways. Firstly, some additional prefixes are declared because the properties of interest are from several different schemas. Secondly, the query pattern is extended to match the additional attributes of the resources. The values of those attributes are bound to variables. Finally the SELECT clause is extended to list all the variables that should be returned.

If the URI for is already known then this can be used to directly identify the resource of interest. Its properties can then be matched and extracted. The following query returns the ISBN, title and BNB number for a specific book:


PREFIX bibo: <purl.org/ontology/bibo/>
PREFIX blterms: <www.bl.uk/schemas/bibliographic/blterms#>
PREFIX dct: <purl.org/dc/terms/>

SELECT ?isbn ?title ?bnb WHERE {
  <bnb.data.bl.uk/id/resource/009910399> bibo:isbn10 ?isbn;
       blterms:bnb ?bnb;
       dct:title ?title.         
}

Whereas the former query identified resources indirectly, via the value of an attribute, this query directly references a resource using its URI. The query pattern then matches the properties that are of interest. Matching resources by URI is usually much faster than matching based on a literal property.

Itemising all of the properties of a resource can be tiresome. Using SPARQL it is possible to ask the SPARQL endpoint to generate a useful summary of a resource (called a Bounded Description. The endpoint will typically return all attributes and relationships of the resource. This can be achieved using a simple DESCRIBE query:


DESCRIBE <bnb.data.bl.uk/id/resource/009910399>

The query doesn’t need to define any prefixes or match any properties: the endpoint will simply return what it knows about a resource as RDF. If RDF/XML isn’t useful then the same results can be retrieved as JSON.

Reverting back to the previous approach of indirectly identifying resources, its possible to ask the endpoint to generate descriptions of all books with a given ISBN:


PREFIX bibo: <purl.org/ontology/bibo/>
DESCRIBE ?uri WHERE {
  ?uri bibo:isbn10 "0261102214".
}

Matching By Relationship

Resources can also be matched based on their relationships, by traversing across the graph of data. For example it’s possible to lookup the author for a given book:


PREFIX bibo: <purl.org/ontology/bibo/>
PREFIX dct: <purl.org/dc/terms/>

SELECT ?author WHERE {
  #Match the book
  ?uri bibo:isbn10 "0261102214";
       #Match its author
       dct:creator ?author.
}

As there are four books with this ISBN the query results return the URI for Tolkien four times. Adding a DISTINCT will remove any duplicates:


PREFIX bibo: <purl.org/ontology/bibo/>
PREFIX dct: <purl.org/dc/terms/>

SELECT DISTINCT ?author WHERE {
  #Match the book
  ?uri bibo:isbn10 "0261102214";
       #Match its author
       dct:creator ?author.
}

Type Specific Patterns

The following sections provide some additional example queries that illustrate some useful queries for working with some specific types of resource in the BNB dataset. Each query is accompanied by links to the SPARQL endpoint that show the results.

For clarity the PREFIX declarations in each query have been ommited. It should be assumed that each query is preceded with the following prefix declarations:

PREFIX bio: <purl.org/vocab/bio/0.1/> PREFIX bibo: <purl.org/ontology/bibo/> PREFIX blterms: <www.bl.uk/schemas/bibliographic/blterms#> PREFIX dct: <purl.org/dc/terms/> PREFIX event: <purl.org/NET/c4dm/event.owl#> PREFIX foaf: <xmlns.com/foaf/0.1/> PREFIX geo: <www.w3.org/2003/01/geo/wgs84_pos#> PREFIX isbd: <iflastandards.info/ns/isbd/elements/> PREFIX org: <www.w3.org/ns/org#> PREFIX owl: <www.w3.org/2002/07/owl#> PREFIX rda: <RDVocab.info/ElementsGr2/> PREFIX rdf: <www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <www.w3.org/2000/01/rdf-schema#> PREFIX skos: <www.w3.org/2004/02/skos/core#> PREFIX xsd: <www.w3.org/2001/XMLSchema#>

Not all of these are required for all of the queries, but they declare all of the prefixes that are likely to be useful when querying the BNB.

People

There are a number of interesting queries that can be used to interact with author data in the BNB.

List Books By An Author

The following query lists all published books written by C. S. Lewis, with the most recently published books returned first:


SELECT ?book ?isbn ?title ?year WHERE {
  #Match all books with Lewis as an author
  ?book dct:creator <bnb.data.bl.uk/id/person/LewisCS%28CliveStaples%291898-1963>;
        bibo:isbn10 ?isbn;
        dct:title ?title;
        #match the publication event
        blterms:publication ?publication.

  #match the time of the publication event
  ?publication event:time ?time.
  #match the label of the year
  ?time rdfs:label ?year          
}
#order by descending year, after casting year as an integer
ORDER BY DESC( xsd:int(?year) )

Identifying Genre of an Author

Books in the BNB are associated with one or more subject categories. By looking up the list of categories associated with an author’s works it may be possible to get a sense of what type of books they have written. Here is a query that returns