Archive for the 'Demonstrators' Category

Penguin Archive demonstrator live

Tuesday, November 6th, 2012

Penguin Archive

The final version of the Penguin Archive demonstrator has been completed and embedded into the Special Collections’ Penguin Archive website.

Penguin demonstrator embedded in Special Collections

The demonstrator details have been documented in an earlier post, but in summary it combines a pre-populated chronology with live data pulled from the Archive’s newly-populated linked data store.

Posted in Demonstrators, Penguin Archive | No Comments »

Demonstrator previews

Wednesday, August 15th, 2012

One deliverable of the Bricolage project is demonstrators:

a browser-based mapping application for exploring the Geology collection via its geography
an interactive timeline displaying the chronology of selected resources within the Penguin Archive

The demonstrators were designed to show the potential for building data visualisations from the Linked Data produced by the project. As such, they both take a similar technical approach: a javascript application that (via a RESTful web service) retrieves json-formatted data from the Linked Data service. This data is then rendered as required by the web-based application.

Both these demonstrators will be publicly available in September, hosted on the Penguin Archive and Geology collection sites. For now here are some screenshots and a little more detail…

Penguin Archive Timeline

The Penguin Archive Timeline uses the freely available TimelineJS to present an interactive chronology of key events in the history of Penguin Books. Its construction involved the following steps:

The collection curators created a spreadsheet containing the key events, their dates and (optionally) an illustrative image. The event data also included the unique collection identifiers of any related collection held in the archive. These identifiers would later provide a route to the Linked Data.

The spreadsheet was then parsed into a JSON data structure understandable by the TimelineJS javascript application.

{
"startDate":"1863",
"endDate":"",
"headline":"Birth of Samuel Lane",
"text":"<span class='lod' res='gb-3-dm2244;gb-3-dm1649;gb-3-dm1676'><img style='border:none;' class='throbber' src="/img/spacer.gif">

Loading this data (and hosting the linked images) gave us the basic chronology, but without any sign of Linked Data so far! Now to use the supplied collection codes.
A small edit was made to TimelineJS to provide a callback upon event data load. This callback gives us the chance to query the Linked Data service. E.g.
```
/elda/api/penguin/id/archivalresource/gb-3-dm2309.json
```

The JSON returned from the call is then parsed and used to populate the timeline (the embedded box in the image above).

{
"format" : "linked-data-api",
"version" : "0.2",
"result" : {
  "_about" : "tc-bricol.ilrt.bris.ac.uk/elda/api/penguin/id/archivalresource/gb-3-dm2309.json",
  "dc_title" : "Pelican Books, Penguin Books, Penguin Handbooks, Penguin Specials, Pan Books, and other materials",
  "extent" : "4 records management boxes (359 books)",
  ...
  }
}

So the demonstrator shows the possibility of augmenting a purely browser-based application with rich, structured data.

Javascript libraries used: jquery-1.7.2, TimelineJS.

This demonstrator will be made publicly available via the Penguin Archive site in September. All associated code will also shortly be available under an Open Source licence.

A map interface for the Geology Collection

The Geology demonstrator provides a map-based route into the museum’s collection. As with the Penguin demonstrator, it is a javascript-based browser app that is backed by the Linked Data created by the project. The steps involved in creating the demonstrator were as follows:

The existing catalogue only had textual place name information. As part of the project this data was reviewed and somewhat cleaned. Then, in order to be able to locate the resources on a map, the place names were passed through a geocoding service. The resulting coordinates were stored with the records. This was done as part of the one-time data migration into the new Drupal platform. Code was also put in place to automatically geocode records created or edited as part of the ongoing catalogue work.

Drupal has RDF support, and this was configured (to be blogged about elsewhere shortly) to produce RDF versions of resource records like this:

<rdf:RDF xmlns:rdf="www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="purl.org/dc/terms/"
xmlns:sioc="rdfs.org/sioc/ns#"
xmlns:ad="schemas.talis.com/2005/address/schema#"
xmlns:geo="www.w3.org/2003/01/geo/wgs84_pos#">

<rdf:Description rdf:about="geomuse-dev.ilrt.bris.ac.uk/id/47-1">
<rdf:type rdf:resource="schema.org/CreativeWork"/>
<rdf:type rdf:resource="rdfs.org/sioc/ns#Item"/>
<rdf:type rdf:resource="xmlns.com/foaf/0.1/Document"/>
<dc:date rdf:datatype="www.w3.org/2001/XMLSchema#dateTime">2012-08-08T11:58:16+01:00</dc:date>
<dc:created rdf:datatype="www.w3.org/2001/XMLSchema#dateTime">2012-08-08T11:58:16+01:00</dc:created>
<dc:modified rdf:datatype="www.w3.org/2001/XMLSchema#dateTime">2012-08-09T15:21:33+01:00</dc:modified>
<sioc:num_replies rdf:datatype="www.w3.org/2001/XMLSchema#integer">0</sioc:num_replies>
<dc:classification rdf:resource="geomuse-dev.ilrt.bris.ac.uk/age/phanerozoic"/>
<dc:classification rdf:resource="geomuse-dev.ilrt.bris.ac.uk/age/mesozoic"/>
<dc:classification rdf:resource="geomuse-dev.ilrt.bris.ac.uk/age/jurassic"/>
<dc:classification rdf:resource="geomuse-dev.ilrt.bris.ac.uk/age/early-jurassic"/>
<dc:classification rdf:resource="geomuse-dev.ilrt.bris.ac.uk/age/pliensbachian"/>
<ad:regionName rdf:resource="geomuse-dev.ilrt.bris.ac.uk/place/radstock"/>
<dc:creator rdf:resource="geomuse-dev.ilrt.bris.ac.uk/person/tutcher-jw-0"/>
<geo:lat rdf:datatype="www.w3.org/2001/XMLSchema#decimal">51.257415</geo:lat>
<geo:long rdf:datatype="www.w3.org/2001/XMLSchema#decimal">-2.504067</geo:long>
</rdf:Description>

The data we are interested in here are classification and geo. Queries for data to populate the map are parameterised with bounding box coordinates and (optionally) age classification.

Drupal RDF also includes a SPARQL endpoint, and here, for ease of use from the javascript browser application, we fronted it using a RESTful Java web application. Thus an ajax request from the browser might look like:

host/bricol-geology/rest/specimens/51.73155108088844,-0.5085178417969018/50.85218333554836,-4.166965107421902/bajocian

and this would be translated into a SPARQL query by the web application:

SELECT ?id ?lat ?lng ?region
WHERE {  GRAPH ?g
  { ?id <www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat . 
    ?id <www.w3.org/2003/01/geo/wgs84_pos#long> ?lng . 
    ?id <purl.org/dc/terms/classification> <geomuse-dev.ilrt.bris.ac.uk/age/bajocian> .
    OPTIONAL { ?id <schemas.talis.com/2005/address/schema#regionName> ?region  }
    FILTER ( ?lat < 51.69240914989516 && ?lat > 50.812284718809906 && ?lng > -4.183444599609402 && ?lng < -0.5249973339844018)
  }
}

Once parsed by the web app, the json returned is along these lines:

[{"uri":"geomuse-dev.ilrt.bris.ac.uk/id/53-1",
  "regionUri":"geomuse-dev.ilrt.bris.ac.uk/place/dundry",
  "long":-2.638459,
  "lat":51.39859},
 ...
]

These points are then displayed (using Google maps api and MarkerClustererPlus) on the map tool. Roll-over popups provide further information on points as well as routes for launching collection browsing.

Javascript libraries used: jquery-1.7.2, jQWidgets, google maps api v3.

This demonstrator is still awaiting the final full Geology data set to be migrated and will be made publicly available via the Geology collection site in September. All associated code will also shortly be available under an Open Source licence.

Posted in Demonstrators | No Comments »

Users and use cases: The Geology Museum

Thursday, June 28th, 2012

The Geology Museum (site under development) is based in the University of Bristol’s School of Earth Sciences. It holds historically and scientifically important collections that are unique to the institution. The museum holds an estimated 100,000 museum specimens, many of which are unique and of international importance. Highlights include: an estimated 20,000 invertebrate fossils including material with important historical associations, over 4,500 mineral specimens, including many display-quality items from nowadays inaccessible mines, over 3,000 vertebrate fossils and casts and the Fry collection of over 4,000 invertebrate and plant fossils from the UK. There is also an extensive teaching collection of 16,000 specimens. Over the past 15 years 41,420 digital records have been produced on the basis of historic museum registers, card index catalogues and specimen labels. The creation of digital metadata has focused on valuable specimens and collection of national or international importance. These records represent about two thirds of the entire collection. Each metadata record contains information in 30 categories, 18 of which will be published by this project.

The School of Earth Sciences is already undertaking work to enhance the online presence of the Geology Museum by improving the museum website and online access to the collections. Included in this work is the migration of the existing collection metadata into a Drupal backed system, which can be used to publish Linked Data automatically.

Initial work focused on moving data from existing spreadsheet format into the Drupal database. Issues arise in the formats used, including free text, and the need to restrict terminology. There is a huge amount of data but it is largely unstructured, so requires manual effort to review and test. Unlike the Penguin Archive use case, the export and publication processes are largely automated by Drupal’s in-built modules for handling RDF, returning it in response to a Linked Data request. The aim is to embed data from the catalogue in the Geology Museum’s new public website using schema.org metadata in the HTML of the site, so that large search engines can find structured data.

The Collections & Practicals Manager in the School of Earth Sciences has suggested that a map demonstrator would be useful for the Geology Museum Linked Data. She is concerned, however, that much of the geo-location data about the collection is embedded as free text in description fields, which would make it difficult to plot the data on a map consistently, if at all. She has proposed using geodata for ‘type specimen’ data for the centre of the UK, although this also raises questions about the level of resolution at which these data could be plotted: for some, the catalogue may only include data about the nearest town or village rather than a precise geolocation related to OS references. Given the Museum’s relationships with local schools and geology enthusiast groups, one way of resolving this issue – and assisting the ‘clean up’ of the data and giving information on use of the site overall – could be to invite these ‘end users’ to provide feedback and correct location data via the site. She has arranged a meeting with one such group in July which could provide a starting point for this. It will need to be made clear to any users beyond the Museum staff, however, that the demonstrators are not at ‘full service grade’.

The Collection Manager has engaged fully with the project, participating in Advisory Board meetings, 1-1 meetings with the development team and piloting and providing feedback on data migration to Drupal. The demonstrator will provide a concrete example of how Linked Data published via Drupal can be used but evaluation of the value of embedding microdata to facilitate search engine optimisation is unlikely to extend beyond the lifetime of the project.

Posted in Demonstrators, Geology Museum, Linked Data | No Comments »

Users and use cases: The Penguin Archive

Thursday, June 28th, 2012

The Penguin Archive, housed in the Special Collections of the University of Bristol Library, contains the archives of Penguin Books Limited from its foundation in 1935 through to the 1980s. Its wide variety of materials covers the company’s establishment and business life, social events, legal cases (particularly the Lady Chatterley’s Lover trial of 1960), exhibitions on the company’s history and the private lives of prominent figures in the early history of the company. The archive also includes a large collection of Penguin books from 1935 to date. The collection comprises 2093 archive boxes of editorial files, 466 archive boxes, 24 records management boxes and 84 box files of other archival material and approximately 30,000 book titles. The digital catalogue is held in the Special Collections CALM (Computer Aided Library Management) installation. Holdings there comprise: 123 collection level descriptions containing over 4000 individual metadata records, plus detailed digital guides to areas of the archive.

JISC has already undertaken work looking at techniques for exporting Linked Data from CALM and the current Step Change project will ensure that Linked Data support is embedded in a future release of CALM, albeit not within the Bricolage project’s lifetime. We will follow the approach developed by LOCAH and SALDA projects: data will be exported as EAD/XML, transformed via XSLT into Linked Data expressed in RDF/XML format, based on the XSLT stylesheet developed within LOCAH and made available as Linked Data. A handful of collection level Penguin Archive records are already lodged with the Archives Hub. Our project will augment this data with a Linked Data set containing thousands of resource-level catalogue records, which will be linked to the Archives Hub identifiers as and when these become available.

Initial work in the project focused on archivists trying to add authority terms to catalogue metadata but this proved extremely labour-intensive, more so than anticipated. The process has revealed how good authority data is needed for good Linked Data and that this needs to be taken into account when initially cataloguing collections – not an option for an existing catalogue like the Penguin Archive. Issues with the CALM export process and stability of URIs have been reported in other project blogposts.

Early development of tools to automate as far as possible the workflow of metadata review and export indicates the need to make it easy to keep the Lnked Data up to date after project funding ends. A batch upload process could be used for initial publication. The archivists confirm that the catalogue is “quite fluid” and is often updated, so ease of use and maintenance of Linked Data are important to our users. One option for increasing the automation of the publishing process could be to upload exports to a folder which was monitored for changes. This may also address concerns that users have already expressed, i.e. that “any non-trivial publishing process would not be used in practice after the project ends”. The project will aim to make the process as ‘light-touch’ as possible.

The Archivist in the University’s Special Collections notes that the primary concern of archivists is to publish sufficient metadata to enable those interested in the materials to be able to identify what exists, and to visit the Penguin Archive to use them for research, journalistic or other purposes. The Archivists have considered what would make an appropriate demonstrator for Linked Data published through the project; they would like to focus on the ‘administrative history’ of the Archive, plotting collection level records against a timeline of, for example, dates when key staff were appointed. Administrative history is a familiar archival concept so the demonstrator would be of interest both to other archivists and potentially to end users of the catalogue/Linked Data. A visual representation of the timeline list of events would need to be created manually; within the scope and timeframe of the project this will only be possible for 1-2 decades, with just some key events plotted for the whole timeline.

The Penguin Archives archivists have engaged fully with the project, participating in Advisory Board meetings, 1-1 meetings with the development team and piloting and providing feedback on workflow processes.

Posted in Demonstrators, Linked Data, Penguin Archive | No Comments »

Bricolage: demonstrators

Friday, May 25th, 2012

At the Advisory Group meeting on 20 April, we discussed potential scope and focus of the two demonstrators that the project will develop. For the Geology Museum, we may want to focus on a demonstrator that links to promoting their work in schools, which could include a mapping feature. The Penguin Archive may want to consider a timeline demonstrator linked to a specific area of the Archive.

We looked at some examples to help refine thinking on demonstrators:

Examples of the use of a timeline:

www.simile-widgets.org/timeline/examples/compact-painter/compact-painter.html

timeline.verite.co/

Example of a geographical view:

opendatacommunities.org/imd_mapper/map.html

The Advisory Group will finalise demonstrators to be developed at its meeting in May; the key will be in demonstrating how the use of Linked Data can enhance the collections, which may in turn encourage sustainability of the tools and processes used.

Posted in Demonstrators, Geology Museum, Linked Data, Penguin Archive | No Comments »

Bricolage

Pages

Topics

Blog Archives

Newsfeeds