You are currently browsing the archives for the General category.
At the final Advisory Group meeting towards the end of July 2012, the following points were made in relation to the evaluation of the quality of Linked Data produced and techniques used:
In terms of methodology, the bringing together of different use cases and technical expertise had worked well, despite learning curves on all sides. The project had been beneficial in raising awareness of Linked Data issues in the Special Collections and Geology teams, and of archival and cataloguing practice in the technical team. Geology and Special Collections were also more aware of each others’ collections and potential for working together in the future.
Posted in General, Geology Museum, Linked Data, Penguin Archive | No Comments »
The Bricolage Advisory Group reviewed progress against the project workplan at its second meeting on 20 April 2012. We agreed to bring forward by a month each the two remaining AG meetings – to May and July – to ensure continuing review and steer towards the end of the project in August.
The Linked Data: Hosting work package has started with options such as Talis and data.bris being considered. Timing of the data.bris project may preclude its use, although it may not be the best place to host data from Bricolage anyway. It could, however, be used to create URIs and expose data in an external view. Options to be considered more fully at the meeting in May.
The Linked Data : Metadata review/export work package is moving at a different pace with the Penguin Archive and the Geology Museum (see separate blog posts on export work). In Earth Sciences, student effort is being used to move data from spreadsheet format into an online (Drupal) database. Some of the issues arising are in formats used, including free text, and the need to restrict terminology. There is a huge amount of currently unstructured data. Review and export piloting will carry on for another 2-3 weeks. For the Penguin Archive, wok has focused on trying to add authority terms but this has proved extremely labour-intensive, more so than anticipated. Good authority data is needed for good Linked Data, and this needs to be taken into account when initially cataloguing collections. Legacy data without authority data continue to pose problems.
The entity extraction process poses the question ‘can we identify things in textual descriptions, linking unlinked data? Some online services can analyse text, eg DBPedia Spotlight entity detection and disambiguation services for constructing bespoke Named Entity Recognition solutions. The Women’s Library at London Met deals effectively with disambiguation of names. We are interested in parsing text through the process to see how useful and accurate it can be. Entity extraction is at the experimental end of our work, but Linked Data to an authority source and processes around this are of interest.
Export implementation should be complete by end of May.
Identifiers and Linking work package: Geology have thus far been creating internal links within their data. Work to link this data to other datasets has not yet started. Work on the Penguin Archive to date has highlighted a problem around the stability of URIs; sustainability is an issue for the future. For example, a person identified in the Penguin Archive data could have a unique ID in CALM but that identifier could easily break if CALM’s internal ID scheme is changed by the vendor. Alternative ID schemes that rely on the person’s name or their biographical dates also pose similar problems if, say, the person changes their name or their commonly accepted biographical dates change. We need persistent IDs (eg DOIs) in combination with a resolver service to map from persistent IDs to appropriate internal current IDs (eg CALM IDs).
On microdata, for Geology we’re looking to embed data coming out of the catalogue in the public site so that big search engines can find structured data. We’ll be looking at schema.org metadata as RDFa within the HTML of the public site.
Sustainability of the tools and workflow developed during the project is important. The key is in developing a set of processes and tools that are easy to use in terms of the export process and publication of Linked Data, so that archivists might routinely use them. Questions arise about what is most useful for the long-term, what is transferable.
The demonstrator workpackages will begin at the end of May; evaluation and dissemination work packages will be discussed at the May Advisory Group meeting.
Posted in General, Geology Museum, Linked Data, Penguin Archive | No Comments »
The Bricolage (University of Bristol Collections as Linked Open Data) project will work with two of its most significant collections to publish catalogue metadata as Linked Open Data.
The project will re-apply the best practice processes and tools produced by relevant preceding projects to create persistent identifiers, identify and create links to authoritative datasets and vocabularies, and work with the two collections’ infrastructure platforms: CALM and Drupal. The Linked Data production workflows will be embedded in the collections’ teams to ensure future sustainability. The project will also produce two simple demonstrators to illustrate the potential of data linking and reuse, and will encode resource microdata into the Geology Museum’s forthcoming online catalogue with the aim of improving collection visibility via the major search engines.
The metadata will be also licensed for ease of reuse according to JISC guidelines.
The main outputs of this project:
One of the main achievements for the project’s host institution will be the sustainable production of public open Linked Data for two of its largest collections. As well as increasing the profile, visibility and potential for reuse of the catalogues in question, the experience gained during the project will provide a solid grounding for the reapplication of the methods to other collections in future.
For the sector, the wider benefits of this work include the following.
Risk |
Probability (1-5) |
Severity (1-5) |
Score (PxS) |
Action to Prevent/Manage Risk |
Staffing |
2 |
4 |
8 |
The staff named below all have significant experience within their areas of expertise. IT Services and Bristol University in general offer a pool of staff with suitably equivalent skills in the event of any staff departures occurring in the project. |
Organisational |
3 |
1 |
3 |
The need to manage a team spanning three departments has been considered when allocating the proportion of project management. |
Technical |
3 |
3 |
9 |
The project remit will be highly focused, and is building upon work already done in this area. In addition there is experience of Linked Data within the team gained from previous JISC projects. The project also has two hosting options. |
Legal |
2 |
3 |
6 |
Licensing issues that may limit the reuse of the data produced have been considered and are not deemed to be a barrier. Both collections have committed to use permissive licences. Any software produced will be available under an open source licence. |
Stakeholder engagement |
2 |
1 |
2 |
Engagement with stakeholders is important to the project and the workplan includes effort to support engagement activities. These will also be evaluated. |
The main issue that would arise for the project if its outputs were to prove popular would be managing any excessive demands on the hosting resources. Simple downloads of the data set would not be problematic in this regard but interfaces that required server-side processing (e.g. SPARQL) could be. These questions will be considered when the project is conducting its review of the data hosting options.
Both collections within the project have committed to release their catalogue metadata as Linked Data for reuse under the ODC-PDDL or CC0 licence, as per the guidance given by the Open Bibliographic Data Guide. This commitment will ensure that the Linked Data produced will be open to reuse, and it also meets the requirement for involvement with the Talis Platform Connected Commons scheme.
Any source code produced will be the copyright of the University of Bristol. It will be made available under an open source licence for free and non-commercial use and will be available to the UK Higher Education and Further Education community in perpetuity.
The team and their roles:
M1 | M2 | M3 | M4 | M5 | M6 | M7 | ||
Governance and Engagement | Establish mailing lists, project blog and project wiki | |||||||
Advisory group establish and meet | ||||||||
Detailed work plan (to be evaluated monthly) | ||||||||
Linked Data | Hosting review | |||||||
Collection metadata review and preparation | ||||||||
Export process development | ||||||||
Identifiers and linking | ||||||||
Export implementation | ||||||||
Documentation for reuse | ||||||||
Microdata | Schema review | |||||||
Microdata markup creation | ||||||||
Embedding in Geology online catalogue | ||||||||
Sustainability | Embed Linked Data maintenance processes | |||||||
Demonstrators | Produce two demonstrations of reuse | |||||||
Evaluation | Evaluation of the Linked Data produced and the techniques used. The project methodology will also be evaluated. | |||||||
Final Reporting & Dissemination | Lessons learned, findings of value to the JISC community | |||||||
Final release of Linked Data with documentation |
A few more details on selected workpackages follow.
The project has the commitment of both the Geology Museum and the Library Special Collections as regards the hosting of the Linked Data produced. In addition the team has experience of hosting Linked Data from previous projects. However, at an early stage we will also assess the suitability of using the Talis Platform Connected Commons scheme to host the project’s Linked Data outputs. This scheme supports the publishing and the reuse of Linked Data by removing, for qualifying data sets, the associated hosting costs.
One of the first tasks of the project will be to review the current collection metadata with particular regard to its structure. While labour-intensive changes are not in scope the team will seek to make edits that will ensure coherency and aid the subsequent transformation of the data to a format that supports reuse. Examples of this may be date, place name and person name formats.
The project will also assess the scope for the archivists to undertake some limited manual enrichment of the data. An example, related to the Penguin Archive in particular, might be to add event information. So metadata for a set of minutes of a committee meeting would be extended with data describing the meeting as an event associated with a time, place, people etc.
The Penguin Archive is held in the Special Collection’s CALM installation. JISC has already undertaken work looking at techniques for exporting Linked Data from CALM, and this project will reference and build on that work, in particular the SALDA and LOCAH projects. It will also maintain links with the recently funded JISC Step Change project. This latter project will ensure Linked Data support is embedded in a future release of CALM. Although this release will not occur in the lifetime of Bricolage by keeping up-to-date with their work and other developments relevant to the Discovery programme, we will ensure our outputs will be compatible with outputs from current infrastructure projects.
For the catalogue data held in CALM the project will follow the approach developed by LOCAH and SALDA. Data will be exported as EAD/XML, transformed via XSLT into Linked Data expressed in RDF/XML format. The starting point for the transformation will be the XSLT stylesheet developed within LOCAH and made available by the Archives Hub.
Part of the project is to use metadata released by the project in conjunction with already existing open metadata. In pursuit of this goal the subject experts within the team will identify appropriate open datasets and vocabularies and lead the work to inter-link the Bristol datasets with them. Obvious examples include DBpedia and the LCSH (or FAST) and VIAF authority services. The Linked Data version of the British National Bibliography will be of particular interest to the Penguin Archive, and the CIDOC Conceptual Reference Model (CRM), and perhaps the BBC Wildlife Finder, will be for Geology. The project will also reuse the RDF vocabulary produced by the LOCAH project. We anticipate that the techniques developed by our subject experts for this process will provide interesting lessons for the community.
As noted in the Discovery programme’s draft high-level technical principles, resource discovery “relies on persistent global identifiers”. The project will follow best practice in this area and use carefully designed URIs, in consultation with other on-going institutional work in this area. These URIs will be created with interoperability and persistence in mind.
Within the Geology domain the project envisages linking geographical information about museum specimens with open access geographical databases (e.g. GeoNames) and GIS systems and interfaces. We believe that this will allow users to not only search the collection database but visualise geographical distributions of specimens and familiarise themselves with local and regional geology – a useful tool for scientists and schools.
The project will also work with the University’s online enhancement team (co-located with the project team) to embed microdata derived from the Geology metadata into their new museum website. This
microdata work will seek to use and extend the schemas found at schema.org, and as a result, will provide structured data recognizable by the major search providers. This strategy aims to improve the discoverability of the museum’s collections, as described in the Discovery programme’s draft technical principles.
For the Geology data the demonstrator will be a browser-based mapping application, allowing a user to navigate the collection via the geographic locations of the resources. This will utilise the links made from the resource metadata to open access geographical databases and will provide an example of a new and versatile way to explore the museum’s collection.
For the Penguin Archive the project will produce an interactive timeline-based interface to aspects of the collection, in particular the resources associated with the Lady Chatterley’s Lover trial. This will provide a chronological view of the data not possible using traditional catalogue data and interfaces.
Total project cost: £81,557.
Of which £43,095 from JISC, £38,462 from University of Bristol.
Posted in General | No Comments »
Bricolage, or to give it its full name, the ‘University of Bristol Collections as Linked Open Data’ project, has recently been funded by JISC as part of the 16/11 grant funding: JISC Digital infrastructure programme.
The project will start 1st Feb 2012 and run through to 31st August 2012.
This project will publish catalogue metadata as Linked Open Data for two of its most significant collections: the Penguin Archive, a comprehensive collection of the publisher’s papers and books; and the Geology Museum, a 100,000 specimen collection housing many unique and irreplaceable resources.
The metadata will be licensed for ease of reuse according to JISC guidelines.
The project will re-apply the best practice processes and tools produced by relevant preceding projects to create persistent identifiers, identify and create links to authoritative datasets and vocabularies, and work with the two collections’ infrastructure platforms: CALM and Drupal. The Linked Data production workflows will be embedded in the collections’ teams to ensure future sustainability.
The project will also produce two simple demonstrators to illustrate the potential of data linking and reuse, and will encode resource microdata into the Geology Museum’s forthcoming online catalogue with the aim of improving collection visibility via the major search engines.
Posted in General | No Comments »