Google Analytics on the Dashboard. --> JISC PoWR » Digital preservation

JISC PoWR

Preservation of Web Resources: a JISC-funded project [Archived Blog]

  • PoWR Categories

    • Case studies (4)
    • Challenges (10)
    • Digital preservation (22)
    • Events (27)
    • Future (5)
    • Guest Post (1)
    • Legal (2)
    • missinglinks09 (8)
    • missinglinks09mg (7)
    • Policies (10)
    • Preservation (14)
    • Project news (21)
    • Records management (10)
    • Resources (3)
    • Selection (7)
    • Software (3)
    • Technologies (5)
    • Web 1.0 (22)
    • Web 2.0 (28)
    • Workshops (10)
  • Recent Posts

    • Goodbye from the JISC PoWR blog
    • Cessation of posts to the JISC PoWR blog
    • A Guide to Web Preservation
    • Making any Upgrades to your Blog Sir?
    • JISC Beginner’s Guide to Digital Preservation
    • The Library of Congress Twitter Archive
    • Blue Ribbon Task Force Publishes Sustainable Economics for a Digital Planet
    • Storing Information in the Cloud
    • “A Fifth Of BBC Sites Are Already Dead”
    • “Why study the web?” – Monday 8th March, Royal Society
  • Recent Comments

    • BlogForever: Thoughts about blog data and metadata | ulcc da blog on ArchivePress: When One Size Doesn’t Fit All
    • What is the Average Lifespan of a Web Site on What’s the average lifespan of a Web page?
    • The Average Lifespan of a Webpage « ARCHIVE CULTURES NEWS COLLECTION by amateur_archivist on What’s the average lifespan of a Web page?
    • JISC Beginner's Guide to Digital Preservation » Blog Archive » Update on the LOC Twitter Archive on The Library of Congress Twitter Archive
    • Thoughts about blog data and metadata | BlogForever on ArchivePress: When One Size Doesn’t Fit All
  • Archives

  • Status of this Blog

    This blog was used to support the JISC PoWR projec which ran from April 2008 to November 2010. The project has delivered its outputs and is now complete. The blog has now been frozen and we do not intend to publish any new posts.
  • Partners

    • JISC IIE
    • UKOLN
    • ULCC
  • Partner blogs

    • Opencontentlawyer
    • UK Web Focus
    • ULCC DA Blog
    • JISC Beginner's Guide to Digital Preservation
  • Licence

    spacer
    Posts on this blog are licenced under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales License. Comments posted to this blog will also have the same licence.
  • Meta

    • Log in
  • Subscribe

    • Entries (RSS)
    • Comments (RSS)

Archive for the 'Digital preservation' Category

« Previous Entries

Blue Ribbon Task Force Publishes Sustainable Economics for a Digital Planet

Posted by Marieke Guy on 22nd April 2010

Universities grappling with complex decisions on which of their burgeoning digital resources they should preserve – and the inherent financial, technical and legal issues that surround such work – may welcome a report that offers a “supply-and-demand” perspective on how individuals and institutions might manage their digital collections.

The Blue Ribbon Task Force on Sustainable Digital Preservation and Access (BRTF-SDPA), a new international initiative funded by JISC and other organisations, has recently released its report entitled Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information. Its report examines the complicated and diverse issues from an economic standpoint. It identifies the problems intrinsic to all preserved digital materials, and proposes domain-specific actions that address the challenges to sustainability. The report focuses its inquiry on materials of long-term public interest in content domains with diverse preservation profiles, namely scholarly discourse, commercially owned cultural content and collectively produced Web content.

JISC is organising a free one-day symposium in London on 6 May 2010 where the Blue Ribbon task force will present its final report and invite responses from the BBC, the Natural History Museum, the British Library, European Bioinformatics Institute and the European Commission. Further information is available.

Posted in Digital preservation | No Comments »

Official Launch of the UK Web Archive

Posted by Marieke Guy on 26th February 2010

The British Library has officially launched the UK Web Archive, offering access in perpetuity to thousands of UK websites for generations of researchers.

The site was unveiled earlier this week by the Minister for Culture and Tourism, the Rt Hon Margaret Hodge MBE MP, and Chief Executive of the British Library, Dame Lynne Brindley, this project demonstrates the importance and value of the nation’s digital memory.

Websites included in the UK Web Archive include:

  • The Credit Crunch – initiated in July 2008, this collection contains records of high-street victims of the recession – including Woolworths and Zavvi.
  • Antony Gormley’s ‘One & Other’ Trafalgar Square Fourth Plinth Project – involving 2,400 participants and streamed live by Sky Arts over the web to an audience of millions, this site will no longer exist online from March 2010.
  • 2010 General Election – work has started to preserve the websites of MPs such as Derek Wyatt, who will be retiring at the next election, creating a permanent record of his time as a Member of Parliament.

This important research resource has been developed in partnership with the National Library of Wales, JISC and the Wellcome Library, as well as technology partners such as IBM.

British Library Chief Executive, Dame Lynne Brindley said:

Since 2004 the British Library has led the UK Web Archive in its mission to archive a record of the major cultural and social issues being discussed online. Throughout the project the Library has worked directly with copyright holders to capture and preserve over 6,000 carefully selected websites, helping to avoid the creation of a ‘digital black hole’ in the nation’s memory.

“Limited by the existing legal position, at the current rate it will be feasible to collect just 1% of all free UK websites by 2011. We hope the current DCMS consultation will enact the 2003 Legal Deposit Libraries Act and extend theprovision of legal deposit through regulationto cover freely available UK websites, providingregular snapshots ofthe free UK web domainforthebenefit of future research.

Further details are available from the British Library.

Posted in Digital preservation, Preservation, Web 1.0 | 1 Comment »

Bookings open for 5th International Digital Curation Conference

Posted by Marieke Guy on 6th November 2009

spacer

5th International Digital Curation Conference

“Moving to Multi-Scale Science: Managing Complexity and Diversity” | 2-4 December 2009

The IDCC is an established annual event reaching out to individuals, organisations and institutions across all disciplines and domains involved in curating data for e-science and e-research.

The DCC will be hosting a workshop programme on 2 December followed by a Pre-Conference Drinks Reception at the Natural History Museum. The main conference will open on 3 December with a keynote speech from Professor Douglas Kell, Chief Executive of the Biotechnology & Biological Sciences Research Council (BBSRC). Other key speakers will include: Professor Ed Seidal, National Science Foundation; Cliff Lynch, Coalition for Networked Information; Timo Hannay, Nature Publishing Group. The first day of the conference will incorporate an interactive afternoon for posters and demos, followed by a Symposium entitled “Citizen Science: Data Challenges” led by Richard Cable, BBC Lab UK.

The second day will be made up of peer-reviewed papers in themed sessions covering Disciplinary and Institutional Challenges, Practitioner Experience, Metadata, Software Preservation & Managing Risk.

Places are limited so please register now.

Registration to close on 20 November 2009

Posted in Digital preservation, Events | No Comments »

“Why you never should leave it to the University”

Posted by Brian Kelly on 19th August 2009

A blog post from Richard Gatarski begins with the blunt announcement:

A year ago my academic web site disappeared. And those who made it go away probably ignored that such a thing could happen.

The article goes on to describe how last year Richard “found out that the School of Business had redesigned their web site. And in the process they just ignored my research. About ten years worth of virtually daily updates were gone That included most of the manuscripts for my published work. The same thing happened to lecture notes, powerpoint slides, course documentations, useful links, etc. It had all disappeared from the Web!“.

Richard did have some good news to report: “Courtesy of the Internet Archive you can still find most of my academic stuff on the Web through their Wayback machine.” although Richard did wonder why he had to rely on the Internet Archive (“a 501(c)(3) non-profit that was founded to build an Internet library”) – after all, wouldn’t you expect your institutional library to provide this service?

Richard’s losses of his digital resources have continued – a blog he set up at Stockholm University was deleted after he left the institution – although, again a copy is archived on the Internet Archive.

Richard’s experiences have left him disillusioned with the attitudes towards the digital preservation of scholarly resources. He concludes by recommending that academics take responsibility themselves for preserving their resources:

Meanwhile, for those of you who publish stuff on the Web while working with an organisation, including universities. Try to put your content where you control it. Most likely you will move between work places, temporary assignments, and soforth. If you want your stuff to be preserved, it is your responsability to make sure it is.

But how easy will this be for the typical academic? Richard doubts whether “the issues I bring forward today are heavily discussed among university chancellors, political leaders, educational policy makers, and scientific philosophers.”  But surely we need to ensure that this debate takes place. And, in today’s economic climate, that debate needs to include discussions of the costs of digital preservation (disk storage may be cheap but management of content is not).

Richard’s tale is based on his experiences as an academic in Sweden. Is the situation different in the UK, I wonder?  Judging by Stuart Smith’s lament that “Mummy I lost my MP3!“, which I summarised in a post on “Disappearing Resources On Institutional Web Sites” in December 2008 it would seem that we have similar experiences in the UK higher education sector. Does anyone have any positive experiences to share?

Posted in Digital preservation | 2 Comments »

What’s the average lifespan of a Web page?

Posted by Marieke Guy on 12th August 2009

…or is it easier to ask how long is a piece of string?

The statistic much banded about (for Web pages not pieces of string!) is 44 days, believed to originate in an article by Brewster Kahle (of Internet Archive fame) published in 1997 and titled Preserving the Internet. Brewster’s original quote is specifically about URLs, “…estimates put the average lifetime for a URL at 44 days.

Whether this figure still stands today is a matter currently being discussed on the CURATORS@LIST.NETPRESERVE.ORG list after a query from Abigail Grotke of the Library of Congress.

Abbie offered up the 44 day statistic and pointed out that on the Digital Preservation Web site they have a graphic that discusses Web volatility stating “44% of the sites available on the internet in 1998 had vanished one year later“.

The other figure often cited is 75 days from a Michael Day’s report Collecting and preserving the world wide web.

The dynamic nature of the Web means that pages and whole sites are continually evolving, meaning that pages are frequently changed or deleted. Alexa Internet once estimated that Web pages disappear after an average time of 75 days. (Lawrence, et al.,2001, p. 30).

Another figure sometimes suggested is 100 days, this seems to come from Rick Weiss article for the The Washington Post, Washington, DC, 24 November 2003, On the Web, Research Work Proves Ephemeral –  no longer available.

So what is the average lifespan of a Web page today? Is it getting shorter or longer? The Internet Archive now gives 44 -75 days as its ball park figure. I’d have to hazard a guess that with the rise in use of Web 2.0 technologies the Web is actually getting more transient by the day.

Is this OK?

Maybe if it’s just a tweet you sent your friend, however if it’s something more substantial that’s disapearing then it’s a real worry.

Posted in Digital preservation, Web 1.0, Web 2.0 | 6 Comments »

Missing links: the enduring web

Posted by Marieke Guy on 11th June 2009

The JISC PoWR team will be involved in the forthcoming Workshop on missing links: the enduring web.  The workshop is sponsored by the Digital Preservation Coalition (DPC) and the Joint Information Systems Committee (JISC) and organised by the six partners of the UK Web Archiving Consortium (British Library, National Library of Wales, JISC, Wellcome Library, The National Archives and the National Library of Scotland). It will be held on Tuesday 21st July 2009 at the British Library Conference Centre, London.

Richard Davis, ULCC, will be giving a presentation on Diamonds in the Rough: Capturing and Preserving Online Content from Blogs. Other members of the team will be presenting posters on the JISC-PoWR Project and on Preservation Policies and Approaches for Use of Web 2.0 Services.

In next few posts we’ll describe in more detail what we’ll be covering. Online registration is still open and closes on Friday 10th July 2009. We hope to see you there…

Posted in Digital preservation, Events, missinglinks09, missinglinks09mg | 1 Comment »

Archiving the US Election 2004 Web sites

Posted by Marieke Guy on 30th April 2009

The Library of Congress have recently made their US Election 2004 Web Archive available from the Library of Congress Web Archives site. The Election 2004 Web Archive is a selective collection of approximately 2,000 Web sites associated with the United States Presidential, Congressional, and gubernatorial elections. It is part of a continuing effort by the Library’s Web Archiving Project Minerva to evaluate, select, collect, catalogue, provide access to, and preserve digital materials for future generations of researchers.

The archived material includes blogs (such as blogs for Bush). Currently permission is necessary for offsite access for researchers. All archived Web sites are available to researchers onsite at the Library of Congress.

Metadata

At the Library of Congress they are currently providing metadata for individual Web sites through brief records using the MODS schema. There is a MARC collection level record (for the collection itself) with a link to an entry/overview page for each collection that links to search and browse functions with MODS metadata for each individual Web site that was collected.

An overview of their metadata approach (at the collection and item levels) is available. They are also in the process of developing more formal descriptive metadata profiles for their digital content and have developed one for the Library of Congress Web archives.

For a list of publicly available Library of Congress Web archives and access to each, see the Library of Congress Web Archives site.

More information on activities at the Library of Congress are given in a Powerpoint presentation given at the Digital Library Federation 2008 Fall Forum.

Posted in Digital preservation | 1 Comment »

Archiving a wiki

Posted by Ed Pinsent on 25th March 2009

On dablog recently I have put up a post with a few observations about archiving a MediaWiki site. The example is the UKOLN Repositories Research Team wiki DigiRep, selected for the JISC to add to their UKWAC collection (or to put it more accurately, pro-actively offered for archiving by DigiRep’s manager). The post illustrates a few points which we have touched on in the PoWR Handbook, which I’d like to illuminate and amplify here.

Firstly, we don’t want to gather absolutely everything that’s presented as a web page in the wiki, since the wiki contains not only the user-input content but also a large number of automatically generated pages (versioning, indexing, admin and login forms, etc). This stems from the underlying assumption about doing digital preservation, mainly that it costs money to capture and store digital content, and it goes on costing money to keep on storing it. (Managing this could be seen as good housekeeping. The British Library Life and Life2 projects have devised ingenious and elaborate formulae for costing digital preservation, taking all the factors into account to enable you to figure out if you can really afford to do it.) In my case, there are two pressing concerns: (a) I don’t want to waste time and resource in the shared gather queue while Web Curator Tool gathers hundreds of pages from DigiRep, and (b) I don’t want to commit the JISC to paying for expensive server space, storing a bloated gather which they don’t really want.

Secondly, the above assumptions have led to me making a form of selection decision, i.e. to exclude from capture those parts of the wiki I don’t want to preserve. The parts I don’t want are the edit history and the discussion pages. The reason I don’t want them is because UKWAC users, the target audience for the archived copy – or the designated user community, as OAIS calls it – probably don’t want to see them either. All they will want is to look at the finished content, the abiding record of what it was that DigiRep actually did.

This selection aspect led to Maureen Pennock’s reply, which is a very valid point – there are some instances where people would want to look at the edit history. Who wrote what, when…and why did it change? If that change-history is retrievable from the wiki, should we not archive it? My thinking is that yes, it is valuable, but only to a certain audience. I would think the change history is massively important to the current owner-operators of DigiRep, and that as its administrators they would certainly want to access that data. But then I put on my Institutional records management hat, and start to ask them how long they really want to have access to that change history, and whether they really need to commit the Institution to its long-term (or even permanent) preservation. Indeed, could their access requirement be satisfied merely by allowing the wiki (presuming it is reasonably secure, backed-up etc.) to go on operating the way it is, as a self-documenting collaborative editing tool?

All of the above raises some interesting questions which you may want to consider if undertaking to archive a wiki in your own Institution. Who needs it, how long for, do we need to keep every bit of it, and if not then which bits can we exclude? Note that they are principally questions of policy and decision-making, and don’t involve a technology-driven solution; the technology comes in later, when you want to implement the decisions.

Posted in Challenges, Digital preservation, Records management, Selection, Web 2.0 | No Comments »

LIWA – Living Web Archives

Posted by Kevin Ashley on 6th March 2009

The PoWR project identified a number of technical challenges which made certain types of content – particularly that with a Web 2.0 flavour – particularly difficult to manage and preserve in an effective way. My attention has recently been drawn to an EU-funded project which hopes to overcome a number of these technical problems, as well as others that are applicable to large-scale archiving such as the problem of spam content.

LIWA – Living Web Archives – began in early 2008, but as with many EU projects, its startup phase involved a lot of internal activity without much of a public face. As a result we didn’t pick up on its work in the JISC-PoWR handbook, but I’m sure we’ll rectify this omission in any future revisions.

To pick one example of LIWA’s areas of interest, it intends to develop tools which make it easier to take a temporal view of web archives and to maintain temporal consistency. Temporal consistency – or rather its absence – will be familiar to anyone who has spent time exploring sites in the Internet Archive, where different pages, or even portions of the same page (such as images) will have been archived on different days. This can lead to occasional surprises when navigating through archived content, with links taking one to pages that don’t have the expected content.

LIWA’s partner’s include Hanzo, a UK-based web archive services company that we covered briefly in the handbook; I hope we can explore their potential value to UK HE in the future.

Posted in Digital preservation, Future, Technologies | 1 Comment »

Considerations for the Preservation of Blogs

Posted by Marieke Guy on 23rd February 2009

DigitalPreservationEurope (DPE) fosters collaboration and synergies between many existing national digital preservation initiatives across the European Research Area. As part of their work they publish concise overviews of key digital preservation and curation issues. Earlier this month they published a briefing paper on Considerations for the Preservation of Blogs (PDF).

The preamble sets the context for the paper:

Blogs, it seems, are everywhere these days, but what about the next day (and the next and the next …). Opinions vary on whether or not blogs merit preservation beyond the actions of a blog’s respective authors. This briefing paper does not contribute to that dialogue. Rather, it provides an overview of issues to be considered by organizations planning blog preservation programs. Blogs are the product of a network of players, including blog authors, service providers, and readers. Discussed here are some key attributes of blogs, and the characteristics and behaviors of these players, which may impact preservation activities.

During the JISC PoWR project we recognised that despite blogs initially being commonly characterised as ephemeral (as commented on in the DPE paper) their increasing importance and role in both the research context and in our cultural history is becoming apparent, and like other Web resources their preservation is a matter that needs to be addressed, somehow.

The PoWR blog has a number of interesting posts on the preservation of blogs including:

  • Legal scholarship recognises long-term value of blogs
  • Student Blogs
  • Auricle: The Case Of The Disappearing E-learning Blog

There is a also a section on preservation of blogs in the JISC PoWR handbook.

Posted in Digital preservation, Web 2.0 | No Comments »

The Fet

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.