DHQ: Digital Humanities Quarterly
2012
Volume 6 Number 2
2012 6.2  |  XML |  Discuss ( Comments )

Building A Volunteer Community: Results and Findings from Transcribe Bentham

Tim Causer <t_dot_causer_at_ucl_dot_ac_dot_uk >, Bentham Project, University College London
Valerie Wallace <valwall23_at_gmail_dot_com>, Bentham Project, University College London,and Center for History and Economics, Harvard University

Abstract

This paper contributes to the literature examining the burgeoning field of academic crowdsourcing, by analysing the results of the crowdsourced manuscript transcription project, Transcribe Bentham. First, it describes how the project team sought to recruit volunteer transcribers to take part, and discusses which strategies were successes (and which were not). We then examine Transcribe Bentham's results during its six-month testing period (8 September 2010 to 8 March 2011), which include a detailed quantitative and qualitative analysis of website statistics, work completed by the amateur transcribers, as well as the demographics of the volunteer base and their motivations for taking part. The paper concludes by discussing the success of our community building with reference to this analysis. We find that Transcribe Bentham's volunteer transcribers have produced a remarkable amount of work – and continue to do so, carrying out the equivalent labour of a full-time transcriber – despite the nature and complexity of the task at hand.

Introduction

1
Crowdsourcing is an increasingly popular and attractive option for archivists, librarians, scientists, and scholarly editors working with large collections in need of tagging, annotating, editing, or transcribing. These tasks, it has been argued, can be accomplished more quickly and more cheaply by outsourcing them to enthusiastic members of the public who volunteer their time and effort for free [Holley 2010].[1] Crowdsourcing also benefits the public by making available and engaging volunteers with material hitherto only accessible to diligent researchers, or with sources previously considered too complex for a non-expert to understand. A project like Galaxy Zoo, for example, has successfully built up a community of more than 200,000 users who have classified over 100 million galaxies, thus supporting a great deal of academic research [Raddick et al 2010]. Crowdsourcing aims to raise the profile of academic research, by allowing volunteers to play a part in its generation and dissemination.
2
The Bentham Project at University College London (UCL) sought to harness the power of crowdsourcing to facilitate the transcription of the manuscript papers of Jeremy Bentham (1748-1832), the great philosopher and reformer. The purpose of the Bentham Project is to produce the new authoritative scholarly edition of The Collected Works of Jeremy Bentham, which is based in large part on transcripts of the vast collection – around 60,000 folios – of Bentham manuscripts held by UCL Special Collections.[2] The Bentham Project was founded in 1958, and since then 20,000 folios have been transcribed and twenty-nine volumes have been published. The Project estimates that the edition will run to around seventy volumes; before the commencement of Transcribe Bentham around 40,000 folios remained untranscribed.
3
This new edition of Bentham’s Collected Works will replace the poorly-edited, inadequate and incomplete eleven-volume edition published between 1838 and 1843 by Bentham’s literary executor, John Bowring [Schofield 2009, 14–15, 20–22]. The Bowring edition omitted a number of works published in Bentham’s lifetime, as well as many substantial works which had not been published but which have survived in manuscript; a forthcoming Collected Works volume entitled Not Paul, but Jesus – only a part of which was previously published by Bentham, and was left out of the Bowring edition altogether – will recover Bentham’s thinking on religion and sexual morality. This material has significant implications for our understanding of utilitarian thought, the history of sexual morality, atheism, and agnosticism. Bentham’s writings on his panopticon prison scheme still require transcription, as do large swathes of important material on civil, penal, and constitutional law, on economics, and on legal and political philosophy. In short, while Bentham’s manuscripts comprise material of potentially great significance for a wide range of disciplines, much of the collection – far from being even adequately studied – is virtually unknown. A great deal of work, both in exploring the manuscripts and producing the Collected Works, clearly remains to be done.
4
The Bentham Papers Transcription Initiative – Transcribe Bentham – was established to quicken the pace of transcription, speed up publication of the Collected Works, create a freely-available and searchable digital Bentham Papers repository, and engage the community with Bentham’s ideas at a time when they are of increasing contemporary relevance.[3] Transcribe Bentham crowdsources manuscript transcription, a task usually performed by skilled researchers, via the web to members of the public who require no specialist training or background knowledge in order to participate. The project team developed the "Transcription Desk", a website, tool and interface to facilitate web-based transcription and encoding of common features of the manuscripts in Text Encoding Initiative-compliant XML. Transcripts submitted by volunteers are subsequently uploaded to UCL’s digital repository, linked to the relevant manuscript image and made searchable, while the transcripts will also eventually form the basis of printed editions of Bentham’s works.[4] The products of this crowdsourcing will thus be utilised for both scholarly and general access purposes. Transcribe Bentham was established and funded under a twelve-month Arts and Humanities Research Council Digital Equipment and Database Enhancement for Impact (DEDEFI) grant. The funding period was divided into six months of development work from April 2010, and the Transcription Desk went live for a six-month testing period in September of that year.[5]
5
Crowdsourcing is becoming more widespread, and thus, it is important to understand exactly how, and if, it works. It is a viable and cost-effective strategy only if the task is well facilitated, and the institution or project leaders are able to build up a cohort of willing volunteers. Participant motivation in crowdsourcing projects is, therefore, attracting more focused attention. The teams behind the Zooniverse projects have analysed the motivations and demographic characteristics of their volunteers in an attempt to understand what drives people to participate in online citizen-science projects, while the North American Bird Phenology Programme, established to track climate change by crowdsourcing the transcription of birdwatchers’ cards, also assessed its participants’ opinions. Rose Holley has also offered several invaluable general insights on user motivation, drawing on the experience of crowdsourcing the correction of OCR software-generated text of historic newspapers at the National Library of Australia (NLA), while Peter Organisciak has provided a useful analysis of user motivations in crowdsourcing projects. [Raddick et al 2010] [Romeo and Blaser 2011] [Phenology Survey 2010] [Holley 2009] [Holley 2010] [Organisciak 2010].
6
However, Transcribe Bentham differs from previous crowdsourcing and community collection schemes, in that its source material is a huge collection of complex manuscripts. Though several projects have crowdsourced manuscript transcription, the material they have made available is generally formulaic, or at least reasonably straightforward to decipher and understand [Old Weather Project] [Family Search Indexing] [War Department Papers]. Transcribing the difficult handwriting, idiosyncratic style, and dense and challenging ideas of an eighteenth and nineteenth-century philosopher is more complex, esoteric, and of less immediate appeal than contributing to a genealogical or community collection.
7
This paper describes how the Transcribe Bentham team sought to attract volunteer transcribers and build an online community. It outlines which strategies worked and which did not, and, drawing on qualitative and quantitative data, analyses the complexion of our volunteer base, comparing its demographic and other characteristics with those of other crowdsourcing projects. This evidence will shed more light on the nature of user participation, crowdsourced manuscript transcription, and provide guidance for future initiatives. Section one will describe our attempts to recruit a crowd and build a community of users; section two will analyse the make-up of this user base, and assess site statistics, user contributions, and motivations; and section three will consider the success of our community building with reference to this analysis.

Crowd or Community?

8
Caroline Haythornthwaite has discerned two overlapping patterns of engagement in online "peer production" initiatives like Transcribe Bentham, distinguishing between a "crowd" and a "community". Contributions made by a crowd, which Haythornthwaite describes as "lightweight peer production", tend to be anonymous, sporadic, and straightforward, whereas the engagement of a community, or "heavyweight peer production", is far more involved. A community of volunteers engaged in the latter requires, Haythornthwaite suggests, qualitative recognition, feedback, and a peer support system. Contributors tend to be smaller in number, to be less anonymous, and to respond to more complex tasks and detailed guidelines. Heavyweight peer production might also involve a multi-tiered progress system to sustain motivation; a crowd on the other hand, is satisfied with quantitative recognition, perhaps in the form of progress statistics, and a two-tiered hierarchy such as that of contributor and moderator. These two patterns, Haythornthwaite contends, are often discernable within one project [Haythornthwaite 2009].
9
Transcribe Bentham blends both heavyweight and lightweight peer production. We attracted an anonymous crowd of one-time or irregular volunteers, along with a smaller cohort of mutually supportive and loyal transcribers. We aimed to cast our net wide by opening the Transcription Desk to all, by creating as user-friendly an interface as possible, and by simplifying the transcription process as much as we could (Figure 1). But, as transcribing Bentham’s handwriting is a complex and time-consuming task which requires considerable concentration and commitment, we also tried to build a dedicated user community to enable sustained participation by, for example, implementing a qualitative and quantitative feedback and reward system. The following section will describe the strategies we devised first to recruit the crowd, and then to foster the community.
spacer
Figure 1. 
The Transcribe Bentham Transcription Desk

Recruiting the Crowd: The Publicity Campaign

10
Our publicity campaign targeted a variety of audiences including the general public, academic community, libraries, archives professionals, and schools. We devised audience-specific tactics as well as more general strategies, taking advantage of services offered by UCL to help us implement our campaign; these included the various Media Relations, Corporate Communications, Outreach, Public Engagement, and Learning and Media Services teams. In devising these strategies we had to consider issues of cost and timing. Transcribe Bentham had a limited budget to spend on publicity – £1,000 – and, as our testing period was six months only, a short time-frame in which to execute the plan. Though we hoped to target the English-speaking world, many of our strategies were, by necessity, confined to the United Kingdom.

The General Public

11
As a web 2.0 project, it was vital to have a visible and interactive online presence. We created a project blog which was regularly updated with progress reports, details of media coverage, and forthcoming presentations, and which linked directly to, and became the main entry point for, users to the Transcription Desk. We also utilised social media by creating a Twitter profile and a Facebook page, which were integrated into our blog and main Bentham Project website, which also prominently featured Transcribe Bentham. A Google Adwords account was created when the Desk went live in order to generate traffic which, owing to budget constraints, was established on a trial basis. We prepaid £60 on our account which was exhausted by the end of September.
12
Besides the web we attempted to generate awareness of the project through traditional media. With the help of UCL Media Relations, a press release was drawn up at the launch of the project in September 2010, and distributed to major British newspapers and magazines. UCL Corporate Communications assisted in designing a Transcribe Bentham leaflet for distribution, 2,500 copies of which were printed at a cost of £295 (excl. VAT). We distributed the leaflet at academic conferences and institutions in Britain, Europe, and Australasia, and the leaflet was also dispensed throughout the year at Bentham’s Auto-Icon in UCL’s South Cloisters.[6] Transcribe Bentham was also promoted via a video produced by UCL Media Services, which was embedded into our websites, and hosted on UCL’s YouTube channel.[7]

The Academic and Professional Community

13
At the outset, we believed that the academic and professional community would be the most receptive to our project. We targeted not just existing Bentham scholars, academics, and students with an interest in history and philosophy, but also those interested in digital humanities and crowdsourcing, palaeography, and information studies. We hoped to encourage a range of scholars to embed Transcribe Bentham in teaching and learning, thereby helping to build a dedicated user base and encourage Bentham scholarship.
14
We considered placing advertisements in academic journals and more mainstream subject-focused magazines. However, an advert in a single journal, with a limited print run, would have swallowed up nearly half of the publicity budget, and we felt that free coverage in the national press would achieve greater impact. Therefore, in order to reach a potentially diverse academic audience, the press release was sent to, amongst others, The Guardian, TechCrunch, The Register, Wired, Mashable, Times Higher Education, Times Educational Supplement, The Times, BBC History Magazine, and History Today. In July 2010 two articles mentioning Transcribe Bentham had appeared in the Times Higher Education, and it thus seemed sensible to approach that publication in the hope that it would run a follow-up piece [Mroz 2010] [Cunnane 2010].
15
Notifications were sent to a large number of academic and professional mailing lists, online forums, and the websites of academic societies. Though some bodies failed to respond, most of those contacted circulated an announcement about Transcribe Bentham via their list or featured it on their websites. Besides these initiatives, project staff delivered presentations on Transcribe Bentham at several seminars, conferences, and workshops throughout the year.[8] We also engaged in consultation with representatives from different repositories including the National Library of the Netherlands, The National Archives (UK), the Natural History Museum (UK), and Library and Archives Canada.
16
To promote Transcribe Bentham to palaeography, information studies, and research methods students, we contacted individual academics, libraries, archives, and educational bodies including the Higher Education Academy History Subject Centre, Senate House Library, and The National Archives. This outreach was generally successful and met with enthusiastic responses, though The National Archives responded negatively, stating that only notifications relating to "government departments, archives and organisations directly relevant to the activities of The National Archives" could be posted on their site. On the recommendation of the HEA History Subject Centre, we created pages on the reading of historical manuscripts to demonstrate how Transcribe Bentham could be used as a tool in teaching and learning.[9] The Subject Centre subsequently produced a review of the resource recommending its use for palaeographic and historical training in undergraduate History classes [Beals 2010]. Dr Justin Tonra, then a Research Associate on the project, also contributed a tutorial using Transcribe Bentham to TEI by Example, an online resource run by the Royal Academy of Dutch Language and Literature, King's College London, and UCL.[10]

Schools

17
At the development stage, project members anticipated that school pupils and their teachers, particularly those undertaking A-levels in Religious Studies, Philosophy, History, Law, and Politics, could be another potential audience, especially considering Bentham specifically features in the curricula for Religious Studies and Philosophy. Once the project got underway, it was tailored in such a way so as to attract schools and colleges. We created pages with information explaining how Transcribe Bentham related to relevant A-Levels and Scottish Highers, including reading lists and direct links to groups of manuscripts of relevance to particular areas of study.[11] We aimed, moreover, to target school teachers and pupils through the media and the web. Our press release was sent to educational publications, while notices and invitations to post links to our site were sent to a range of educational websites and bodies.
18
A-level pupils from the Queen’s School in Chester visited the Bentham Project in summer 2010 before the Transcription Desk went live, where they tested the website; their experience was written up in the school’s website and in the local newspaper [Chester Chronicle 2010]. A link to the Transcription Desk was later included on the school’s virtual learning environment. In order to attract more schools to the project we invited school groups to visit the Project to see the Auto-Icon, hear a short lecture, and participate in the transcription exercise. We drew up a letter outlining these details which we sent, along with the Transcribe Bentham leaflet, to c.500 state schools in London, the cost of printing and postage for which was around £360. Raines Foundation School in Bethnal Green, London, responded positively to the outreach letter and arranged a visit in November of A-level Philosophy students who participated in the initiative [Bennett 2010]. The class teacher and one of his pupils were also interviewed about Transcribe Bentham for a broadcast journalism project at City University London.

Success?

19
In terms of raising awareness of the initiative, the publicity campaign has been a success. Despite mainly targeting English-speakers and the UK, particularly with our press release, the project has received media coverage in twelve countries including the United States, Australia, Japan, Germany, Norway, Sweden, Austria, and Poland. We estimate that the project has been mentioned in around seventy blogs, thirteen press articles, and two radio broadcasts. As of 3 August 2012, we have acquired 853 followers on Twitter, and 339 fans on Facebook. Transcribe Bentham has certainly made an impact on the academic community and libraries and archives profession; its progress has been tracked by JISC and the Institute of Historical Research, and it has been reviewed by the Higher Education Development Association, and the Higher Education Academy [Dunning 2011] [Winters 2011] [Elken 2011] [Beals 2010]. Transcribe Bentham is also being used as a model for archives discovery by repositories in Europe and North America, and has been featured in the professional blog of the British Library [Shaw 2010]. The project has been embedded, moreover, into teaching and learning in Queen’s University Belfast, Bloomsburg University, the University of Virginia, and King’s College London.
20
More recently, Transcribe Bentham was honoured with a highly prestigious Award of Distinction in the Digital Communities category of the 2011 Prix Ars Electronica, the world’s foremost digital arts competition, and staff were given the opportunity to speak about the project at that year’s Ars Electronica festival.[12] This is testimony to the project’s international impact, both inside and outside the academy, with the Digital Communities jury commending Transcribe Bentham for its "potential to create the legacy of participatory education and the preservation of heritage or an endangered culture" [Achaleke et al 2011, 206]. Transcribe Bentham was also one of five crowdsourcing projects shortlisted for the 2011 Digital Heritage Award, part of that year’s Digital Strategies for Heritage Conference.[13]
21
We hoped that our considerable efforts in publicising the project, and crowdsourced transcription, would help us to recruit a large crowd of volunteers. We also implemented strategies to retain this crowd and transform it into a loyal community.

Building the Community

The Interface

22
Retaining users was just as integral to the project’s success as recruiting them in the first place. It was therefore important to design a user-friendly interface which facilitated communication in order to keep users coming back to the site, and to develop a sense of community cohesion [Causer, Tonra and Wallace 2012]. The Transcription Desk was developed using MediaWiki, an interface familiar to, and easily navigable by, the millions of those who have browsed, used and contributed to Wikipedia. It was decided that offering remuneration for contributions would be contrary the collaborative spirit of the project, and so platforms such as Amazon’s Mechanical Turk were discounted, in favour of open source software. An alternative approach would have necessarily limited participation in Transcribe Bentham, as well as the level of engagement with and access to material of national and international significance.
spacer
Figure 2. 
The Transcription Toolbar
spacer
Figure 3. 
Transcribing and Encoding a Manuscript
23
The features of MediaWiki were utilised in an attempt to forge a virtual community engaged in heavyweight peer production. We provided detailed, clearly-written guidelines to explain the process of transcription and encoding, along with a "quick-start" guide to summarise the main points. Training videos and downloadable files were embedded in order to provide an audiovisual aspect to the learning experience, and an intuitive toolbar was developed so that volunteers otherwise unfamiliar with text encoding could add the relevant TEI-compliant XML tags at the click of a button (Figures 2 and 3). In order to give regular feedback to users and to provide a platform for shared resources, we included a discussion forum on the Desk’s main page where volunteers could swap ideas, ask questions, or make requests of the project editors. Each registered participant was given a social profile which could be left anonymous or populated with an avatar and personal information, including his/her home town, occupation, birthday, favourite movies, and favourite Bentham quotation (Figure 4). Each volunteer profile also included a personal message board and an "add friends" function; we hoped that registered users would be able to message each other privately or publicly and build up a cohort of transcriber friends.
spacer
Figure 4. 
Example of a Transcribe Bentham volunteer profile
spacer
Figure 5. 
The Transcribe Bentham Benthamometer progress bar
spacer
Figure 6. 
The Transcribe Bentham Leaderboard
24
The project editors used the message function on a daily basis to communicate with and provide feedback to transcribers. The "Benthamometer"[14] tracked the progress of transcription, while the leaderboard recorded and publicly recognised the efforts of the most diligent transcribers (Figures 5 and 6).[15] Volunteers received points for every edit made; as an incentive we devised a multi-tiered ranking system, a progress ladder stretching from "probationer" to "prodigy" for transcribers to climb.[16] We also intended to utilise a gift function which allowed editors to award users with virtual gifts – an image of the Collected Works for example – whenever they reached a milestone. "Team-building" features like these have been found to be useful in stimulating participation by other projects like Solar Stormwatch and Old Weather: we hoped to facilitate interaction between users, to generate healthy competition, and to develop a sense of community. However, some of the social features of the site, including the "add friends" option and the gift-awarding feature, malfunctioned at the development stages. These problems, as will be discussed below, may have been an impediment to social integration.

Community Outreach: Beyond the Virtual

25
Though we aimed to create a cohesive online community, we were also keen to move beyond the virtual and add a personal element to the initiative by organising a series of public outreach events. This programme was arranged in consultation with local amateur historians and aimed to start a dialogue between professionals and amateurs, to engage the public, and to situate Bentham and UCL more firmly within the local community. We wanted to engage the interest of amateur historians in Transcribe Bentham as well as to give our regular transcribers a chance to meet project staff. These events were held in May 2011 and included two information sessions, one held at UCL and one held externally, as well as a guided walk around Bentham’s London.[17] In terms of integrating the Transcribe Bentham community, this strategy, discussed in more detail below, had limited success.
26
The project team devised, therefore, a range of strategies to recruit a crowd and build a cohort of dedicated transcribers; on his blog discussing crowdsourced manuscript transcription, Ben Brumfield commented that Transcribe Bentham "has done more than any other transcription tool to publicize the field" [Brumfield 2011]. As Transcribe Bentham's attempts to crowdsource highly complex manuscripts are novel, the project team was only able to draw on the general experiences of other crowdsourcing projects when making its decisions regarding the recruitment plan. The strategies employed were to a large extent experimental. The following sections of this paper will assess the complexion of our user base and consider how successful these strategies were in forging a Transcribe Bentham community.

The Results

27
Our six-month testing period lasted from 8 September 2010 to 8 March 2011, and during this time 1,207 people registered an account (discounting project staff, and seven blocked spam accounts).[18] Between them these volunteers transcribed 1,009 manuscripts, 569 (56%) of which were deemed to be complete and locked to prevent further editing. Though the fully-supported testing period has ceased, the Transcription Desk will remain available dependent on funding, and Transcribe Bentham has become embedded into the Bentham Project’s activities. As of 3 August 2012, the project now has 1,726 registered users. 4,014 manuscripts have been transcribed, of which 3,728 (94%) are complete and locked to prevent further editing. However, unless otherwise stated, the analysis below pertains to the six-month testing period.[19]
28
In this section, we will assess site statistics, user demographics, behaviour, and motivations. Our findings are derived from quantitative data provided by a Google Analytics account,[20] analysis of statistics collated from the Transcription Desk, qualitative findings from a user survey, and comparisons with other studies of crowdsourcing volunteer behaviour.[21] The survey received 101 responses – about 8% of all registered users – 78 of which were fully completed. While it is, therefore, not necessarily representative of the entire user base, the survey contains a great deal of revealing information about those who did respond.
29
Before reviewing the data, it is worth taking note of the following milestones in the project’s life during the testing period:
  • 8 September 2010: official launch of the Transcription Desk and first wave of publicity
  • 27 December 2010: New York Times article featuring Transcribe Bentham published online [Cohen 2010]
  • 28 December 2010: New York Times article published in print
  • 1 February 2011: first broadcast of Deutsche Welle World radio feature[22]
  • 1 and 2 February 2011: each registered user received an invitation to take part in the Transcribe Bentham user survey
  • 8 March 2011: end of testing period
30
As will be seen, the publication of the article in the New York Times (NYT) had a vital and enduring impact upon Transcribe Bentham, and it is thus helpful to consider the testing period as having had two distinct parts: Period One, or the pre-NYT period, covers 8 September to 26 December 2010 (110 days); and Period Two, the post-NYT period, encompasses 27 December 2010 to 8 March 2011 (72 days).

Site Visits

31
During the six months as a whole, the Transcription Desk received 15,354 visits from 7,441 unique visitors, or an average of 84 visits per day (see Figure 7).[23] Period One saw 5,199 visits, while in Period Two, there were 10,155. It is clear, then, that traffic to the site during the shorter Period Two was much greater than the longer Period One, but this is far from the full story.
spacer
Figure 7. 
Visits to the Transcription Desk, 8 September 2010 to 8 March 2011
32
Following the publicity surrounding the launch of the Transcription Desk, there were 1,115 visits to the site during the first week, though things settled down soon afterwards when, during the remainder of Period One the site subsequently received an average of forty visits per day. Indeed, in November and December the number of daily visits rarely rose above thirty, on occasion reached sixty, but dropped as low as seven during mid-to-late December. Traffic to the Transcription Desk had essentially flatlined, though the volunteers then taking part had transcribed 350 manuscripts by the time Period One ended.
33
Then came the NYT article. From eleven visits on 26 December, traffic rocketed to 1,140 visits on 27 December, with a further 1,411 the following day. Remarkably, thirty per cent of all visits during the testing period to the Transcription Desk came between 27 December 2010 and 4 January 2011. The NYT article also had the effect of increasing the regular level of traffic to the site, to an average of 141 visits per day. The number of visits did not regularly drop below 100 until 19 January, and from then to 8 March the site rarely received fewer than sixty per day. In short, the publicity derived from the NYT article provided a level of traffic and an audience of potential volunteers which it is hard to see how we would have otherwise reached.[24]
34
The Transcription Desk has been visited by users from ninety-one countries (Figure 8); most visits over the six months were from the United States, with the UK in second place.[25] This again reflects the NYT's impact and lack of comparable British press coverage during Period Two, as during Period One there were more than double the number of visits from Britain as there were from the United States (Table 1).
spacer
Figure 8. <">
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.