Participating in the Bazaar: Sharing Code in the Digital Humanities

In The Cathedral and the Bazaar, Eric Raymond argues that open source software development works more like a bazaar than a cathedral, where openness and community are prized over hierarchy and secrecy. I'd like to talk about how developing open source code has made me a better practitioner of digital humanities, and why more digital humanities (DH) scholars and projects should be participating on the open-source bazaar. I would argue that, right now, the digital humanities is getting really good at shopping/browsing at the bazaar, but not actually sharing. We seem to have no problem using open source tools and applications, but very rarely are we actually giving back, or making the development and sharing of open source code a central part of our work.

My target audience for this probably doesn't include programmers; Most of the programmers and developers I know and have worked with have already drank the open source Kool-Aid. Perhaps this has skewed my opinion on the role open-source development should play in DH, but I'm OK with that. Really, my audience for this post are two groups: Lone, individual scholars who either never considered sharing their code or who may think the code they're writing isn't worth sharing, and digital humanities centers or groups who are building significant projects but are afraid/uncertain/indifferent to the idea of sharing the code for their projects. I want to address these groups by discussing a bit of my own experience sharing code.

My Experience Sharing Code

When I began working at CHNM in 2003, I barely knew HTML, and honestly didn't know much of anything else. (It may be a miracle I was even selected a GRA.) The first year I was there, I spent a lot of time just browsing the files on the CHNM server, opening up files I never knew existed on projects, and seeing how the code helped generate the content I saw in the browser. It was eye-opening to say the least, a bit nerve-wracking, but also empowering to see all the code that made these sites go. Of course, what I was looking at wasn't publicly available, but my point here is that just looking at the code, tinkering with it, copying/pasting into my own projects was an incredible learning experience. Making code like that more open makes it possible for anyone to learn from it.

I've started sharing more of my code on my GitHub account. Some of my projects include phpZotero (a PHP class for working with Zotero), clioweb-theme (the WordPress theme I use on my personal site), and cw-authorbase (WordPress plugin to let you change the base for author URLs). But I've also developed open-source code elsewhere. ScholarPress Courseware is a plugin to do simple course management on a WordPress blog. I started it with Josh Greenberg initially to scratch an itch I had about how to set up my own course website when I started teaching. We wrote it mainly to satisfy my needs at the time, but I shared it with others, who then suggested features, and found bugs that I (and others!) could fix. Dave Lester added BibTeX import. Zac Gordon updated the admin interface to work with a later version of WordPress. Now, Stas Sushcov is using Courseware as part of a Google Summer of Code project. After looking over phpZotero, Wayne Graham wrote a Ruby wrapper for the Zotero API, and we did a bit of Zotero hacking at THATCamp in May based on that work. If I hadn't developed any of this as open source, and instead had hidden away the code, these kinds of collaborations and branches likely wouldn't have happened. There are plenty of other digital humanities folks on GitHub, all doing more or less the same thing. Following these people, and having them give me comments back on my code, has been so helpful.

At CHNM, we've started to do this with more of our projects. Right now, both Zotero and Omeka are open source. Their code is freely available to download, either via handy zip file or through a Subversion repository. (For example, here is Omeka's SVN repository, and ticketing/bug system.) Anyone can view the source and the tickets, and can request an account to submit tickets or patches if interested. Similarly, Omeka's addons repository uses SVN and has a ticketing system. Anyone who wants to develop and Omeka plugin or theme can either request access to the addons repository, or set up their own.

This week, I started a GitHub account for CHNM, and started sharing some source code I've written for some of our projects, namely the WordPress themes for Hacking the Academy and One Week | One Tool. We're planning to share more code from our individual projects, like any Omeka or WordPress themes, plugins, just about anything we develop that we can share. (Expect THATCamp-in-a-Box to be up there in some form sometime this summer!) So if anyone wants to use the Hacking the Academy or One Week themes as the basis for their own themes, they can just check out our code. If they find a bug in there and want to tell us about it, we'd be very grateful! I'm not sure how much we'll put up there that we've already written for past projects. But we want to give code back whenever possible, so anyone can learn from and use our code, and so we might get feedback and improvements from the others, too.

spacer

The "how" part for sharing code is fairly straight-forward, and there are any number of ways to do it: Sign up for a GitHub, Google Code, or some other project hosting service, and commit code there. Or install your own Git, Subversion, or other repository system on your own server. You may have to learn some kind of versioning system; There's a great piece by Julie Meloni that provides a gentle introduction to version control. Or you could just make a zip file of your code available on your project site. Add links to these on your project website and on your personal or institution website. Add a colophon or technical explanation of your site. Basically, whether you're a lone digital humanist coding stuff just for you, or a larger center or institution building grant-funded projects, get your code out there so others can see it! As an individual graduate student and digital humanities scholar, and as a project manager at CHNM, I've had nothing but positive experiences with sharing code, and I hope others do, too.

Why Share Code?

Encourage more thorough evaluation, and replication our work.

Most reviews of digital humanities projects are only of the "surface" of the project, and rarely ever deal with the code underneath. I don't think digital work can be thoroughly and critically evaluated if the evaluators have neither the technical knowledge nor the access to the source code for the digital work. And they both seem to go hand-in-hand. We should start asking projects that wish to be evaluated in any kind of review process to make their code open for the same kind of review. We could go even further and demand that projects include a technical statement or colophon explaining design and development decisions on the project. But baby steps first...

As for replication, think of it this way: We ask for footnotes in articles and monographs to prove we've done the research, and that the reader can, at any time, replicate that research if so desired. Likewise, our digital humanities projects should be more replicable than they are. It doesn't seem too much of a stretch to be able to build new digital humanities projects based on work already done and available from a previous project. Can you image how differently we would build DH projects if we knew the source code would be part of the evaluation, or that it could be possible to replicate some or all of the results of a DH project using the same source code? I know I would write better code!

Increase status and/or credit for code writing in DH.

The potential for more thorough evaluation and replication of work could certainly lead to an improvement in how we allocate reputation and credit for writing code. As it stands now, much of the work that goes into DH projects is hidden. Sharing code in the digital humanities would make the craft of creating that code much more important and prized in DH work, and would go a long way toward making that work could for things like tenure and promotion. But think of all the opportunities for collaboration on projects, and gaining reputation or credit for contributing bug fixes or features to projects. There is, of course, a whole different conversation to be had about how design and development of DH projects should count toward promotion or play into academic reputation and credit.

Encourages sustainability.

I occasionally hear questions about sustainability with regards to digital humanities projects. Writing code with the intent of sharing it makes knowledge transfer much easier, which helps make sustainability even more possible. Eric Raymond points out that, with open source projects, there is an unwritten rule that if the developer of a project feels unable to continue for whatever reason, she/he should find a suitable replacement to keep the project going, and should relinquish control over the project once someone steps up to take on that role. I'm not sure how many orphaned DH projects are out there, but I know there are a lot. I don't know if there is anyone out there who could take over some of those projects, but I would hope there are a few. Regardless, we'll never know until we begin build projects as open-source, share them, and see if there are others able and willing to contribute to their sustainability. This is especially important if, as a recent issue of Digital Humanities Quarterly addressed, work on DH projects is rarely ever done.

Knowledge exchange.

As I already mentioned, I learned more from looking at the source code for CHNM projects than I ever have from reading a programming book. I've learn more from talking to developers directly about their code, about specific projects they're working, than from working with a book discussing abstract programming terms. Given how many conversations I've seen among folks in DH lamenting the lack of training opportunities or examples for developing DH projects, I'm amazed we don't share more code to help remedy these issues. Having more DH projects with open code can help make others in digital humanities much more aware of, and literate in, using that code. Imagine how much we could learn from having the code for Valley of the Shadow or every project published by the Vectors journal. More to the point, imagine if our projects were written with the idea that we are contributing knowledge through our code, and that others would be looking at the code to gain some insight into our projects. Just like with peer review and evaluation, we might write better code, and think more about how we would explain it to others.

There are, I know, plenty of unanswered questions. What would support models for shared code be like? What license should you release code under? Plenty of others. These are legitimate questions, and should be dealt with on institutional, personal, and/or per-project bases. But we should have serious discussions about how to share code on digital humanities projects, and about how that sharing should come into play in our broader professional work. Talk to your institution or employers about it. But I think we should encourage others to reuse our work and build off of it. I'm concerned about digital humanities work becoming lost and/or irrelevant, especially since we increasingly seem to be living in a era of black boxes. We should fear our projects atrophying because their code is hidden. We should fear being irrelevant in broader discussions regarding technology and standards. We should fear not learning from all the wonderful projects we're all creating, because we're afraid of sharing our code for whatever reason.

I don't think digital humanities cannot afford to leave the programming and developing to non-humanists. I totally agree with the fine folks at NiCHE that it leaves us at the mercy of the people who do code. We should start actively write code with the intent to share it, and share as much as possible, so we might become better developers together. We should write code with an audience in mind, like digital humanities community, who may want to reuse our code. We should share our code so others can learn from us, and so we can learn from others. More than anything, though, we should share code because it's academic work, and I think academic work should be shared openly, critiqued, and improved.

Update, 7:10PM: One thing I totally forgot to mention, and am terribly embarrassed for not doing so, was that phpZotero is based a lot on Jim Safley's work on a Zend client for using the Zotero API, which is used in an upcoming Zotero Import plugin for Omeka.