Learning by Doing: Labs and Pedagogy in the Digital Humanities

5 Replies

The digital humanities adore labs. Labs both symbolize and enable many of the field’s overarching themes: interdisciplinary teamwork, making/building, and the computing process itself. Labs give digital humanists a science-y legitimation that, whether we admit it or not, we find appealing. Labs aren’t necessary for doing digital humanities research, but in terms of infrastructure, collaboration, and institutional backing they certainly help. Along with “collaboration” and “open” (and possibly “nice“), “lab” is one of the field’s power words. With a period of accelerated growth over the past five years, world-wide digital humanities labs and centers now run into the hundreds. We overwhelmingly focus on labs in this kind of context: labs as physical research spaces. I’d like to move away from this familiar ground to discuss the role of lab assignments within a digital humanities curriculum. While reflecting on my own recent experience of designing and using labs in the classroom, I realized it spoke to many of the current issues facing the digital humanities.

Let me start with some background. This past autumn I taught my first college course, “The Digital Historian’s Toolkit: Studying the West in an Age of Big Data.” It was one of Stanford History Department’s Sources & Methods seminars, which are classes aimed at history majors to get them working intensively with primary sources. When I was designing my course a year ago, I decided to blend a digital humanities curriculum with more traditional historical pedagogy. Under the broad umbrella of the nineteenth-century American West, I used a specific historical theme each week (mining, communications, tourism, etc.) to tie together both traditional analysis and digital methodology. As part of this, over five different class periods students met in the Center for Spatial and Textual Analysis to complete a weekly lab assignment.

In designing the course, I wrestled with a problem that faces every digital humanist: the balancing of “traditional” (for lack of a better term) and “digital.” How much of my curriculum should follow a seminar model based on reading and discussion? How much should it follow a lab model based on technical tools and techniques? As is often the case, pragmatism partially informed my decision. Because my class was part of a required series of courses offered by the department, I couldn’t simply design a full-blown digital humanities methods course. It had to have a strong historical component in order to get approved. This juggling act is not uncommon for digital humanists. But more philosophically, I believed that digital tools were best learned in the context of historical inquiry. An overarching theme (in my case, the late nineteenth-century West) helped answer the question of why a student was learning a particular piece of software. Without it, digital pedagogy can stray into the bugaboo waved about by skeptics: teaching technology for technology’s sake.

I designed my labs with three goals in mind. First, I wanted my students to come away with at least an introduction to technical skills they wouldn’t otherwise get in a typical history course. Given my background, I focused largely on GIS, textual analysis, and visual design. I didn’t expect my students to become geospatial technicians in ten weeks, but I did want them to try out these kinds of methods and understand how they could be applied to historical problems. This first goal speaks to the alarmist rhetoric of a “crisis in the humanities,” of falling enrollments and shrinking budgets and growing irrelevance. In this caricature, the digital humanities often get remade as a life-boat for a sinking ship. This view is obviously overblown. But it is important to remember that the vast majority of our students are not going to end up as professors of history, literature, or philosophy. While there is a strong case to be made for the value of the humanities, I also think we need to do a better job of grafting other kinds of skills onto the field’s reading/writing/thinking foundation.

Second, I wanted students to learn technical skills as part of a larger intellectual framework. I pursued this in part by assigning specific techniques to answer larger questions. For instance, how does Mark Twain’s western novel Roughing It compare to other iconic nineteenth-century works of literature? Instead of assigning thousands of pages of text, I had my students use topic modeling to compare Roughing It to other books such as Uncle Tom’s Cabin and Little Women. But labs were also an effective way to concretize some of the contemporary issues swirling around technology. In one of the labs, students applied different kinds OCR software to a sampling of pages from an Overland Trail diary they had read earlier in the week. This gave them a chance to peer behind the curtain of large-scale digitization projects. When you experience first-hand just how many words and characters the OCR process can miss, it makes you think more critically about resources like Google Books or LexisNexis. Teaching in the digital humanities should, in part, force students to think critically about the issues surrounding the tools we use: copyright, access, marginalization.

Finally, I wanted students to learn by doing. There’s a certain passive mode of learning endemic to so many humanities courses: go to lectures, write a few papers, study for an exam, make comments in discussion. Student passivity can be inherent to both the pedagogical form itself and how it’s practiced, as anyone who has sat in a lecture hall or watched a student coast through discussion can tell you. Don’t get me wrong: bad labs can be just as passive as lectures. But done right, they emphasize active learning based on immediate feedback. As much as I’ve soured on the term “hacking” and all the privileged baggage it can carry, it is a useful term to describe the type of learning I want my students to engage in. Try something out. If it doesn’t work, try something else. Under this rubric, mistakes are a necessary part of the process. Feedback is more immediate in a way that enables exploration, tinkering, tangents, and restarts. It’s a lot harder to do this with traditional assignments; trying out something new in a paper is riskier than trying out something new in a lab.

This last goal proved the hardest to meet and constitutes one of the major hurdles facing digital humanities pedagogy. We want to teach digital methods not for their own sake, but to fit them within a broader framework, such as how they help us understand the past. But to get to that point, students need to make a fairly substantial investment of time and energy into learning the basics of a particular tool or technique. I tried to scaffold my lab assignments so that they became less and less prescriptive and more and more open-ended with each passing week. The idea was that students needed heavy doses of step-by-step instruction when they were still unfamiliar with the technology. My first lab, for instance, spelled out instructions in excruciating detail. Unfortunately, this led to exactly the kind of passive learning I wanted to avoid. I liken it to the “tutorial glaze” – focusing so much on getting through individual tasks that you lose track of how they all fit together or how you would apply them beyond the dataset at hand. The ability to teach early-stage technical skills involves a litany of pedagogical challenges that humanities instructors are simply not used to tackling.

By contrast, my final lab gave students a dataset (a map of Denver and enumeration district data from the 1880 census) and asked them to formulate and then answer a historical question through GIS. By nearly any metric – enthusiasm, results, feedback – this proved to be the most effective lab. It forced students to engage in the messy process of digital history: exploring the data enough to formulate a question, returning to the data to answer that question, realizing the data can’t even begin to answer that question, formulating a different question, figuring out how to answer it, and deciding how to visualize an argument. I was even more satisfied with their reflections on the process. Some described the frustrations that came with discovering the limits or gaps in census data. Others remarked on how their own mapmaking decisions, such as changing classification breaks or using different symbology, could completely alter the presentation of their argument. It’s one thing for students to read an essay by J.B. Harley on the subjectivity of maps (which they did). It’s another for students to experience the subjective process of map-making for themselves. Learning by doing: this is what was labs are all about.

To try and help others who want to integrate labs into their curriculum, I’ve made the labs and datasets available to download on the course website. Even as I posted them, though, I was reminded of one last problem facing the digital humanities: the problem of ephemerality. I spent hours and hours designing labs that will likely be unusable in a matter of years. Some of them require expensive software licenses, others rely on tools that could fall completely out of development. That’s one of the downside of labs. Ten years from now, I’ll still be able to re-use my lesson plan for discussing Roughing It. The lab on topic-modeling Twain and other novelists? Doubtful. But ephemerality is one of the necessary costs of teaching digital humanities. Because labs, and the broader pedagogical ethos of the digital humanities they embody, are ultimately worth it.

A Dissertation’s Infancy: The Geography of the Post

3 Replies

A history PhD can be thought of as a collection of overlapping areas: coursework, teaching, qualifying exams, and the dissertation itself. The first three are fairly structured. You have syllabi, reading lists, papers, classes, deadlines. The fourth? Not so much. Once you’re advanced to candidacy there’s a sense of finally being cut loose. Go forth, conquer the archive, and return triumphantly to pen a groundbreaking dissertation. It’s exhilarating, empowering, and also terrifying as hell. I’ve been swimming through the initial research stage of the dissertation for the past several months and thought it would be a good time to articulate what, exactly, I’m trying to find. Note: if you are less interested in American history and more interested in maps and visualizations, I would skip to the end.

The Elevator Speech

I’m studying communications networks in the late nineteenth-century American West by mapping the geography of the U.S. postal system.*

The Elevator-Stuck-Between-Floors Speech

From the end of the Civil War until the end of the nineteenth century the US. Post steadily expanded into a vast communications network that spanned the continent. By the turn of the century the department was one of the largest organizational units in the world. More than 200,000 postmasters, clerks, and carriers were involved in shuttling billions of pounds of material between 75,000 offices at the cost of more than $100 million dollars a year. As a spatial network the post followed a particular geography. And nowhere was this more apparent than in the West, where the region’s miners, ranchers, settlers, and farmers led their lives on the network’s periphery. My dissertation aims to uncover the geography of the post on its western periphery: where it spread, how it operated, and its role in shaping the space and place of the region.

My project rests on the interplay between center and periphery. The postal network hinged on the relationship between its bureaucratic center in Washington, DC and the thousands of communities that constituted the nodes of that network. In the case of the West, this relationship was a contentious one. Departmental bureaucrats found themselves buffeted with demands to reign in ballooning deficits. Yet they were also required by law to provide service to every corner of the country, no matter how expensive. And few regions were costlier than the West, where a sparsely settled population scattered across a huge area was constantly rearranged by the boom-and-bust cycles of the late nineteenth century. From the top-down perspective of the network’s center, providing service in the West was a major headache. From the bottom-up perspective of westerners the post was one of the bedrocks of society. For most, it was the only affordable and accessible form of long-distance communication. In a region marked by transience and instability, local post offices were the main conduits for contact with the wider world. Western communities loudly petitioned their Congressmen and the department for more offices, better post roads, and speedier service. In doing so, they redefined the shape and contours of both the network and the wider geography of the region.

The post offers an important entry point into some of the major forces shaping American society in the late nineteenth century. First, it helped define the role of the federal government. On a day-to-day basis, for many Americans the post was the federal government. Articulating the geographic size and scale of the postal system will offer a corrective to persistent caricatures of the nineteenth-century federal government as weak and decentralized. More specifically, a generation of “New Western” historians have articulated the omnipresent role of the state in the West. Analyzing the relationship between center and periphery through the post’s geography provides a means of mapping the reach of federal power in the region. With the postal system as a proxy for state presence, I can begin to answer questions such as: where and how quickly did the state penetrate the West? How closely did it follow on the heels of settler migration, railroad development, or mining industries? Finally, the post was deeply enmeshed in a system of political patronage, with postmasterships disbursed as spoils of office. What was the relationship between a communications network and the geography of regional and national politics?

Second, the post rested on an often contentious marriage between the public and private spheres. Western agrarian groups upheld the post as a model public monopoly. Nevertheless, private hands guided the system’s day-to-day operations on its periphery. Payments to mail-carrying railroad companies became the department’s single largest expenditure, and it doled out millions of dollars each year to private contractors to carry the mail in rural areas. This private/public marriage came with costs – in the early 1880s, for instance, the department was rocked by corruption scandals when it discovered that rural private contractors had paid kickbacks to department officials in exchange for lavish carrying contracts. How did this uneasy alliance of public and private alter the geography of the network? And how did the department’s need to extend service in the rural West reframe wider debates over monopoly, competition, and the nation’s political economy?

Getting Off The History Elevator

That’s the idea, at least. Rather than delve into even greater detail on historiography or sources, I’ll skip to a topic probably more relevant for readers who aren’t U.S. historians: methodology. Digital tools will be the primary way in which I explore the themes outlined above. Most obviously, I’m going to map the postal network. This entails creating a spatial database of post offices, routes, and timetables. Unsurprisingly, that process will be incredibly labor intensive: scanning and georeferencing postal route maps, or transcribing handwritten microfilmed records into a database of thousands of geocoded offices. But once I’ve constructed the database, there are any number of ways to interrogate it.

To demonstrate, I’ll start with lower-hanging fruit. The Postmaster General issues an annual report providing (among other information) data on how many offices were established and discontinued in each state. These numbers are fairly straightforward to put into a table and throw onto a map. Doing so provides a top-down view of the system from the perspective of a bureaucrat in Washington, D.C. For instance, by looking at the number of post offices discontinued each year it’s possible to see the wrenching reverberations of the Civil War as the postal system struggled to reintegrate southern states into its network in 1867:

Post Offices Discontinued By State, 1867
(Source: Annual Report of the Postmaster General, 1867)

The West, meanwhile, was arguably the system’s most unstable region. As measured by the percentage of its total offices that were either established or discontinued each year, states such as New Mexico, Colorado, and Montana were continually building and dismantling new nodes in the network.

Post Offices Established or Discontinued as a Percentage of Total Post Offices in State, 1882
(Source: Annual Report of the Postmaster General, 1882)

Of course, the broad brush strokes of national, year-by-year data only provide a generalized snapshot of the system. I plan on drilling down to far more detail by charting where and when specific post offices were established and discontinued. This will provide a much more fine-grained (both spatially and temporally) view of how the system evolved. Geographer Derek Watkins has employed exactly this approach:

Screenshot from Derek Watkins, “Posted: U.S. Growth Visualized Through Post Offices” (25 September 2011)

Derek’s map demonstrates the power of data visualization: it is compelling, interactive, and conveys an enormous amount of information far more effectively than text alone. Unfortunately, it also relies on an incomplete dataset. Derek scraped the USPS Postmaster Finder, which the USPS built as a tool for genealogists to look up postmaster ancestors. The USPS historian adds to it on an ad-hoc basis depending on specific requests by genealogists. In a conversation with me, she estimated that it encompasses only 10-15% of post offices, and there is no record of what has been completed and what remains to be done. Derek has, however, created a robust data visualization infrastructure. In a wonderful demonstration of generosity, he has sent me the code behind the visualization. Rather than spending hours duplicating Derek’s design work, I’ll be able to plug my own, more complete, post office data into a beautiful existing interface.

Derek’s generosity brings me back to my ongoing personal commitment to scholarly sharing. I plan on making the dissertation process as open as possible from start to finish. Specifically, the data and information I collect has broad potential for applications beyond my own project. As the backbone of the nation’s communications infrastructure, the postal system provides rich geographic context for any number of other historical inquiries. Cameron Ormsby, a researcher in Stanford’s Spatial History Lab, has already used post office data I collected as a proxy for measuring community development in order to analyze the impact of land speculation and railroad construction in Fresno and Tulare counties.

To kick things off, I’ve posted the state-level data I referenced above on my website as a series of CSV files. I also used Tableau Public to generate a quick-and-dirty way for people to interact with and explore the data in map form. This is an initial step in sharing data and I hope to refine the process as I go. Similarly, I plan on occasionally blogging about the project as it develops. Rather than narrowly focusing on the history of the U.S. Post, my goal (at least for now) is to use my topic as a launchpad to write about broader themes: research and writing advice, discussions of digital methodology, or data and visualization releases.

*By far the most common response I’ve received so far: “Like the Pony Express?” Interestingly, the Pony Express was a temporary experiment that only existed for about eighteen months in 1860-1861. In terms of mail carried, cost, and time in existence, it was a tiny blip within the postal department’s operations. Yet it has come to occupy a lofty position in America’s historical memory and encapsulates a remarkable number of the contradictions and mythologies of the West.

Coding a Middle Ground: ImageGrid

8 Replies

Openness is the sacred cow of the digital humanities. Making data publicly available, writing open-source code, or publishing in open-access journals are not just ideals, but often the very glue that binds the field together. It’s one of the aspects of digital humanities that I find most appealing. Despite this, I have only slowly begun to put this ideal into practice. Earlier this year, for instance, I posted over one hundred book summaries I had compiled while studying for my qualifying exams. Now I’m venturing into the world of open-source by releasing a program I used in a recent research project.

The program tries to tackle one of the fundamental problem facing many digital humanists who analyze text: the gap between manual “close reading” and computational “distant reading.” In my case, I was trying to study the geography within a large corpus of nineteenth-century Texas newspapers. First I wrote Python scripts to extract place-names from the papers and calculate their frequencies. Although I had some success with this approach, I still ran into the all-too-familiar limit of historical sources: their messiness. Namely, nineteenth-century newspapers are extremely challenging to translate into machine-readable text. When performing Optical Character Recognition (OCR), the smorgasbord nature of newspapers poses real problems. Inconsistent column widths, a potpourri of advertisements, vast disparities in text size and layout, stories running from one page to another – the challenges go on and on and on. Consequently, extracting the word “Havana” from OCR’d text is not terribly difficult, but writing a program that identifies whether it occurs in a news story versus an advertisement is much harder. Given the quality of the OCR’d text in my particular corpus, deriving this kind of context proved next-to-impossible.

The messy nature of digitized sources illustrates a broader criticism I’ve heard of computational distant reading: that it is too empirical, too precise, and too neat. Messiness, after all, is the coin of the realm in the humanities - we revel in things like context, subtlety, perspective, and interpretation. Computers are good at generating numbers, but not so good at generating all that other stuff. My computer program could tell me precisely how many times “Chicago” was printed in every issue of every newspaper in my corpus. What it couldn’t tell me was the context in which it occurred. Was it more likely to appear in commercial news? Political stories? Classified ads? Although I could read a sample of newspapers and manually track these geographic patterns, even this task proved daunting: the average issue contained close to one thousand place-names and stretched more than 67,000 words (or, longer than Mrs. Dalloway, Fahrenheit 451, and All Quiet on the Western Front). I needed a middle ground. I decided to move backwards, from the machine-readable text of the papers to the images of the newspapers themselves. What if I could broadly categorize each column of text according both to its geography (local, regional, national, etc.) and its type of content (news, editorial, advertisement, etc.)? I settled on the idea of overlaying a grid onto the page image. A human reader could visually skim across the page and select cells in the grid to block off each chunk of content, whether it was a news column or a political cartoon or a classified ad. Once the grid was divided up into blocks, the reader could easily calculate the proportions of each kind of content.

My collaborator, Bridget Baird, used the open-source programming language Processing to develop a visual interface to do just that. We wrote a program called ImageGrid that overlaid a grid onto an image, with each cell in the grid containing attributes. This “middle-reading” approach allowed me a new access point into the meaning and context of the paper’s geography without laboriously reading every word of every page. A news story on the debate in Congress over the Spanish-American War could be categorized primarily as “News” and secondarily as both “National” and “International” geography. By repeating this process across a random sample of issues, I began to find spatial patterns.

Grid with primary categories as colors and secondary categories as letters

For instance, I discovered that a Texas paper from the 1840s dedicated proportionally more of its advertising “page space” to local geography (such as city grocers, merchants, or tailors) than did a later paper from the 1890s. This confirmed what we might expect, as a growing national consumer market by the end of the century gave rise to more and more advertisements originating from outside of Texas. More surprising, however, was the pattern of international news. The earlier paper contained three times as much foreign news (relative “page space” categorized as news content and international geography) as did the later paper in the 1890s. This was entirely unexpected. The 1840s should have been a period of relative geographic parochialism compared to the ascendant imperialism of the 1890s that marked the United States’s noisy emergence as a global power. Yet the later paper dedicated proportionally less of its news to the international sphere than the earlier paper. This pattern would have been otherwise hidden if I had used either a close-reading or distant-reading approach. Instead, a blended “middle-reading” through ImageGrid brought it into view.

We realized that this “middle-reading” approach could be readily adapted not just to my project, but to other kinds of humanities research. A cultural historian studying American consumption might use the program to analyze dozens of mail-order catalogs and quickly categorize the various kinds of goods – housekeeping, farming, entertainment, etc. – marketed by companies such as Sears-Roebuck. A classicist could analyze hundreds of Roman mosaics to quantify the average percentage of each mosaic dedicated to religious or military figures and the different colors used to portray each one.

Inspired by the example set by scholars such as Bethany Nowviskie, Jeremy Boggs, Julie Meloni, Shane Landrum, Tim Sherratt, and many, many others, we released ImageGrid as an open-source program. A more detailed description of the program is on my website, along with a web-based applet that provides an interactive introduction to the ImageGrid interface. The program itself can be downloaded either on my website or on its GitHub repository, where it can be modified, improved, and adapted to other projects.

historying

thoughts on scholarship and history in a digital age

Learning by Doing: Labs and Pedagogy in the Digital Humanities

A Dissertation’s Infancy: The Geography of the Post

Coding a Middle Ground: ImageGrid