Guides - HIST 680

Crowdsourcing Digital Humanities Projects


Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.
– Mia Ridge in “Crowdsourcing Out Cultural Heritage: Introduction”


What can you do with crowdsourcing?

I’ll admit that I was a bit skeptical going into this module because I wasn’t entirely sure that I bought into crowdsourcing. How can you maintain quality? Who’s going to participate, and how does this participation look? How do you track milestones and ensure that you’re creating a comprehensive product? It all seemed a bit Wild Wild West to me, but I stand corrected. In a lot of ways, crowdsourcing is a quintessential component of digital humanities in that you make research accessible to, engaging with, and, best of all, interactive for the general public.

In this module, we focused on crowdsourced transcription and data (or text) correction. Technology has helped expedite a lot of the digitization process in that images can be captured using optical character recognition (OCR) techniques, and then oftentimes computers can take this digitization a step further by transcribing any text captured in these newly digitized images.

If a project doesn’t have the resources to transcribe digitized images automatically via software applications, then they can crowdsource the transcription process. One of the more popular interfaces to use for crowdsourced transcription is MediaWiki, which has several perks:

    • Its intuitive design makes it easy for users to engage with the text editing functionality.
    • Each page has a revision history log that tracks all changes, even if a user hasn’t created an account on the site.
    • Users don’t need to have any technical expertise to update a page – more often than not, the simple text box mimics that of a basic Word document, so users enter text, apply any formatting (optional), and click Save.

If a project has the funding to support software transcription, there’s still a need for manual quality control because technology isn’t perfect, so transcriptions are seldom accurate without human intervention, which is where crowdsourced data correction comes into play. Users can review a repository of digitally captured and transcribed text to vet accuracy and make updates in real time.


What can the general public do?

Transcribe text, as seen in The Papers of the War Department and Transcribe Bentham, and correct data, as seen in Trove and Building Inspector.

With both forms of crowdsourcing – transcribing and correcting – another key task that the general public can do is add metadata. By adding basic metadata like names, places, themes, and so on, the project becomes more searchable. Combining the layers of metadata with the expanded search parameters gives crowdsourced digital humanities projects greater visibility across the web and allows deeper analyses to happen – interdisciplinary analyses, cross-project analyses, general public analyses (never underestimate the general public’s passion and expertise!).

One final point I’d like to reinforce – crowdsourcing a project comes down to sheer numbers. Even if a project has the funding to hire full-time staff, odds are this funding isn’t going to cover hundreds, even thousands, of people. A well-funded project may have 3-6 full-time staff (plus or minus a few folks), which simply can’t compete with the potential volume of volunteers who can transcribe text or correct already captured data. Crowdsourcing allows digital projects to maximize their funding by employing maybe 1-2 full-time staff members to oversee operations – perform quality control, establish processes, ensure that everyone’s playing nicely in the sandbox (i.e., maintaining professionalism), and so on. Even though it may seem that these full-time staffers are spending their time not transcribing or correcting text, in reality, the volunteer staff is covering vast amounts of territory that those full-time employees wouldn’t have likely been able to cover on their own.


How do you attract contributors/volunteers?

Altruism can be quite a powerful tool. What surprised me the most, although I’m not entirely sure why, is how much users want to participate and volunteer their time to help the greater good by supporting these digital history projects in accomplishing their goals. For example, in Trove, an online repository of all things Australia – books, videos, music, photographs, and so on – users are motivated almost exclusively by their personal interests. In Singing for their supper: Trove, Australian newspapers, and the crowd, Marie-Louise Ayres says, “Trove text correctors are primarily motivated […] by the sense of being involved in something bigger than them and of lasting value, and by a very strong sense of giving back or singing for their supper.”

Crowdsourced digital projects also have the basic advantage of a global internet user base, so you’re able to tap into users conducting personal research like genealogy or simply those users wanting to participate in a crowdsourcing project just for the fun of it.

One of the downsides of crowdsourcing that I noticed while actually participating in the projects is that you have to have some sort of interest or connection to the project. Maybe not always, but the project has to pique your interest at least a little – otherwise, you’re just going through the motions and not really engaging with the content. For example, I could spend days transcribing documents in the Papers of the War Department because I’m fascinated with that historical era, but also I felt a surprising connection to what I was transcribing in that I was proud, for lack of a better word, to know that I was participating in a historical archiving process.

On the other hand, while I found the Building Inspector to be a well designed site, I’m not a New Yorker, so I didn’t find it terribly engaging. I can appreciate what the project’s aiming to do, and I think that its longterm goals of adding features like exploring some of the city’s more obscure locations and viewing historical documents associated to specific locations will make the buildings truly come to life, even to non-New Yorkers. But in the context of crowdsourcing and recruiting volunteers, I think that the connection to the content, or lack thereof, is a limitation worth noting.


What kind of interface/community building do you need?

Dr. Sharon Leon said that crowdsourced digital humanity projects aren’t just if-you-build-it-they-will-come; you have to build and connect with your online community. As I mentioned above, one of the ways to do this is to tap into the general public’s altruism, but you can strengthen this force by providing ways for users to communicate with each other, sharing ideas, providing feedback or peer reviews, placing value in their opinions and input. While users may flock to a crowdsourced project, they can flock away from it just as quickly and easily if they feel unappreciated or left out of the project’s narrative. By welcoming the general public into academic research and historic preservation projects, you’re allowing everyone to participate in activities that once lived in a silo for academics only – this inclusiveness alone is a pretty powerful means of creating community.

Leave a Reply

Your email address will not be published. Required fields are marked *