Analyzing Text with Voyant

October 18, 2019 - By admin

Analyzing text isn’t easy, especially when you’re looking at a long document or a large collection of documents. Trying to identify and track repeat words while thinking about common themes and tracing potential trends is daunting, to say the least. Voyant is a web-based tool that reads these large bodies of text, from a single file (a document) to a group of files (a corpus) – and visualizes the most common words via computer-generated text analyses. Essentially, the computer sifts through data – in this case, text – analyzes it based on frequency, and then converts this analysis into visual components like a graph or a Cirrus word cloud. (I talk more about the Cirrus word cloud below.)

How can it help me in my research?

Oftentimes, visualizations can pack a more powerful intellectual punch than words alone, so digital historians and researchers are using these visualizations to tell a different story, instigate a different conversation, and hopefully yield a different analysis than traditional scholarship.

While visualizations are captivating and fun to play around with, computers aren’t people. Voyant provides an innovative way to analyze text methodically – counting words, comparing them, tracking them, and then ultimately displaying them – but these analyses don’t produce meaning. They do, however, have the power to produce meaning! By using these very computational outputs of text frequency, you’re able to extrapolate data to find potentially new ways of looking at a document or a corpus, perhaps making connections obscured by the sheer volume of words or noticing a theme that piques enough interest for further investigation (whether using Voyant or another digital tool).

At the end of the day, as with any research tool, you should access more than one digital tool as a point for comparison, analysis, and most importantly, questioning – research isn’t research without lots and lots of questions. Voyant helps stimulate questions that ultimately (hopefully!) lead to new insights and even more questions for further research. As Stéfan Sinclair and Geoffrey Rockwell say in The New Companion to Digital Humanities, “We use tools not to get results but to generate questions, so the more things we try, the more questions we’re likely to have.”

Are there any tips to help me get started in Voyant?

On the Voyant landing page, in the Add Texts box, you can either manually enter or paste text, open a prepopulated corpus (Shakespeare’s Plays or Austen’s Novels ) via the Open button, or upload one or more files from your workstation via the Upload button.

Once you hit the Reveal button, you’re taken to the actual set of tools that generate visualizations based on your entered, selected, or uploaded corpus. The default configuration of tools includes, at a very high level, moving from left to right and top to bottom:

- Cirrus, which is word cloud that displays the most frequently used words in a corpus with the larger words being the highest frequency and then getting smaller as the words become less frequent. It’s colorful, easy to read, intuitive to use, and adaptable to meet your research criteria. Less important words, like articles (a, the, an, etc.), may skew your initial world cloud, but you’re able to define options for this tool by using a stopword list, which is a manually entered list of words that you enter to keep from populating in the final Cirrus word cloud. There’s a lot of functionality baked into Cirrus, but to get started, click one of the words in the cloud. A tooltip displays the total number of times the selected word is referenced in the corpus, and all of the other tools on the page also update based on the selected word because they interact with and respond to each other, updating the data in real time.
- Reader has two major components: the text reader, which shows the selected word in context of the actual corpus, and the prospect viewer, which graphs the overview of the entire corpus, representing each document in the corpus individually so that you’re able to visualize how frequently the selected word appears in a certain document.
- Trends similarly graphs the frequency of a selected word or group of words in a corpus, broken down by individual documents in the corpus. This tool is customizable as well so that you’re able to define certain parameters and/or select certain terms that you want displayed in the graph. If you double-click a dot on the graph, you’re presented with the option to either display the results of the selected word across the corpus by selecting Terms or in a single document by selecting Document.
- Summary gives an overview of the corpus broken down by document length, vocabulary density, average words per sentence, most frequent words, and distinctive words.
- Contexts takes a word and displays how it fits into the larger corpus. When you select a line item in this section, the Reader section updates automatically to display the word in context in the document that you selected in the line item. The actual components of this section include the document title, contextual words that display to the left of the selected word, the actual word that you’re reviewing/analyzing, and contextual words that display to the right of the selected word.

A couple of notes to add to the tools detailed above:

- These five tools are just the default ones that Voyant uses, but you can reconfigure this page to include a variety of tools that suit your text analysis needs, from correlations, to scatterplots, to word trees. For more information on the types of tools available in Voyant, check out their List of Tools page, which provides a thorough overview that may help you decide which tool is going to work best for your research.
- All of these available tools are rich with capabilities, far too many to list here, but Voyant has extensive supporting documentation that’s worth exploring before you kick off your analysis.

Case Study: WPA’s “The Other Slave Narratives”

I came into this activity with preconceived notions about what slave narratives might look like in the Voyant tool, using a very modern filter to look at a very historically emotional time for the country. The Works Progress Administration’s (WPA) Slave Narratives were interviews conducted between 1936-1938 by primarily white interviewers of former slaves, all of whom were children during their enslavement. Many of these interviews clustered in and around major cities across seventeen states, which left out the rural voice, to name just one of the biases. I’m, honestly, a little embarrassed and disappointed that I would assume these interviews were diverse in nature – i.e., different types of interviewees from different backgrounds living in all over the country – and in some ways, these interviews are diverse enough to capture firsthand accounts of slavery in the United States. At the same time, however, these interviews are also (white) interpretations of (former slave) memories. It’s not like these interviewers had their cell phone recording these interviews and then transcribing them for digital archiving later. These were handwritten transcripts, loosely connected to the words actually spoken and often heavily edited for consistent language and expected, cliché dialects.

All of this is to say that while the Voyant tools representation of these narratives is wildly interesting, it’s also wildly up for interpretation since much of the original transcript likely isn’t the original transcript. As long as you look at the visualization with this understanding – i.e., you’re looking at a partially redacted transcription, to put it mildly – then you should still be able to make connections, dive into analyses, and draw conclusions.

I created the following Cirrus word cloud to represent the corpus of interviews from these slave narratives:

Using the Cirrus word cloud above as a springboard, I started to explore, clicking on both prominent and not-so-prominent words to see in which states they were referenced the most and the least, look at how time affected frequency (if at all), how words were used in actual documents, and then also comparing word use in corpus vs in an individual document. One of the functions in the Cirrus word cloud is the control of how many terms you want to display, so you can adjust how big or small Cirrus is by sliding the Terms bar. (I spent a good bit of time playing around with this feature because it allowed me to find some more obscure, less frequently mentioned words that were interesting to see in context.)

As I mentioned above, text-reader tools like Voyant are great for unpacking text heavy documents to find themes and strategize further research, but it’s still important to keep in mind that this tool isn’t comprehensive and should supplement your research, not define it exclusively.

How can it help me in my research?

Are there any tips to help me get started in Voyant?

Case Study: WPA’s “The Other Slave Narratives”

Leave a Reply Cancel reply