Text and Digital History

1.Movie NGrams
The first tool in this week’s reading was Bookworm Movies. Within these databases, a user can change the corpus of the search and view trends in areas such as dialogue in films based on text. In the accompanying blog post, “Sapping Attention”, the author uses Bookworm to view trends in language and the subject matter discussed in films. It is due to collections of metadata available in places such as IMDB, the writer contends, that allows users to better view trends. Metadata such as writer,director, and country of origin can give a more complete picture of trends in film.
This was actually a really fun page. Through this portal, a user can view trends in a variety of different subjects. Along with the Movie NGrams there is Vogue, babynames, and Rate My Professor, all popular web pages. This particular part of Bookworm is known as Culturomics,and focuses on content which would fit under mainly entertainment or popular culture. There are also other databases like OpenLibrary and US Congress hold trends in government and literature.

3.Historic Newspaper NGrams
This NGram is in the same vein as the Culturomics databases. It shows a very large example of historic newspapers (7m texts, 212 billion words). The default search criteria, “bicycle” showed a rise in the mention of this form of transportation in the mid-1890s. When you click on the line on the graph, you can view both the frequency within a particular year and also the text.

4.Mining the Dispatch
This piece focuses on the Richmond newspaper, the Dispatch, and its significance in uncovering the Confederate capital during the tumultuous years of the Civil War. Historian Kenneth Noe contends that although it was the center of much political and social change, much about Richmond during these years is relatively unknown.This particular text project, “Mining the Dispatch” aims to open up the conversation about this time and place by using text. The time frame used in this database is from Lincoln’s election in 1860 to the evacuation of the City in 1865. This collection encompasses 24 million words in 12,000 pieces. This system uses Topic Modeling, a process that uses statistics to categorize texts and form patterns from them. Through software called MALLET, the program collects specific numbers of topics from documents using algorithms to display patterns. A topic is defined in this piece as “a group of words that are likely to appear together in the document”(“Mining”). The author uses slavery as a topic example and a model basis. Through graphs, two aspects of data can be discovered, thematic, through relative space occupied graphs, and generic, which is shown through graphs that count the number of articles where “proportion is above the specified level”(“Dispatch”).

5.NYT Chronicle
This NGram focuses on the records of the New York Times, and uses keyword searches to show trends within the content of the written works from 1860 to the present. I used the examples of slavery, civil rights, and Jim Crow, after reading the “Mining the Dispatch” piece and was interesting to see the trends and their correspondence to the time periods included within the data. For example, there was a rise in the mention of Civil Rights around the 1960s.

Voyant is a Text tool that allows the user to insert a page into the reader, where patterns are created from the given information. I uploaded a reading from another class and began messing around with the program. The user can click on any word in the document, and it will show the frequency of that word in the piece. The program also gives a brief summary of the corpus,giving frequency of unique words and the most frequently used words.

7. Getting Started With Voyant
This page is a user-friendly guide to using voyant. It show how to upload not just single pages, but HTML, XML, and PDF content. It then show she different skins shown within the program including the summary, cirrus(word cloud), and corpus reader. This piece also tells the reader how they can bookmark particular corpora, and export them unto sites such as blogs.

8.Comparing Corpora in Voyant
This particular piece shows how to upload corpora on voyant in order to compare patterns. It shows a step-by-step guide on how to export multiple corpora by saving one corpus and adding it to another example,by enabling the “difference” function. The end result is that one can view the comparison of word frequencies in both corpora.

Questions for Discussion
What are the benefits of using software such as voyant or bookworm in research? What are some difficulties?
How could these technologies strengthen the connections between the humanities and other fields?
Are these technologies the new frontier in research? Could they create new fields within historical practice?

8 thoughts on “Text and Digital History

  • April 4, 2016 at 8:40 pm

    As the “Dispatch” article pointed out, using programs like this can allow us to see patterns that are obvious, but also patterns that surprise us. From some of the patterns he found analyzing this newspaper, the author could make an educated guess based off of information that we do know to understand why these patterns appear the way they do. But that also seems to be the problem with this type of analysis. With such a large amount of information, it would be difficult to go back and verify that your educated guess is true. While I think getting the bigger picture in this way is just as important as getting a micro view of an event, it seems that there are pitfalls that you would not encounter with a smaller data set. The problem of context also seems to be an issue with Ngrams. One can see the shifts in word usage, but without any background information one would have to dig a lot deeper in order to find out why that shift was taking place. The other problem that I think we have talked about before is that these programs are only searching what is digitized, leaving out a large chunk of possibly valuable information.

    • April 5, 2016 at 3:35 am

      Isn’t that the problem with all historical information, though? It’s all meaningless without context–a map, image, quote or letter without the surrounding cultural and social context lacks meaning, which is why we do research on townships and the timelines of land sales. What can we get from the macro picture if we combine it with traditional close reading? It’s not an either/or choice.

    • April 5, 2016 at 7:22 pm

      It seems to me that these programs help to see patterns that are obvious. Their purpose seems less about discovering new information, and more about utilizing historic data to reaffirm already known (or suspected) information, and using data visualization to make that information clearer. For example, if a historian were to state that there were a significant number of advertisements seeking runaway slaves during the Civil War Era, it is unlikely that that historian would be met with surprise, since it is generally accepted knowledge that the institution of slavery was beginning to dissipate during that point in history. However, displaying the amount of advertisements searching for runaway slaves compared to other advertisements at the time period on a pie chart can more effectively demonstrate the sheer percentage of advertisements related to slaves in a way that is very easily understood.

  • April 5, 2016 at 3:41 am

    Some other questions your post brings up: what’s the trade off of ease of use vs. rich analysis in the various types of text visualizations? How might something like this be incorporated at a public history site, and what kinds of interpretation would it need to make it readable? What’s the drawback of interactivity in this case, ie, does it result in the viewer only seeing patterns that they already know about?

    • April 5, 2016 at 6:12 pm

      Your question about whether the results found in textual analysis are merely confirming what we already think we know is something that I have been wrestling with a lot in my own work. As the NTY chronicle shows, it is relatively easy (once you have done all of the set up work, which is rough) to use this type of analysis to show broad changes over time, you have to be looking for something in the text in order to find it. For example, in the NYT Chronicle I typed in the word ‘antisemitism’ to look at the change over time, and was unsurprised by the results shown. Since I have general knowledge of that history, I knew what the trends would look like. Conversely, however, if I did not know that historical context, I would not really know what words I would be interested in looking at nor would I understand the context of them after I did the search. This is why I still struggle with the broad reading of change over time offered by these types of programs versus close reading. I have found my project this semester very useful and it will be very helpful to my future research, but I do still wonder sometimes if I should be asking different questions, although there have been some cases where my findings have not correlated to what I expected to find which has been a pleasant surprise.

  • April 5, 2016 at 6:02 pm

    Both Voyant and Bookworm seem to be fairly user friendly. Working with Voyant’s software during research it is simple enough to eliminate words you are not interested in for analysis. The word cloud imagery and simple word trend graphs are aesthetically pleasing. The difficulty would be in creating meaning from these graphics and understanding the text or websites you are entering. In seeing the examples created using Bookworm software, I think the visuals are clean, simple, and reader friendly. On the other hand, simplicity may lack detail and I’m unsure of how user-friendly it is to create these visualizations using Bookworm. Although I see these tools are being useful for research and visualizations I don’t know how ‘cutting edge’ they are. At this point Voyant is already at least four years old. I think there is a logical bridge that this software can help gap between literary studies, history, and sociology. I don’t think it will create it’s own field, but it may prove to be a valuable tool for historical practice.

  • April 5, 2016 at 6:21 pm

    I think Ngrams could be a useful research tool, but I had trouble figuring out Voyant (damn this technological illiteracy). As for Ngrams, being able to pinpoint certain years that a word, or phrase was used could be incredibly helpful for a history student, but there are some drawbacks. As Elana states, context and the evolution of language/word usage might cause problems. Adding to this, I think that it is necessary to know what you are looking for when using Ngrams, otherwise all of this information could be daunting. I’ve also run into some issues where Ngrams has trouble recognizing some phrases. I searched for things like Phonograph (invented in the 1870s), and other words that don’t create any sort of line graph. This could be Safari failing at basic web browsing, of course, so I’ll check that in class. I think there could also be limitations in certain Ngrams, like Chronicling America, which is based off of historical newspapers. Certain buzzwords will come up, while others that are equally important will receive less attention.

  • April 5, 2016 at 7:49 pm

    I have found Voyant to be an excellent tool- not just for research, but for the sake of filing! Working at Schuyler Mansion, we had filing cabinets full of folders. Each folder was for a different historical figure who either lived in, visited, or was associated with Schuyler Mansion. A folder for an individual figure included every letter in the Museum’s collection that was either written to, by, or about that figure. Of course, the actual historical letters were preserved in storage, so what we had in the folders were transcribed letters. In order to sort newly transcribed letters, they could be placed in Voyant, and names could be searched. Thus, by typing a figure’s name, it was easy to see every letter in which their name appeared, so Voyant was a helpful source for organizing information.

Comments are closed.