The first tool in this week’s reading was Bookworm Movies. Within these databases, a user can change the corpus of the search and view trends in areas such as dialogue in films based on text. In the accompanying blog post, “Sapping Attention”, the author uses Bookworm to view trends in language and the subject matter discussed in films. It is due to collections of metadata available in places such as IMDB, the writer contends, that allows users to better view trends. Metadata such as writer,director, and country of origin can give a more complete picture of trends in film.
This was actually a really fun page. Through this portal, a user can view trends in a variety of different subjects. Along with the Movie NGrams there is Vogue, babynames, and Rate My Professor, all popular web pages. This particular part of Bookworm is known as Culturomics,and focuses on content which would fit under mainly entertainment or popular culture. There are also other databases like OpenLibrary and US Congress hold trends in government and literature.
3.Historic Newspaper NGrams
This NGram is in the same vein as the Culturomics databases. It shows a very large example of historic newspapers (7m texts, 212 billion words). The default search criteria, “bicycle” showed a rise in the mention of this form of transportation in the mid-1890s. When you click on the line on the graph, you can view both the frequency within a particular year and also the text.
4.Mining the Dispatch
This piece focuses on the Richmond newspaper, the Dispatch, and its significance in uncovering the Confederate capital during the tumultuous years of the Civil War. Historian Kenneth Noe contends that although it was the center of much political and social change, much about Richmond during these years is relatively unknown.This particular text project, “Mining the Dispatch” aims to open up the conversation about this time and place by using text. The time frame used in this database is from Lincoln’s election in 1860 to the evacuation of the City in 1865. This collection encompasses 24 million words in 12,000 pieces. This system uses Topic Modeling, a process that uses statistics to categorize texts and form patterns from them. Through software called MALLET, the program collects specific numbers of topics from documents using algorithms to display patterns. A topic is defined in this piece as “a group of words that are likely to appear together in the document”(“Mining”). The author uses slavery as a topic example and a model basis. Through graphs, two aspects of data can be discovered, thematic, through relative space occupied graphs, and generic, which is shown through graphs that count the number of articles where “proportion is above the specified level”(“Dispatch”).
This NGram focuses on the records of the New York Times, and uses keyword searches to show trends within the content of the written works from 1860 to the present. I used the examples of slavery, civil rights, and Jim Crow, after reading the “Mining the Dispatch” piece and was interesting to see the trends and their correspondence to the time periods included within the data. For example, there was a rise in the mention of Civil Rights around the 1960s.
Voyant is a Text tool that allows the user to insert a page into the reader, where patterns are created from the given information. I uploaded a reading from another class and began messing around with the program. The user can click on any word in the document, and it will show the frequency of that word in the piece. The program also gives a brief summary of the corpus,giving frequency of unique words and the most frequently used words.
7. Getting Started With Voyant
This page is a user-friendly guide to using voyant. It show how to upload not just single pages, but HTML, XML, and PDF content. It then show she different skins shown within the program including the summary, cirrus(word cloud), and corpus reader. This piece also tells the reader how they can bookmark particular corpora, and export them unto sites such as blogs.
8.Comparing Corpora in Voyant
This particular piece shows how to upload corpora on voyant in order to compare patterns. It shows a step-by-step guide on how to export multiple corpora by saving one corpus and adding it to another example,by enabling the “difference” function. The end result is that one can view the comparison of word frequencies in both corpora.
Questions for Discussion
What are the benefits of using software such as voyant or bookworm in research? What are some difficulties?
How could these technologies strengthen the connections between the humanities and other fields?
Are these technologies the new frontier in research? Could they create new fields within historical practice?