Monday, April 3, 2017

Data Mining Blog Post

One important aspect of the digital humanities is the use of computer programs to achieve greater understanding of literature. This process is referred to as data-mining, and is used by DHers to critically examine texts, and go beyond the surface level of literature. Digital Humanities scholars use data-mining to make connections that without the aid of computer technology, would be close to impossible. This research could include looking at word frequency throughout a text to determine if the author favored or avoided certain words. A program that is useful in this regard would be Voyant, an online tool that allows the user to examine word, and phrase frequency throughout a piece of literature. The DH scholars could make inferences about the word usage tendencies of certain authors, to gain even more insight into the author's life, writing style,and education. An example of using Voyant to gain insight into a text can be found below.  We have examined our own blog post to search for trends.  While searching for trends in our latest blog post about the predictions of Ray Kurzweil; it seems apparent that the major themes based on word usage were technology, human, and brain. These words accurately exemplify the central theme of the blog post almost instantly.  This just shows that voyant, as a tool for data-mining, can offer great utility to the researcher when trying to find meaningful trends in text. Another useful tool that can be used for data-mining  is Google's Ngram.

Screenshot (1).png


Ngram is a program by Google that searches through a database of books for words or phrases the user enters into the program. Using its extensive database , Ngram displays a graph where it shows the usage of the given word or phrase by year. How is this useful to Dhers ? Well, it can be a good representation of culture and how the meaning of certain words have changed or are possibly no longer used,  for example the word lord was very heavily used in the late 1700s but in the modern world it's usage is practically non existent. With this kind of information one could see the changing trends of society and possible correlations with certain word usage.  Therefore given an adequate amount of information could possibly predict which words will stop being used or become more frequently used just by paying attention to the given trends society is currently behind.


Data mining gives the reader or the researcher a different way to look at literature. The process can allow readers to find deeper meanings, and patterns without technically reading the piece of literature. Connections are made more easily using tools such as Voyant and Ngram because they point out the patterns in the writing for you. Both make it easier for the DH scholar to seek relationships between different words. Thankfully, for data mining we are able to jump deeper into a greater understanding of literature.

No comments:

Post a Comment