Clio 1 Project

For the Clio 1 final project I originally intended to text mine newspaper articles and Supreme Court cases and then showcase those sources in Omeka. Ultimately, I wanted to determine how Supreme Court cases centering on women’s rights and newspaper articles published in reaction to the decisions shaped the definition of gender as it pertained to womanhood in the 20th century.

Since I didn’t come into the program with an MA in history, I first had to determine what sources I wanted to work with and where I could access such sources. I went back to a paper I wrote for an undergraduate independent study on constitutional law and decided to use the decisions of the following nine Supreme Court cases:

  • Muller v. Oregon (1908)
  • Adkins v. Children’s Hospital of District of Columbia (1923)
  • West Coast Hotel Co. v. Parrish (1937)
  • Griswold v. Connecticut (1965)
  • Phillips v. Martin Marietta Corp. (1971)
  • Reed v. Reed (1971)
  • Roe v. Wade (1973)
  • Pittsburgh Press Co. v. Pittsburgh Commission on Human Relations (1973)
  • Planned Parenthood of Southeastern Pennsylvania v. Casey, Governor of Pennsylvania, et. al. (1992)

I then went to the ProQuest Historical Newspapers database to find articles. I decided to use four articles per decision in order to ensure that no one case had more data than another. I compiled all of the pertinent bibliographic information into a spreadsheet, and downloaded the articles to my computer in PDF format. This was the point in the project where I realized I would not be able to use Omeka to showcase my sources since ProQuest documents are copyrighted.

After compiling my sources, I had 36 articles to run through Google OCR. Unfortunately the technology did not recognize the PDF as containing any text, so all I got back was a copy of the original image with no machine readable text. I then realized that I would need to transcribe all of my articles in a Word document in order to be able to run them through Voyant.

I tasked myself with transcribing two articles per day in order to finish the process by mid-November. During the transcription process I realized: a) how arduous transcription can be (although I had to deal with 20th century articles rather than 16th century handwritten journal entries or letters, so I consider myself lucky), and b) clean data will yield the best results in Voyant. Since I wasn’t OCRing the sources, I didn’t have to worry about cleaning up the messy data, and I think that process would’ve taken me much longer (and given me more headaches) than simply transcribing. Also, I didn’t have to worry about running into any of the problems we encountered when we examined the OCR of the Chronicling America newspapers. In order to ensure that the data remained clean, I didn’t include: hyphens in words that were cut off at the ends of columns; the parts with “article continued on page A16″; the blocks of highlighted text sandwiched in the middle of articles, such as “Justices acted in a way that surprised both sides of the abortion issue”; and captions on pictures. I also decided not to spell out abbreviations, so Sen. remained Sen. rather than becoming Senator. I used Microsoft Word to transcribe, and when I was finished I saved the file as an RTF.

Screen Shot 2014-12-01 at 11.04.38 AM

Screen Shot 2014-12-01 at 11.07.11 AM

Screen Shot 2014-12-01 at 11.10.35 AM

Screen Shot 2014-12-01 at 11.11.56 AM

I went to Oyez and the Legal Information Institute at Cornell University to find the text of the Supreme Court decisions. I copied and pasted the text of the decisions into Microsoft Word and saved the files as RTFs. This was unquestionably the easiest part of the project.

After transcribing the articles and finding the texts of the decisions, I was able to put all of my files into Voyant. I ended up using Voyant mostly as a comparative tool to analyze word frequencies.

When I first examined the word cloud in Voyant and saw that the word medical appeared 201 times, I wondered what the role of the medical profession has been over the 20th century, particularly since Roe v. Wade, in determining the rights and status of women.

I chose to compare amendment and fourteenth because I wanted to see how often the justices used the 14th Amendment in their decisions, and it’s mentioned in all of the decisions with the exception of Phillips v. Martin Marietta Corp.

I then compared health and medical. Health is used most often in the article “State Protection of Women in Trade Called Necessary,” which was published in response to the West Coast Hotel Company v. Parrish decision. Unsurprisingly, medical peaked in the Roe v. Wade decision.

“7-to-2 Ruling Establishes Marriage Privileges” has the most frequent use of the word rights and the word liberty, and was published in response to the Griswold v. Connecticut decision.

The frequency of abortion and life trend (very) roughly at the same time, but pregnancy does not.

In comparing public, private, and respect, I wanted to determine the extent to which the Supreme Court rulings and newspaper articles discuss the distinction between public and private life and duties, and if there is respect for those separate physical spaces. Obviously Voyant couldn’t answer that question, but it didn’t come as a surprise that privacy peaked during the time span of Griswold v. Connecticut. The word personal sees spikes during Griswold as well, in addition to Planned Parenthood v. Casey. Personal is used in all of the Supreme Court decisions except for Pittsburgh v. Pittsburgh and Phillips v. Martin Marietta Corp. By using the corpus reader, I was able to tell that the spike in the word respect in the article “Supreme Court Hears Attacks on Salary Law” was not used in the manner I was interested in, and while it is used in all of the decisions, it is never used to differentiate between public and private.

This is a network analysis of the corpus. This visualization shows the most common words within the dataset to be court, abortion, state, opinion, and law, which were previously identified as the words with the highest frequency in the box showing words in the entire corpus. If you double click on any of the nodes, Voyant will show smaller nodes with words that were mentioned within the larger node. For example, after double clicking abortion, the words issue, obtaining, Roe, says, states, decision, president, and Pennsylvania appear.

This is another network analysis showing the people, organizations, and locations in top frequency links. Top frequency links filters nodes based on participation in high frequency edges. This visualization has proved tricky for me to determine what exactly it is conveying. I understand that supra, New York, US, and Supreme Court connect to Adkins, but I am unsure of what the numbers next to the words mean. In order to use this tool effectively to analyze my corpus, I would need to have a further explanation of this visualization.

After running my files through Voyant, I wanted to determine how a similar text analysis tool, Overview, would analyze the same dataset. Many thanks to Jordan for introducing me to this tool! Overview was originally created to help journalists find stories hidden within a large corpus of documents, and is now used for “qualitative research, social media conversation analysis, legal document review, digital humanities, and more” (from their about page). Overview organizes documents based on topics, enables the user to tag documents, and has an advanced search feature. The algorithms used read every word in every file, and have the ability to compare the words within them. While reading the files, they strip the words of punctuation and capitalization, and disregard all stop words. Click here for more detailed information on the algorithms. Once my files were uploaded, I was presented with this visualization:

The most common topics in my dataset were: abortion, sex, wages, Idaho, footnote, minimum wage, employment, and hours. From there, the 45 documents were divided into three folders.

2 - 26 docs
Twenty-six documents discussed wages, minimum, Idaho, wage, hours, Pittsburgh, press, power, ordinance, and Oregon.

3 - 11 docs
In 11 documents, most included abortion, abortions, and Roe Wade (Roe_Wade in the image); some of those included vote, minimum, pregnancy, and skeel. Skeel refers to E.L. Skeel, who was the attorney for the West Coast Hotel Company in West Coast Hotel Company v. Parrish.

4 - 8 docs
Eight documents discussed job, Phillips, sex plus (sex_plus), young children (young_children), employment, and Reed.

Voyant’s interface is clunky, and there are certain flaws that I discovered while working on this project. When I was importing documents, there was no way for me to see which documents I had imported after the import box filled. I couldn’t scroll up and down in the box, and so the first time I tried to enter my data I ended up importing one Supreme Court case twice, which skewed the results. In addition, I was unable to compare words that were on different pages of the box “words in the entire corpus.” For example, had I wanted to compare the frequency of Mrs and Mr I wouldn’t have been able to since they were not on the same page. Voyant, however, is embeddable, unlike Overview. Overview’s interface is less cluttered than Voyant’s, but I had difficulty at first in determining what exactly Overview was telling me about the documents. Overview is better at mining documents for topics or strings of words, such as undue burden, than Voyant. Both tools made me think about the dataset differently than I had previously.

Were I to do a similar project, there are at least two things I would do differently. First, I would spell out the abbreviations I found in the newspapers (so Sen. would become Senator) in order to have a more accurate reading of the corpus. I would also name my documents differently and upload them to Voyant in a different order. For this project I gave each RTF a name identical to their article title. Next time, I will include the date, and if I am still analyzing Supreme Court decisions, the correlating decision. So, instead of uploading “Supreme Court Voids Birth Control Ban” I will upload an RTF with the following title: “19650608 Griswold v CT Supreme Court Voids Birth Control Ban.” The numbers represent the year, month, and day the article was originally published, and the article was written in reaction to the Griswold v. Connecticut decision. I would upload all of my files into Voyant in chronological order in order to be able to track changes over time while looking at the word trend visualization.

When presenting my project to the Clio class, Eric (check out his fantastic Clio project Wearing Gay History) suggested that this process could be replicated in studying the impact of language and second wave feminists. This was an excellent suggestion and I am sure I’ll be spending more time with Voyant.

Ultimately, I did not end up with any sort of analysis. After using Voyant I cannot make any assertion about how 20th century Supreme Court decisions impacted the definition of gender as it pertained to womanhood. I started out this project with a rather lofty idea of what I might end up with, and even though my results are descriptive rather than analytical, I learned a great deal about digital humanities projects and I have a newfound respect for Voyant and the various patterns word frequencies can unearth.