Clio 1, Introduction to History and New Media, was taught by Dr. Stephen Robertson in the fall of 2014, and is a required course for history PhD students at Mason. This course provided an introduction to digital history and its various methodologies, including text mining, mapping, public history, and networks. Digitization, databases, crowdsourcing, games as history, copyright, and access were also discussed.
For the final project, I used Voyant to text mine United States Supreme Court cases and corresponding newspaper articles to ascertain public opinion on women’s rights throughout the twentieth century.
Below are the blog posts I wrote for this course. My practicum blog posts, which appear first, detail my initial forays into using and understanding these methodologies, and my readings blog posts are my reflections on each week’s assigned articles, blog posts, etc.
To access my Omeka site, click here.
For this week's practicum on public history we were tasked with creating a collection of objects on Omeka. I had been looking forward to learning about and using Omeka, and luckily for the first year fellows, our practicum coincided with our first week of our rotation in Public Projects. In the past three days we have gotten a handle on the front end of Omeka by looking at the showcased sites; played in the back end by installing Omeka on our dev sites through the shell rather than FTP; installed themes and more on our dev sites using git commands; and user tested themes and the poster plugin.
Step One: Gathering Primary Sources
In order to begin the practicum, I had to find primary sources. Since I didn't come into the PhD program with a master's in history, I don't have a lot of data on hand to play with. I had been toying with the idea of women's history as legal history for the final Clio project, in which I either text mine or map Supreme Court cases and newspaper articles. I quickly found the full text of the final decision on Muller v. Oregon from the Legal Information Institute at Cornell University. I edited the text, removing any hyperlinks or additional data, in order to not violate copyright. I then went to Chronicling America and found three newspaper articles, and from the Library of Congress site I found a cartoon. I also went on Flickr Commons and found a photograph of the nine justices sitting on the Court when the Muller decision was made.
Step Two: Uploading Items & Dublin Core
After collecting my primary sources I uploaded them onto Omeka. Omeka uses Dublin Core metadata, and inputting data into the fields was a time-consuming process since I wanted to ensure that I used a controlled vocabulary. In library school I did not ever get to use Dublin Core - this cataloging code is mostly used in archives and museums, and since I didn't specialize in archives I wasn't too familiar with Dublin Core. I found this page to be particularly helpful in figuring out what information to put into which fields (whether or not I did it correctly is yet to be determined). Often, the fields would overlap. For example, I used the same information in the type and identifier fields, and usually the creator and publisher would be the same.
Step Three: Creating a Collection
Once the documents had been put onto the site, I created a collection. For the collection I omitted certain metadata fields. I did not think it appropriate to use creator, publisher, contributor, or relation. Obviously if I use this collection as part of the final project, then I will need to include more information in the description field, as well as an argument. I was able to add all items into the collection in one click.
Step Four: Finishing Touches
I then moved on to editing the site in general. I played around with the themes, and chose seasons because I liked having the primary source at the top of the page with the metadata following. I named the site and filled out the site description field in settings. I had thought this would populate the about tab, but it didn't. After looking around a bit I figured out that I needed to use the simple pages plugin to edit that tab. I removed the exhibit tab at the top of the page and configured the appearance so the featured exhibit box would not appear on the homepage. I edited the wording of the browse collections and browse items tabs.
I was incredibly pleased with Omeka. The interface is easy to use and is aesthetically pleasing. The codex is incredibly detailed, and I found myself referencing it a few times and always finding what I needed. I'm glad I was finally able to use Dublin Core, even though I'm not entirely sure I populated the fields with the correct information. I can certainly understand why so many institutions use Omeka for their online exhibits, and I'm looking forward to using Omeka more in the future.
Step One: Excel
The first thing I did when examining the data for the First Regiment of the Michigan Calvary was to open Excel and begin putting the data into a spreadsheet. It was difficult for me to look at the such a large block of text and attempt to see patterns, and I wanted to be able to examine the data in an organized fashion. I created separate sheets for each year (1864, 1865, and 1866), and within the sheets I had the following categories: action, place, to, begin, and end. So this way I could see that the regiment had a campaign from Rapidian to the James River from the third of May 1864 to the 24th of June in the same year. Since this was a campaign that lasted over a month, I bolded the line in the sheet. All of the unbolded text underneath were places the regiment visited as a part of the campaign. I also made sure to not add any text to the spreadsheet that wasn't in the original data. I put the information into Excel in the exact way it was written on the National Park website. When the regiment was at two places in one day (or perhaps the regiment had been split up?) I put those locations into two separate boxes, such as Furnaces and Broad Rock. I then added a column for latitude and longitude so I could input the information into the Google Map Engine as a CSV.
Step Two: 1864
I decided to layer the map by year, so I had one layer each for the years 1864, 1865, and 1866. After I put the CSV file for 1864 into Google Map Engine, I found that there were 24 errors in the file, meaning that 24 of my latitude and longitude fields were empty. Some were obviously going to be empty, such as the campaign from Rapidian to the James River. Others I had not been able to find by doing a Google search, like Aenon Church and Locke's Ford. I then searched in the Google Map Engine for these places, and found a few, which were then added to my data table (the ones I found in Google Maps are distinguishable by a yellow pin point in the data sheet on the map). I did run into several problems: the pin for the James River is at the source of the river, which I doubt is where the regiment actually was; it is impossible to see the sites that have multiple pins unless you go into the data sheet; and there was a great amount of uncertainty in the entire process. For example, the original data said the regiment was in Loudoun County, but there is no further information to clarify exactly where in Loudoun County they were. I had the same problem with the "Toll Gate near White Post." I couldn't put in the latitude and longitude coordinates for White Post because they weren't actually in White Post, they were only near White Post. And which toll gate where they near? I had thought of drawing a line from Rapidian to the James River to show the regiment's campaign, but I realized that they didn't follow a straight line from point to point; they fought at several locations in between. I played around with the labels a bit, and originally wanted to keep the place name with each pin, but I thought it looked too cluttered. I kept the pins a red color because I thought they would stand out well against any base map I chose, and for 1864 I would have the highest number of pins out of any other year. I also decided to use the traditional pin rather than any of the variations because I think they're more precise than the others. One of the best things about putting the data into a spreadsheet first was that all the additional information I had, such as the action and beginning and end dates, were included in the text in the pins. So, if you find Kilpatrick's Raid in Richmond, you get the following metadata:
Step Three: 1865 & 1866
I had the same problems with the data for the 1865 spreadsheet as I did for the previous year. There were 10 errors in this file, and so I went back through and searched for the places on Google. I was able to find five locations and add them to the map. I also added a line from Edenburg (spelled Edinburg in Google) to Little Fort Valley to show the regiment's expedition. The 1866 CSV only had two rows of information, and since neither of the locations had latitude and longitudes associated with them I was unable to import data for that year. I put down a pin for Utah, since they were there for the beginning of the year before being "mustered out." For the 1865 pins I used blue diamonds, and for for 1866 I used a green square. I ended up choosing the light landmass base map. The other maps were either too dark (satellite, dark landmass) or showed highways and interstates (light political, simple atlas).
This practicum was more labor-intensive than the others, mostly because I had to start with raw data that had not already been organized. I had no idea how imprecise mapping can be, and I am wondering how my map will compare to that of my colleagues'. I have no idea how many mistakes I might have made inputting the latitude and longitude coordinates, which parts of Loudoun or Faquier County the regiment was in, where Ground Squirrel Church is, and much, much more. This is in part due to the raw data I began with, which was often fuzzy and not explicit. The map obviously does not relate any content to the viewer, so maps are best used in collaboration with a narrative.
For this week's practicum, I am using Palladio and RAW to show how networks are constructed, what they reveal, and how they can be useful for historians.
I began in Palladio by copying and pasting the battle and unit spreadsheet information into it, and then added in the CSV that contained the battles and coordinates. I first looked at the map view. The map view is interesting since you can see where the battles took place in relation to one another, and you can choose if you want to size the points, which makes the battle sites bigger based on the frequency of their usage in the dataset. The graph view shows the relationships between the units and the battles. If you zoom in closer, you can see that the 44th New York Infantry fought at the Battle of Petersburg and fought at the Battle of Gettysburg, along with the 29th New York Infantry, 1st Michigan Calvary, and the 136th New York Infantry.
Raw constructs networks slightly differently than Palladio. I began the same way, with copying and pasting the spreadsheet of battle and unit into the interface. I was then asked to choose one chart out of 16 options. Obviously not all charts will work with the data I input. I first tried the Alluvial Diagram.
Circle Packing was another network that worked with the dataset, but it did not present the information in a fashion that I found aesthetically pleasing, nor was it any more or less helpful than the Alluvial Diagram. The Circular Dendrogram presented the information in a way that was easy to look at and can be assessed quickly. The information is not as spread out as it is in the Alluvial Diagram, which I thought made it much easier to read.
I attempted to use the Cluster Dendrogram, but it presented the information in such a way as to render it useless for this particular dataset. Clustered Force Layout was another diagram that was not helpful. Convex Hull, Delaunay Triangulation, Hexagonal Binning, Parallel Coordinates, all required only numbers or dates, so I could not use those. Reingold-Tilford Tree presented the dataset in a way similar to Alluvial Diagram, but it was more organized and easier to read quickly.
My attempts with Scatter Plot, Small Multiples, Steamgraph and Voronoi Tessellation were unsuccessful, but I did have fun playing with Treemap. I was able to get two different results by manipulating the hierarchy and color fields. The first embedded network is with unit in the hierarchy field and battle in the color field, and the second is the reverse.
After installing Gephi about five times, I was not ever successful in getting it to work. While it would install correctly, it would not let me get to the data laboratory.
How are text mining and networks similar or different? With text mining, you are looking at word frequency. This can be used to identify which sources mention the word as well as how often it is used within a corpus. With networks you can examine the relationship between sources. When using Palladio and RAW, I was able to determine the relationship between units and battles, when they overlapped, and which unit fought the most battles. Both of these analytic methods would be considered investigative tools, not communicative tools, since neither of these analytic methods provides any context. Each tool is unique: while we can text mine the content we cannot text mine the networks, and while we can network the relationships we cannot network the content. They can be used in conjunction with one another since they are complementary.
Networks and text mining have taught me how useful digital tools can be in interpreting historical data. They can reveal trends and patterns that otherwise would be buried within the information and they examine the data in ways that humans would never be able to. It is astonishing how we can use technology to further scholarship. I am learning to look at how we can use these tools to analyze sources differently and more efficiently, and how they can be used to further the historical narrative.
For this week's practicum, we are using various ngram viewers and comparing them, and then using Voyant. For first half of this exercise I chose keywords that are of interest to me in my research. I started out the practicum hoping to get a better grasp on what text mining is and how it works, since the topic seemed abstract to me even after the readings and class discussion.
The Google ngram viewer scans the corpus collected in the Google Books project to display a graph showing the usage of a particular word or words. I first ran the following set of words through the viewer: domestic_NOUN, maid, housemaid, maidservant. Neither domestic nor maidservant had been used frequently, and housemaid saw a small percentage of usage from 1860 to the 1940s. Maid had the highest frequency of matches within Google Books, and usage peaked in 1900 and has slowly been declining since. I also did a search using maid and housewife. Maid was much more common, and the use of housewife increased a small amount in the 1920s until its peak in the 1980s, and then declined. Google then displayed links in chronological order, such as 1800-1843, 1843-1900, and so on, that, when followed, brought me to a list of books within the Google Books project that had my word or words in them. This is not usually helpful for me as a 20th century historian since copyright restrictions do not allow the user to read the entire book online. In addition to not being able to follow through on finding the actual words within the books, there is not enough transparency. How was this ngram viewer put together? What algorithm(s) are used? While the interface is clean and straightforward, it does not allow for user interaction.
Bookworm's Chronicling American ngram viewer scans the newspapers included within the Chronicling American database. I ran the words housewife, domestic, and maid in the viewer. The results I received were different than those of Google's. Domestic had a far greater number of hits, but I had to remind myself that in the Google ngram viewer I was able to state that I only wanted the noun form of domestic. Housewife was hardly used, and the frequency of maid was only a bit higher than housewife. One of the greatest advantages of using Bookworm is that it will link you back to the original paper the word was found in since there are no copyright restrictions. Bookworm's ngram viewer is more user friendly and the interface is much more interactive than Google's. You can change the publishing time to be either year, month, day, day and year, month and year, or week and year. The user can also determine the quantity based on the percentage of words, percentage of texts, word count, or text count. The user can decide whether they want the search to be case sensitive or insensitive.
I also used Bookworm's Congress.gov ngram viewer to search the percentage of words in Democrat-sponsored and Republican-sponsored bills about abortion. The results showed that Republicans sponsored more bills with the word abortion in it. But I cannot say that Republicans have sponsored more bills about abortion, since the visual analysis shows the percentage of words in bills themselves. Thus this is not a straightforward analysis. I'm sure that other scholars who have worked with quantitative methodologies previously will not not make this mistake. It is important to make sure to explicitly state what the graph is actually showing, versus what your initial reaction might be. The graph will link the user to a site that gives the full text of the bill, as well as pertinent information about the sponsors, the bill's progress or lack thereof through committees, etc. This viewer would be particularly beneficial to those studying legal or political history.
The New York Times Chronicle is an ngram viewer that analyzes every issue of the newspaper. I attempted to search housewi* to see the results for both housewife and housewives in one graph, but it didn't work. I searched for both housewife and housewives separately. The only great disparity between the results for both is in the early to late 1940s, when there is a much greater use of housewives than housewife. This viewer links back to the articles in which searched word was found, but this can get complicated. If you personally do not have a New York Times subscription or if your institution doesn't, than there are certain articles that you will not be able to read. The Chronicle has a somewhat friendly user interface. It's much better than Google but not as interactive as Bookworm. You can choose whether you want your results to be displayed based on the percentage of total articles or the number of articles.
I received different results in each ngram viewer because each of the viewers scans a different corpus of information. Google Books will have the word housewife appear in various books over time but those results may or may not correlate with what has been in the Chronicling America papers or the New York Times. Also Google allows users to specify the part of speech of the word, which was incredibly helpful. In general I found these viewers to be fun and neat to play with, and they did help me get a better feel for what text mining is. I'm still not entirely sure to what extent text mining will be useful to me in my research. Also, these viewers should be used as tools, not as methodologies, and I will need to be careful should I choose to use them in my research. They can be utilized in conjunction with scholarly research but should not be the sole method for proving one's argument. The analysis and interpretation of the graphs can differ, and it is important to ensure that, as a scholar and researcher, you are reading the graph the way it was meant to be read. I also think that all of the ngram viewers should have the capability of search truncation. There is not enough transparency with these viewers, either. The problems I had with Google were never fully resolved with the exception of interface interactivity.
I ran the magazine version and novel of The Picture of Dorian Gray through Voyant, which is a web-based tool that analyzes digital texts. Voyant is much more than an ngram viewer and is by far a better tool because there is more information readily available for the user, and the interface is customizable. The summary details the number of words in the documents; vocabulary density; most frequent words in the corpus; words that had notable peaks in their frequency; and distinctive words. There is also a list of words in the entire corpus, and the user can decide whether or not he/she wants to include stop words. Within this window the trend is shown by a small icon. For The Picture of Dorian Gray the five most frequently used words, in descending order, are: Dorian, said, lord, life, and Henry. Voyant can also trace the word trends. There are two other categories: keywords in context and words in documents. The latter displays even more information, showing the raw count and the relative count, and the trend graphs shows the values of the mean relative counts across the corpus. I really enjoyed playing around in Voyant, but I do have one complaint. While I appreciate the interface being so interactive, it seems a bit dated and very crowded. The experience would be greatly enhanced if the user could choose which of the many boxes they want to see on the screen at one time. I know the boxes can be minimized, but that doesn't solve the problem. For example, I can minimize the corpus reader box, but that leaves a gaping whole in the middle of the site where other graphs or information could go should I want them to. Ultimately, I think Voyant will be incredibly useful to come back to when I'm conducting my own research, especially since I have control over the corpus. I do have a much better sense of what text mining is and how it works now that I've played around with the ngram viewers and Voyant. I am certain I'll use Voyant in the future but am still undecided about the ngram viewers.
I did not have much luck in finding use of databases in the most recent articles in The Journal of Southern History. While searching through footnotes, the most highly utilized sources were books, followed by scholarly articles. I am interested to know if the authors found those scholarly articles via databases, because if they did, then the majority certainly did not cite them as such. The following are examples of where I found authors citing electronic databases or sources within their footnotes.
In one article titled "West Virginia Mountaineers and Kentucky Frontiersmen: Race, Manliness, and the Rhetoric of Liberalism in the Early 1960s," found in the August 2014 issue of the JSH, the author used material from the JFK Library Digital Collection, and provided links to the pertinent information in multiple footnotes. In another two footnotes there were references to Gallup polls that had been taken in the 1960s with links to the website where they could be found.
In the May 2014 issue of the Journal, the author of "Centers of Creation: John Perkins Barratt's Biogeographical Theory of Racial Origins" provided a doi to a journal article.
In "The Electric Home and Farm Authority, 'Model T Appliances,' and the Modernization of the Home Kitchen in the South," the author used information from the US Census Bureau and provided links to the site. Additionally, she also linked back to The American Presidency Project. This article can be found in the February 2014 issue of JSH.
In the February 2012 issue of the Journal, the author of "The Legacy of Indian Removal" used the HeritageQuest Online subscription database to find census information. A link to an article found in the online database for Neshoba Democrat was found in the footnotes as well as well as multiple links to the Samuel Proctor Oral History Program.
The following is the best example of citing databases and online sources that I found when going through the Journals. The author of "Border Men: Truman, Eisenhower, Johnson, and Civil Rights," found in the February 2014 edition, is very transparent in citing his sources and is not hesitant to admit when he utilized databases or other online resources. He used the Papers of Dwight David Eisenhower, accessed through the subscription-based electronic edition by Johns Hopkins University Press. The online resource The American Presidency Project was cited in the footnotes, and included a link to the site. The author used information from speeches given by Truman and Johnson, which were found online at the Presidential Speech Archive. In addition, he cited the web version of a Washington Post news article, and utilized the Eisenhower online archives. One of Johnson's speeches was posted by the LBJ Library on Youtube, and the author provided a link to the article in his footnotes.
This is only a sampling of the information (or lack thereof) that I found within the last few years in The Journal of Southern History. I was incredibly surprised that there were not more databases cited within the articles, and it makes me question the transparency and legitimacy of these authors' research. If a scholar uses an electronic version of an article, then he/she needs to provide the database name at the very least within their bibliography or notes. Scholars who use digital sources of any kind need to engage in a conversation to ensure that they adhere to proper standards of citation.
Google OCR – page three of Pinkerton files
At first I left the image horizontal and attempted to run it through Google OCR. It did not recognize any of the characters and thus I didn’t have any text to examine. I manipulated the image so it was vertical, and Google OCR did recognize the characters and words, but with a high degree of inaccuracy. Out of 23 lines of OCRed text, there was only one line of text that correctly displayed the words shown in the image. The OCR completely cut off the last few lines of text as well as a few others throughout the document, so the text was incomplete. I then cropped the image and ran it through again, and this was a bit more successful in that it recognized a few more words than the un-cropped version had. The software was able to OCR all of the lines of text this time, but the OCR was still incredibly poor. I then tried to make the contrast darker, which did help the software to recognize a few more words, but it was not vastly improved. April and I compared our results and we had similar problems, but April had less success with the cropped version of the image than I did. I was shocked that the OCR did not read our files in the exact same way. We both had poor success with Google OCR, but I would've assumed our results would have had more similarities in the OCRed text. In general the software did not recognize the Qs and As in the document, and simply skipped many words. The OCR was so inaccurate because the original image was not of high quality, the size of the text was very small, the typewritten words make it difficult to distinguish certain characters and many of them resembled each other, and there was bleed-through from the other side of the page.
Chronicling America – Daily Capital Journal, September 15, 1914, Image 1
The first thing that I noticed with the OCRed text is that it had a hard time recognizing the words that are close to the fold seen on the left side of the page. This made the first few words of each line in the first column unrecognizable to the OCR software. Whoever scanned the microfilm of the image did not ensure that the page was completely flat. I was disconcerted by how inaccurate the text was. The software didn't even read the masthead of the paper. I noticed that the OCR had a difficult time reading subheadings. "Say Austrians Must Quit" was OCRed as "i 8ajr Austrian Mim Quit." The software was also unable to read the bottom portion of the page. Most of the "Uncle Sam Protests" column came out as gibberish, and then several of the last lines were not even a part of the OCRed text.
Chronicling America - The Hawaiian Gazette, September 18, 1914, Image 1
I chose this particular paper because the image is so dark, and I wanted to see how that would affect the OCR. I was surprised to find that it did not seem to impact the accuracy of the OCR, which was actually better in the darker places of the paper. For example, in the far right column, under the subheading "Blunder Not Known Till Damage is Done," the text is almost perfectly accurate. There were a few folds in the image, but it didn't affect the OCR as badly as I had expected. The OCR was able to read the bottom portions of the paper, unlike that of the first image I studied, and picked up the "Teuton Cruiser Sinks British Merchantmen" very well. Once again the OCR did not compute the masthead.
Chronicling America - The Day Book, September 18, 1914, Image 1
It was not surprising to find that this image produced almost completely accurate OCR text. The only issue the software had was with the image underneath the title of the paper. The near perfect OCR can be attributed to a few factors: the easy readability of the font, the lack of images, the lack of columns, the greyscale, and the lack of folds in the paper. Chronicling America and Google OCR seemed to have the same sorts of problems, but I found the latter to be much worse than the former, simply because the quality of the original Pinkerton file was so poor. While the Chronicling America papers did not produce high quality OCRed text by any stretch, it was still better than Google OCR because the microfilm produced much more readable and searchable text.
Digital resource in my field - The Abbeville Press and Banner, February 8, 1918, Image 6
The first problem I had with this task was finding a source that allowed users to see the text output. I checked sources in the African American Newspapers (1827-1998) database, the ProQuest Historical Newspapers database, the electronic holdings at the Library of Virginia, the Virginia Historical Society, and elsewhere. I attempted to run a few newspaper sources I had through Google OCR and Google did not want to read the characters or words, so that didn't work. After doing a search on Chronicling America I was able to find newspaper article that will be highly useful for my research further down the road. The text did not OCR as well as I had hoped. The image is very clear and easy to read, but nevertheless the text is not accurate. There are symbols, such as asterisks, placed in the middle of words, and the software did not know what to do with the empty spaces after one paragraph ends and another begins.
After doing this exercise, it is very clear to me how inaccurate OCR can be. This can be extremely detrimental to historians, because if they do keyword searches in databases and the text is OCRed incorrectly, then they might miss out on an important source. Despite the appalling problems I had with OCR technology, I still think the positive impacts it has on the research process outweigh the negatives.
Our practicum for this week is to assess the digital history in our field of study. To begin with, I typed "20th century Southern women" into Google. I received many results that were either faculty pages at universities, books for sale on Amazon, and essays, both scholarly and non-scholarly in nature. I found a section of The Gilder Lehrman Institute of American History devoted to women's history which provided essays, primary sources, multimedia, and teaching resources. There is a page of the History Channel devoted to the 19th Amendment which included some videos as well as a brief history of the women's suffrage movement. There is a pathfinder for women's history available from the National Archives website, which is fairly detailed and supplies the bibliographic information for many archival and print-based resources. However none of these examples are really what I had in mind: the History channel site is not scholarly, nor does it provide much useful information; The Gilder Lehrman Institute site, while providing primary sources and some multimedia, is geared towards school teachers; and the National Archives pathfinder simply points one in the direction of print-based sources.
There were two sites I came across that were more along the lines of scholarly digital history: Oral Histories of the American South and Southern Women Trailblazers. Both are products of the University of North Carolina at Chapel Hill, so it can be assumed that they are authoritative and accurate. Oral Histories of the American South lists 148 interviews with Southern women, which cover such themes as politics, economics, labor, religion, race relations, and more. Each interview includes helpful metadata, transcripts, and the option to download the interview onto a computer. This site will be helpful to me in my research. Southern Women Trailblazers includes analyses of suffrage, education, the workforce, the Civil Rights Movement, and the results of women's work to achieve equality. There are also images and audio excerpts, but the excerpts are taken from the Oral Histories of the American South website. Southern Women Trailblazers would be a good resource for lower-level undergraduate classes, but I doubt I'll be returning to it when I conduct research.
I had found the prior sources by googling "20th century Southern women." I then tried manipulating keywords a bit, and input "20th century" "South*" "women." The first two resources listed are products of that search. I found a blog, run by Digital Collections Department of the University of South Carolina Libraries, for the South Carolina Digital Newspaper Program. One of their posts, "From Socialization to Social Change" details women's clubs in South Carolina. While primarily discussing the 1800s, there is a bit of information about the early 20th century. There is another site, Digital History, that is supported by the College of Education at the University of Houston. It serves as a teaching resource for school teachers and provides many primary sources, multimedia, interviews, and background information. The National Women's History Museum provides a plethora of online exhibits, ranging from topics such as women in sports, women in the Progressive Era, and pioneer female state legislators. I found this resource by simply going to the Museum's homepage. It did not appear as a result of either of my Google searches.
While searching the web for digital history relating to 20th century Southern women, most of the sites I found were affiliated with either an institution, museum, archive, or university. I was unable to find any that were products of organizations, like local history clubs, or any that were crowdsourced. A simple Google search did not yield the results I had been expecting: amateur historians writing blogposts or local history group websites with digitized records. Clearly more scholarly digital history sites need to be established to further the study of 20th century Southern women. There is a dearth of online sources for scholars in this field of study, and this is a need that should be addressed.
The day is finally here. I have held out for as long as I could, but knew that I would have to give in eventually. Not only have I created my own blog but I also have a Twitter account. Previously I had a Facebook page and tried my hand at starting a blog and a tumblr, both of which failed when I realized how boring my life is. I deleted my Facebook when it was revealed that Facebook was not being careful with their users' data and personal information. So how did I end up with my own domain name and a Twitter?
It is vital for anyone in the field of digital history, be it a new grad student like myself or an established historian, to have an active online presence. It is how digital historians communicate with each other and stay up-to-date with the goings-on in the field. The readings for this week strongly emphasized how important it is to have a place for yourself on the web. In class, I brought up the point that none of the readings addressed privacy concerns and how the various sites collect user information. Dr. Robertson responded that there is a difference between visibility and privacy. For me, the important distinction will be how to be visible without being too personal. I strongly value my privacy and don't want everyone with an internet connection to know all of the details about my personal life. But I can - and must - divulge information about my professional life.
When I Googled "Alyssa Toby Fahringer" prior to starting this blog, the only correct hit I could find was a link to the Fall 2013 Southern Association of Women Historian's newsletter, in which I am named as a new member. After searching the first five pages of links and the image results, I could not find other accurate information about myself. Now, of course, when I Google my name there are links to this blog and my about page. When I Googled "Alyssa Fahringer" I found links to my LinkedIn profile and my small section of the American Library Association's Connect site. Further down on the first page of results, there is a link to a LibGuide I made in library school. Apparently there is an Alyssa Fahringer living in Pennsylvania who is an accomplished athlete. There were links to her information in the White Pages and her profile on Shippensburg University Athletics. Within the first five pages, the following is the only correct information I found about myself:
- Graduate in the University of Pittsburgh's iSchool August 2013 recognition ceremony
- My assignment to a group project for my Terrorism class at VCU
- A brief blurb in The Rotunda, Longwood University's newspaper
- On the roster for the Model United Nations club at VCU
- On the Honor Roll in high school
- Results of a 10K I ran almost 10 years ago
Luckily I was not in any of the image results. A week or so ago, I would have been proud of my relative lack of presence on the internet, but now I see how important it is for me to have a professional visibility online.
Apart from Google search results, I have a LinkedIn profile that badly needs updating. I also edited it so I am now listed as Alyssa Toby Fahringer rather than Alyssa Fahringer. I had not heard of Academia.edu prior to doing the readings for this week, so I will have to create a profile on their site. Coincidentally, one of the digital historians I follow on Twitter tweeted this morning about creating her Academia profile.
And speaking of Twitter, I found several institutions, publications, professional organizations, and historians to follow. While I do not think it necessary to track all movements one might make during the day (I do not care when you are brushing your teeth or grocery shopping), I think it will be a useful medium to keep me informed. For all of my griping, I have a feeling I will like Twitter more than I will admit.
So this is my attempt to start having a professional online presence. Welcome to my little corner of the internet.
Our final week of readings focused on teaching digital methods to students of all ages. Arguably, two of this week's readings could have been assigned in previous weeks. The article on video games could be used in the week on video games and history, and Dan Cohen's piece on the structure of digital history education at Mason could be one of the pieces read in week one. Honestly I would have preferred to read the latter piece at the beginning of the course, since, as a historian, I like to understand the history behind things and the reasons for studying a certain topic.
The most entertaining article this week was Mills Kelly's chapter "Making DIY History?" While reading the article I couldn't help but wish that the course was still offered at Mason, although I understand that offering that class semester after semester would ultimately be the equivalent of flogging a dead horse. The students learned valuable lessons about the methodology behind digital historical production, including how to create academically based sources in Youtube and blog platforms. The grey area of ethics also interests me. To what extent was this endeavor ethical? Is the creation of fictional historical narratives produced with the intent of comprehending the process of utilizing digital methods in history unethical? While I want to say that it isn't unethical, I also can't fully say that it is ethical, and herein lies the conundrum.
Wikipedia should be embraced as a resource by k-12 teachers, especially since Wikipedia is now as accurate as authoritative encyclopedias, such as Encarta. Rather than telling their students that Wikipedia should never be used, teachers should explain to their students the faults of Wikipedia and mention crowdsourcing in as elementary a fashion as possible. Students should be taught how to differentiate between what makes a source reliable and what doesn't, rather than simply knowing that certain sources are not "accredited," as one boy Boyd interviewed said. While it can be used in classrooms in resources, I am not yet ready to assert that Wikipedia should be used as sources for research papers and projects.
The article on using Omeka in the classroom made me think about what other technology CHNM created that can be used in the same setting. PressForward immediately popped into my mind, and I think it could be used in a Clio 1 class or an undergraduate class on introduction to digital methods. In Clio 1, or in the undergrad class if it utilizes blogging, PressForward could be used to curate, collect, and preserve the best blog posts each week into a larger course blog. Students would have to read each other's blogs each week, perhaps comment on one or two, and then go through and nominate a few that they think best analyzes or critiques the topic of discussion. At the end of the semester the course blog would be a compilation of the most informed posts from the students. During the process, students would learn about digital scholarship, including the blogging as a form of academic communication, and have a grasp on how the plugin can be used in a larger context.
In library school one of my two favorite classes was Legal Issues in Information Handling: Copyright and Fair Use in the Digital Age, taught by the excellent Dr. Kip Currier. We explored all manner of intellectual property, including patents, trademarks, trade secrets, and copyright. We read Kembrew McLeod’s Freedom of Expression, James Boyle’s The Public Domain, William Patry’s Moral Panics and the Copyright Wars, among others. I found my opinions aligned most closely with those espoused by McLeod in Freedom of Expression.
I believe that current American copyright laws are far from what the Founding Fathers had in mind when writing the Constitution. The chapter in Cohen and Rosenzweig’s Digital History show how far copyright has developed since the 18th century. This development has not been beneficial for creators and scholars, but has generated significant revenue for commercial enterprises. The Digital Millennium Copyright Act and the Sonny Bono Copyright Term Extension Act reveal how extensive copyright law has become. With the DMCA, corporations are now able to control how we use a particular product even after it’s purchased. This level of control over consumer use of DVDs, ebooks, etc. is unacceptable. As Cohen and Rosenzweig note, copyright law is meant to create a balance between copyright holders and users, but the current laws unfairly tip the scale in favor of rights holders, especially corporations. Intellectual property is meant to provide incentives to create original work, encourage competition, and encourage public access. The laws in place do not fulfill any of these policies.
In order to remedy these problems, I propose that copyright laws should be abolished. Rather, we should fully embrace open access policies and Creative Commons licenses. Especially in the world of digital humanities, everything should be available free of charge in order to properly foster public access and the democratization of knowledge. Those who want to profit from their work (and who can blame them?) should implement a system like that of Gregg Gillis, also known as Girl Talk. Girl Talk is a musician who creates mashups of popular songs without the consent of any copyright holder. Luckily for his listeners, he has not to deal with any lawsuits. All of his albums can be downloaded for free from his website with a “pay as you can” system. I realize this is a hopelessly idealist model full of flaws, but perhaps it can serve as a jumping off point for a conversation on intellectual property law.
I believe that PressForward is the future of digital publishing. Such publications as Digital Humanities Now and Journal of Digital Humanities are both available online at no charge. JDH is a well-respected journal among digital humanists, promotes open access, and disseminates scholarly work to a large audience. PressForward, like all of CHNM’s software, can be freely downloaded from the web. Other academic journals should utilize PressForward and make their content freely available online.
I view the AHA statement on embargoing dissertations a reactionist reaction to the threat that open access dissertations pose. One issue is that scholarly publishing houses are concerned that dissertations published online prior to being published in monograph form won't make as much money. Once again, everything comes down to money. The threat of not making as much, or enough, money, is veiled behind other issues, like the ideas of recent PhD graduates being "stolen," running the risk of not getting tenure if PhDs do not have a scholarly monograph published, and the AHA's strict adherence to maintaining a book-based discipline. Rather than money, shouldn't the issue be the dissemination of ideas? As Trevor Owens writes in his blog, the AHA should be more concerned with that rather than the medium through with those ideas are disseminated, whether it be online or in a monograph. Another topic that should be discussed is tenure. Committees need to recognize that scholarship does not have to reside solely in monographs.
The Google Books case was still undecided when I was in library school and my peers, my professors, and I were eagerly and anxiously awaiting the decision. I was immensely pleased with Judge Chin's ruling that Google Books constitute fair use. While many found his interpretation of fair use to be incredibly broad, the decision was a win for those concerned with access. Judge Chin notes the many benefits of Google Books in his decision: it's a valuable finding tool for researchers and librarians, promotes text mining, increases access to books, gives books new life in a digital form, and benefits authors and publishers. Any tool that increases access to books and promotes reading is inherently good.
Copyright and intellectual property law is meant to encourage creativity and accessibility of information, but in reality they are hindering it. The Copyright, Permissions and Fair Use Among Visual Artists and the Academic and Visual Art Communities report explicitly states that one third of visual artists have abandoned their work due to copyright concerns. This is not what the original copyright laws were designed to do. Copyright holders are frightened that their labors won't result in monetary profit, or, as in the case of the visual artists, that they will unknowingly break a provision of copyright law. The creation and promulgation of ideas should not be about financial reward, but rather should promote learning and scholarship. The current copyright system is broken, and steps need to be taken immediately to fix it. Scholars and non-scholars alike should embrace open access policies and Creative Commons licenses.
Digital scholarship is defined in the readings as scholarship that is created using digital tools and is presented in a digital format. In order to be effective, digital scholarship needs to be interactive; use several methods of communication, such as videos and still images, that are integrated with the text; and provide access to primary sources either directly from the site or through an externally accessed repository. Digital scholarship will continue to have the same purpose as traditional academic writing: providing a compelling narrative and argument backed up with primary source evidence.
With digital scholarship, Tim Hitchcock asserts that students, scholars, and professors are moving away from the book and towards online texts. The benefits of moving away from analog sources like books and towards digital sources are many. On the most basic level, online sources are much quicker to use and can provide the reader with a hyperlink to primary sources or other articles on the same topic. Access and audience other key benefits to moving away from the book. With some sources available online for free, knowledge is easily disseminated and can engage with a wide variety of diverse audiences, many of which were unreachable prior to the digital revolution. Digital scholarship is also cost-effective, and can aid in creating an online presence for universities and institutions. Academic journals and presses can, and some have, transport the traditional methods of reviewing, editing, and publishing to the digital realm. While some might find this to be a negative aspect of digital scholarship, I think it's important to retain these traditional methods in order to ensure the accuracy, reliability, and authoritativeness of academic work.
Blogging is one area of digital scholarship in which editorial supervision is not necessary. Blogs are by nature informal, and several authors in this week's readings have stressed the importance of presenting ideas and research in unregulated spaces such as blogs. They can be used to interact with various audiences, including academics and others who are simply interested in the topic at hand. They can also serve as writing groups, where authors can share their work and receive feedback. The blogosphere is one area where people are encouraged to write about their research in a less constrained way. As Melissa Terras demonstrated in her article, blogs and Twitter are platforms on which scholars can reach wider audiences and promote their scholarship.
There are some negative aspects of digital scholarship. While I think the informality of blogging is valuable, others find the lack of editorial process and traditional peer review problematic. There is also the issue of permanence, which applies to digital scholarship as a whole. With technology in a constant state of change, will digital scholarship be accessible and usable ten years from now? How will we address this problem in order to preserve our born digital work?
In order for digital scholarship to be effective, there are several issues that must first be addressed. Ed Ayers claims that digital scholarship needs a greater focus and purpose, and more of a sense of a collective identity. In addition, digital scholarship should be seen as a movement across all disciplines of higher education. There are not enough scholars willing to use digital scholarship. People are still too hesitant to trust digital scholarship as a true and serious form of academic history.
As a librarian, I must strongly object to Tim Hitchcock's statement that the book is dead. I worked as a public librarian, and can easily refute Hitchcock's claim that the book is dead, at least for the wider public. As a student, I still largely rely on print books and print-based sources for conducting research. I am curious as to what my other peers in Clio think about Hitchcock's radical assertion. Do they primarily use digital sources now? While I understand there is a growing trend in the academic community towards the digital, is the book really dead?
I am not an avid user of games, and the last time I remember playing them was in elementary school, when I used Oregon Trail and Number Munchers. I had never considered games to be a form of serious history prior to doing this week's readings, and learned quite a lot about how they can be used to engage people in the discipline.
In order to be effective, games need to be an immersive, authentic experience. They have to be visually interesting to engage the user, and games should provide users with a feeling that they are actually taking part in history. There should also be interactions with various characters within the game in order to provide a sense of exploration. Games should be open-ended, allowing users to choose between a variety of choices without having a specified outcome. This was one of the failures of the Lost Museum. The creators made a limited set of choices available to the users, which did not allow them to make any historical inferences or to come to their own conclusions. The Lost Museum allowed the choices of the designers to have more weight than those of the users, which is not how serious historical games should operate. If users have more control than the creators, then they are able to interpret the primary sources and reach their own conclusions, which is the point of historical inquiry.
It is also important that not all historical games are created to interpret political or military history. Social and cultural history are equally important and can benefit from using games to engage students of history. The game Pox and the City is one example of this. Dealing with medical history, the creators also wanted to show social practices and customs and the differences between social classes. They were able to illustrate those differences through the clothes, furnishings of the rooms, foods offered on the dinner menus, other guests, and the topics of conversation in households of various social standing. Not only is this an ingenious way to highlight the differences between classes, but it also takes an incredible amount of time and energy to create these meticulous yet crucial details.
Games can be and are used as a pedagogical tool. Undergraduates or high schoolers can use games to teach them how to do research, and graduate students and researchers can use games to help them determine what to look for when going to the archives. At the most basic level, historical games require the users to read and interpret primary sources, which is a useful and necessary skill for any student regardless of discipline.
I am too much of a traditional academic to be fully persuaded by Trevor Owen's article "Games as Historical Scholarship." His points are valid, and indeed it does seem like games could potentially be a sound form of historical scholarship. I think that games could be used as a tool to supplement a monograph or journal article rather than a form of scholarship in and of itself. If historians are having a problem embracing the tools and topics we've covered already in this class, such as text mining, networking, and mapping, then I think that they would be even more hesitant to embrace games. Despite this week's readings, I have a feeling that academia would not be fully supportive of using games as another form of serious scholarship.
These are the questions Ron and I came up with on the topic of crowdsourcing:
1. What is crowdsourcing, and how do you, as a historian, feel about crowd-sourced history?
2. In Leslie Madsen-Brooks’ blog, she talks about Consensus vs. Expertise in Wikipedia creating a “collision of cultures.” What does she mean? Do you think Wikipedia’s editorial policies are too democratic? Why, in the author’s view, are professional historians sometimes reluctant to contribute? What happened to Timothy Messer-Kruse and the Haymarket Trial?
3. Do you agree with Dan Cohen’s quote in the Rosenzweig article in which he states: “[sites like Wikipedia] that are free to use in any way, even if they are imperfect, are more valuable than those that are gated or use-restricted, even if those resources are qualitatively better”? What does this mean for the future of serious history on the web?
4. How does audience play into the creation, collection of materials, editing process and overall usage of sites like Wikipedia, Ancestry.com, the 911 Digital Archive, and the Hurricane Digital Memory Bank? [Ancestry.com is not a crowdsourced site.]
5. What are the negative implications of crowdsourced history? What are some ways that the process of crowdsourcing can be improved?
6. Are there certain times or projects when crowdsourcing is appropriate or inappropriate? Why?
I came to class last night expecting crowdsourcing to be a topic of hot debate, and I turned out to be right. I know that crowdsourcing is something most academics and especially historians have very strong feelings about, and I was pleased to hear a broad range of opinions from my colleagues last night. Between my co-leader, Ron, and myself, we only asked three of our six questions, although Jordan did try to bring the conversation back on track when he referenced one of our other questions. I wouldn't attempt to claim that it was the questions I posed that engaged the class; rather, it was topic that was engaging. I think that crowdsourcing is one of those topics where we could have simply opened up the floor for discussion without any questions and someone would've jump-started the conversation for the night. Ron's question regarding the consensus of expertise was incredibly insightful and brought up a valid point that got to the root of crowdsourcing and academics' problems with it.
The lively discussion focused on many different aspects of crowdsourcing. We began defining crowdsourcing, how crowdsourcing is not necessarily 'crowd'sourcing since the projects that utilize it attract a small cohort of volunteers, and when and in what projects or situations crowdsourcing can be helpful. Marion brought up the topic of gender inequality, which is something I found particularly troubling. I would have liked to have further discussion of how we can attempt to diversify Wikipedia and make it more attractive to potential female members, but the conversation quickly shifted to the broader implications of Wikipedia. Not surprisingly, most of my peers had negative feelings about Wikipedia. We discussed why academics are hesitant to spend time contributing to Wikipedia, even though that might make the site more credible and reliable. We also considered how the audiences for projects like the 911 Digital Archive and Transcribe Bentham are different, as well as what those projects are actually doing and how they're using crowdsourcing.
The discussion made me think about crowdsourcing from the point of view of a historian, as opposed to a librarian. While I still consider crowdsourcing and Wikipedia to be invaluable assets to the academy and the general public, respectively, I was intrigued by my colleague's opinions on the topic, and in what ways they have engaged in crowdsourcing or would find it useful in their research. What most sparked my thinking was one of the last comments of the night. Dr. Robertson stated that crowdsourcing turns our notion of audience on its head. Rather than writing a monograph towards a particular audience, we can now use the audience to do work, whether that is through having them transcribe documents or collect their photographs from a momentous occasion. We can use the crowd to further our own purposes, we need to appreciate our various audiences, and we need to understand how to connect with them.
In library school we had many discussions about crowdsourcing and the perils of Wikipedia. One of my professors thought Wikipedia was an undeniable evil, creating and promoting inaccurate information for everyone to see and access. Another professor thought Wikipedia was fantastic since it involved people in the production and promulgation of knowledge. I tend to agree with the latter much more than the former. The internet exists: why not use that power to build a crowdsourced encyclopedia? Obviously Wikipedia cannot be used as a source for any academic work since it is not a reliable, authoritative, or accurate site. Despite these shortcomings, Wikipedia should still be used for non-academic purposes. The authors of this week's readings commenting on Wikipedia never seem to take into account Wikipedia's audience. Wikipedia was designed to be used by everyone and anyone, but the people who read Wikipedia are not generally academics, nor are they reading for purely academic purposes. People do not use Wikipedia and expect to read a fully academic account of a historic event or person. There is nothing wrong with Wikipedia not accepting primary source research in their articles, since encyclopedias are not the place to present such research.
Crowdsourcing is not an apt term, as Owens points out. Madsen-Brooks states that 65% of Wikipedia editors are men, and Causer et. al. detail how Transcribe Bentham relied on seven dedicated volunteers who produced more than 70% of the transcripts. Crowdsourcing usually relies on a small cohort of interested volunteers, which makes "crowd" inaccurate. While Owens objects to using both crowd and sourcing, I think the implications of the former term are more important. The historical sites on Wikipedia are generally being written by middle-aged men in white-collar professions. We can never seem to correct the trope that history is told by and reflects the experiences of fairly well-off white men, and excludes the experiences of marginalized people. Wikipedia needs to take pains to correct this error, and needs to make their site a more welcoming place for women, both as contributors and as a place for women's history.
There are other negative aspects of crowdsourcing. As the efforts of Transcribe Bentham showed, crowdsourcing can be very expensive and time-consuming. Sometimes crowdsourcing projects have to develop their own transcription tool, which takes time and money in the forms of programming, infrastructure, and digitization. Once the transcription process is over, workers have to correct errors, which can be a lengthy process. The project needs to publicize their efforts or it will not garner as much attention as wanted or needed to sustain it. Transcribe Bentham only got off the ground after the New York Times published an article describing the project. Workers on the Hurricane Digital Memory Bank passed out swag during Mardi Gras to raise awareness of the site. Money is always an issue, and without a continuous source of funding projects can burn out.
The positive outcomes of crowdsourcing heavily outweigh the negatives. At the heart of crowdsourcing is the democratization of knowledge. More people are accessing more and more information thanks crowdsourced projects like Wikipedia. The converse is also true: more people are contributing more and more information to crowdsourced projects. Efforts like the 911 Digital Archive seek to collect the histories of people's experiences that would otherwise be lost or forgotten. Crowdsourcing can, and should, reach diverse audiences, and garner the stories of marginalized peoples.
As historians, we should accept crowdsourcing, contribute to projects when we can, and treat the expansion of knowledge and access to that knowledge as one of the greatest outcomes of the digital age. The information online will not always be accurate and history will often be decontexualized, as seen by the @HistoryinPics Twitter account and others like it. In spite of this, we need to understand that though crowdsourcing is a flawed method, it is often used as a last resort. When computers, an individual, or a group cannot complete the task, then we turn to crowdsourcing. It is a way to engage with a wider audience, and anything that promotes the appreciation of history can only have positive effects.
I was excited to do the readings this week since they are discussing public history. Since I am hoping to go into public history I was curious to see what this week’s articles would say about how public history can be applied to the web, the challenges that entails, and how putting history online is inherent to the democratization of history.
Smith’s article does not ever discuss audience. While The Great Chicago Fire is designed for use by all age groups and educational levels, not all online public history needs to attempt to reach such a wide audience. Does all public history need to be serious scholarly work? What if you are only attempting to describe the Civil Rights Movement to elementary school children? I would think that would be inherently descriptive in nature, and wouldn’t fall under the category of “serious history.” I do agree with Smith’s definition of serious history – history that is original work based on primary source evidence, is aware of other research, and makes multiple points about the subject.
The two websites of The Great Chicago Fire are noticeably different at first glance. The first, from 1996, is obviously dated and it is difficult to maneuver through the site. The second, from 2011, is more aesthetically pleasing and is organized in a much more sensible fashion. I particularly liked the use of mapping in the Touring the Fire section.
Tebeau presents a great argument on the democratization of history in his article about the Cleveland Historical Project. Tebeau argues that oral history is particularly suited for digital public history since sound has a certain capacity to evoke a sense of place. Cleveland Historical is both a website and an app, and by listening to an oral history on a mobile device users are transported to the time and place in which that person lived. In creating the site, Tebeau utilized community crowdsourcing rather than the more orthodox approach to crowdsourcing, in which the community was trained in documentary techniques. This can ensure that the information produced and put online is both accurate and of high quality. Cleveland Historical is not only aimed at the entirety of the community but actively engages the community in producing and creating the work found on the site and app. This is democratized history – one that is created by the people who also serve as the audience. I would have liked to hear more about the technical side of creating this website and app. Apps do not have a long life span and need to be updated constantly. How did Tebeau and his team address this problem?
Wyman et. al. share some best practices for how the museum field can keep up with changing technology, and how that technology can be adapted and used in museum settings, whether that is the traditional setting or through an online presence. Museums have largely stepped up to the plate concerning their use and presence on social media. The National Archives, the Smithsonian, and the National Museum of American History all have some sort of social media presence and use that to engage with a wider audience. While the suggestions were helpful and interesting to think about, I started to consider how smaller museums and institutions would be able to implement them. Such places generally do not have large budgets. How are they able to cope with this move toward the digital? What challenges have they faced? What is their online presence like?
In Lindsay’s article, she discusses how museums, heritage institutions, and the like can effectively engage with virtual tourists. What struck me the most about this article was the importance of a unifying narrative. According to Lindsay, these places must develop an overarching narrative that shows the everyday life of the period and includes multiple voices of participants. This reminded me of when I went to the International Spy Museum. When first admitted, I was able to choose which spy I wanted to be. Should I be a 45-year-old banker from Madrid, traveling to London for “family reasons”? Such methods allow visitors to engage with the material on a more personal level, in addition to making them think about the information in a different way. The narrative is unified in terms of visitor experience – they identify with said narrative, experience that narrative visually and tactically, and think about the information from that person’s perspective. I believe the Holocaust Museum employs a similar tactic as the International Spy Museum.
Both Terras’ blog post and Sherratt’s article show how democratizing history, digitizing content, and placing information on the web can be used in ways contrary to the original intent. On her blog, Terras uses digitization’s “most wanted” to frame a discussion about shallow versus deep engagement with digital collections and how social media impacts digitized collections. Sherratt shows how visitors to Trove often use the content for social media purposes. This is another consequence of democratizing history. People will use the content for non-scholarly purposes (such as posting a funny picture of a dog with a pipe on Facebook), and will engage with the material in a different way than was intended. I do not think this is always a bad thing. When someone pulls a picture from a cultural heritage site solely to put it on a social media platform, this action increases the amount of visitors to the cultural heritage site, publicizes that content, and increases awareness of that institution. So while the intent is not in any way scholarly, it lets the content from that institution reach out and speak to a much wider audience. There are downfalls to democratizing history, but being snobbish about how sources are used is not one of them.
I must admit that I have been finding the readings more and more difficult to understand as the weeks have gone on. Since we’ve started reading about and discussing the actual methods of digital history, such as text mining and networks, it’s been hard for me to fully grasp the readings since I’ve never worked with these tools before and the subject matter always seems more abstract until I can work with the tools myself. The setup of the readings this week (and the past week with networks) was a few theoretical articles followed by articles detailing uses of GIS and mapping in scholarship, along with the accompanying websites. This has helped me in understanding the application of this certain historical tool.
Hitchcock brings up the point that historians and geographers should be in constant dialogue. This is a valid point that has been brought up in readings from past weeks. All humanities scholars should be working collaboratively to further scholarship in the disciplines that fall under the humanities umbrella. Hitchcock’s notion of the infinite archive is an interesting one. He argues that the infinite archive is the driving force behind a change that is bringing the disciplines of history and geography into a more direct relationship. The infinite archive has turned text into data, which can then be used to create geographical representations. I also agree with his point that using maps to aid in interpreting history will draw in a wider audience.
Harris, Corrigan, and Bodenhammer state that humanistic mapping runs counter to traditional GIS mapping. They mention deep maps, which is different than GIS mapping in that the user is no longer just an observer but can experience through a sense of actually being in the world. I didn’t fully comprehend their explanation of a deep map and hope that we will mention this in discussion. Are Visualizing Emancipation and ORBIS examples of a deep map? Their definition of GIS for humanities seems accurate and accounts for the changing nature of historical data.
In mapping it appears that scale has two different meanings: it can define the spatial and temporal reach of specific practices, and it is also how observers frame social activity; it is both practiced and perceived. Visualizing Emancipation uses both meanings of scale to map emancipation. The Ayers and Nesbit article really made me think about how I think about historical information. They were able to create this fantastic mapping tool to show the process of emancipation because they were able to change the way that they thought about their sources. The concept of deep contingency is new to me, and I liked seeing how they used this concept to create their tool. Their final argument, that seeing the patterns of emancipation can help scholars understand the profound social changes it created in American history, is more believable after visiting the Visualizing Emancipation site and playing around with it.
With Visualizing Emancipation, you can choose to view the information as either a map or a list. You can select emancipation event types, such as abuse of African Americans, capture of African Americans by Union troops, conscription and recruitment by the Union, conscription by the Confederacy, fugitive slaves, and more. The sources are from books, newspapers, official records, and personal papers. It is a great way to visualize the data. What surprised me the most was the amount of sources they used to compose the map. I also liked that each point on the map gave the bibliographic information of the source, since this promotes transparency.
ORBIS is another mapping tool that shows the transport of goods and the movement of people in the Roman Empire. Meeks and Grossner argue that interactive scholarly works (ISW), of which ORBIS is one, blur the line between academic and popular scholarship. They state that to treat ISWs as simply tools undercuts the possibilities that ISWs and other similar models bring to the humanities. While I do not agree that ISWs should be methodologies, I do think they are useful tools and any ISW that widens the historical audience can only be beneficial in the long run. I thought that Dunn made a good point when he stated that ORBIS does not try to simulate agency. The user must interact with ORBIS in the context of their own analysis or interpretation, which makes ORBIS a tool since it cannot independently create research conclusions. ORBIS is so popular that the creators have decided to update the current version.
What struck me the most when reading Dr. Robertson’s article on Digital Harlem was how much information is left out of historiography and how much was gained by inputting the sources into the mapping databases. As noted by the author, location is not a precise science for most historians, but when studying it closely, location can show wider patterns and trends among the population, and can show how that population actually lived. Historians examining location have to use sources that are not always valued by other scholars. Digital Harlem revealed new information on nightlife, traffic accidents, and how blacks moved through the city. The articles on Digital Harlem showed how digital tools can be used to fill holes in the historiographical gap. While I had previously understood that text mining and networks could show patterns and trends, I never fully grasped that they could reveal the holes in existing scholarship.
Week 7 readings
Visualizing historical data in the form of networks comes with many problems and benefits. The definition of visualization, as found in Theibault's article, is "the organization of meaningful information into two or three dimensional spatial form intended to further a systematic inquiry." Networks are made up of nodes, which are the connection points, and edges, which are the lines between those points. Visualization can be used to identify patterns in data sets and can enhance the presentation of arguments. In this way networks are similar to text mining in that both can be used to identify patterns from a large corpus of information and should be used only as a tool for analyzing sources, not as a methodology in and of itself. In order to be an effective research tool, they need to be information rich, transparent, and accurate. They can be used to show how people move, how information spreads, or where goods travel. Visualizations can serve as either a narrative or an analysis, but the authors of this week's readings tend to caution against using visualizations as purely narrative structures.
Visualizing networks has many pitfalls, which Theibault, Drucker, and Weingart spend large portions of their articles and blog posts detailing. The network theory of graphs uses mathematical principles that don't lend themselves to interpreting the experiences of human beings. Drucker maintains that putting statistical information into graphs gives it a semblance of simplicity and legibility, but there is no understanding of or information about the original interpretive framework on which the data of the graph was constructed. Visualizations can be ambiguous and uncertain. Weingart mentions that networks have no memory since the nodes and edges do not contain any information as to how the networks are traversed. Any scholarly work based on quantitative methods need to make clear their research process as well as the process that went into creating the visualization. Since networks are created by algorithms it is important to understand how the algorithms work and to make sure that the mechanics of the visualization are available for anyone to see. By applying your research to a network you are fitting it in to certain standards. How do those standards and the graph affect the integrity of your research? Visualizations has the same problem as all other methods of digital humanities: transparency.
Drucker argues that if the scholarly community wants to use visualizations as a part of humanities scholarship, then all data needs to be reconceived as capta. Data is information that is given, whereas capta is information that is taken actively. According to Drucker, capta is much more representative of how humanities scholars produce knowledge. I would be interested to apply this theory of capta and data to other methodologies utilized in the digital humanities. At first her article was difficult to understand and I had to go back and re-read her main arguments before fully grasping this idea. I had never thought of information input and output in those specific terms, but they do seem to make sense. With the rise of digital humanities, it is important that we learn to differentiate between information that is taken and information that is given.
Friot's blog post will be helpful for the practicum this week because it was a step-by-step account of instructions on how to use Gephi, a visualization platform. It was also helpful for me to see how I could go about using networks in my own research, starting from the very beginning. Unlike the ngram viewers we used last week, when networking with Gephi you can input your own sources into the program, so I think I will have an easier time understanding how I can use it in my research.
My first thought on examining Mapping the Republic of Letters was that the website should be better organized. For example, when I clicked on Algarotti, I had to look down at the related information to find the visualizations of his travels and network. Shouldn't that information have been in the body of the site, with the rest of the information pertaining to him? Regardless of this minor quibble, this is an incredibly fascinating site. I particularly liked how the Grand Tour site gave a lot of information as to how they came up with each visualization and the thought process behind them. The page for D'Alembert has the same type of information, and I think any sort of digital humanities should have something in this same vein that explains how the researcher(s) arrived at the visualizations. While they do not explain how the visualizations work, I think it is equally important to detail the beginning stages of such work. Some pages, like Locke's, haven't yet been populated with any visualizations, but there is still information as to what the visualizations will be detailing. Mapping the Republic of Letters really helped me to understand how visualizations can be used as a tool to aid historical scholarship. I mean, look at how neat Voltaire's page is! I especially liked the visualization at the end that networked the correspondents Voltaire and Benjamin Franklin had in common. Mapping the Republic of Letters is a great resource, and provided me with a solid introduction to using visualizations for digital history. I almost think that visualizations are better than text mining, but it does depend on the anticipated end result and what sort of research is being performed. It seems that visualizations provide more information to the viewer, but this could just be a novice assumption.
We seem to have moved on from the question of 'what is digital history' to how to perform and use digital history. The topics for this week's readings are text mining and topic modeling. I was eager to read the articles this week because text mining and topic modeling are two phrases that I've heard used but have had no idea what they actually mean. Close reading, distant reading, and ngrams (as well as ngram viewers) are other words/phrases used this week that I need to take some time to examine more closely. I wasn't sure what they were or how they are used in the field of digital humanities, so this week's blog post will focus on these new terms and how the articles defined and discussed them. Some of these concepts continue to remain a bit abstract to me, despite articles like Blevins' and Kaufman's that demonstrate their application, so these definitions might not be fully accurate.
Text mining, also referred to as data mining, is a quantitative method that analyzes words from a large corpus of text. It is a tool that can be used to understand the history of culture and can complement the ways we already organize historical information. Text mining can be used as an exploratory technique to determine what areas of a research topic need to be fleshed out and written up using more traditional methods. Searching is a form of text mining, but it is not a pure form in that it only shows the user what he/she is already expecting. Scholars have questioned the validity of this method, since meaning can be found in a much wider variety of cultural objects and not simply text. Words can be ambiguous and are dependent entirely upon context, which a computer cannot understand. The underlying language of text mining is Bayesian statistics. Underwood argues that text mining can be used to help scholars think more deeply about existing practices of algorithmic research.
Topic modeling is a computational method that shows patterns in text. It uses an algorithm to organize the language of a collection of text into clusters of terms that tend to occur in the same contexts. The clusters of terms are "topics." The modeling program then creates a visual map to show which groups of words appear most often in a group of documents. This method can show patterns throughout the collection of text that the reader might not be able to see. Rather than grouping documents that have the same words in them, topic modeling groups the words themselves that appear in the same documents. This is a popular method since the maps created are easily interpreted. In addition, topic modeling allows the researcher to examine the larger trends across the corpus rather than analyze individual documents. Topic modeling is incredibly useful in identifying patterns that scholars cannot explain, which then prompts more research questions, according to Nelson.
Close reading is more traditional than distant reading and involves the close examination of small body of text. Distant reading utilizes a large collection of text and is computer-enhanced. The latter focuses on the ways in which content and meaning emerge across a large scale. Topic modeling is one way of practicing distant reading. Some of the authors mentioned a dichotomy between close and distant reading, but all authors mentioned that the two should be used in conjunction in order to provide more in-depth, complex research findings. Digital methodologies must allow scholars to move easily between close and distant reading.
An ngram is a word unit, and an ngram viewer is an application of text mining. The authors of the articles seem to have an overwhelmingly negative view of Google's ngram viewer. The viewer ignores the meaning of words when it searches, thus the words are taken out of context, which is not helpful when conducting research. In addition to the critique regarding context, Gibbs and Cohen argue that the viewer does not show transparency, nor does it allow users to interact with the interface.
As a librarian and a historian, I find databases to be incredibly helpful. What is a database? At the most basic level, it is a structure of collected data. As Lev Manovich points out in "Database as a Genre of New Media," there are different kinds of databases: hierarchical, network, relation- and object-oriented. Databases have completely revolutionized how scholars search for information. I noticed that the readings were ordered in the same sort of pattern as the weeks before: the first few articulated the topic at hand, then discussed the negative aspects before concluding with the positive aspects.
What are some of the problems or obstacles associated with databases? Scholars no longer have to go to archives to find their sources. Some will never physically experience an archive, hold fragile primary source material, or wear the white gloves. Tim Hitchcock, in "Digital Searching and the Re-formulation of Historical Knowledge," argues that the digitization changes the nature of the archive in two different ways: how the archive is used and how historians experience it. Historians can now experience archives while sitting in their pajamas on their living room couch. The archive has transitioned from being a mobile, physical location to being accessible whenever one has an internet connection. Manovich sees this increasing distance between scholars and the physical archive as a problem. As Patrick Spedding discusses in "The New Machine: Uncovering the Limits of ECCO," not every single piece of information is accessible via databases. There is a selection process that happens when deciding whether or not to digitize certain materials, and the largest factor in that decision oftentimes is whether or not the commercial vendor will make any money from digitizing that source. This is definitely a serious issue that must be dealt with as soon as possible. I believe that all material within the public domain should be available to the public at no charge. Vendors such as ProQuest charge institutions incredibly high prices for access to such materials, when those materials should be freely accessible in the first place. Yet another problem discussed in this week's readings is that there are no standards in place for citing digital sources, articulating their search processes, and describing their results. Historians must do their due diligence in researching how databases gather and display their information. They should determine whether the text being searched has been run through OCR software or transcribed by hand; how often the body of material of the database is updated; if fuzzy logic is employed when displaying search results.
Databases have many advantages. The two foremost benefits of databases are efficiency and access. Scholars can undertake significant research in their field without having to travel great distances. Lara Putnam calls this phenomenon "geographic unanchoring" in her article "The Transnational and Text-Searchable: Digitized Sources and the Shadows They Cast." Millions of sources have been digitized and have furthered historical study in numerous ways. Caleb McDaniel articulates how beneficial databases and online sources have been to early American historians in "The Digital Early Republic." Without digitization and databases, much of their source material would still be in boxes and would have remained untouched until a lone researcher stumbles upon it. James Mussell, in "Doing and Making: History as Digital Practice," argues that digital history does not take the historian away from primary sources but provides the historian with a new context to encounter those sources.
While many people rightly have concerns about databases, including how they collect, curate, store, and display information, the greater accessibility to materials is so disproportionately advantageous that those concerns seem almost trivial by comparison. That is not to say those issues shouldn't be addressed - they should be, and sooner rather than later. We need to create standards for describing our digital research processes, and we need to figure out how to push controlling commercial interests out of the scholarly field. Despite these obstacles, historians and scholars should embrace the changes brought about by the digital revolution.
When I was working on my Master's degree, I had the opportunity to work on a fairly large-scale digitization project. The Archive of European Integration is digitizing and putting online official European Union documents. As a part of the project, and together with other Master's students, I disbound documents; scanned them using a scanner with an automatic sheet feeder; and then page-checked and bookmarked the PDF versions. There were quite a few steps that the documents then had to go through before being put online for others to access, including having the scanned versions run through OCR software. Despite being highly repetitive, this job gave me invaluable experience with digitization and made me much more appreciative of the amount of work that goes into such efforts.
There are both positive and negative aspects of digitization. One of the disadvantages is that OCR technology is not fully accurate. In "Deciding Whether Optical Character Recognition is Feasible," author Simon Tanner gives an example of how flawed OCR can be. Hypothetically, if there was a page of 500 words with 2,500 characters, and the OCR was 98% accurate, then 50 of those characters would be incorrect. This can create a host of problems for researchers. Ian Milligan, in "Illusionary Order: Online Databases, Optical Character Recognition, and Canadian History, 1997-2010" states that OCR can alter research, lead to missed hits while performing searches, and lays a flawed fundamental layer beneath historical research. Bob Nicholson provides one solution to fixing OCR problems in "The Digital Turn." The British Newspaper Archive enables suers to manually correct OCR errors, as well as tag articles with their own keywords. Crowd-sourcing might be one way to deal with this problem. Tanner mentions that OCR is only one way to deliver text to a user. Why do we rely so heavily on OCR rather than other methods? Is it the most accurate (despite its inaccuracies)? Regardless of whether it is or is not, what are the alternatives to OCR and what are the drawbacks to using them?
Yet another problem with digitization is the loss of integrity that occurs when a source is digitized. When any analog source becomes digital, there is a loss of data. How are we able to retain the original integrity of the piece that became digitized? Newspapers provide a great example of this loss of integrity. Prior to becoming digitized, newspapers were originally in a single issue format. They were then placed into bound volumes, and then were microfilmed. Each of those remediations places the user one step further away from the original text. We need to be especially careful to collect as much information as we can prior to digitizing in order to try and remedy the loss of data. This is one reason why metadata is so vitally important to researchers: the more information you have about a source, the better informed you are as a researcher, and your research will be of a higher quality because of it.
Another issue brought up in the articles this week is that of transparency in the research process. This is something that I, as both a librarian and a historian, am passionate about. While this is not a common practice in the historical discipline, I think it is highly important for researchers in any field to discuss search strategies and results; keywords used; what was successful and what was not. All scholars should be able to describe in explicit detail how they arrived at a specific source so colleagues can replicate that search process. This would allow historians to collectively discuss any problems they might have, and then they could work together to find solutions to those problems. Transparency in research is incredibly necessary, even more so now that we are using digital sources. When we are creating bibliographies, why do we not cite digital sources or at least detail our search process that helped us reach those sources? This should become a "best practice" for historians in the 21st century.
There are many ways that digitization has made the research process exponentially easier. Both Nicholson and Sarah Werner, in "When Material Book Culture Meets Digital Humanities," highlight the positive aspects of digitization and OCR technology, and the ways in which they benefit material culture. Werner's article, which is a highly engaging read, shows how microfilm does not always capture what it should, since it can only print in black and white. She also examines the ways that digital tools can be used to study physical characteristics of text, which is just one use of digitization that would not have occurred to me but is fascinating to read about. Nicholson argues that we are on the cusp of a "digital turn" in scholarship driven by these new technologies and new possibilities in research. Instead of focusing on the negative aspects of digitization, he believes it is far more beneficial to focus on the various opportunities that have been created.
Overall, digitization and OCR technologies have made research much easier, although inaccuracies and loss of integrity are serious obstacles that should be studied and remedied. In addition, transparency in research should be promoted and encouraged until it becomes standard practice across all disciplines.
In response to Manoff's article, I do believe that libraries have adequately met the challenges posed by digital technologies and continue to work to keep themselves abreast of technological changes and advances. Since the article is dated, I feel that the issues raised have largely been dealt with, thus making my comments superfluous. What I would really like to see is a follow-up to the original article: would Manoff agree or disagree with me?
The topic for this week's class is "What is Digital History?," which is the very same question I get asked when I tell people about the Digital History Fellowship. I was unsure of how to answer that question, and despite this week's readings am not yet confident enough to espouse my own definition. My goal is to have a definitive answer by the end of this semester.
One of the articles, "Interchange: The Promise of Digital History," asks several different scholars how they would define digital history, and William Thomas, William Turkel, Dan Cohen, and Steven Mintz all have slightly different explanations. From the articles I can say that digital history involves the following:
- Utilizing new technologies to spread and democratize history
- Is a scholarly endeavor as well as one pursued by non-academics
- A new methodology that allows the public to interact with history
- Much more than just digitization
Digital history also has a different goal than "analog" history, by which I mean the history pursued by purely academics in a non-digital form. Sure, all historians want the general public to engage with history on some level, but in the analog world they are the best at spreading that knowledge among other members of the academy through peer review, conferences, and more. Digital history allows the public to see, explore, and interpret history in a more tactile, innovative, and creative way. Does this mean that digital history falls under the umbrella of public history? What if digital history is pursued by academics with PhDs - would it then be considered as academic? I realize that there is a spectrum, but I have yet to pinpoint where digital and public history fall on said spectrum. The definitions of digital history and public history remain ambiguous to me and I look forward to teasing out the distinctions and similarities throughout the semester.
What really struck me about these articles is the many challenges that digital historians face. In his article "Scarcity or Abundance: Preserving the Past in a Digital Era" Roy Rosenzweig argues that digital history will either have a future of scarcity or abundance: either we will be unable to preserve digital history, or we will be inundated with information due to storage capacity (also, I now understand where titles for O'Malley's and Takats' blog posts come from). In my opinion we are in an age of abundance, but preservation remains an issue for digital historians. What tools are in place to ensure that our digital history will be accessible and usable years from now? Preservation is also discussed in Rosenzweig & Cohen's "Promises and Perils of Digital History," and Seefeldt and Thomas identify preservation as one of the three main challenges of digital history.
How is digital history assessed? Ayers contends that digital history and new media must have standards in order to not dilute the authority of the historical discipline. This is especially important since anyone can use the internet, establish an online identity, and then create digital history. The information on the internet is not always reliable or factual. A popular Twitter account, @HistoryinPics, tweets images that are not authentic (Sarah Werner wrote about this on her blog in this post), but many people assume they are. There needs to be some set of encompassing criteria that can be used by anyone in the field to determine the authoritativeness, accuracy, and quality of work published as digital history.
Another significant obstacle is ensuring that all digital history is open access. I think that the model the Center for History and New Media employs is the best route possible for enabling the largest dissemination of digital history and digital humanities: using Creative Commons Licenses. PressForward, Digital Humanities Now, Histories of the National Mall are all licensed under Creative Commons. In addition, Omeka, a web-publishing platform, and Zotero, a citation tool, are both open source. It is important to not restrict access to any of the digital humanities; not only should they should not be prey to commercial interests but the information should be freely accessible.
But what about the digital divide? In "Promises and Perils of Digital History" Rosenzweig and Cohen briefly mention that there must be a way to reach those that have no internet connection. The National Broadband Map (which is a bit outdated - the last update was in December of 2013) displays broadband availability in the United States. There are still many regions in the US where such technology is not offered. Another issue with the digital divide revolves around those who use mobile phones rather than broadband, which was discussed in one of last week's readings "40 Maps that Explain the Internet." Egypt is one example: 2.7% of Egyptians have internet access at home, but 10 times that number have internet access on their phones ("40 Maps that Explain the Internet," Vox, accessed September 4, 2014, http://www.vox.com/a/internet-maps#list-2). Is digital history accessible and easily utilized on mobile phones? If we want to ensure everyone is able to obtain the information, then the different ways people go online must be considered.
So while I am still hesitant to define explicitly what digital history is, I have identified four of the challenges facing digital history today: preservation, assessment, open access, and the digital divide. As Scheinfeldt urges in his blog post, it is of vital importance for everyone in the digital humanities to work together as colleagues and find solutions. We are lucky enough to live in a world of abundance: let us take care of and treasure that gift by solving these problems.