Civil War Navies Bookworm

If you read my last post, you know that this semester I engaged in building a Bookworm using a government document collection. My professor challenged me to try my system for parsing the documents on a different, larger collection of government documents. The collection I chose to work with is the Official Records of the Union and Confederate Navies. My Barbary Bookworm took me all semester to build; this Civil War navies Bookworm took me less than a day. I learned things from making the first one!

This collection is significantly larger than the Barbary Wars collection—26 volumes, as opposed to 6. It encompasses roughly the same time span, but 13 times as many words. Though it is still technically feasible to read through all 26 volumes, this collection is perhaps a better candidate for distant reading than my first corpus.

The document collection is broken into geographical sections, the Atlantic Squadron, the West Gulf Blockading Squadron, and so on. Using the Bookworm allows us to look at the words in these documents sequentially by date instead of having to go back and forth between different volumes to get a sense of what was going on in the whole navy at any given time.

Looking at ship types over the course of the war, across all geographies.
Looking at ship types over the course of the war, across all geographies.

Process and Format

The format of this collection is mostly the same as the Barbary Wars collection. Each document starts with an explanatory header (“Letter to the secretary of the navy,” “Extract from a journal,” etc.). Unlike BW, there are no citations at the end of each document. So instead of using the closing citations as document breakers, I used the headers. Though there are many different kinds of documents, the headers are very formulaic, so the regular expressions to find them were not particularly difficult to write.[ref]Ben had suggested that I do the even larger Civil War Armies document collection; however, that collection does not even have headers for the documents, much less citations, so the document breaking process would be exponentially more difficult. It’s not impossible, but I may have to rework my system—and I don’t care about the Civil War that much. 🙂 However, other document collections, such as the U.S. Congressional Serial Set, have exactly the same format, so it may be worth figuring out.[/ref]

Further easing the pain of breaking the documents is the quality of the OCR. Where I fought the OCR every step of the way for Barbary Bookworm, the OCR is really quite good for this collection (a mercy, since spot-checking 26 volumes is no trivial task). Thus, I didn’t have to write multiple regular expressions to find each header; only a few small variants seemed to be sufficient.

New Features

The high quality OCR enabled me to write a date parser that I couldn’t make work in my Barbary Bookworm. The dates are written in a more consistent pattern, and the garbage around and in them is minimal, so it was easy enough to write a little function to pull out all parts. In the event that certain parts of the dates were illegible, or non-existent, I did make the function find each part of the date in turn and then compile them into one field, rather than trying to extract the dates wholesale. That way, if all I could extract was the year, the function would still return at least a partial date.

Another new feature of this Bookworm is that the full text of the document appears for each search term when you click on the line at a particular date. This function is slow, so if the interface seems to freeze or you don’t seem to be getting any results, give it a few minutes. It will come up. Most of the documents are short enough that it’s easy to scroll through them.

Testing the Bookworm

Some of the same reservations apply to this Bookworm as I detailed in my last post about Barbary Bookworm—they really apply to all text-analysis tools. Disambiguation of ship names and places continues to be a problem. But many of the other problems with Barbary Bookworm are solved with this Bookworm.

The next step that I need to work on is sectioning out the Confederate navy’s documents from the Union navy’s. Right now, you can get a sense of what was important to both navies, but not so easily get a sense of what was important to just one side or the other.

To be honest, I don’t really know enough about the navies of the Civil War to make any significant arguments based on my scrounging around with this tool. There are some very low-hanging fruit, of course.

Unsurprisingly, the terms "monitor" and "ironclad" become more prominent throughout the war.
Unsurprisingly, the terms “monitor” and “ironclad” become more prominent throughout the war.

The Bookworm is hosted online by Ben Schmidt (thanks, Ben!). The code for creating the files is up on GitHub. Please go play around with it!

Feedback

Particularly since I don’t do Civil War history, I’d welcome feedback on both the interface and the content here. What worked? What didn’t? What else would you like to see?

Feel free to send me questions/observations/interesting finds/results by commenting on this post (since there’s not a comment function on the Bookworm itself), by emailing me, or for small stuff, pinging me on Twitter (@abbymullen). I really am very interested in everyone’s feedback, so please scrub around and try to break it. I already know of a few things that are not quite working right, but I’m interested to see what you all come up with.

Editor Vignette: Edward E. Cross

In my work on Viral Texts, I run across a host of interesting people, including editors whose lives are just as interesting as the stories they publish. To highlight some of these interesting people, I’m writing short posts about them as I research their papers. This first vignette is about the first editor of the first newspaper published in Arizona, before Arizona was even a state. I write about him today on the 150th anniversary of his death.

Edward Ephraim Cross (1832-1863)

Edward Cross began his newspaper career at the age of 15, at the Coos Democrat, a paper in his native Lancaster, New Hampshire. He moved to Cincinnati in 1850, where he continued to work as a printer, now at the Cincinnati Times. 

Soon, Cross became a reporter for the Times, even becoming their Washington correspondent for a short time. But he invested in some mining operations in Arizona, and he moved out to Tubac, Arizona, in 1859. In Tubac, under the auspices of the Santa Rita Silver Mining Company, he began the first newspaper in Arizona, the Weekly Arizonian. Cross had strong political opinions, and those opinions often found their way into his newspaper. He was especially concerned with the need for Arizona to have its own government (separate from New Mexico), since he felt that the two territories had sufficiently different needs to also need different representation in the government. Cross was primarily concerned with Arizona politics, and it seems that in general, the newspaper was somewhat ambivalent about national politics.

Another of Cross’s goals as a newspaperman was to paint a picture of Arizona as it really was. Robert Grandchamp, a biographer of Cross, claimed that many of Cross’s editorials were not meant for Arizonians, but rather for people back East reading the Weekly Arizonian.[1. Grandchamp 59.] (If that’s true, it shows something about how editors themselves viewed reprint culture in the USA.) Just as with every territorial expansion, writers often embellished the benefits of the territorial life and downplayed its dangers. Cross disliked such idyllic portraits of Arizona, so his editorials featured the rough and difficult life of Arizonians.

This desire to portray the hard life in the territory brought Cross into contention with one Sylvester Mowry, a wealthy mine owner who also happened to represent the territory in Congress. Mowry had written some reports about the status of Arizona that Cross felt were too rosy, describing the land as highly fertile and the native Indians as of minimal concern. Cross decided to take on Mowry in the press. He didn’t publish his editorial in the Weekly Arizonian (possibly, he wanted better nationwide than he thought he’d get from the Arizonian), but rather in an Eastern newspaper, the States. A complicated dance of letters and replies ensued (Mowry was in Washington, Cross in Arizona–travel time was definitely an issue). 

Mowry realized that the only way to deal with Cross was direct confrontation, in Arizona. Upon his return to the territory, Mowry issued a challenge. Cross accepted the challenge and the duel was on.

Cross decided to make the duel interesting by choosing Burnside carbines as the weapons instead of standard dueling pistols. Though both men were purportedly good shots,[2. Grandchamp states that each man practiced the previous day; Cross shot up a cactus and Mowry a cottonwood tree.] after four rounds in which neither man hit the other, Mowry declared himself satisfied.

The issue might have continued to be contentious, despite published apologies from both parties, except that a week after the duel, Mowry bought the Weekly Arizonian from the Santa Rita Mining Company. Obviously, Cross would not remain the editor. The paper moved to Tucson and became a paper with stronger Democratic leanings.

Though Cross moved back to New Hampshire after losing the Weekly Arizonian, he remained concerned about Arizona politics and military affairs. He wrote repeatedly to the secretary of war about the situation in Arizona. The attachment Cross felt to Arizona is somewhat remarkable, considering that he lived in the territory for less than a year (he took on Mowry after only one month of residence!).

Later in 1860, Cross invested once again in a silver mine in Arizona, volunteering to travel to the mine as a scout. Though he supported Stephen Douglas for president, his political concerns were primarily local: when the Army left Arizona to deal with the fractious Southern states, Cross’s mining investment was sacked by Indians. After that loss, he left Arizona to serve briefly with the Mexican army of General Juarez.

When war broke out in America, Cross headed back to New Hampshire to command the Fifth New Hampshire Regiment of Volunteers. He served with distinction at many famous battles, including Fredericksburg, Antietam, and Chancellorsville, and he became known for his toughness on the battlefield.

In July 1863, the 5th New Hampshire was among the regiments that fought at Gettysburg. His brigade fought at the Wheatfield, where he was mortally wounded. He died of his wounds on July 3, 1863.

Monument to the 5th New Hampshire at Gettysburg National Battlefield Park CreativeCommons licensed photo by Flickr user BattlefieldPortraits.com
Monument to the 5th New Hampshire at Gettysburg National Battlefield Park
CreativeCommons licensed photo by Flickr user BattlefieldPortraits.com

You can read more about Edward Cross here:
Grandchamp, Robert. Colonel Edward E. Cross, New Hampshire Fighting Fifth: A Civil War Biography. Jefferson, NC: McFarland, 2012.
Cross, Edward Ephraim. Stand Firm and Fire Low: The Civil War Writings of Colonel Edward E. CrossBoston: University Press of New England, 2003.