Civil War Navies Bookworm

If you read my last post, you know that this semester I engaged in building a Bookworm using a government document collection. My professor challenged me to try my system for parsing the documents on a different, larger collection of government documents. The collection I chose to work with is the Official Records of the Union and Confederate Navies. My Barbary Bookworm took me all semester to build; this Civil War navies Bookworm took me less than a day. I learned things from making the first one!

This collection is significantly larger than the Barbary Wars collection—26 volumes, as opposed to 6. It encompasses roughly the same time span, but 13 times as many words. Though it is still technically feasible to read through all 26 volumes, this collection is perhaps a better candidate for distant reading than my first corpus.

The document collection is broken into geographical sections, the Atlantic Squadron, the West Gulf Blockading Squadron, and so on. Using the Bookworm allows us to look at the words in these documents sequentially by date instead of having to go back and forth between different volumes to get a sense of what was going on in the whole navy at any given time.

Looking at ship types over the course of the war, across all geographies.
Looking at ship types over the course of the war, across all geographies.

Process and Format

The format of this collection is mostly the same as the Barbary Wars collection. Each document starts with an explanatory header (“Letter to the secretary of the navy,” “Extract from a journal,” etc.). Unlike BW, there are no citations at the end of each document. So instead of using the closing citations as document breakers, I used the headers. Though there are many different kinds of documents, the headers are very formulaic, so the regular expressions to find them were not particularly difficult to write.[ref]Ben had suggested that I do the even larger Civil War Armies document collection; however, that collection does not even have headers for the documents, much less citations, so the document breaking process would be exponentially more difficult. It’s not impossible, but I may have to rework my system—and I don’t care about the Civil War that much. đŸ™‚ However, other document collections, such as the U.S. Congressional Serial Set, have exactly the same format, so it may be worth figuring out.[/ref]

Further easing the pain of breaking the documents is the quality of the OCR. Where I fought the OCR every step of the way for Barbary Bookworm, the OCR is really quite good for this collection (a mercy, since spot-checking 26 volumes is no trivial task). Thus, I didn’t have to write multiple regular expressions to find each header; only a few small variants seemed to be sufficient.

New Features

The high quality OCR enabled me to write a date parser that I couldn’t make work in my Barbary Bookworm. The dates are written in a more consistent pattern, and the garbage around and in them is minimal, so it was easy enough to write a little function to pull out all parts. In the event that certain parts of the dates were illegible, or non-existent, I did make the function find each part of the date in turn and then compile them into one field, rather than trying to extract the dates wholesale. That way, if all I could extract was the year, the function would still return at least a partial date.

Another new feature of this Bookworm is that the full text of the document appears for each search term when you click on the line at a particular date. This function is slow, so if the interface seems to freeze or you don’t seem to be getting any results, give it a few minutes. It will come up. Most of the documents are short enough that it’s easy to scroll through them.

Testing the Bookworm

Some of the same reservations apply to this Bookworm as I detailed in my last post about Barbary Bookworm—they really apply to all text-analysis tools. Disambiguation of ship names and places continues to be a problem. But many of the other problems with Barbary Bookworm are solved with this Bookworm.

The next step that I need to work on is sectioning out the Confederate navy’s documents from the Union navy’s. Right now, you can get a sense of what was important to both navies, but not so easily get a sense of what was important to just one side or the other.

To be honest, I don’t really know enough about the navies of the Civil War to make any significant arguments based on my scrounging around with this tool. There are some very low-hanging fruit, of course.

Unsurprisingly, the terms "monitor" and "ironclad" become more prominent throughout the war.
Unsurprisingly, the terms “monitor” and “ironclad” become more prominent throughout the war.

The Bookworm is hosted online by Ben Schmidt (thanks, Ben!). The code for creating the files is up on GitHub. Please go play around with it!

Feedback

Particularly since I don’t do Civil War history, I’d welcome feedback on both the interface and the content here. What worked? What didn’t? What else would you like to see?

Feel free to send me questions/observations/interesting finds/results by commenting on this post (since there’s not a comment function on the Bookworm itself), by emailing me, or for small stuff, pinging me on Twitter (@abbymullen). I really am very interested in everyone’s feedback, so please scrub around and try to break it. I already know of a few things that are not quite working right, but I’m interested to see what you all come up with.

Database of Officers of the Line

Becoming an officer of the line in the navy is a bit like getting on the tenure track in academia. Not all officers are created equal–officers such as pursers, sailing masters, and chaplains were classified as officers and received the preferential treatment given to officers. But they could never be captains–they were not in line for those sorts of promotions.

Data

The Naval Historical Center has made lists available of the officers of the navy and Marine Corps from 1775 to 1900. This list is very useful, but it’s not in a format that makes it easy to see the data in the aggregate. It includes both warrant officers (non-tenure-track) and line officers (tenure-track).

I wanted to look at the promotion trends of line officers from the early republic. There was no way to isolate those records in the form the NHC provides. So I built a Google spreadsheet that tracks each line officer’s initial date of entry and his subsequent promotions.

Following my desire to track how social connections changed as the navy developed, I’ve divided the officers into 4 groups, or generations. I had initially planned to do 3 generations, but after doing all the data input, I realized that 4 was a more logical divide.

First generation officers entered the service before 1801, as a rank higher than midshipman.

Second generation officers entered the service before the Peace Establishment Act (or by the end of 1801), but as midshipman. Thus, they essentially became adults in the service, and they learned their craft from the first generation.

Third generation officers entered the service as midshipmen after the Peace Establishment Act but before the end of the War of 1812. Those officers in this generation who became captain rose to that rank in the 1830s and ’40s.

Fourth generation officers entered the service after the War of 1812 had ended. These officers saw almost no wartime service, and many of the ones who achieved captain found themselves having to decide whether to serve in the Union or the Confederacy during the Civil War.

I marked a few things that were interesting that weren’t specifically promotion-related. Though I didn’t record dates of exit from the service, if the officer was discharged under the Peace Establishment Act, I marked it in column G as “PEA.” I also marked records where the official record indicates that the officer was killed in a duel (an idle curiosity about whether duels were really as prevalent as most historians have claimed).

Limitations

Promotions in the navy are a bit tricky because the system of ranks changed considerably from 1798 to 1849 (the end point I selected for my data). But there were four standard ranks that prevailed throughout that time period, so for consistency, I tracked only those four ranks: midshipman, lieutenant, master commandant (then commander, an equivalent rank), and captain. It took until the Civil War for ranks above captain (such as commodore and admiral) to be created, so I didn’t record those.

All told, there are 3441 line officers in the NHC database. I’m not interested in all 3441 of them, most of whom never made it past midshipman. Since my project involves social networks of influence, I’m mostly interested in those officers who stayed around long enough to have influence, generally those who made it at least to lieutenant. However, I put all the line officers into my spreadsheet in case someone else wants the data.

There are several specific limitations on my spreadsheet that anyone who wants to use it (all 2 of you in the world) should be aware of.

  1. There are a few rare instances in which an officer entered the service, resigned, and then re-entered later at the same rank or lower. In those instances, I did not mark the second entrance, but rather treated the officer as if he had never left the service.
  2. There are even rarer instances in which, during the late 1790s, officers were given the commission of captain in order to command galleys, but they were never subsequently given other commands. So I left them out of the record entirely.
  3. I noticed a few discrepancies in dates (promotion to lieutenant dated before promotion to midshipman, for instance). Where possible, I merely corrected the obvious typos. Otherwise, I highlighted the cell of the disputed date.

Uses

Merely recording all this data given me a better understanding of how the promotion system worked in the early navy. But I’d like to do some visualizations showing the relative speed of promotion, how batch promotions work, and a few smaller things. So far I haven’t found a visualization program that will do it. (Suggestions are welcome!)

I’m sure there are plenty of other uses for this data, as well. For myself, it will help me to see where promotions don’t follow the general pattern–these aberrant promotions may very well be indicative of an intervention by a social connection. But I hope other people will be able to use it as well.

 

Poetry and War: Constitution v. Guerriere

USS Constitution v. HMS Guerriere. Public domain image from Naval Historical Center.

In commemoration of the 200th anniversary of War of 1812, here’s an excerpt from Columbia’s Naval Victories, a poem about the naval victories of the Americans, written in 1813 by Benjamin Allen.

In war a lion, though a lamb in peace,

Hull [1. A brief biography of Isaac Hull can be found here.] bears the flag of freedom o’er the seas;[2. A motto often flown on ship’s flags was “Free Trade and Sailors’ Rights.”]

Ready to vindicate his country’s fame,

And add new honours to her injur’d name.

Soon Albion’s banner rises on his view [3. A few weeks previous to this battle, the Constitution had successfully evaded the Guerriere, which had then been sailing in a squadron with several other warships. This time, on August 19, 1812, Hull decided to take the chance that the Guerriere was alone.]—

His dauntless soul impels him to pursue.

Of equal force, the ready foemen meet,

And with the cheer of gladness loudly greet.

Here England’s Dacres,[4. Captain James Richard Dacres, captain of the Guerriere] with a gallant band–

There the firm sons of blest Columbia’s strand.

Now roaring rolls the deathful cannon’s sound,

A novel thunder frights the floods around:

The pious soul attendant angels guard,

Or wait to waft him to his last reward.

Short is the contest, carnage soon is o’er,

For Albion’s banner falls, to rise no more.

Low in the briny deep the Guerriere lies ;

The finny tribes of ocean o’er her rise :

Like some forgotten wave she sinks to rest,[5. The author is taking some poetic license here: Though the Guerriere was indeed too badly damaged to be claimed as a prize, Hull ordered it burned, which did of course result in its sinking.]

In all her futile, fleeting, boastings drest.

Modest, but firm, the victor Hull is seen,

With sympathising kindness in his mien,

Aiding the vanquished: he receives them well,

And bide them with himself, like brethren, dwell.