Tag Archives: digital humanities

Civil War Navies Bookworm

If you read my last post, you know that this semester I engaged in building a Bookworm using a government document collection. My professor challenged me to try my system for parsing the documents on a different, larger collection of government documents. The collection I chose to work with is the Official Records of the Union and Confederate Navies. My Barbary Bookworm took me all semester to build; this Civil War navies Bookworm took me less than a day. I learned things from making the first one!

This collection is significantly larger than the Barbary Wars collection—26 volumes, as opposed to 6. It encompasses roughly the same time span, but 13 times as many words. Though it is still technically feasible to read through all 26 volumes, this collection is perhaps a better candidate for distant reading than my first corpus.

The document collection is broken into geographical sections, the Atlantic Squadron, the West Gulf Blockading Squadron, and so on. Using the Bookworm allows us to look at the words in these documents sequentially by date instead of having to go back and forth between different volumes to get a sense of what was going on in the whole navy at any given time.

Looking at ship types over the course of the war, across all geographies.
Looking at ship types over the course of the war, across all geographies.

Process and Format

The format of this collection is mostly the same as the Barbary Wars collection. Each document starts with an explanatory header (“Letter to the secretary of the navy,” “Extract from a journal,” etc.). Unlike BW, there are no citations at the end of each document. So instead of using the closing citations as document breakers, I used the headers. Though there are many different kinds of documents, the headers are very formulaic, so the regular expressions to find them were not particularly difficult to write. 1

Further easing the pain of breaking the documents is the quality of the OCR. Where I fought the OCR every step of the way for Barbary Bookworm, the OCR is really quite good for this collection (a mercy, since spot-checking 26 volumes is no trivial task). Thus, I didn’t have to write multiple regular expressions to find each header; only a few small variants seemed to be sufficient.

New Features

The high quality OCR enabled me to write a date parser that I couldn’t make work in my Barbary Bookworm. The dates are written in a more consistent pattern, and the garbage around and in them is minimal, so it was easy enough to write a little function to pull out all parts. In the event that certain parts of the dates were illegible, or non-existent, I did make the function find each part of the date in turn and then compile them into one field, rather than trying to extract the dates wholesale. That way, if all I could extract was the year, the function would still return at least a partial date.

Another new feature of this Bookworm is that the full text of the document appears for each search term when you click on the line at a particular date. This function is slow, so if the interface seems to freeze or you don’t seem to be getting any results, give it a few minutes. It will come up. Most of the documents are short enough that it’s easy to scroll through them.

Testing the Bookworm

Some of the same reservations apply to this Bookworm as I detailed in my last post about Barbary Bookworm—they really apply to all text-analysis tools. Disambiguation of ship names and places continues to be a problem. But many of the other problems with Barbary Bookworm are solved with this Bookworm.

The next step that I need to work on is sectioning out the Confederate navy’s documents from the Union navy’s. Right now, you can get a sense of what was important to both navies, but not so easily get a sense of what was important to just one side or the other.

To be honest, I don’t really know enough about the navies of the Civil War to make any significant arguments based on my scrounging around with this tool. There are some very low-hanging fruit, of course.

Unsurprisingly, the terms "monitor" and "ironclad" become more prominent throughout the war.
Unsurprisingly, the terms “monitor” and “ironclad” become more prominent throughout the war.

The Bookworm is hosted online by Ben Schmidt (thanks, Ben!). The code for creating the files is up on GitHub. Please go play around with it!


Particularly since I don’t do Civil War history, I’d welcome feedback on both the interface and the content here. What worked? What didn’t? What else would you like to see?

Feel free to send me questions/observations/interesting finds/results by commenting on this post (since there’s not a comment function on the Bookworm itself), by emailing me, or for small stuff, pinging me on Twitter (@abbymullen). I really am very interested in everyone’s feedback, so please scrub around and try to break it. I already know of a few things that are not quite working right, but I’m interested to see what you all come up with.


  1. Ben had suggested that I do the even larger Civil War Armies document collection; however, that collection does not even have headers for the documents, much less citations, so the document breaking process would be exponentially more difficult. It’s not impossible, but I may have to rework my system—and I don’t care about the Civil War that much. :) However, other document collections, such as the U.S. Congressional Serial Set, have exactly the same format, so it may be worth figuring out.

Text Analysis on the Documents of the Barbary Wars

This past semester, I took a graduate seminar in Humanities Data Analysis, taught by Professor Ben Schmidt. This post describes my final project. Stay tuned for more fun Bookworm stuff in the next few days (part 2 on Civil War Navies Bookworm is here).


In the 1920s, the United States government decided to create document collections for several of its early naval wars: the Quasi-War with France, the Barbary Wars, and the Civil War (the War of 1812 did not come until much later, for some reason). These document collections, particularly for the Quasi-War and the Barbary Wars, have become the standard resource for any scholar doing work on these wars. My work on the Barbary Wars relies heavily on this document collection. The Barbary Wars collection includes correspondence, journals, official documents such as treaties, crew manifests, other miscellaneous documents, and a few summary documents put together in the 1820s. 1

It’s quite easy to get bogged down in the multiplicity of mundaneness in these documents—every single day’s record of where a ship is and what the weather is like, for instance. It’s also easy to lose sight of the true trajectory of the conflict in the midst of all this seeming banality. Because the documents in the collection are from many authors in conversation with each other, we can sometimes follow the path of these conversations. But there are many concurrent conversations, and often we do not have the full correspondence. How can we make sense of this jumble?

Continue reading Text Analysis on the Documents of the Barbary Wars


  1. U.S. Office of Naval Records and Library, Naval Documents Related to the United States Wars with the Barbary Powers (Washington: U.S. Govt. Print. Off., 1939); digitized at http://www.ibiblio.org/anrs/barbary.html.

Digital History and Naval History: Ships in the Night


Ships that pass in the night and speak each other in passing;
Only a signal shown and a distant voice in the darkness;
So on the ocean of life we pass and speak one another,
Only a look and a voice; then darkness again and a silence.
—Henry Wadworth Longfellow

This month I attended two very different professional conferences. The first, THATCamp CHNM (aka THATCamp Prime), is so unlike normal conferences that it’s billed as an “unconference.”[1. If you want to know exactly what an unconference is, read the THATCamp About page.] It brings together people from a wide swath of academic disciplines to talk about digital humanities. Sessions ranged from talking about programming languages to teaching digital history to talking about size and scale in academic research. Many of the people in attendance were relatively young; many hold “alt-ac” jobs.

The other conference could not have been more different. Even its title, “From Enemies to Allies: An International Conference on the War of 1812 and its Aftermath,” fits it into a tight disciplinary mold. Though it drew scholars from the United States, the UK, and Canada, all the scholars were primarily historians of the 19th century, and a large proportion were military historians. My fellow panelists and I were among the youngest there by a fair margin; very few of the attendees were graduate students or young scholars. A surprising number of panelists were independent scholars. It was very much a traditional conference, with concurrent panels and two (great) keynote addresses.

I’ll write more about each conference later. For now, I want to talk about where I hope the fields of digital history and naval history may go, based on these two conferences. It has long been my impression that digital humanities and naval history (and military history more generally) are a bit like ships passing in the night. Every once in a while, they graze each other, but they quickly separate again and carry on without much change to either field. Conversations with people at both conferences confirmed this suspicion. When I asked some people at the War of 1812 conference if they’d ever thought about using digital mapping tools or creating online exhibits, the response was generally “I don’t really do computers.” But they were drawing digital maps—in PowerPoint. Similarly, I don’t know anyone who self-identifies as a DHer whose primary academic discipline is military history—at least no one I met at THATCamp CHNM. (Big huge disclaimer here: obviously, I don’t know all the DHers in the world. If you work on military history and do DH, we need to talk. Please email me.) But military history comes up—witness one of the models for Omeka’s Neatline exhibits: the battle of Chancellorsville.

So I found it somewhat amusing that in both conferences, the most interesting outcome for me was related to the other discipline. At THATCamp, I won third place in the Maker Challenge (along with my partner in crime Lincoln Mullen) for creating an argument about promotions of naval officers from 1798-1849, which actually came in handy while I was talking to scholars at FETA. And at FETA, the best contact I made was with a scholar who wants me to help him build a database about engagements during the War of 1812 not unlike the Early American Foreign Service Database. He’s one of those who “doesn’t do computers,” but he understands the values of accessibility and openness that THATCampers hold dear.

Going to the two conferences almost back-to-back highlighted for me how much each field might enrich the other. These connections give me hope that someday soon, digital historians can “speak” naval historians with greater success. And then, not all will be darkness and silence between the two.

Who’s with me? 

A Graph of Diplomatic Wrangling in Algiers

When the United States became independent after the American Revolution, it had to struggle to protect its seaborne commerce in the Atlantic and Mediterranean. Americans had to rely on the goodwill of France, Portugal, and other European powers because the United States lacked the naval power necessary to protect its own shipping.

Historical Background 

Americans had to negotiate with the Barbary states to secure the release of hostages, taken by Barbary corsairs, and to decide how much tribute would guarantee the safety of American shipping. The United States quickly felt the bite of diplomatic and military impotence. American diplomats, who had little power of their own, had to rely on the good graces of many others with better connections to the Algerine court. Sometimes, those others helped the American cause; at other times, they weren’t all that helpful; and on a few occasions, they purposely derailed American negotiations.

Richard B. Parker writes about the United States’ relationship with Algiers in Uncle Sam in Barbary: A Diplomatic History, which details the complicated and sometimes absurd relationships of American diplomats, European diplomats and dignitaries, and the court of the Algerine dey. The story is quite complex, which makes it difficult to understand in a narrative, and Parker’s organization doesn’t help matters. (A quick shout-out to Jean Bauer, whose Early American Foreign Services Database was extremely helpful in elucidating the roles of some diplomats whom Parker does not adequately identity.)

The story of American-Barbary diplomacy is all about relationships. Naturally, a story about relationships suggests a network graph as a way to make the situation more intelligible.

Parameters and Characteristics of the Graph

To represent the American-Barbary diplomatic network, I created the graph in Gephi. I hate Gephi. I like Gephi. (You know what I mean.) This graph represents interactions from approximately 1785 to 1800. The last interaction I recorded was between the dey of Algiers and William Bainbridge in September 1800; this interaction was the first one in which the navy was directly involved (though it was a diplomatic interaction, not a military one). I decided to end my graph there because I’m most interested in how the navy changed things for U.S. relations with the Barbary states and with the European nations who had hitherto helped those relations.

The nodes are people who had a connection to Barbary diplomacy. The edges are letters and meetings that Parker writes about. I checked up on as many as I could using American State Papers, and I will continue to document the interactions more explicitly than Parker does in his bibliography (where he only records the collection, not the exact document, his source comes from). 

 Each node is color-coded by nationality; the next step is also to record where these people were actually living while they were engaged in Barbary negotiations. 

Green: Algiers
Red: United States
Purple: England
Light blue: Tripoli
Darker blue: France
Light purple: Spain
Yellow: Portugal
Orange: Sweden

The graph isn’t perfect (obviously). There’s a lot more to be done here. This graph is based solely on Parker’s book, which I’m not wholly convinced is accurate. In addition, Parker addresses only diplomatic relations with Algiers, not the other three Barbary states (Tripoli, Tunis, and Morocco). Furthermore, I haven’t attached dates to each edge, simply because Parker doesn’t provide dates for all of the interactions. A more dynamic timeline of the network changes would be most instructive. So there’s a lot more data that needs to be added to this graph. But I think it’s a good start toward understanding the global nature of American relations with the Barbary states, which culminated in the Barbary Wars of 1801-1805 and 1815.



Boston-Area Days of DH Wrap-up

[cross-posted to HASTAC.org]

Now that it’s been almost a month since the Boston-Area Days of DH, I figured I’d better write a wrap-up of the conference. It was my very great pleasure to help Prof. Ryan Cordell organize the conference, and along the way I learned a lot about DH and about scholarly work in general (and about scheduling and organization and making sure the coffee gets to the right place…).

The Boston-Area Days of DH conference was sponsored by Northeastern University’s NULab for Texts, Maps, and Networks. Originally, it was designed to coincide with the worldwide Day of DH, sponsored by CenterNet. It would do in a conference what Day of DH does online: highlight the work that Boston-area digital humanists are doing and start conversations based on that work. In addition, we tried to include sessions to help digital humanists do their work better.

Day 1 Breakdown

Our first session, the lightning talks, was designed to highlight as many projects as possible in a short amount of time. All the presentations were interesting, but I’d like to especially mention a couple. First, the Lexomics group from Wheaton College presented on their text analysis work on Old English texts. This group was unusual both for the work they did and also for their place in the field: all the presenters were undergraduates at Wheaton. I found it very heartening to see undergraduates doing serious scholarly work using digital humanities. Second, Siobhan Senier’s work on Native American literature was especially inspiring. I love how she is using digital tools to help expose and analyze literature of New England Native Americans. She’s using Omeka as a digital repository for Native American literature, much of which is not literature in words, but rather in art or handicraft (such as baskets). I think this is a perfect use for the Omeka platform.

After the lightning talks, we were able to run a set of workshops twice during the first day of the conference. The topics ranged from network analysis (taught by Jean Bauer), to text analysis (taught by David Smith), to historical GIS (taught by Ryan Cordell). I heard lots of good feedback about how helpful these workshops were, though I wasn’t able to attend any myself.

The keynote address has to rate as one of the most entertainingly educational talks I’ve ever heard. Matt Jockers, from the University of Nebraska, Lincoln, sparred with Julia Flanders from Brown University in a mock debate over the relative merits of big data and small data. They’ve posted their whole talk, along with some post-talk comments on their respective blogs (Matt’s and Julia’s). The talk is certainly well worth the read, so rather than outlining or overviewing it here,  I’ll just entreat you to go to the source itself.

Day 2 Breakdown

On Day 2, we suffered an environmental crisis: a sudden snowstorm in the night on Monday night which made travel a much greater hassle than it already is in Boston. As a result, our numbers were greatly reduced, but we soldiered on, sans coffee and muffins.

Our first session was a series of featured talks about specific projects. Topics ranged from gaming, to GIS, to pedagogy, to large-scale text analysis. Augusta Rohrbach discussed how a game she’s working on, Trifles, incorporates elements of history and literature into a game environment to teach students about both history and literature, while engaging in questions about gender and social issues as well. Michael Hanrahan talked about how GIS can reframe questions about rebellions in England in 1381, and on a wider scale, how GIS can reframe questions of information dissemination. Shane Landrum talked about how he uses digital technology to teach at a large, public, urban university, and the challenges of doing DH in a place where computer access and time to “screw around” are real problems. And Ben Schmidt talked about doing textual analysis on large corpora using Bookworm, a tool created at the Harvard Cultural Observatory.

The final session of the conference was a grants workshop with Brett Bobley, director of the NEH’s Office of Digital Humanities. By staging a mock panel discussion such as might occur in a real review of grant proposals, Brett was able to instruct us about what the NEH-ODH is looking for in grant proposals, and how the grant-awarding process works. I found the issues that Brett raised about grant proposals to be helpful in thinking through all of my work: am I being specific about my objectives? about who this will reach? about how exactly it’s all going to get done? These questions ought to inform our practice not just for grants, but for all the work we do.


All in all, despite some environmental setbacks, I think the conference was a great success. A friend, upon seeing the program, remarked to me, “Wow, a digital humanities conference that’s not a THATCamp!” I’m all for THATCamps, but I do think that pairing this sort of conference with the THATCamp model allows us to talk about our work in different ways, all of which are valuable. So, with some trepidation, I will join those who have already called for this conference to become an annual event. (After all, with a year of experience under our belt, what could go wrong?)

Database of Officers of the Line

Becoming an officer of the line in the navy is a bit like getting on the tenure track in academia. Not all officers are created equal–officers such as pursers, sailing masters, and chaplains were classified as officers and received the preferential treatment given to officers. But they could never be captains–they were not in line for those sorts of promotions.


The Naval Historical Center has made lists available of the officers of the navy and Marine Corps from 1775 to 1900. This list is very useful, but it’s not in a format that makes it easy to see the data in the aggregate. It includes both warrant officers (non-tenure-track) and line officers (tenure-track).

I wanted to look at the promotion trends of line officers from the early republic. There was no way to isolate those records in the form the NHC provides. So I built a Google spreadsheet that tracks each line officer’s initial date of entry and his subsequent promotions.

Following my desire to track how social connections changed as the navy developed, I’ve divided the officers into 4 groups, or generations. I had initially planned to do 3 generations, but after doing all the data input, I realized that 4 was a more logical divide.

First generation officers entered the service before 1801, as a rank higher than midshipman.

Second generation officers entered the service before the Peace Establishment Act (or by the end of 1801), but as midshipman. Thus, they essentially became adults in the service, and they learned their craft from the first generation.

Third generation officers entered the service as midshipmen after the Peace Establishment Act but before the end of the War of 1812. Those officers in this generation who became captain rose to that rank in the 1830s and ’40s.

Fourth generation officers entered the service after the War of 1812 had ended. These officers saw almost no wartime service, and many of the ones who achieved captain found themselves having to decide whether to serve in the Union or the Confederacy during the Civil War.

I marked a few things that were interesting that weren’t specifically promotion-related. Though I didn’t record dates of exit from the service, if the officer was discharged under the Peace Establishment Act, I marked it in column G as “PEA.” I also marked records where the official record indicates that the officer was killed in a duel (an idle curiosity about whether duels were really as prevalent as most historians have claimed).


Promotions in the navy are a bit tricky because the system of ranks changed considerably from 1798 to 1849 (the end point I selected for my data). But there were four standard ranks that prevailed throughout that time period, so for consistency, I tracked only those four ranks: midshipman, lieutenant, master commandant (then commander, an equivalent rank), and captain. It took until the Civil War for ranks above captain (such as commodore and admiral) to be created, so I didn’t record those.

All told, there are 3441 line officers in the NHC database. I’m not interested in all 3441 of them, most of whom never made it past midshipman. Since my project involves social networks of influence, I’m mostly interested in those officers who stayed around long enough to have influence, generally those who made it at least to lieutenant. However, I put all the line officers into my spreadsheet in case someone else wants the data.

There are several specific limitations on my spreadsheet that anyone who wants to use it (all 2 of you in the world) should be aware of.

  1. There are a few rare instances in which an officer entered the service, resigned, and then re-entered later at the same rank or lower. In those instances, I did not mark the second entrance, but rather treated the officer as if he had never left the service.
  2. There are even rarer instances in which, during the late 1790s, officers were given the commission of captain in order to command galleys, but they were never subsequently given other commands. So I left them out of the record entirely.
  3. I noticed a few discrepancies in dates (promotion to lieutenant dated before promotion to midshipman, for instance). Where possible, I merely corrected the obvious typos. Otherwise, I highlighted the cell of the disputed date.


Merely recording all this data given me a better understanding of how the promotion system worked in the early navy. But I’d like to do some visualizations showing the relative speed of promotion, how batch promotions work, and a few smaller things. So far I haven’t found a visualization program that will do it. (Suggestions are welcome!)

I’m sure there are plenty of other uses for this data, as well. For myself, it will help me to see where promotions don’t follow the general pattern–these aberrant promotions may very well be indicative of an intervention by a social connection. But I hope other people will be able to use it as well.


DigiWriMo Wrap-Up

Today’s the last day of November. Advent starts in two days; classes end in four days (for me, anyhow); and today DigiWriMo is ending.

So, what was DigiWriMo like for me?

Maybe I should start with how I did on my goals.

Goal #1: Completed. All officer bios on Preble’s Boys are completed.

Goal #2: Mostly completed. In the last few days, I did slack off. Bad Abby. But I got a fair number up.

Goal #3: Technically completed. I didn’t do the intensive reflection I was intending. But that’s ok. I wrote some other pretty good blog posts.


I purposely didn’t tax myself all that far for this DigiWriMo. This is my first semester in grad school in a long time, and I wasn’t sure how much time my projects would take down the stretch. In retrospect, that was probably a wise decision.

Nevertheless, I did reap some rewards from doing DigiWriMo.

Because of the rapidity with which I wrote the officer bios, I saw some connections and similarities that I feel certain I wouldn’t have seen otherwise. In addition, the group has started to really cohere as a unit for me; before this month I viewed them as individuals who were just arbitrarily grouped together (although I do still think their grouping is somewhat arbitrary).

I also gained an appreciation for other people’s amazing work that they wrote about for DigiWriMo. I feel somewhat more connected to the DH community than I did before DigiWriMo, even if it’s a somewhat one-way connection for now.

Because I spent so much time in the blogosphere and on my Omeka site this month, when a speaker at Northeastern mentioned the possibility of doing a department-based group blog, I jumped at the idea. Thanks in part to DigiWriMo, we successfully launched Global History in the Digital Age in the middle of the month. This blog’s authors are members of Northeastern University’s history department, and we’re writing about things that we’re doing in history right now, digital or not. (Please read and follow!)

Global History in the Digital Age is giving some members of my cohort the chance to get their ideas out on the Web without having to deal with the admin stuff of running a blog. So DigiWriMo did something for not just me but also at least four (right now) people in my department. (Since I started the blog, I think I should get DigiWriMo credit for all of the posts on it right now–just kidding. :) )


I’m sure the makers of DigiWriMo will be making structural changes to the event for next year (though mad props to them for a really fantastic month of writing!). For myself, there are a few things I’d like to do or see done for next year.

1. I’d like to see more historians participating. Perhaps I just wasn’t looking in the right place, but I seemed a bit like a horse among a herd of zebras.

I’ll probably be proselytizing DigiWriMo next year several months in advance of November. I’d love to see some of my cohort at Northeastern participate.  Now that we have our blog up and running, they have the platform to do the writing.

2. I’d like to do more substantive collaboration with other historians. This follows from No. 1, of course. I enjoyed participating in the opening-day collaborative poem and the collaborative novel, but I’m not that much of a creative writer like that. (I still need pen and paper to do most of my creative writing.) But I would love to see some serious historical research collaborations.

More historians would also be nice for the Twitter discussions, since the types of digital writing that might be discussed would be different from the literature-types’ writing.

3. Next year, I’m doing 50,000. You can hold me to that.


Total word count for the month (not counting FB and Twitter words): 13,800 words

Lessons from a Google Fusion Table Graph

Armed with new and improved service record data, last night I set out to create a new network graph in Gephi, to see whether just new data would help to mitigate some of last time’s problems.

To be frank, Gephi beat me. My graph is so small, and my screen is so small, and the zoom function in the graph window is so bad (at least, I couldn’t figure it out) that I couldn’t really see my graph in order to draw any conclusions. All my data imported correctly, though, so I knew there was hope.

I turned instead to Google Fusion Tables, an experimental data visualization app from Google. Unfortunately, it appears that the data tables work completely differently from Gephi’s, so I did have to do some reformatting. (This isn’t a huge problem for a graph with only 34 edges, but it would be a pain for something bigger.)

Google Fusion Tables

For this small network graph, Google Fusion Tables seemed to have worked out very well. The graph itself is clear and easily readable, and it’s a relatively simple proposition to remove nodes and see what happens. Fewer options for manipulating the data mean that the graph renders quickly. It’s also nice to be able to hover over a node and see the attached edges highlighted.

Google Fusion Tables does do a few annoying things, which may be able to be disabled. It would be nice if the nodes were able to be moved to a specific location for ease of reading (as is, you can pull a node to a general area, but it won’t stay exactly where you put it). Also, it would be nice if the graph would hold still! It seems like some element of the graph is always moving all the time.

More options for edges would be helpful too. I’d like to be able to see the edge labels that are in my chart, and I’d also like to be able to click on an edge for more data, just like you can see the highlighted edges when you hover over a node.

Observations about the Graph

I’ve been doing some reading up about social network graphs in this book. But this network I’ve created is not actually a social network: there are not any connections that can accurately be predicted by network theory. Why? Because this set of data is about concurrent service, not something that the men themselves control (for the most part).

So how then is this helpful?

First, even though new connections cannot be posited, the graph does show the high degree of connectedness between some of the officers. Thomas Macdonough, for instance, has multiple connections to many people. The same is true for David Porter. Macdonough and Porter have some shared connections, but they also have some unique connections. Hopefully, seeing the connected officers through the eyes of Porter and Macdonough may yield information about them.

Second, once I do get the social networks mapped out, it will be interesting to compare the two graphs, seeing whether the connections established shipboard continue into correspondence. It seems unlikely that officers would write directly to each other without previous personal connections, and concurrent service seems the most likely place for that connection to have occurred.

The comparison will be a little tricky, because it will involve networks that evolve over time. In order to create these networks, I’ll probably have to move back over to Gephi. Maybe by then I’ll have figured it out a little more.

Mapping the service of just the Preble’s Boys connections to each other shows only an incomplete picture of their records. So the next step will be to add in connections to other officers, especially highly prominent officers such as Thomas Truxtun and John Rodgers. After that, mapping out squadron service will be the next step in establishing these formal connections.


The question I’ve been asked before about this sort of network is: how is this more helpful than a spreadsheet? The value here is how easy it is to remove nodes and see the resulting changes in the network. You can’t really do that with a spreadsheet. As the networks get more complex, and they have to change over time, visualization is going to be much more helpful than a spreadsheet.


As always, I’d welcome any insights on my network thoughts!


DigiWriMo Halftime Report

Today is November 15, the halfway mark in DigiWriMo/AcWriMo. It’s time to check in and see how my DigiWriMo goals are progressing.

Here are the goals:

1. Write one officer bio every day for the first 17 days, taking off Sundays.

I’m happy to report that I’m right on target. Today, I completed the last of my officer bios: Stephen Decatur

2. Write one or two ship bios for the remaining days. (Take Thanksgiving Day and Sundays off.)

Since I just completed the officer bios, tomorrow begins the ship bios. These are going to take more work because for most of them, there is no one authoritative source to consult. 

3. Blog about the progress and challenges of the site at least twice during the month.

Well, here’s blog post number one about the progress. Coming soon (in the next day or two): blog post number one about the challenges and benefits of DigiWriMo.

How’s the word count? Well, I never planned to make it to 50,000, which is good, since I’m at 7109. I do hope I can make it to around 25,000, but to me, getting the content up is more important than getting a certain number of words out. (Also, I’m not counting tweets. Maybe I should count them.)

The Lessons of a Bad Network Graph

Spurred by our DH reading group at Northeastern, as well as my general tendency to jump into things before really knowing what I’m doing, I decided a few weeks ago to download Gephi and see what sort of rudimentary networks I could create.

I’d been cataloging the service record of each of my Preble’s Boys officers, setting up the chart so that I could see concurrent service. I started out just looking to see whether any of the Boys had actually served on the same ship as Edward Preble, but as I created the chart (the link here is to a more fleshed-out chart with more comprehensive data), some other patterns began to emerge.

So I thought, let’s plug this into Gephi and see what happens! I set up my network, fumbling through the Gephi readme to set up a very basic network in which the nodes were the officers and the ships were the edges.

I knew what was coming before I rendered the graph as a network visualization, but I was still a little surprised when I saw it. What I saw was a network that I knew from all my research heretofore to be completely false.

Download (PDF, 20KB)

(I apologize for the crazy way the graph sort of goes off the page. I tried every setting I could find to get it not to do that. Some mysteries of Gephi remain hidden to me.)

My initial reaction was to scrap the whole thing and start my thinking about networks all over. But on further examination, I realized that this graph still had something to teach me.

First, I learned the importance of good data. This graph shows Stephen Decatur as having only two links to anyone, a fact that is false. Additionally, it looks like Edward Preble is almost a tangential figure, a fact that is false. The person with the most links is David Porter, who is an important figure but not that important. So why the graph that looks like this?

Simply put, this is a bad data set. It starts to get at my question (How do these people link together?) by a very small subset of their interactions with each other. I don’t even have complete service records for some of these men, so it’s possible that there are connections missing from my chart. In addition, these men had several levels of interaction beyond just concurrent service (squadron concurrent service, shoreside interaction, correspondence, indirect influences…the list goes on). So the data set is quite incomplete.

What this bad data set teaches me is that the meaningful network of these men is going to be quite complex. It’s likely to need to be organized on several different interaction levels, as well as interactions over time and even perhaps spatially (do men feel others’ influence more when they’re at sea than when they are landbound? I don’t know).

Second, I saw new connections, forged through unintended groupings. Since this is a bad graph, it’s tempting to say that all the links it made between people are bogus. However, I realized that there is at least one interesting phenomenon going on that I hadn’t thought of before, but that perhaps is borne out by the documentary evidence.

This phenomenon, which may actually be a real breakthrough in my analysis, is the appearance of two groups. If you draw a connection between Stephen Decatur and Edward Preble (in your mind), then you see the loose formation of a group around them. The graph already shows a clique: the group with David Porter and William Bainbridge. What’s the connection between these two groups?

Interestingly, the two groups roughly fall into (1) those who were aboard the USS Philadelphia when it grounded in Tripoli Harbor, and (2) those who volunteered for the mission led by Stephen Decatur to destroy the Philadelphia. There are some outliers, officers who were not involved in that series of events in any way (Lewis Warrington, for instance), and one interesting anomaly, Charles Stewart, who was not aboard the Philadelphia, though he is well-ensconced into that group of officers. It will be interesting to see what happens to those men once there’s more data.

Without having done any other research yet into this grouping, I have an inkling that this way of looking at Preble’s Boys may show more about their careers after 1803 than their link to Edward Preble.


So what’s the major lesson for me? When I next take on Gephi, I’ll be armed with a lot more data, but even if the results are surprising, I’ll be keeping my eyes open for possibilities that I didn’t see coming down the pike.

I’d welcome any other insights on my first foray into network analysis.