The Lessons of a Bad Network Graph

Spurred by our DH reading group at Northeastern, as well as my general tendency to jump into things before really knowing what I’m doing, I decided a few weeks ago to download Gephi and see what sort of rudimentary networks I could create.

I’d been cataloging the service record of each of my Preble’s Boys officers, setting up the chart so that I could see concurrent service. I started out just looking to see whether any of the Boys had actually served on the same ship as Edward Preble, but as I created the chart (the link here is to a more fleshed-out chart with more comprehensive data), some other patterns began to emerge.

So I thought, let’s plug this into Gephi and see what happens! I set up my network, fumbling through the Gephi readme to set up a very basic network in which the nodes were the officers and the ships were the edges.

I knew what was coming before I rendered the graph as a network visualization, but I was still a little surprised when I saw it. What I saw was a network that I knew from all my research heretofore to be completely false.

[gview file=”https://abbymullen.org/smallnetwork.pdf”]

(I apologize for the crazy way the graph sort of goes off the page. I tried every setting I could find to get it not to do that. Some mysteries of Gephi remain hidden to me.)

My initial reaction was to scrap the whole thing and start my thinking about networks all over. But on further examination, I realized that this graph still had something to teach me.

First, I learned the importance of good data. This graph shows Stephen Decatur as having only two links to anyone, a fact that is false. Additionally, it looks like Edward Preble is almost a tangential figure, a fact that is false. The person with the most links is David Porter, who is an important figure but not that important. So why the graph that looks like this?

Simply put, this is a bad data set. It starts to get at my question (How do these people link together?) by a very small subset of their interactions with each other. I don’t even have complete service records for some of these men, so it’s possible that there are connections missing from my chart. In addition, these men had several levels of interaction beyond just concurrent service (squadron concurrent service, shoreside interaction, correspondence, indirect influences…the list goes on). So the data set is quite incomplete.

What this bad data set teaches me is that the meaningful network of these men is going to be quite complex. It’s likely to need to be organized on several different interaction levels, as well as interactions over time and even perhaps spatially (do men feel others’ influence more when they’re at sea than when they are landbound? I don’t know).

Second, I saw new connections, forged through unintended groupings. Since this is a bad graph, it’s tempting to say that all the links it made between people are bogus. However, I realized that there is at least one interesting phenomenon going on that I hadn’t thought of before, but that perhaps is borne out by the documentary evidence.

This phenomenon, which may actually be a real breakthrough in my analysis, is the appearance of two groups. If you draw a connection between Stephen Decatur and Edward Preble (in your mind), then you see the loose formation of a group around them. The graph already shows a clique: the group with David Porter and William Bainbridge. What’s the connection between these two groups?

Interestingly, the two groups roughly fall into (1) those who were aboard the USS Philadelphia when it grounded in Tripoli Harbor, and (2) those who volunteered for the mission led by Stephen Decatur to destroy the Philadelphia. There are some outliers, officers who were not involved in that series of events in any way (Lewis Warrington, for instance), and one interesting anomaly, Charles Stewart, who was not aboard the Philadelphia, though he is well-ensconced into that group of officers. It will be interesting to see what happens to those men once there’s more data.

Without having done any other research yet into this grouping, I have an inkling that this way of looking at Preble’s Boys may show more about their careers after 1803 than their link to Edward Preble.

 

So what’s the major lesson for me? When I next take on Gephi, I’ll be armed with a lot more data, but even if the results are surprising, I’ll be keeping my eyes open for possibilities that I didn’t see coming down the pike.

I’d welcome any other insights on my first foray into network analysis.

WriMos

I remember the first time I heard the word(?) NaNoWriMo. First I thought: What in the world does that word(?) mean? It sounds a bit like an alien planet. Once I found out what it was, I thought: You people are insane. Write a novel in a month? That’s crazy.

I still think NaNoWriMo is crazy. But it has spurred several other WriMos that seem a little more useful to my current life: DigiWriMo and AcWriMo. Both of these challenges begin in about a week on November 1. And I’m going to try to do them both. I feel pretty certain that I won’t make it to 50,000 words, but you never know.

The cool thing about AcWriMo and DigiWriMo is that they work in tandem. I intend to do a large portion of my academic writing for the month online, thereby fulfilling the requirements for both challenges. (Is that cheating? If it is, oh well. I’m doing it anyway.)

Both WriMos have challenged participants to set outlandish goals and make them public. So here’s my plan for the month.

The overall goal: Populate Preble’s Boys with bios of each officer and ship.

The specifics:

1. Write one officer bio every day for the first 17 days, taking off Sundays.

2. Write one or two ship bios for the remaining days. (Take Thanksgiving Day and Sundays off.)

3. Blog about the progress and challenges of the site at least twice during the month.

The challenges:

1. Language exam, Nov. 16.

2. A live-in toddler.

3. Need for more research. (My intention is to write the bios using secondary sources for now and when I have the chance to travel to archives, then flesh them out with primary sources if needed.)

4. Thanksgiving!

The preparation:

1. Research: I need to build up my Zotero library about each of these officers so that I don’t have to do a lot of reading when it’s writing time.

2. Organization: I need to set up a good system for keeping myself organized. I’ve been working in a sort of piecemeal fashion up to this point. I need to get it together.

And that’s the plan. We shall see whether my site is text-heavy by the end of the month!

I’m looking forward to seeing what everyone else is doing for DigiWriMo and AcWriMo!

THATCamp New England Roundup

On Saturday, I went to THATCamp (The Humanities and Technology Camp) New England at Brown University in Providence, Rhode Island. I’ve known of THATCamps for several years, but this was my first chance to actually attend one. I went to four sessions: Libraries, Archives, and Museums; Customizing Omeka; Doing Digital History with Non-Digital Sources (link to notes); and Network Analysis.

This post isn’t a comprehensive record of everything that went on, but rather just a few things that I found interesting or valuable about the experience.

1. The value of collaboration. In at least two of the sessions I went to, collaboration was explicitly discussed: between colleagues in the same discipline, colleagues in similar disciplines, colleagues in totally different disciplines (historians and computer scientists!), and even professors and grad students.

The bottom line: the best DH work is collaborative.

The challenge: Collaborating is risky. Working with people who know nothing about your subject matter can make communication difficult (but remember that your collaborator has equal difficulty communicating with you).

Best practices: Communicate, communicate, communicate! And in the final outcome, be sure to give credit where credit is due–the Fair Cite initiative can help humanists correctly and fairly give collaboration credit to all people involved in the project, academic or alt-ac.

 

2. New tools (for me) of digital humanities: I was introduced to several tools and resources that I never knew existed and I can’t wait to explore further. The two big ones are these:

Quantum GIS: This open-source mapping software may be the answer to my mega-problems with Neatline. Trying to use the institutional copy of ArcMap through the remote desktop was a complete disaster, besides my surprise that anything from CHNM/Scholars’ Lab types would require proprietary software. Turns out–it doesn’t! My life is revolutionized!

SNAC (Social Networks and Archival Context): This is a site where Linked Open Data is used to provide access to an aggregate of archives. To be truthful, even after half a session’s worth of discussion about LOD, I still don’t really understand what it is, but the value of an aggregate of archives, including rudimentary network graphs based on the metadata in the archival records, is only going to increase as more archives get linked to this database.

 

What were the major takeaways from the conference for me?

1. I need to go to more THATCamps now that I’ve got a little more lingo in my vocabulary.

2. I personally have opportunities for collaboration. The sessions weren’t the only places I learned about collaboration: interaction with the other campers opened up a staggering array of potential opportunities for me. It was remarkable how many people there were doing something related to naval history or the early republic. And many of them are working with things that I can either help with or be helped by. I’m excited about the new contacts I’ve made. In fact, this week a new friend and I are going to be customizing our Omeka sites based on what we learned at THATCamp. And now I’m thinking I would like a collaborator to help me make some maps for my site as well. (Digital cartographers, I’d like to chat if you’re into drawing oceans and battle diagrams.)

 

If you’re interested in digital humanities, I’d recommend you try to find a THATCamp in your area to attend. Since THATCamps have proliferated like rabbits over the past few years, you should be able to find one (for instance, another THATCamp, Hybrid Pedagogy, occurred simultaneously with THATCamp New England, and before the end of 2012, there are nine more THATCamps across the globe).