Opening Day, no matter what the weather, is a signal that winter is truly over, and the joyous days of summer are on their way. For me, baseball is a sport meant to be imbibed in one particular way: radio. Don’t get me wrong—going to a ball game in person is a great experience. Everyone should go see a real MLB game in person at least once in their life. But listening to the Atlanta Braves on the radio is the pinnacle of sports.
I love the Braves because of my mom. I don’t know how my mom became a Braves fan. But from the time I was pretty little, she tuned in to our local (Greenville, SC) station on the Braves radio network for almost every game in the season. When I was very young, I just tuned out the noise. But as I got older, I began to listen and follow the game. By the time I was a teenager, I was a diehard Braves fan. Yes, yes, I joined the club in the good years—Maddux, Glavine, Smoltz, Chipper Jones—but I wasn’t a fairweather fan. (Witness: I still love my Braves, even though the glory days have been a bit elusive for the past decade.)
There’s nothing quite like straining to keep your pinky on the stereo while stretching as tall as you can, making a human antenna to get the broadcast on a bad radio day. Or trying desperately to find another station to listen to the game when the normal station inexplicably broadcasts the Clemson game instead. As I began to love the Braves more and more, these sort of frustrations became part of the joy of being a part of the Braves radio network rather than a deterrent.
When I was in junior high, I used to go with my mom to church choir practice, which was at 4:15 on Sundays. Most days I’d just sit in the sanctuary while they rehearsed. But some Sundays, the Braves game would be in the later innings when we arrived at church (if it had started at 1:05 or so). On those days, I’d sit in the sweltering car and listen to the game, hoping it would wrap up before the heat in the car became too much to bear. On very rare occasions, even Mom would stay in the car to listen, though under almost any other circumstance it would be totally unacceptable to be late to choir practice. Enduring the heat made me feel like a true fan.
In my childhood, it was a rare treat to watch the Braves on TV when they were on the FOX Saturday game of the week. But I was usually glad to return to the radio broadcast for the next game: the FOX commentators just didn’t stack up to Joe, Don, Pete, and Skip. My experience with the players was always mediated through the Braves radio commentators—the players themselves were virtual strangers. I barely even recognized their faces or their swings. But I loved them no less for their distance.
Radio commentators are a little like historians. They introduce their listeners to people they don’t know (and will never meet) by giving rich description of the events, context for the players’ actions and attitudes, and background information about every aspect of their professional lives, plus a frequent dose of humor. Commentators have unique voices that become familiar and beloved, but the best ones focus their broadcast on the players and the sport, not themselves. So here’s to you, Skip, Don, Pete, Joe, and now Jim (and others), and go Braves!
I don’t remember ever helping my mom make cheeseburger pie. They say that letting kids help in the kitchen encourages those kids to eat the food they helped to make, but I’m pretty sure my opinion of cheeseburger pie wouldn’t have changed whether I had helped or not. Cheeseburger pie was simply awful.
I recently found the cheeseburger pie recipe while I was going through a family recipe book I had received as a wedding present. Seeing it written out brought back a cavalcade of memories. I could just taste the slightly gritty ground beef, the backbone of the dish. The beef bathed in a slime of ketchup (half a cup!) and evaporated milk. The recipe called for half a cup of diced onions, but we never—I mean not once that I remember—had fresh onions in the house. Instead, Mom used minced dried onions, which were never quite soft enough to just disappear nor quite crunchy enough to add interesting texture. This combination of meat, ketchup, milk, and onions always proved too much for the storebought frozen pie crust, pasty and sodden even after baking. Topped with shredded cheese mixed with just the wrong amount of Worcestershire sauce, cheeseburger pie was the food of my nightmares.
Our dining room wasn’t really big enough to fit our table and, by extension, the four of us kids and my parents. As a result, many nights my parents ate dinner in the living room and watched the news, while we kids were left to our own devices in the dining room. But cheeseburger pie was a dining room meal. We had to pull the table out from the wall, leaving little to no room to get into the kitchen or living room. Moreover, in a household where paper plates were all but ubiquitous, for cheeseburger pie we got out the Pfalzgraff.
My eight-year-old self, picking at the edge of the crust (the only part I deemed edible), didn’t appreciate this opportunity for gathering. Not everyone in my family felt the same way—in fact, cheeseburger pie was a beloved family treat. Whereas I stewed in a puddle of aggrieved tears in the corner of the dining room every time it was served, at least one of my brothers once requested cheeseburger pie for his birthday dinner. (When I asked my older brother just a few weeks ago for his opinion of the dish, he replied, “The nectar of the gods.”) I was the only dissenting voice.
In a weird way, cheeseburger pie means a lot to me. Now, I appreciate the lesson that even in a tightly knit family, tastes differ, and you don’t always get your way. It’s good to try new things that you might not like. I try to teach my kids these lessons every day. But I’ll never feed them cheeseburger pie to illustrate the point.
When Thomas Jefferson sent a small naval squadron to the Mediterranean in 1801, he intended to intimidate the Barbary regencies into backing down from their claims of tribute in exchange for commercial freedom in the Mediterranean. Negotiations with the Barbary states hadn’t worked over the previous 15 years of American attempts, and the newly built navy was meant to show the world that America would take its place in the world economy by force.
Algiers was responsible for the capture of American ships that had stultified American commerce in the Mediterranean, and its fleet of corsairs was seen as the biggest threat. The Americans had negotiated many times with the dey, but he often changed the terms of the negotiations on a whim. In 1789, Richard O’Brien, then a captive in Algiers who would become the American consul-general there, did not think it was worth trying to make a formal treaty with Algiers because the dey took so many liberties with the treaties he already had (though the United States did end up making a treaty).
The squadron Jefferson sent to the Mediterranean in 1801 served two purposes beside defeating the Barbary states. First, it represented the American desire to one-up their European counterparts, who paid tribute to the Barbary rulers in exchange for free passage through the Mediterranean. But second, and more important, the Americans wanted desperately to be received as full members of the European community. The British and French were far worse for American trade than the Barbary states. But the Barbary rulers were viewed as bloodthirsty, capricious, and “not within the pale of civilization.” Fighting against them seemed like a literal and symbolic war that the Americans could win in order to show the Europeans that they could compete on a world scale.
The war didn’t go well. When the American navy arrived in 1801, they learned that Tripoli was the regency who had declared war, while Algiers, Tunis, and Morocco remained threats. Fighting against Tripoli turned out to be a difficult task. The American squadron, supposed to be projecting strength, barely spent any time blockading Tripoli, and Tripolitan corsairs easily evaded the ships. Even when successive squadrons brought bigger forces, American forces were in unfamiliar territory, and it showed. Intelligence from the State Department was slow to make the rounds, and circumstances changed so quickly that the Secretary of State was essentially excluded; instead, the few consuls in the Mediterranean were left to keep the peace on their own. Keeping consuls in neighboring Tunis and Algiers proved difficult as well, so the Americans had to rely on uncertain relationships with their European counterparts in order to gauge the mood in the Barbary courts.
By 1805, apart from a few spectacular events (some favoring the United States, some not), virtually nothing had been accomplished. When Tobias Lear finally negotiated a treaty with Tripoli, the high-flown rhetoric of victory without tribute early in the war gave way to the reality—the United States would pay Tripoli for the release of prisoners and peace between the two nations. It wasn’t until 1815, when circumstances were radically different in both the United States and the Mediterranean, that peace could be made without payment.
Obviously, 21st century America is not 19th century America. But there are some parallels between our current president’s approach to foreign policy with North Korea and the way the United States approached the Barbary states. Perhaps there are lessons here, perhaps not.
President Trump’s disdain for North Korean leader Kim Jong Un is obvious. His epithet “Little Rocket Man” strikes a similar chord as consul William Eaton’s dismissal of the bashaw as “the madman of Tripoli” (Eaton to Richard O’Brien, 15 September 1801. James L. Cathcart Papers, Library of Congress). And Kim does share the bloodthirstiness and capriciousness of the Barbary rulers. The Americans’ prejudices against North Africa caused them to discount the Barbary rulers’ capacity. Hopefully our current president doesn’t follow their example.
The process of peace and disarmament with North Korea has been a hard problem for American presidents to untangle. Trump’s interest in North Korea seems to center on saving face for the United States (or, let’s face it, for himself). His behavior toward our allies in the world has caused significant damage to diplomatic relations elsewhere. In these negotiations, Trump can one-up the other powerful countries have not succeeded in convincing North Korea to disarm. So going after a common enemy might help to distract from diplomatic strife elsewhere. But just as American naval officers discovered after they constantly provoked and derided the European connections that made it possible for the Americans to stay in the Mediterranean, Trump may find that antagonizing his allies doesn’t bring the success he hopes for.
Trump’s strategy of direct confrontation is a bit like sending a squadron into a territory where the intel is thin and the friends are few and far between. American strategy in the Mediterranean suffered because their diplomatic standing in the countries surrounding Tripoli was tenuous or non-existent. Likewise, not filling key State Department posts (such as the ambassador to South Korea) may place Trump in a situation where his intelligence on North Korea would be poor even if he wants to hear it. Keeping the peace with neighboring countries who are not friends while courting war with North Korea is an equally sticky task. Hopefully Trump has put more thought into relations with China, for example, than just tweeting.
Nevertheless, the meeting with Kim Jong Un could be a positive development. Trump does have the element of surprise on his side (but his tendency to go off-script could go just as badly for the United States as for anyone else). However, Trump could learn caution from the Americans’ diplomatic meetings with the Barbary states, in which the Americans always went in assuming the rulers meant to meet in good faith, and on multiple occasions narrowly escaped with their lives and freedom.
There are a few key differences between 1801 and 2018. In the First Barbary War, the war was fought with short-range weapons and a small fighting force on both sides. The stakes are slightly higher when the potential weapons are nuclear warheads. In 1801, the United States really was a minor power. Now collateral damage from a conflict between the United States and North Korea could span the globe. Free trade and nuclear disarmament are both laudable goals—even necessary ones. I hope that President Trump goes into this meeting with North Korea, if it happens, circumspectly and with as much historical and contemporary intelligence as he can muster. He’s going to need it.
Faithful readers of this blog (all one of you) will notice that I haven’t posted in almost a year. It’s not that I’ve had nothing interesting to say, but rather that I’ve been too busy with those interesting things to write about them for the blog. Here’s a brief rundown.
In the summer of 2014, my family moved to Fairfax, VA, when my husband was hired by George Mason University. For the 2014-2015 school year, I commuted to Boston from Virginia almost every week so I could finish my coursework at Northeastern University. In August 2015, I passed my comprehensive exams and defended my dissertation proposal, officially becoming a PhD candidate. For the past year, I’ve been researching and writing my dissertation, as well as continuing to work on the Viral Texts project.
The Viral Texts project has been part of my graduate-school experience almost since the beginning. I joined the project as part of the inaugural group of NULab fellows in the spring of 2013. I remember sitting around a table with the other fellows, hearing about all the different projects we might be assigned to, and thinking, “I really hope the spots for that newspaper project don’t fill up before I get to choose.” Thankfully, they didn’t. The NULab fellows’ role has changed since then, but I’ve always been able to stay attached to the project, and I’m so grateful.
Viral Texts is one of the defining pieces of my graduate school experience. It shaped my understanding of digital humanities, and it stretched me to work in multiple disciplines. It taught me how to work with a team while keeping my individuality. And I learned an awful lot about how nineteenth-century newspapers work.
And now, in true Viral Texts fashion, it’s time for me to pass on the scissors and the quill. Starting in May, I’ll be joining the research division at the Roy Rosenzweig Center for History and New Media at George Mason University. I’ll be working with PressForward, Zotero, and mostly Tropy, CHNM’s new Mellon-funded project for archiving and organizing photos. I’m particularly excited about working with Tropy, though I’m a little bummed that my dissertation will (I hope) be close to complete before Tropy is ready for the big time. 🙂
The projects and tools at CHNM were my first encounter with digital humanities, even before I wanted to embrace the digital in my own work. Throughout my graduate career, I’ve benefited greatly from Zotero and Omeka and other amazing work at the center, and I’m looking forward to helping develop other great tools for myself and others to use.
In joining CHNM and departing Viral Texts, I take these words from the valedictory editorial of Thomas Ritchie, editor of the Richmond Enquirer: “I cannot close this hasty valedictory, without again expressing the sentiments of gratitude and affection with which I am so profoundly penetrated.” So to everyone on the team—Ryan, David, and Fitz in particular—thanks. It’s been great.
If you read my last post, you know that this semester I engaged in building a Bookworm using a government document collection. My professor challenged me to try my system for parsing the documents on a different, larger collection of government documents. The collection I chose to work with is the Official Records of the Union and Confederate Navies. My Barbary Bookworm took me all semester to build; this Civil War navies Bookworm took me less than a day. I learned things from making the first one!
This collection is significantly larger than the Barbary Wars collection—26 volumes, as opposed to 6. It encompasses roughly the same time span, but 13 times as many words. Though it is still technically feasible to read through all 26 volumes, this collection is perhaps a better candidate for distant reading than my first corpus.
The document collection is broken into geographical sections, the Atlantic Squadron, the West Gulf Blockading Squadron, and so on. Using the Bookworm allows us to look at the words in these documents sequentially by date instead of having to go back and forth between different volumes to get a sense of what was going on in the whole navy at any given time.
Process and Format
The format of this collection is mostly the same as the Barbary Wars collection. Each document starts with an explanatory header (“Letter to the secretary of the navy,” “Extract from a journal,” etc.). Unlike BW, there are no citations at the end of each document. So instead of using the closing citations as document breakers, I used the headers. Though there are many different kinds of documents, the headers are very formulaic, so the regular expressions to find them were not particularly difficult to write. 1
Further easing the pain of breaking the documents is the quality of the OCR. Where I fought the OCR every step of the way for Barbary Bookworm, the OCR is really quite good for this collection (a mercy, since spot-checking 26 volumes is no trivial task). Thus, I didn’t have to write multiple regular expressions to find each header; only a few small variants seemed to be sufficient.
The high quality OCR enabled me to write a date parser that I couldn’t make work in my Barbary Bookworm. The dates are written in a more consistent pattern, and the garbage around and in them is minimal, so it was easy enough to write a little function to pull out all parts. In the event that certain parts of the dates were illegible, or non-existent, I did make the function find each part of the date in turn and then compile them into one field, rather than trying to extract the dates wholesale. That way, if all I could extract was the year, the function would still return at least a partial date.
Another new feature of this Bookworm is that the full text of the document appears for each search term when you click on the line at a particular date. This function is slow, so if the interface seems to freeze or you don’t seem to be getting any results, give it a few minutes. It will come up. Most of the documents are short enough that it’s easy to scroll through them.
Testing the Bookworm
Some of the same reservations apply to this Bookworm as I detailed in my last post about Barbary Bookworm—they really apply to all text-analysis tools. Disambiguation of ship names and places continues to be a problem. But many of the other problems with Barbary Bookworm are solved with this Bookworm.
The next step that I need to work on is sectioning out the Confederate navy’s documents from the Union navy’s. Right now, you can get a sense of what was important to both navies, but not so easily get a sense of what was important to just one side or the other.
To be honest, I don’t really know enough about the navies of the Civil War to make any significant arguments based on my scrounging around with this tool. There are some very low-hanging fruit, of course.
The Bookworm is hosted online by Ben Schmidt (thanks, Ben!). The code for creating the files is up on GitHub. Please go play around with it!
Particularly since I don’t do Civil War history, I’d welcome feedback on both the interface and the content here. What worked? What didn’t? What else would you like to see?
Feel free to send me questions/observations/interesting finds/results by commenting on this post (since there’s not a comment function on the Bookworm itself), by emailing me, or for small stuff, pinging me on Twitter (@abbymullen). I really am very interested in everyone’s feedback, so please scrub around and try to break it. I already know of a few things that are not quite working right, but I’m interested to see what you all come up with.
Ben had suggested that I do the even larger Civil War Armies document collection; however, that collection does not even have headers for the documents, much less citations, so the document breaking process would be exponentially more difficult. It’s not impossible, but I may have to rework my system—and I don’t care about the Civil War that much. 🙂 However, other document collections, such as the U.S. Congressional Serial Set, have exactly the same format, so it may be worth figuring out. ↩
This past semester, I took a graduate seminar in Humanities Data Analysis, taught by Professor Ben Schmidt. This post describes my final project. Stay tuned for more fun Bookworm stuff in the next few days (part 2 on Civil War Navies Bookworm is here).
In the 1920s, the United States government decided to create document collections for several of its early naval wars: the Quasi-War with France, the Barbary Wars, and the Civil War (the War of 1812 did not come until much later, for some reason). These document collections, particularly for the Quasi-War and the Barbary Wars, have become the standard resource for any scholar doing work on these wars. My work on the Barbary Wars relies heavily on this document collection. The Barbary Wars collection includes correspondence, journals, official documents such as treaties, crew manifests, other miscellaneous documents, and a few summary documents put together in the 1820s. 1
It’s quite easy to get bogged down in the multiplicity of mundaneness in these documents—every single day’s record of where a ship is and what the weather is like, for instance. It’s also easy to lose sight of the true trajectory of the conflict in the midst of all this seeming banality. Because the documents in the collection are from many authors in conversation with each other, we can sometimes follow the path of these conversations. But there are many concurrent conversations, and often we do not have the full correspondence. How can we make sense of this jumble?
U.S. Office of Naval Records and Library, Naval Documents Related to the United States Wars with the Barbary Powers (Washington: U.S. Govt. Print. Off., 1939); digitized at http://www.ibiblio.org/anrs/barbary.html. ↩
This past week in my Humanities Data Analysis class, we looked at mapping as data. We explored ggplot2’s map functions, as well as doing some work with ggmap’s geocoding and other things. One thing that we just barely explored was automatically extracting place names through named entity recognition. It is possible to do named entity recognition in R, though people say it’s probably not the best way. But in order to stay in R, I used a handy tutorial by the esteemed Lincoln Mullen, found here.
I was interested in extracting place names from the data I’ve been cleaning up for use in a Bookworm, the text of the 6-volume document collection, Naval Documents Related to the United States Wars with the Barbary Powers, published in the 1920s by the U.S. government. It’s a great primary source collection, and a good jumping-off point for any research into the Barbary Wars. The entire collection has been digitized by the American Naval Records Society, with OCR, but the OCRed text is not clean. The poor quality of the OCR has been problematic for almost all data analysis, and this extraction was no exception.
The tutorial on NER is quite easy to follow, so that wasn’t a problem at all. The problem I ran into very quickly was the memory limits on my machine–this process takes a TON of memory, apparently. I originally tried to use my semi-cleaned-up file that contained the text of all 6 volumes, but that was way too big. Even one volume proved much too big. I decided to break up the text into years, instead of just chunking the volumes by size, in order to facilitate a more useful comparison set. For the first 15 years (1785-1800), the file was small enough, and I even combined the earlier years into one file. But starting in 1802, the file was still too large even with only one year. So I chunked each year into 500kb files, and then ran the program exactly the way the tutorial suggested with multiple files. I then just pushed the results of each chunk back into one results file per year.
Once I got my results, I had to clean them up. I haven’t tested NER on any other type of document, but based on my results, I suspect that the particular genre of texts I am working with causes NER some significant problems. I started by just doing a bit of work with the list in OpenRefine in order to standardize the terrible spelling of 19th-century naval captains, plus OCR problems. That done, I took a hard look at what exactly was in my list.
Here’s what I found:
1. The navy didn’t do NER any favors by naming many of their ships after American places. It’s almost certain that Essex and Chesapeake, for instance, refer to the USS Essex and USS Chesapeake. Less certain are places like Philadelphia, Boston, United States, and even Tripoli, which are all places that definitely appear in the text, but are also ship names. There’s absolutely no way to disambiguate these terms.
2. The term “Cape” proved to be particular problems. The difficulty here is that the abbreviation for “Captain” is often “Cap” or “Capt,” and often the OCR renders it “Cape” or “Ca.” Thus, people like Capt. Daniel McNeill turn up in a place-name list. Naval terms like “Anchorage” also cause some problems. I guarantee: Alaska does not enter the story at all.
3. The format of many of these documents is “To” someone “from” someone. I can’t be certain, but it seems like the NER process sometimes (though not always) saw those to and from statements as being locational, instead of relational. I also think that journal or logbook entries, with their formulaic descriptions of weather and location, sometimes get the NER process confused about which is the weather and which is the location.
4. To be honest, there are a large number of false hits that I really can’t explain. It seems like lists are particularly prone to being selected from, so I get one member of a crew list, or words like “salt beef,” “cheese,” or “coffee,” from provision lists. But there are other results as well that I just can’t really make out why they were selected as locations.
Because of all these foibles, each list requires hand-curation to throw out the false hits. Once I did that, I ran it through R again to geocode the locations using ggmap. Here we also had some problems (which I admittedly should have anticipated based on previous work doing geolocation of these texts). Of course, many of the places had to be thrown out because they were just too vague to be of any use: “harbor,” “island,” and other such terms didn’t make the cut.
When I ran the geocoder for the first time, it threw a bunch of errors because of unrecognizable place names. Then I remembered: this is why I’ve used historical maps of the area in the past–to try to track down these place names that are not used today. Examples include “Cape Spartel,” “Cape DeGatt,” and “Cape Ferina.” (I’m not sure why they were all capes.) I discovered that if you run the “more” option on the geocode, the warnings don’t result in a failed geocode, plus all the information is useful to get a better sense of the granularity of the geocode, and what exact identifier the geocoder was using to determine the locations.
This extra information proved helpful when the geocoded map revealed oddities such as the Mediterranean Sea showing up in the Philippines, or Tunis Bay showing up in Canada. Turns out, the geocoder doesn’t necessarily pick the most logical choice for ambiguous terms: there is, in fact, an Australasian sea sometimes known as the Mediterranean Sea. These seemingly arbitrary choices by the geocoder mean that the map looks more than a little strange.
So what’s the result here? I can see the potential for named-entity extraction, but for my particular project, it just doesn’t seem logical or useful. There’s not really anything more I can do with this data, except try to clean up my original documents even more. But even so, it was a useful exercise, and it was good practice in working with maps and data in R.
Last week, an opinion piece appeared in the New York Times, arguing that the advent of algorithmically derived human-readable content may be destroying our humanity, as the lines between technology and humanity blur. A particular target in this article is the advent of “robo-journalism,” or the use of algorithms to write copy for the news. 1 The author cites a study that alleges that “90 percent of news could be algorithmically generated by the mid-2020s, much of it without human intervention.” The obvious rebuttal to this statement is that algorithms are written by real human beings, which means that there are human interventions in every piece of algorithmically derived text. But statements like these also imply an individualism that simply does not match the historical tradition of how newspapers are created. 2
In the nineteenth century, algorithms didn’t write texts, but neither did each newspaper’s staff write its own copy with personal attention to each article. Instead, newspapers borrowed texts from each other—no one would ever have expected individualized copy for news stories. 3 Newspapers were amalgams of texts from a variety of sources, cobbled together by editors who did more with scissors than with a pen (and they often described themselves this way). Continue reading On Newspapers and Being Human→
The article also decries other types of algorithmically derived texts, but the case for computer-generated creative fiction or poetry is fairly well argued by people such as Mark Sample, and is not an argument that I have anything new to add to. ↩
This post is based on my research for the Viral Texts project at Northeastern University. ↩
In 1844, the New York Daily Tribunepublished a humorous story illustrating exactly the opposite, in fact—some readers preferred a less human touch. ↩