Farewell to Consolation Prize

This afternoon, we held a party at RRCHNM to celebrate the past of Consolation Prize and the future of R2 Studios. I wrote out a little speech which sums up a lot of how I feel about the show and what it has meant, so I thought I’d post it here.


Thank you all for coming to our R2 Studios open house and celebration of our OG show, Consolation Prize. As many of you know, Consolation Prize was launched in September 2020, during the first full semester of the pandemic. The Center had made a podcast many years ago called Digital Campus, doing podcasting before podcasting was cool. I remember listening to Digital Campus on a few road trips. But it had been a while since the center had been in the business of podcasting. An external partner reached out about the possibility of doing something with us, and we had the idea to make a podcast. That project eventually didn’t pan out because we were concerned that the Center’s lack of experience with podcasting would make our grant application non-competitive. How do you get the podcasting experience necessary to be competitive? Well, you make a podcast! So Mills very graciously gave me the green light—and the time of a number of people at the Center—to pilot a brand-new podcast, different from anything we’d done before. That podcast was Consolation Prize.

Consolation Prize had an ambitious goal—to be a highly produced, deeply researched narrative podcast…a thing that no one at the center had the least bit of experience with. It also had a very niche premise: that consuls, low-level diplomatic officials, were important enough to the history of the United States that they were worth making a whole show about.

Even though I read everything I could get my hands on about making a podcast, we still went into this with a lot of room to learn. Very, very few other shows exist that do what we wanted to do, and sound like we wanted to sound. So we were kind of flying blind on some things. We got some really helpful advice from a consultant after the first few episodes, which led to our now-maybe-familiar description: “Consolation Prize is a podcast about the United States in the world through the eyes of its consuls.”

Over the past two seasons, a lot of things have changed and improved. For one thing, we got a real studio space, which I’d invite you to check out after the speechifying. We also got a lot more comfortable with sound and story. We learned how to find interesting stories, how to ask better questions, how to articulate our ideas in a more focused way. Over two seasons, Consolation Prize went to six continents and three centuries. We did legal history, religious history, art history, economic history, military history, gender history, and of course, diplomatic history. I also counted up the number of different voices you hear over our 33 episodes, and it’s over 100, between me, our team, our expert guests, and our voice actors. Getting to talk to so many amazing historians was definitely a highlight for me. And yeah, I definitely called in a LOT of favors to get voice actors on a shoestring budget. So shout-out to many of you in this room whose voices have been heard on Consolation Prize, whether you agreed to do it eagerly or with a significant amount of arm twisting.

I’m really proud of our core team, made up of me, Deepthi Murali, a postdoc, Megan Brett, a graduate student turned graduated student, Kris Stinson, a PhD student, Jeanette Patrick, a classified staff member, Andrew Cote, a former adjunct, and our two great interns, Brenna Reilley, an undergraduate intern on Season 1, and Frankie Bjork, a graduate intern on Season 2. Consolation Prize has truly been a team effort from Day 1, and I’m so proud of that.

But all good things must come to an end. This is my last month at GMU, so continuing the show would be complicated. But I had already decided that the show needed to come to an end. It’s time for the studio and the Center to move on to bigger and better projects, which will hopefully build on the lessons we’ve learned from Consolation Prize. It has truly been my privilege to be the showrunner for Consolation Prize, and the head of studio for R2 Studios, and I’m delighted to be leaving the studio in such capable hands.

It’s Going OK

Over the past few days, I’ve been wrestling with a bizarre type of guilt: guilt that things are working. Like everyone, I moved my classes online after spring break at George Mason University. I teach two classes in the history department, one a 46-person class called “The Digital Past,” and one a 13-person topics in digital history course, where we’re learning about podcasting for history. I’m going to focus on the good in this post: “The Digital Past,” otherwise known as HIST390. I may write about the extreme challenges of the other class at a later time—it hasn’t ALL been working.

[I’m deliberately leaving out all discussions of how the rest of my life has changed because of this, though perhaps that’s not fair. It’s not like I can compartmentalize the different parts; they all bleed together. And I haven’t been dealing with any family members being sick or being homeless, or any other points of anxiety that have been overwhelming many of my students. I am immensely privileged in a lot of ways, and I acknowledge that.]

I don’t want to minimize the amount of disruption this move has caused. In the first frantic days, I couldn’t sleep at night for worrying about how I was going to make it all work. How was I going to teach a class about technology when my students had inconsistent access to technology? How was I going to do workshops about writing and editing with my podcasters when I couldn’t see them? How was I going to do activities with primary sources (particularly in a class where most students have little experience with reading historical documents and need a lot of prompting)?

After those first few days, things have settled into something of a rhythm. I wouldn’t say the transition has been “easy,” exactly (or inexactly). But there are parts of the new system that I’ve really liked. I want to highlight those.

Return to the essentials

The first time I taught HIST390, I tried to do WAY TOO MUCH. The second time I taught HIST390, I tried to do Way Too Much. This time, I tried to do way too much. Moving online has prompted me to reorient my entire course with an emphasis on the essentials. I’ve posted before about learning outcomes and how they guide my course preparation. However, that guidance doesn’t preclude my trying to do too much, as my course evaluations from my first go at a graduate class last semester made painfully clear. The kind of stripping down I had to do to make HIST390 work feels kind of like that exercise: write what you want in a page. Then write it in a paragraph. Then write it in a sentence. Then write it in a phrase. I started out three semesters ago teaching this class in a page. This semester, I started out with a sentence and I’m ending with a phrase.

In my reframing of the course, I have had to sharpen my focus on what I think really matters. What can I adjust or remove and still meet the learning goals? Much of my “assessment” work has gotten the axe. No more reading blog posts; actually, no more readings. Asking students to demonstrate minimal understanding of a digital tool instead of using its more advanced features. Changing two weeks on databases into one week on audio.

Instead of “assessment,” I’m focusing on understanding, giving them more time to work on projects with more support. I’ve tried to reframe the projects not as checking up on skills, but as an opportunity for creativity and showing off knowledge. I gave them more options for which projects they have to do, since if they struggle with them they don’t have easy access to face-to-face help from me. And it’s going fine. Students are still learning. In fact, the projects are going better than any other semester because I’ve relaxed my time expectations.

To be honest, I don’t miss most of the things I’ve removed. Sure, there are a few things that I’m sad we can’t do, and when I next teach this class face-to-face, I’ll add them back in. But I’ve realized that I’ve always been too aggressive in my course plan. So when we go back to face-to-face, this focus on the essentials is going to make a difference in how I teach this class.

Increased class participation

This one was a huge surprise to me. One of the things I read about asynchronous teaching (which is how I’m doing it) is that it exposes the learning process in a way that synchronous teaching doesn’t. That has certainly held true for me. At the beginning of the semester, I had set up a Slack group for my class. I do require them to join the group, but in the Before Time it mostly got used for tech support. In the After Time, I have used it for class discussion. And it has been GREAT.

In the Before Time, I often broke my students into groups for group discussion. But really hearing the discussions of 46 students, or giving each student a chance to contribute in those discussions, felt pretty impossible. The setup of our classroom was not well-suited to discussion, and it always felt hackneyed.

In the After Time, I broke up my class into groups of 3 or 4. I gave them each their own Slack channel. Now for every class period, they listen to me talk on a podcast episode, or listen to or watch something else. Then they answer discussion questions or do other activities within their Slack group channel. Sometimes I link to a primary source and have them discuss it. Sometimes I have them reflect on how the course materials fit into their lives. I’ve had them make things and photograph them and post to their channel.

I like this way so much better than the in-person discussions. I can “hear” all the discussions; the groups are small enough and asynchronous enough that everyone can have their say. I get to see them thinking through some of the questions in a way that I’d never get in a face-to-face discussion. I always despaired of good discussions in my classroom; I feel like this is finally fulfilling my hopes.

Not every person participates, of course. It would be irrational to expect that. But I’d say I know more about more students’ individual circumstances and personalities now than I did before the break. They’re not particularly shy about their lives, and I really like getting to see the course material come alive as they make connections to themselves. I’m already thinking about how I can use these tools to facilitate asynchronous discussion even when we go back to face-to-face.

Experimentation with form

One of the unexpected benefits of this move to online has been the chance to experiment with new and different ways of presenting materials. I’ve always been a huge proponent of using a wide variety of methods and techniques to communicate history–from one perspective, that’s the whole point of this class. But I’ve rarely had the opportunity to practice what I preach and innovate in my delivery of material.

But now innovation is upon us, whether we wish it or not. I knew from the beginning that I didn’t want to do synchronous class meetings; they make me extremely anxious, and I don’t find them a good teaching or learning environment (no shade on those who are doing synchronous; it’s just not for me). So that meant I got to get creative with content delivery. I’ve chosen to work mostly in audio—no surprise to those of you who know me; I’ve recently become rather enamored of audio.

I started out with just voice recordings. Over time I’ve developed my “lectures” into real podcast-sounding deliveries, with theme music, an intro, a conclusion, and other things that help good audio stand out. I’m not arguing that I’m making something good, necessarily, only that I’m concertedly trying to do so.

It’s hard to communicate with just audio when you’re used to being able to use gestures, facial expressions, and other visual cues to get your point across. I do miss being able to scribble on a white board. But I’ve found the challenge of experimenting with audio immensely rewarding. I’ve taught myself a lot about the form and the mechanics, and my students seem to appreciate the work I put in.

It is a LOT of work. I spend several hours on each 20-minute episode, editing it, selecting and placing the music, re-recording when needed. But there are a lot of moments when I find the work therapeutic: it feels good to make something every week that I’m proud of. My students have also found it therapeutic to make things, it seems. Every time I’ve asked them to do something hands-on, like draw a map of their house, they’ve thrown themselves into it with a right good will.

There’s a pretty decent chance I’ll never use these podcast episodes again. But having to write a script, edit it for clarity, and then listen to myself talk it out has been valuable to me—once again, I’m back to the essentials. What do I need to include in order to get across the point I really care about? Condensing the speaking part of my class from 50-60 minutes into 20 has made me really consider what I care about.

Am I a Jerk for Liking My Class Right Now?

Sometimes I feel like a jerk for liking my class right now. I’ve heard from many colleagues about how they’re struggling to adjust to this new reality. The rest of the semester is just about survival for them, and they believe that the students are getting an inferior product now. And maybe that’s true. And I’m sympathetic—in my other class, I feel like we’re hanging on by a thread and a giant ogre is standing over us with a huge pair of scissors.

I am likewise fully aware that my students are not feeling good about their lives right now. Many of my students are in extremely difficult situations, where classwork is the least of their concerns (rightly). All of them are living in the perpetual fog of covid-19, and I’m there too. Life is not comfortable for any of them, and it’s downright bad for some.

I also know that if we stay online for the fall, much of the work I’ve done for this online class will need to be re-done; it’s very specific to this semester’s students and work. (Plus there’s the whole first half of the class which hasn’t been online-ized.) I’m not saying that this class is better online, either, or that the university should dispense with face-to-face classes forever. Teaching online from the outset, with students whom I don’t know and can’t tailor instruction to out of the gate, is a whole different beast from this switch midstream. To be honest, that kind of online teaching scares the snot out of me. This ain’t that.

So when I say that this class is working for me, it kind of feels like I’m betraying my colleagues and my students who are just barely making it.

And yet it wouldn’t be fair to say that these last 6 weeks have been universally horrible. My students are responding really well to these new ways of teaching and learning. They’ve told me (and I think they’re being honest) that they’re enjoying the new forms, and they’re finding our Slack discussions useful. Not everything has worked, but a lot of stuff has worked. Students are making historical connections to their own lives. They’re learning in real time about how to understand the digital environment where they live and are now even more immersed in.

I’ve learned a lot. I’ve learned new skills, new ways of showing empathy, new ways of communicating, new ways of managing my own and my students’ expectations. And most important, what I’ve learned this semester is going to make future semesters better, online or face-to-face. I think it’s a mistake to miss the good in the midst of the bad. We all need a few successes to hang our hats on right now. This class, for the moment, is where I’m going to hang my hat.

Digital Methods for Military History: An Institute for Advanced Topics in the Digital Humanities

In October 2014, I ran a workshop at Northeastern University called “Digital Methods for Military History,” designed to (you guessed it) introduce digital history methods to military historians. It was a two-day event that covered a lot of ground, and many participants suggested that they’d like a longer period of instruction or a follow-up event.

A lot has changed since 2014. I was a graduate student then, not even advanced to candidacy. I was a fellow at the NULab for Texts, Maps, and Networks, feeling my way through the wilds of digital history, mostly under the auspices of the Viral Texts project. In 2013, I attended my first THATCamp Prime, where I met Brett Bobley, the director of the NEH’s Office of Digital Humanities, and he and I talked about how military historians could be brought into the digital humanities fold. From that conversation, the project was born. Looking back on those conversations today, I continue to be humbled by the confidence that Brett, the NEH, and the NULab and College of Social Sciences and Humanities placed in me, a very young graduate student, to pull off the workshop.

In 2016, while still working on my dissertation at Northeastern, I started a job at the Roy Rosenzweig Center for History and New Media as a part-time wage employee on the Tropy project. I defended my dissertation in April 2017, and since then I’ve transitioned from wage employee to research faculty, and now this fall to instructional faculty at George Mason University. I’ve worked on Tropy for that whole time, and continued my own research on the First Barbary War while I work on turning the dissertation into a book (as one does), as well as being involved in several other grant projects.

This grant, an Institute for Advanced Topics in the Digital Humanities to fund a new 2-week institute on Digital Methods for Military History, feels special, though. It’s fitting that a project that was conceived during my first visit to RRCHNM should find its way back to the Center, where so many great institutes have occurred in years past. It’s a privilege to follow in their footsteps in teaching about digital history. I’m honored that the NEH again found the instruction of military historians a worthwhile endeavor and gave me a chance to assemble a great team to do that instruction.

This institute is two weeks instead of two days, giving us a lot more time to delve more deeply into the topics that military historians already find interesting. We’ll be spending our time investigating data creation and cleaning, visualizations, and mapping. We chose those topics because they are ones that many military historians are familiar with but don’t know how to create on their own. We’ll also be thinking about how to see a DH project through from beginning to end. Our instructors are top-notch practitioners in these areas: Jason Heppler, Jean Bauer, and Christopher Hamner (and me).

The planning has only just begun, of course, but the tentative dates are July 20-31, 2020. Stay tuned for more information and a call for participants. This time, we’ll also be able to pay for people to come, which will hopefully make it possible for some historians to come who couldn’t afford to pay their own way to the workshop.

I’m so grateful to have this opportunity to introduce military historians to tools for the digital age, and I’m humbled that the NEH has funded this institute. I’m looking forward to working with a great group of military historians in summer 2020!

Passing on the Scissors and the Quill

Faithful readers of this blog (all one of you) will notice that I haven’t posted in almost a year. It’s not that I’ve had nothing interesting to say, but rather that I’ve been too busy with those interesting things to write about them for the blog. Here’s a brief rundown.

In the summer of 2014, my family moved to Fairfax, VA, when my husband was hired by George Mason University. For the 2014-2015 school year, I commuted to Boston from Virginia almost every week so I could finish my coursework at Northeastern University. In August 2015, I passed my comprehensive exams and defended my dissertation proposal, officially becoming a PhD candidate. For the past year, I’ve been researching and writing my dissertation, as well as continuing to work on the Viral Texts project.

The Viral Texts project has been part of my graduate-school experience almost since the beginning. I joined the project as part of the inaugural group of NULab fellows in the spring of 2013. I remember sitting around a table with the other fellows, hearing about all the different projects we might be assigned to, and thinking, “I really hope the spots for that newspaper project don’t fill up before I get to choose.” Thankfully, they didn’t. The NULab fellows’ role has changed since then, but I’ve always been able to stay attached to the project, and I’m so grateful.

Over the past three years, I’ve done a lot of crazy stuff with Viral Texts. I’ve taught myself R, Python, a little Ruby on Rails, and a little JavaScript. I’ve read enough 19th-century newspapers that some of their editors feel a little like friends. I’ve made maps, graphs, networks, and a host of other things. I’ve seen our data grow from a few hundred newspapers in a handful of American states, to periodicals on three continents and in multiple languages. I’ve written an article on fugitive texts with Ryan Cordell (forthcoming in American Periodicals). I’ve been a jack-of-all-trades, though perhaps a master of none.

Viral Texts is one of the defining pieces of my graduate school experience. It shaped my understanding of digital humanities, and it stretched me to work in multiple disciplines. It taught me how to work with a team while keeping my individuality. And I learned an awful lot about how nineteenth-century newspapers work.

And now, in true Viral Texts fashion, it’s time for me to pass on the scissors and the quill. Starting in May, I’ll be joining the research division at the Roy Rosenzweig Center for History and New Media at George Mason University. I’ll be working with PressForward, Zotero, and mostly Tropy, CHNM’s new Mellon-funded project for archiving and organizing photos. I’m particularly excited about working with Tropy, though I’m a little bummed that my dissertation will (I hope) be close to complete before Tropy is ready for the big time. 🙂

The projects and tools at CHNM were my first encounter with digital humanities, even before I wanted to embrace the digital in my own work. Throughout my graduate career, I’ve benefited greatly from Zotero and Omeka and other amazing work at the center, and I’m looking forward to helping develop other great tools for myself and others to use.

In joining CHNM and departing Viral Texts, I take these words from the valedictory editorial of Thomas Ritchie, editor of the Richmond Enquirer: “I cannot close this hasty valedictory, without again expressing the sentiments of gratitude and affection with which I am so profoundly penetrated.” So to everyone on the team—Ryan, David, and Fitz in particular—thanks. It’s been great.

Civil War Navies Bookworm

If you read my last post, you know that this semester I engaged in building a Bookworm using a government document collection. My professor challenged me to try my system for parsing the documents on a different, larger collection of government documents. The collection I chose to work with is the Official Records of the Union and Confederate Navies. My Barbary Bookworm took me all semester to build; this Civil War navies Bookworm took me less than a day. I learned things from making the first one!

This collection is significantly larger than the Barbary Wars collection—26 volumes, as opposed to 6. It encompasses roughly the same time span, but 13 times as many words. Though it is still technically feasible to read through all 26 volumes, this collection is perhaps a better candidate for distant reading than my first corpus.

The document collection is broken into geographical sections, the Atlantic Squadron, the West Gulf Blockading Squadron, and so on. Using the Bookworm allows us to look at the words in these documents sequentially by date instead of having to go back and forth between different volumes to get a sense of what was going on in the whole navy at any given time.

Looking at ship types over the course of the war, across all geographies.
Looking at ship types over the course of the war, across all geographies.

Process and Format

The format of this collection is mostly the same as the Barbary Wars collection. Each document starts with an explanatory header (“Letter to the secretary of the navy,” “Extract from a journal,” etc.). Unlike BW, there are no citations at the end of each document. So instead of using the closing citations as document breakers, I used the headers. Though there are many different kinds of documents, the headers are very formulaic, so the regular expressions to find them were not particularly difficult to write.[ref]Ben had suggested that I do the even larger Civil War Armies document collection; however, that collection does not even have headers for the documents, much less citations, so the document breaking process would be exponentially more difficult. It’s not impossible, but I may have to rework my system—and I don’t care about the Civil War that much. 🙂 However, other document collections, such as the U.S. Congressional Serial Set, have exactly the same format, so it may be worth figuring out.[/ref]

Further easing the pain of breaking the documents is the quality of the OCR. Where I fought the OCR every step of the way for Barbary Bookworm, the OCR is really quite good for this collection (a mercy, since spot-checking 26 volumes is no trivial task). Thus, I didn’t have to write multiple regular expressions to find each header; only a few small variants seemed to be sufficient.

New Features

The high quality OCR enabled me to write a date parser that I couldn’t make work in my Barbary Bookworm. The dates are written in a more consistent pattern, and the garbage around and in them is minimal, so it was easy enough to write a little function to pull out all parts. In the event that certain parts of the dates were illegible, or non-existent, I did make the function find each part of the date in turn and then compile them into one field, rather than trying to extract the dates wholesale. That way, if all I could extract was the year, the function would still return at least a partial date.

Another new feature of this Bookworm is that the full text of the document appears for each search term when you click on the line at a particular date. This function is slow, so if the interface seems to freeze or you don’t seem to be getting any results, give it a few minutes. It will come up. Most of the documents are short enough that it’s easy to scroll through them.

Testing the Bookworm

Some of the same reservations apply to this Bookworm as I detailed in my last post about Barbary Bookworm—they really apply to all text-analysis tools. Disambiguation of ship names and places continues to be a problem. But many of the other problems with Barbary Bookworm are solved with this Bookworm.

The next step that I need to work on is sectioning out the Confederate navy’s documents from the Union navy’s. Right now, you can get a sense of what was important to both navies, but not so easily get a sense of what was important to just one side or the other.

To be honest, I don’t really know enough about the navies of the Civil War to make any significant arguments based on my scrounging around with this tool. There are some very low-hanging fruit, of course.

Unsurprisingly, the terms "monitor" and "ironclad" become more prominent throughout the war.
Unsurprisingly, the terms “monitor” and “ironclad” become more prominent throughout the war.

The Bookworm is hosted online by Ben Schmidt (thanks, Ben!). The code for creating the files is up on GitHub. Please go play around with it!

Feedback

Particularly since I don’t do Civil War history, I’d welcome feedback on both the interface and the content here. What worked? What didn’t? What else would you like to see?

Feel free to send me questions/observations/interesting finds/results by commenting on this post (since there’s not a comment function on the Bookworm itself), by emailing me, or for small stuff, pinging me on Twitter (@abbymullen). I really am very interested in everyone’s feedback, so please scrub around and try to break it. I already know of a few things that are not quite working right, but I’m interested to see what you all come up with.

Text Analysis on the Documents of the Barbary Wars

This past semester, I took a graduate seminar in Humanities Data Analysis, taught by Professor Ben Schmidt. This post describes my final project. Stay tuned for more fun Bookworm stuff in the next few days (part 2 on Civil War Navies Bookworm is here).


 

In the 1920s, the United States government decided to create document collections for several of its early naval wars: the Quasi-War with France, the Barbary Wars, and the Civil War (the War of 1812 did not come until much later, for some reason). These document collections, particularly for the Quasi-War and the Barbary Wars, have become the standard resource for any scholar doing work on these wars. My work on the Barbary Wars relies heavily on this document collection. The Barbary Wars collection includes correspondence, journals, official documents such as treaties, crew manifests, other miscellaneous documents, and a few summary documents put together in the 1820s.[ref]U.S. Office of Naval Records and Library, Naval Documents Related to the United States Wars with the Barbary Powers (Washington: U.S. Govt. Print. Off., 1939); digitized at http://www.ibiblio.org/anrs/barbary.html.[/ref]

It’s quite easy to get bogged down in the multiplicity of mundaneness in these documents—every single day’s record of where a ship is and what the weather is like, for instance. It’s also easy to lose sight of the true trajectory of the conflict in the midst of all this seeming banality. Because the documents in the collection are from many authors in conversation with each other, we can sometimes follow the path of these conversations. But there are many concurrent conversations, and often we do not have the full correspondence. How can we make sense of this jumble?

Continue reading Text Analysis on the Documents of the Barbary Wars

Named Entity Extraction: Productive Failure?

This past week in my Humanities Data Analysis class, we looked at mapping as data. We explored ggplot2’s map functions, as well as doing some work with ggmap’s geocoding and other things. One thing that we just barely explored was automatically extracting place names through named entity recognition. It is possible to do named entity recognition in R, though people say it’s probably not the best way. But in order to stay in R, I used a handy tutorial by the esteemed Lincoln Mullen, found here.

I was interested in extracting place names from the data I’ve been cleaning up for use in a Bookworm, the text of the 6-volume document collection, Naval Documents Related to the United States Wars with the Barbary Powers, published in the 1920s by the U.S. government. It’s a great primary source collection, and a good jumping-off point for any research into the Barbary Wars. The entire collection has been digitized by the American Naval Records Society, with OCR, but the OCRed text is not clean. The poor quality of the OCR has been problematic for almost all data analysis, and this extraction was no exception.

The tutorial on NER is quite easy to follow, so that wasn’t a problem at all. The problem I ran into very quickly was the memory limits on my machine–this process takes a TON of memory, apparently. I originally tried to use my semi-cleaned-up file that contained the text of all 6 volumes, but that was way too big. Even one volume proved much too big. I decided to break up the text into years, instead of just chunking the volumes by size, in order to facilitate a more useful comparison set. For the first 15 years (1785-1800), the file was small enough, and I even combined the earlier years into one file. But starting in 1802, the file was still too large even with only one year. So I chunked each year into 500kb files, and then ran the program exactly the way the tutorial suggested with multiple files. I then just pushed the results of each chunk back into one results file per year.

Once I got my results, I had to clean them up. I haven’t tested NER on any other type of document, but based on my results, I suspect that the particular genre of texts I am working with causes NER some significant problems. I started by just doing a bit of work with the list in OpenRefine in order to standardize the terrible spelling of 19th-century naval captains, plus OCR problems. That done, I took a hard look at what exactly was in my list.

List of named-entity-recognition results
An excerpt from the results (before passing through OpenRefine) that demonstrates some of the problems discussed here.

Here’s what I found:
1. The navy didn’t do NER any favors by naming many of their ships after American places. It’s almost certain that Essex and Chesapeake, for instance, refer to the USS Essex and USS Chesapeake. Less certain are places like Philadelphia, Boston, United States, and even Tripoli, which are all places that definitely appear in the text, but are also ship names. There’s absolutely no way to disambiguate these terms.
2. The term “Cape” proved to be particular problems. The difficulty here is that the abbreviation for “Captain” is often “Cap” or “Capt,” and often the OCR renders it “Cape” or “Ca.” Thus, people like Capt. Daniel McNeill turn up in a place-name list. Naval terms like “Anchorage” also cause some problems. I guarantee: Alaska does not enter the story at all.
3. The format of many of these documents is “To” someone “from” someone. I can’t be certain, but it seems like the NER process sometimes (though not always) saw those to and from statements as being locational, instead of relational. I also think that journal or logbook entries, with their formulaic descriptions of weather and location, sometimes get the NER process confused about which is the weather and which is the location.
4. To be honest, there are a large number of false hits that I really can’t explain. It seems like lists are particularly prone to being selected from, so I get one member of a crew list, or words like “salt beef,” “cheese,” or “coffee,” from provision lists. But there are other results as well that I just can’t really make out why they were selected as locations.

Because of all these foibles, each list requires hand-curation to throw out the false hits. Once I did that, I ran it through R again to geocode the locations using ggmap. Here we also had some problems (which I admittedly should have anticipated based on previous work doing geolocation of these texts). Of course, many of the places had to be thrown out because they were just too vague to be of any use: “harbor,” “island,” and other such terms didn’t make the cut.

When I ran the geocoder for the first time, it threw a bunch of errors because of unrecognizable place names. Then I remembered: this is why I’ve used historical maps of the area in the past–to try to track down these place names that are not used today. Examples include “Cape Spartel,” “Cape DeGatt,” and “Cape Ferina.” (I’m not sure why they were all capes.) I discovered that if you run the “more” option on the geocode, the warnings don’t result in a failed geocode, plus all the information is useful to get a better sense of the granularity of the geocode, and what exact identifier the geocoder was using to determine the locations.

This extra information proved helpful when the geocoded map revealed oddities such as the Mediterranean Sea showing up in the Philippines, or Tunis Bay showing up in Canada. Turns out, the geocoder doesn’t necessarily pick the most logical choice for ambiguous terms: there is, in fact, an Australasian sea sometimes known as the Mediterranean Sea. These seemingly arbitrary choices by the geocoder mean that the map looks more than a little strange.

Map of named entities before cleaning
Just to see what would happen, I ran the geocoder on the raw results (no cleaning done). It turned out entertaining, at least.

A slightly more sensible map: This is one created with the clean data.
A slightly more sensible map: This is one created with the clean data. You can see from the outliers, though, that some of these locations are not correct. Given how far off some important terms are (like “Mediterranean Sea”), the text plotting made more sense for understanding than simply plotting symbols. Text plotting does obscure the places that are close to other places, leaving the outliers as the easily visible points. Those points seem the most likely to be incorrect.

So what’s the result here? I can see the potential for named-entity extraction, but for my particular project, it just doesn’t seem logical or useful. There’s not really anything more I can do with this data, except try to clean up my original documents even more. But even so, it was a useful exercise, and it was good practice in working with maps and data in R.

On Newspapers and Being Human

Last week, an opinion piece appeared in the New York Times, arguing that the advent of algorithmically derived human-readable content may be destroying our humanity, as the lines between technology and humanity blur. A particular target in this article is the advent of “robo-journalism,” or the use of algorithms to write copy for the news.[ref]The article also decries other types of algorithmically derived texts, but the case for computer-generated creative fiction or poetry is fairly well argued by people such as Mark Sample, and is not an argument that I have anything new to add to.[/ref] The author cites a study that alleges that “90 percent of news could be algorithmically generated by the mid-2020s, much of it without human intervention.” The obvious rebuttal to this statement is that algorithms are written by real human beings, which means that there are human interventions in every piece of algorithmically derived text. But statements like these also imply an individualism that simply does not match the historical tradition of how newspapers are created.[ref]This post is based on my research for the Viral Texts project at Northeastern University.[/ref]

In the nineteenth century, algorithms didn’t write texts, but neither did each newspaper’s staff write its own copy with personal attention to each article. Instead, newspapers borrowed texts from each other—no one would ever have expected individualized copy for news stories.[ref]In 1844, the New York Daily Tribune published a humorous story illustrating exactly the opposite, in fact—some readers preferred a less human touch.[/ref] Newspapers were amalgams of texts from a variety of sources, cobbled together by editors who did more with scissors than with a pen (and they often described themselves this way). Continue reading On Newspapers and Being Human

Introducing the Boston Maps Project

This semester, Northeastern University’s history department is branching out into new territory: we’re beginning a large-scale digital project that is being implemented across several classes in the department. The goal of the project is to investigate urban and social change in the city of Boston using historical maps. We’re very excited to be partnering with the Leventhal Map Center at the Boston Public Library for this project.

This project was originally conceived as an offshoot of a group project from Prof. William Fowler’s America and the Sea course last spring. The original plan was just to think about how the waterfront changed, but it has expanded significantly in response to feedback from faculty in the department. Our focus has become both the topography and the culture of Boston, and how those two intertwine.

Our final product will be an interactive, layered series of historical maps with annotations that help to explore urban and social change across 250 years of Boston’s history. We’ll be building our map series in Leaflet, which we think is a beautiful and flexible medium for such a task.

Why maps?

We made the decision to use historical maps for several reasons. Getting at the topographical changes in the city calls for map comparison. Boston’s topography has changed so substantially in its history that a 1630 map is essentially unrecognizable as the same city. In many senses, modern Boston isn’t even the same land as 1630 Boston. Because the actual land forms have changed so much, it’s impossible to tell the story of Boston without investigating its maps.

Space is an important part of the story of Boston. As the function and prospects of the city change, so does its landform. But Bostonians have never been content to merely take land from the west, as so many other coastal cities have done. Instead, they literally make land in the sea. Over the course of almost four hundred years, Boston has made so much land that its 1630 footprint is essentially unrecognizable in its 2014 footprint.

These drastic topographical changes are inextricably linked to the life of the city. Many of the changes connect explicitly to commercial concerns–the building of new wharves, for instance. So one major goal of the Boston Maps Project is to make obvious these connections between the city’s life and its land.

We’re fortunate to have such a great collection of maps at our disposal. For this semester, we’re going to be using approximately 25 maps, spanning from 1723 to 1899. In the future, we’d like to expand further toward the present, but the Leventhal maps don’t extend far into the 20th century.

Beginning the process

The first step in our process is to get the maps georectified and then annotated. Aligning these historical maps with each other is critical for tracking how the city changes. The work of georectification and annotation is being done this semester by undergraduate and graduate students in seven classes, ranging in subject from public history to Colonial and Revolutionary America. They’re using QGIS to georectify the maps, and then using Omeka as a repository for their annotations.

The georectification process helps the students compare maps and think about how things have developed over time. These georectified maps are the backbone of the project, as they provide the structure for the story of change. Eventually, they’ll provide both the conceptual and the physical structure of the project as well.

But merely georectifying the maps doesn’t really tell us that much about the changes that are going on within the city. To get at those changes, students are identifying features on the maps and writing paragraph-length descriptions of them that describe their purpose and evolution. We hope these annotations will provide context that enriches our understanding of topographical and social change in the city.

Features such as the ones in the black polygons are ones that I've encouraged the students to annotate. What is that black box? How has Beacon Hill's function changed? What in the world is Mount Whoredom? These are all questions that we hope to answer. (Zoom of Richard WIlliams, "A plan of Boston and its environs," 1775.
Features such as the ones in the black polygons are ones that I’ve encouraged the students to annotate. What is that black box? How has Beacon Hill’s function changed? What in the world is Mount Whoredom? These are all questions that we hope to answer. (Zoom of Richard WIlliams, “A plan of Boston and its environs,” 1775. From the Leventhal Map Center, BPL.)

Thus far, the rollout has been mostly successful. We’ve had a few technical blips along the way (word to the wise Mac user: download all those extra packages before installing QGIS!), but in general the students are excited about beginning the work on this project. I’ve lectured in several of the classes already about the idea of the project and the technical aspects of it, and the students are all beginning to work on their individual pieces.

Thanks

This project would never have gone forward without encouragement and advice from several people.

Chief encourager and motivator has been Professor Bill Fowler, who has always believed that a large-scale digital project is not only possible, but profitable to implement  in undergrad courses. He is learning right along with the students about the tools and technologies that we’re using, and he is our biggest advocate with the BPL and other organizations.

Chief technical adviser, without whom the project would have already completely imploded, is Ben Schmidt. He has written scripts, hashed out schemas, wrangled servers, and done many other tasks that I don’t yet have the technical competency to deal with. In addition, he has provided invaluable advice about best practices for digital projects and the direction the project should go.

All of the staff at the Leventhal Map Center have jumped on board this project with enthusiasm. They’ve met with us, advised us on the best maps to use, and helped us think through how the project can best benefit both NEU and the BPL.

All the faculty who have agreed to implement this project in their courses deserve special thanks as well. The project takes away class time from lectures on their own subject matter, and it certainly adds an element of uncertainty to the course structure. I appreciate their willingness to go out on a limb to make this project happen.

I’m very grateful to all these people—and plenty of others—who have already helped to make the Boston Maps Project a success.

—-

We’re very excited to begin this new project. I hope to write infrequent reports on our progress, and hopefully our final product will be beautiful and useful to scholars, visitors, and residents of the city of Boston.

McMullen Naval History Symposium Recap

This weekend, I had the privilege of presenting a paper at the McMullen Naval History Symposium. It was my second time at the U.S. Naval Academy, and I have had a great time.

Our Panel

I organized a panel titled “Politics of the Sea in the Early Republic,” in which the panelists looked at how the navy and maritime concerns influenced political discourse (and vice versa). Bill Leeman argued that Thomas Jefferson’s approach to the navy in the Barbary Wars was more pragmatic than idealistic. The question of who could declare war–was it the president or the Congress?–was a live one in the early republic. What were the president’s powers when a foreign country declared war first? These are the questions that Jefferson had to grapple with as he sent the navy to deal with the threat of the Barbary States.

My paper picked up the political question in the War of 1812. Titled “Naval Honor and Partisan Politics: The Naval War of 1812 in the Public Sphere,” the paper investigated how partisan newspapers approached the naval war, using exactly the same events to make exactly opposite political points. Interestingly, both political parties also used the same imagery and rhetoric. They both used the concept of honor in order to castigate the other party. I’ll be posting an edited version of the paper on the blog soon, so you’ll just have to wait to read the exciting conclusion.

Steve Park addressed how the Hartford Convention, held at the end of the War of 1812, addressed–or rather, didn’t address–the concerns of Federalists. Since the Federalists had traditionally been strongly in favor of naval buildup and the end of impressment, it was highly surprising that the delegates did not really mention these concerns at all in their convention resolutions. Nevertheless, they were not secessionist, but instead sought a constitutional solution to their perceived grievances.

We were very fortunate to have a premier naval historian, Craig Symonds, as our chair, and an excellent younger scholar, David Head, as our commentator. The audience was involved in the themes of our panel, and they asked great questions and pushed each of our ideas in fruitful directions. Even after the session was over, we continued to field questions informally, and I had some profitable conversations about the paper even afterwards during the reception.

New Connections

The historians that attend the naval history symposium are members of the community I want to be a part of. Senior scholars in the field of naval history attend every year, including many historians whose work has been integral to my research. This year, I met several of those historians. Two were particularly special, as they are essentially responsible for my desire to do naval history. Frederick Leiner, who is a historian of the early American navy only as a side interest, wrote Millions for Defense: The Subscription Warships of 1798 and The End of Barbary Terror: America’s 1815 War Against the Pirates of North Africa. Millions for Defense was the book that set me on the path to studying the Barbary Wars. And Christopher McKee wrote the seminal work on the naval officer corps of the early republic, A Gentlemanly and Honorable Profession, which has shown me the breadth and depth of the stories in the naval officer corps. These stories will undoubtedly keep me busy for a lifetime. (I also would love to make that book into a digital project, but that’s a task for another time.)

Almost as exciting as meeting a few of my history heroes, I also met some young scholars, working on their PhDs or just finished with their degrees. Several of them were women, also doing naval history. These meetings gave me so much hope for the future–for my own career and for the field at large. I can’t wait to keep up with these scholars, and perhaps even forge some meaningful relationship and collaborations with them. I also met some young scholars who are doing digital history. In light of my previous blog post about the intersection of DH and MH, I’m very excited to learn that the field is not quite as barren as it seems. Again,  I hope to establish some meaningful connections and build up a community of digital naval historians.

The symposium left me with lots of new ideas, new avenues of exploration, and new professional connections. So now I’m looking forward to jumping back into my work!