Faithful readers of this blog (all one of you) will notice that I haven’t posted in almost a year. It’s not that I’ve had nothing interesting to say, but rather that I’ve been too busy with those interesting things to write about them for the blog. Here’s a brief rundown.
In the summer of 2014, my family moved to Fairfax, VA, when my husband was hired by George Mason University. For the 2014-2015 school year, I commuted to Boston from Virginia almost every week so I could finish my coursework at Northeastern University. In August 2015, I passed my comprehensive exams and defended my dissertation proposal, officially becoming a PhD candidate. For the past year, I’ve been researching and writing my dissertation, as well as continuing to work on the Viral Texts project.
The Viral Texts project has been part of my graduate-school experience almost since the beginning. I joined the project as part of the inaugural group of NULab fellows in the spring of 2013. I remember sitting around a table with the other fellows, hearing about all the different projects we might be assigned to, and thinking, “I really hope the spots for that newspaper project don’t fill up before I get to choose.” Thankfully, they didn’t. The NULab fellows’ role has changed since then, but I’ve always been able to stay attached to the project, and I’m so grateful.
Viral Texts is one of the defining pieces of my graduate school experience. It shaped my understanding of digital humanities, and it stretched me to work in multiple disciplines. It taught me how to work with a team while keeping my individuality. And I learned an awful lot about how nineteenth-century newspapers work.
And now, in true Viral Texts fashion, it’s time for me to pass on the scissors and the quill. Starting in May, I’ll be joining the research division at the Roy Rosenzweig Center for History and New Media at George Mason University. I’ll be working with PressForward, Zotero, and mostly Tropy, CHNM’s new Mellon-funded project for archiving and organizing photos. I’m particularly excited about working with Tropy, though I’m a little bummed that my dissertation will (I hope) be close to complete before Tropy is ready for the big time.
The projects and tools at CHNM were my first encounter with digital humanities, even before I wanted to embrace the digital in my own work. Throughout my graduate career, I’ve benefited greatly from Zotero and Omeka and other amazing work at the center, and I’m looking forward to helping develop other great tools for myself and others to use.
In joining CHNM and departing Viral Texts, I take these words from the valedictory editorial of Thomas Ritchie, editor of the Richmond Enquirer: “I cannot close this hasty valedictory, without again expressing the sentiments of gratitude and affection with which I am so profoundly penetrated.” So to everyone on the team—Ryan, David, and Fitz in particular—thanks. It’s been great.
If you read my last post, you know that this semester I engaged in building a Bookworm using a government document collection. My professor challenged me to try my system for parsing the documents on a different, larger collection of government documents. The collection I chose to work with is the Official Records of the Union and Confederate Navies. My Barbary Bookworm took me all semester to build; this Civil War navies Bookworm took me less than a day. I learned things from making the first one!
This collection is significantly larger than the Barbary Wars collection—26 volumes, as opposed to 6. It encompasses roughly the same time span, but 13 times as many words. Though it is still technically feasible to read through all 26 volumes, this collection is perhaps a better candidate for distant reading than my first corpus.
The document collection is broken into geographical sections, the Atlantic Squadron, the West Gulf Blockading Squadron, and so on. Using the Bookworm allows us to look at the words in these documents sequentially by date instead of having to go back and forth between different volumes to get a sense of what was going on in the whole navy at any given time.
Process and Format
The format of this collection is mostly the same as the Barbary Wars collection. Each document starts with an explanatory header (“Letter to the secretary of the navy,” “Extract from a journal,” etc.). Unlike BW, there are no citations at the end of each document. So instead of using the closing citations as document breakers, I used the headers. Though there are many different kinds of documents, the headers are very formulaic, so the regular expressions to find them were not particularly difficult to write. 1
Further easing the pain of breaking the documents is the quality of the OCR. Where I fought the OCR every step of the way for Barbary Bookworm, the OCR is really quite good for this collection (a mercy, since spot-checking 26 volumes is no trivial task). Thus, I didn’t have to write multiple regular expressions to find each header; only a few small variants seemed to be sufficient.
The high quality OCR enabled me to write a date parser that I couldn’t make work in my Barbary Bookworm. The dates are written in a more consistent pattern, and the garbage around and in them is minimal, so it was easy enough to write a little function to pull out all parts. In the event that certain parts of the dates were illegible, or non-existent, I did make the function find each part of the date in turn and then compile them into one field, rather than trying to extract the dates wholesale. That way, if all I could extract was the year, the function would still return at least a partial date.
Another new feature of this Bookworm is that the full text of the document appears for each search term when you click on the line at a particular date. This function is slow, so if the interface seems to freeze or you don’t seem to be getting any results, give it a few minutes. It will come up. Most of the documents are short enough that it’s easy to scroll through them.
Testing the Bookworm
Some of the same reservations apply to this Bookworm as I detailed in my last post about Barbary Bookworm—they really apply to all text-analysis tools. Disambiguation of ship names and places continues to be a problem. But many of the other problems with Barbary Bookworm are solved with this Bookworm.
The next step that I need to work on is sectioning out the Confederate navy’s documents from the Union navy’s. Right now, you can get a sense of what was important to both navies, but not so easily get a sense of what was important to just one side or the other.
To be honest, I don’t really know enough about the navies of the Civil War to make any significant arguments based on my scrounging around with this tool. There are some very low-hanging fruit, of course.
The Bookworm is hosted online by Ben Schmidt (thanks, Ben!). The code for creating the files is up on GitHub. Please go play around with it!
Particularly since I don’t do Civil War history, I’d welcome feedback on both the interface and the content here. What worked? What didn’t? What else would you like to see?
Feel free to send me questions/observations/interesting finds/results by commenting on this post (since there’s not a comment function on the Bookworm itself), by emailing me, or for small stuff, pinging me on Twitter (@abbymullen). I really am very interested in everyone’s feedback, so please scrub around and try to break it. I already know of a few things that are not quite working right, but I’m interested to see what you all come up with.
Ben had suggested that I do the even larger Civil War Armies document collection; however, that collection does not even have headers for the documents, much less citations, so the document breaking process would be exponentially more difficult. It’s not impossible, but I may have to rework my system—and I don’t care about the Civil War that much. However, other document collections, such as the U.S. Congressional Serial Set, have exactly the same format, so it may be worth figuring out. ↩
This past semester, I took a graduate seminar in Humanities Data Analysis, taught by Professor Ben Schmidt. This post describes my final project. Stay tuned for more fun Bookworm stuff in the next few days (part 2 on Civil War Navies Bookworm is here).
In the 1920s, the United States government decided to create document collections for several of its early naval wars: the Quasi-War with France, the Barbary Wars, and the Civil War (the War of 1812 did not come until much later, for some reason). These document collections, particularly for the Quasi-War and the Barbary Wars, have become the standard resource for any scholar doing work on these wars. My work on the Barbary Wars relies heavily on this document collection. The Barbary Wars collection includes correspondence, journals, official documents such as treaties, crew manifests, other miscellaneous documents, and a few summary documents put together in the 1820s. 1
It’s quite easy to get bogged down in the multiplicity of mundaneness in these documents—every single day’s record of where a ship is and what the weather is like, for instance. It’s also easy to lose sight of the true trajectory of the conflict in the midst of all this seeming banality. Because the documents in the collection are from many authors in conversation with each other, we can sometimes follow the path of these conversations. But there are many concurrent conversations, and often we do not have the full correspondence. How can we make sense of this jumble?
U.S. Office of Naval Records and Library, Naval Documents Related to the United States Wars with the Barbary Powers (Washington: U.S. Govt. Print. Off., 1939); digitized at http://www.ibiblio.org/anrs/barbary.html. ↩
This past week in my Humanities Data Analysis class, we looked at mapping as data. We explored ggplot2’s map functions, as well as doing some work with ggmap’s geocoding and other things. One thing that we just barely explored was automatically extracting place names through named entity recognition. It is possible to do named entity recognition in R, though people say it’s probably not the best way. But in order to stay in R, I used a handy tutorial by the esteemed Lincoln Mullen, found here.
I was interested in extracting place names from the data I’ve been cleaning up for use in a Bookworm, the text of the 6-volume document collection, Naval Documents Related to the United States Wars with the Barbary Powers, published in the 1920s by the U.S. government. It’s a great primary source collection, and a good jumping-off point for any research into the Barbary Wars. The entire collection has been digitized by the American Naval Records Society, with OCR, but the OCRed text is not clean. The poor quality of the OCR has been problematic for almost all data analysis, and this extraction was no exception.
The tutorial on NER is quite easy to follow, so that wasn’t a problem at all. The problem I ran into very quickly was the memory limits on my machine–this process takes a TON of memory, apparently. I originally tried to use my semi-cleaned-up file that contained the text of all 6 volumes, but that was way too big. Even one volume proved much too big. I decided to break up the text into years, instead of just chunking the volumes by size, in order to facilitate a more useful comparison set. For the first 15 years (1785-1800), the file was small enough, and I even combined the earlier years into one file. But starting in 1802, the file was still too large even with only one year. So I chunked each year into 500kb files, and then ran the program exactly the way the tutorial suggested with multiple files. I then just pushed the results of each chunk back into one results file per year.
Once I got my results, I had to clean them up. I haven’t tested NER on any other type of document, but based on my results, I suspect that the particular genre of texts I am working with causes NER some significant problems. I started by just doing a bit of work with the list in OpenRefine in order to standardize the terrible spelling of 19th-century naval captains, plus OCR problems. That done, I took a hard look at what exactly was in my list.
Here’s what I found:
1. The navy didn’t do NER any favors by naming many of their ships after American places. It’s almost certain that Essex and Chesapeake, for instance, refer to the USS Essex and USS Chesapeake. Less certain are places like Philadelphia, Boston, United States, and even Tripoli, which are all places that definitely appear in the text, but are also ship names. There’s absolutely no way to disambiguate these terms.
2. The term “Cape” proved to be particular problems. The difficulty here is that the abbreviation for “Captain” is often “Cap” or “Capt,” and often the OCR renders it “Cape” or “Ca.” Thus, people like Capt. Daniel McNeill turn up in a place-name list. Naval terms like “Anchorage” also cause some problems. I guarantee: Alaska does not enter the story at all.
3. The format of many of these documents is “To” someone “from” someone. I can’t be certain, but it seems like the NER process sometimes (though not always) saw those to and from statements as being locational, instead of relational. I also think that journal or logbook entries, with their formulaic descriptions of weather and location, sometimes get the NER process confused about which is the weather and which is the location.
4. To be honest, there are a large number of false hits that I really can’t explain. It seems like lists are particularly prone to being selected from, so I get one member of a crew list, or words like “salt beef,” “cheese,” or “coffee,” from provision lists. But there are other results as well that I just can’t really make out why they were selected as locations.
Because of all these foibles, each list requires hand-curation to throw out the false hits. Once I did that, I ran it through R again to geocode the locations using ggmap. Here we also had some problems (which I admittedly should have anticipated based on previous work doing geolocation of these texts). Of course, many of the places had to be thrown out because they were just too vague to be of any use: “harbor,” “island,” and other such terms didn’t make the cut.
When I ran the geocoder for the first time, it threw a bunch of errors because of unrecognizable place names. Then I remembered: this is why I’ve used historical maps of the area in the past–to try to track down these place names that are not used today. Examples include “Cape Spartel,” “Cape DeGatt,” and “Cape Ferina.” (I’m not sure why they were all capes.) I discovered that if you run the “more” option on the geocode, the warnings don’t result in a failed geocode, plus all the information is useful to get a better sense of the granularity of the geocode, and what exact identifier the geocoder was using to determine the locations.
This extra information proved helpful when the geocoded map revealed oddities such as the Mediterranean Sea showing up in the Philippines, or Tunis Bay showing up in Canada. Turns out, the geocoder doesn’t necessarily pick the most logical choice for ambiguous terms: there is, in fact, an Australasian sea sometimes known as the Mediterranean Sea. These seemingly arbitrary choices by the geocoder mean that the map looks more than a little strange.
So what’s the result here? I can see the potential for named-entity extraction, but for my particular project, it just doesn’t seem logical or useful. There’s not really anything more I can do with this data, except try to clean up my original documents even more. But even so, it was a useful exercise, and it was good practice in working with maps and data in R.
Last week, an opinion piece appeared in the New York Times, arguing that the advent of algorithmically derived human-readable content may be destroying our humanity, as the lines between technology and humanity blur. A particular target in this article is the advent of “robo-journalism,” or the use of algorithms to write copy for the news. 1 The author cites a study that alleges that “90 percent of news could be algorithmically generated by the mid-2020s, much of it without human intervention.” The obvious rebuttal to this statement is that algorithms are written by real human beings, which means that there are human interventions in every piece of algorithmically derived text. But statements like these also imply an individualism that simply does not match the historical tradition of how newspapers are created. 2
In the nineteenth century, algorithms didn’t write texts, but neither did each newspaper’s staff write its own copy with personal attention to each article. Instead, newspapers borrowed texts from each other—no one would ever have expected individualized copy for news stories. 3 Newspapers were amalgams of texts from a variety of sources, cobbled together by editors who did more with scissors than with a pen (and they often described themselves this way). Continue reading On Newspapers and Being Human→
The article also decries other types of algorithmically derived texts, but the case for computer-generated creative fiction or poetry is fairly well argued by people such as Mark Sample, and is not an argument that I have anything new to add to. ↩
This post is based on my research for the Viral Texts project at Northeastern University. ↩
In 1844, the New York Daily Tribunepublished a humorous story illustrating exactly the opposite, in fact—some readers preferred a less human touch. ↩
[This semester I’m taking Humanities Data Analysis with Professor Ben Schmidt. One of our tasks for this week was to build a random-walk generator using 3-grams. Here’s my quick writeup of my generator cross-posted from our course blog.]
We’ve been reading a lot of fairy tales around my house recently, so I wanted to see how well-spun of a tale I could create by walking randomly through a collection of fairy tales. I selected four fairy-tale collections from Project Gutenberg to test this idea on. Code is on GitHub.
The addition of the Arabian Nights stories to Western European fairy tales makes the random generator more interesting, sometimes throwing the geographical sense of the tale into a different place and a different world.
This version generated my favorite beginning: “once upon a time a man by the river yes he was looking straight into the deep waters skeletons of walruses.”
But other versions of the generator took an even darker turn. Here’s the raw text:
“once upon a great procession which was conscious of pain And sore regret of which she said nothing but torment and affliction that He sniffed about to give the ants were always running to and when he approached her they did not really birds but she bore thee Thou hast nothing to me Only tell me something Why this is what you say What is the news O my sister relate to me Art thou she whom he found it impossible to think of The old rough doll You are learned and wise men assembled together in his age and to nail up my mind every earthly care and sorrow with soft turf From the narrow walks and the Wezeer the father of Is both of you should care so much that renders men sinful and impure He fully realized the true the speaker s hand saying to each other till the morning following I have with me from first to last and then burst and fell fast asleep”
And here’s the story, with some punctuation that I added for “clarity”:
Once upon a great procession–which was conscious of pain and sore regret, of which she said nothing but torment and affliction that He sniffed about to give. The ants were always running to, and when he approached her, they did not really birds but she bore thee: “Thou hast nothing to me. Only tell me something: Why this is what you say? What is the news? O my sister relate to me! Art thou she whom he found it impossible to think of? The old rough doll? You are learned, and wise men assembled together in his age and to nail up my mind every earthly care and sorrow with.” Soft turf from the narrow walks and the Wezeer the father of Is, both of you should care so much! That renders men sinful and impure. He fully realized the true the speaker’s hand, saying to each other till the morning following, “I have with me from first to last,” and then burst and fell fast asleep.
And sometimes it’s important to be reminded of where your texts come from. I didn’t remove any text at all from the Project Gutenberg texts, which means that the copyright and distribution information could appear in our stories too. For example:
“The two grand annual festivals are observed with public domain eBooks Redistribution is subject to particular laws or rules with respect to our beetle to himself but the observance of this Wezeer So the porter approached the Distracted Slave of Love when his boat or playing in the lap of prosperity and the fear of him said the Fire drum Peter has gone away I ll do something in me.”
I might publish a longer generated story sometime soon, but all this generator proves is that tales can be wiggly indeed.
After an AHA in which I heard a lot about how digital history needs to be about results as well as methodology, I decided to write up a post about the results I gained from mapping the Quasi-War. Special h/t to Cameron Blevins and Yoni Appelbaum for inspiring me to write about my research. I’m also using Yoni’s hyperlink-style citations.
For my seminar in Empires and Colonialism this past semester, I wrote about the United States’ Quasi-War with France. The paper argues that the Quasi-War was one of the United States’ first chances to engage with international law on a broad scale, and that the conflicting legal realities of an undeclared war helped to destabilize the French empire in the Caribbean to the breaking point. As part of that seminar paper, I mapped encounters between the French and the Americans (with a few British encounters) from 1797 to 1800. This map proved to be more illuminating than I expected, and it became an integral part of my argument about the primacy of prize courts in the Quasi-War. The map has clickable points where encounters occurred, as well as a fuller explanation of the judgments I made in creating it. You can see the map here. What follows is my explanation of what the map does.
From 1798 to 1800, the United States waged an undeclared maritime war with France. Though this conflict is often described as a naval war, it was not a traditional one. Almost no French naval vessels entered the Caribbean, and the hostile encounters between the French and the Americans were almost all battles between privateers and merchant vessels.
Why were there so many privateers in the Caribbean? Geographically, the islands had always been prime areas for piratical types—lots of inlets and tiny islands for staging. In addition, the privateers served an important role in providing for the colonies. The dominance of the sugar industry had restricted the colonies’ ability to provide basic foodstuffs for their people, both white and black. Previous to this conflict, the United States had provided a large portion of the colonies’ food for the sugar workers—one scholar states that St. Domingue relied on the commerce of at least 600 American ships for basic supplies during 1796 alone. But as a result of the non-intercourse act, the supply had dried up. Consul Turell Tufts wrote in despair to President Adams about the port of Cayenne: “Every exertion is making there in Privateering, as they consider it the very harvest of Plunder; and besides, they have no other means of procuring Supplies.”
Constant war with Britain meant that supplies from elsewhere in the empire were difficult to come by, so taking supplies that were already present made perfect sense. When the privateers captured merchant vessels in the Caribbean, they were able to bring in both the money from the sale of the vessel and cargo, and also parts of the cargo itself. The colonial governments had a vested interest in the actions of the privateers as well—not just because of the food itself, but also because of the “consequent discontent” if food was not available.In an already volatile political environment, maintaining order sometimes meant encouraging the privateers.
It’s not surprising that the United States government decided that the French were a threat enough to build a navy. Compared to the number of captures by Barbary corsairs, the French threat was immense and widespread. There’s no way to know with any certainty how many captures actually occurred, but given the number of captures that we do have information about (more than 250 captures with enough spatial information to be plotted on a map, and hundreds more with no spatial data), the total number of captures could easily range over a thousand. When the navy did finally make it to the Caribbean, its commanders adopted strategies that helped them to deal with the huge numbers of privateers in the area. Recognizing how privateers operated, the commanders planned their locations and logistics accordingly.
At the heart of both the privateers’ and the navy’s strategy was the prize court. International maritime law had established the prize court as the appropriate way to adjudicate the legal claims of captor and captured alike. 1 Privateers, by and large, adhered to the prize-court structure; at least, the claims of piratical behavior were much less frequent than accounts of lawful prize-taking. This is not to say that every prize brought into a prize court was fairly and impartially adjudicated: privateers could count on certain ports as friendly to their causes, where the commissioners would declare captures lawful prize on the slightest provocation.
Though French privateers made captures all across the world, they found the greatest success in the Caribbean. Privateers could use some of the same tactics as the famed pirates of the Caribbean, using sheltered harbors and small islands as protection and cover. But privateers differed from pirates in that privateers needed to stay close to the ports where they could send in prizes, whereas pirates tended to plunder their captures. The abundance of colonial governments in the Caribbean meant an abundance of prize courts.
Privateers’ vessels weren’t large enough to sustain long periods at sea, and captures only reduced the time they could spend at sea. Privateers elongated their time at sea by placing prize crews on board captures and sending the prizes unaccompanied into port. These prizes were less likely to actually bring the captors their prize money, since the chances of the prize making it unscathed into port decreased when the privateer did not escort the prize back. In addition, the prize crew was taken from the crew of the privateer, which meant that even this solution would eventually leave the privateer with too few men to maneuver effectively.
The majority of captures were within a few days’ sail of a prize court. For French privateers, French ports were the ideal, but other neutral ports (such as Curacao) would do in a pinch. British ports were, of course, out of the question, as the British were at war with the French. At the beginning of the war, ports in Guadeloupe (particularly Basseterre) and Saint-Domingue were most likely to condemn American prizes. As the war progressed, and the Americans negotiated trade agreements with Toussaint separate from the French government, Guadeloupe became the primary port where American prizes would likely be condemned.
Prize courts—or rather, accessibility to prize courts—also dictated American strategy against the privateers. For a navy being literally built ship by ship, one-on-one pitched battles against the privateers could never be a feasible strategy. Instead, the naval commanders focused their attention on the prize court ports: places they could be sure to encounter privateers, and even more frequently, their prizes. This strategy had two strengths: first, it gave the navy a better chance of actually capturing privateers, and thus removing their threat. But second, it also made privateering less profitable even for the privateers who eluded capture. Prizes were relatively easy to capture, since they had a skeleton crew of belligerents along with the original crew, who were all too willing to rise up against the prize crew. And if those prizes never made it into port, all the cost in munitions, time, and crew members that the privateer had expended was meaningless. No prize court, no matter how lenient, would condemn a vessel whose papers never made it to port. Though these reasons were never spelled out in so many words, they must have occurred to at least some of those men who handed down orders.
The Americans adopted a strategy, then, that kept them very close to enemy ports. They targeted Guadeloupe specifically—a whole squadron was ordered to stay “in the neighborhood of Guadeloupe,” as the secretary of the navy had put it. They were then able to capture ships in neutral waters as they came in and out of those ports. On occasion, American naval vessels came very close to violating neutral waters: international law declared that water within a cannon-shot of land was the territory of the nation that held the land. But no one ever objected to their captures on those grounds. The naval vessels maintained an even smaller range than the privateers. Privateers usually made their captures within two or three days’ sail of a prize court; the navy maintained a distance of one day or less.
The number of naval vessels on the Guadeloupe station at any one time vacillated wildly. The secretary of the navy attempted to keep at least half a dozen ships there, but maintenance needs, expiration of terms of enlistment, sickness, or any number of other factors could pull ships off their patrolling grounds. Once Toussaint began to request the use of American naval vessels to help his cause in Saint-Domingue, the number of ships at Guadeloupe was even more unpredictable. And of course, individual captains sometimes took their ships off to places outside the strategic area for convoy duty or by sheer incompetence.
American naval vessels could not maintain perpetual patrols off Guadeloupe, no matter how ideal the circumstances. Just like the privateers, they needed a safe place to go for supplies, maintenance, and prize adjudication (they too operated on the prize system). They primarily used St. Kitts, to the north of Guadeloupe, as a base for resupply. However, prices in the islands were exorbitant, so the secretary of the navy sent supply vessels from the United States as well.
These geographical constraints did not preclude the navy’s sailing elsewhere—far from it. But the number of captures very near to enemy ports indicates that the navy’s strategy was effective. By the end of the year 1800, Thomas Truxtun, who was cruising off Guadeloupe, wrote to Thomas Tingey, “With all this cruising my success has been very limited indeed, for the french have become scarce, so much so, that what I formerly found (chasing) an amusement, and pastime, is now insiped, Urksome & tiresome.”
In fact, by this time the treaty had already been signed to reestablish commercial relations between the United States and France, though it would be another several months before the terms were ratified by all parties. Michael Palmer estimates that U.S. naval forces, averaging 16 ships at any given time between 1798 and 1800, captured 86 privateers over the course of the war. 2 This number is impressive for such a small force, but it still doesn’t come even close to an annihilation of the privateer forces. Many factors contributed to the eventual decline of French privateering, but it does seem that targeting the prize courts was one of those factors.The American naval strategy had succeeded.
For more about prize law and its relationship to empire, see chapter 3 of Lauren A. Benton, A Search for Sovereignty: Law and Geography in European Empires, 1400–1900 (Cambridge; New York: Cambridge University Press, 2010). ↩
Just as we don’t have enough spatial data to map all of the French victories over American shipping, we also don’t have enough spatial data to map these American victories completely—again, the map shows about ¼ of these victories. ↩
Today is Ada Lovelace Day, honoring a woman who is often credited with being the first computer programmer because of her work programming for Charles Babbage’s Analytical Engine in the 1840s. The day honors Ada and all women who are involved in science, technology, engineering, and mathematics.
I am not a woman in a STEM field, not really. But I am celebrating Ada Lovelace Day today because I am the humanities scholar I am through the influence of a woman who did work in STEM—my mom. So I’d like to celebrate Ada Lovelace Day 2014 by honoring my mom.
My mom was an elementary school teacher for the first part of her adult life. Once she had kids, she transitioned to writing elementary-school textbooks for a small press, a role she maintained for the rest of her life. Though she worked on a variety of projects, her favorite, and her longest-running project, was the elementary science curriculum. Writing these textbooks gave her the chance to incorporate into a curriculum the experiments and explorations she had always done with us kids at home. We got to look at eclipses through little holes in paper, and collect animal tracks using plaster. We were always being subjected to discussions about how best to demonstrate viscosity, or the most interesting way to talk about the distance between planets, or the kid-friendliest way to learn about civil engineering. And all of these household discussions worked their way into her textbooks.
I didn’t always appreciate my mom’s emphasis on science and mathematics. I used to cringe when she’d give me two similar items in the grocery store and ask me to figure out which one was the better deal, based on their price and weight (this was before the stores so helpfully printed the “unit price” on the price tag). Or she would play games with me to estimate how much our total grocery bill would be based on my having to keep track of all the items’ prices in my head.
When I was a teenager, I worked for the same press as my mom, and though I worked in a different department, I sometimes got outsourced to her as a researcher and writer. She gave me a vast array of different assignments, like writing about autonomous underwater vehicles, or atmospheric optics, for a call-out page in a 5th-grade science textbook. Initially I wasn’t that excited about some of the topics, but I ended up catching her enthusiasm and digging in.
In a few weeks, it will be the fourth anniversary of my mom’s death. She was able to finish the entire elementary school science curriculum before becoming too sick to work. That’s one scientific legacy. But the legacy is more personal, too. I still find myself wishing I could call her when I see things like halos around the sun, or an oddly colored insect, because I know that she would most appreciate the beauty of a random scientific phenomenon.
My whole family has been inspired by my mom’s legacy. In fact, of four kids, I’m the only one who doesn’t have some sort of higher education in a STEM field. Three of the four of us are working on PhDs (and the fourth is still in college—the bar’s pretty high, Auria…). My latent mathematician has been coming out recently as I get into digital humanities, but the very way I think about knowledge and research–even as a historian—comes from my mom. Both of my parents have always encouraged us to educate ourselves, both officially and unofficially. Both my mom and my dad have always pushed us to excel as far as we can, while supporting us along the way. But today, Ada Lovelace Day, I want to honor the one woman in a STEM field who has meant the most to me and has shaped my life more than anyone else. I love you, Joyce Garland. You’re the best role model I could ever have.
The major work on the Boston Maps Project for the semester is wrapping up this week. This semester, we ended up with 108 users (about 100 students) who contributed to 19 maps and over 400 annotations on our Omeka site.
Review: The Process
Throughout the semester, I attended an average of three full class periods for each of the five classes that participated heavily in the project. Some of these meetings were scheduled in advance; others were scheduled when I noticed a particular problem across a large number of students in the class.
The initial instruction took two forms. In two classes, I explained the instructions about georectification in a separate class period from my instruction about annotations. In the others, I did all the instruction about both topics in one class period. In general, I noticed that the classes who received instruction on two different occasions struggled less with the technical aspects of both georectification and annotation.
I visited each class at least one more time to provide further clarification. All the classes needed additional help with writing annotations. In each class, the students received a handout that explained the way the students should think about their descriptions, as well as the sources they should consult. It turned out that the handout was not sufficient for students to understand what was expected of them regarding either the research for or the writing of the annotations. (In retrospect, I should have anticipated this.)
Over the course of the semester, I probably received an average of one email per day about the project, with up to 10 emails per day nearer the students’ deadlines. I also met with at least one student a week in my office to work through their struggles with research, writing, or technical issues. It was actually gratifying to see the number of students who wanted my input and assistance—ironically, in this semester, when I was rarely in the classroom, I had more person-to-person interaction with students than I’ve ever had before.
Most of the students did a great job with the georectification, with very little additional instruction from me. Though I met with 6 or 7 of the students doing the map, only a few of those students required any real assistance—most met with me merely to reassure themselves that they were doing a good job. (They were.) When I asked two classes about their biggest challenge with the project, each group’s georectifier said that that task was their biggest challenge. However, several of them also mentioned that the georectification was one of the more rewarding aspects of the project.
One group did their georectification collectively: they projected QGIS onto a large screen and then all of them suggested points to use for the georeferencing. Their map was one of the more unusually aligned maps to begin with, but they did an extremely good job. The students of that group also told me that once they got the hang of finding points of commonality, the georectification became quite fun. In future georectification assignments, I may suggest this way of doing the work, since it seems to have been highly successful.
The annotations proved somewhat more problematic for many students (though the problems were still relatively minor). The students experienced the joys and frustrations of both freedom and constraint. Each group got to pick the features they wanted to annotate from the maps I gave them. At first, I was concerned that allowing the students to pick their own annotations might lead to uneven distribution of annotations, or features not being annotated that needed to be, but in general the annotations seem to be very evenly distributed, and of course each map has more features that could be annotated than any one group could do in a semester. So each of these maps could easily be assigned to another class in a subsequent semester and still have plenty of features to annotate.
Some features are still common Boston landmarks, such as the Boston Common, and were easily identified (and were annotated in almost every group). However, some features on the maps were more difficult—for instance, any one of the dozen of wharves present in the 1860s, or temporary encampments built by the British during the occupation of Boston in the 1770s. Additionally, some of the features have histories that are easy to trace in the 20th century but much more difficult to trace into the 19th or 18th century. Several students picked features on their map that they had to abandon because they weren’t able to find any research about them. Doing all that searching only to have to abandon the quest was intensely frustrating for some (but it is something every historian deals with at some point, I think).
Students told me that research was a big challenge for them. They were required to cite a secondary source and a primary source that they consulted to write their descriptions. Almost all of the students mentioned that finding primary sources was a struggle. Though a few students struggled to the point of ineffectiveness, most rose to the challenge and found really great information about parts of Boston that I didn’t even notice on their maps, much less know anything about.
In particular, students came to the realization that doing research about a feature meant more than merely finding a source that acknowledged its existence. Their research had to dig much deeper, to find out about the feature’s function not only at some point in its history, but at a particular point (in the time period around their map). In addition, some of the students mentioned that they found sources with conflicting information about the feature and had to decide which sources were right. They also had to recognize that things move: many Boston landmarks have not always been where they are today. Churches came up quite frequently as features that appear in a different place on the historical maps than their present location.
Review: The Product
For one semester’s worth of work, I am very proud of what the students accomplished. We made a very good start! Though the end product, I’m certain, is going to be very interesting and informative, the product of the semester, or rather the goal, was something less tangible. Students have provided feedback to me through private emails, as well as through class presentations of their group’s work, and several themes have arisen out of the comments I’ve received.
First, students have been (mostly unwittingly) learning the craft of a historian. Learning to dig deeper to find the piece of information you know is out there; investigating primary sources from municipal and state records to newspapers to personal diaries of Bostonians; reading maps; and sifting through evidence to decide which is the right information: All of these skills are part of the historian’s craft. Writing annotations about several different parts of Boston forced the students to practice all of these skills differently from how one might research for a larger paper about any of these topics–in some ways, an easier task, but in many ways, a harder. A research tool that was profitable for Trinity Church might have nothing for the Columbian Museum; primary sources about Faneuil Hall are so numerous that sifting through them is a chore, but finding primary sources about S.G. Bowdlear and Co. Flour required some persistence (including investigating some documents from the BPL’s rare books collection).
Second, students learned the rudiments of spatial thinking. Many students told me that they had never really thought about maps of Boston as being sources of historical information or analysis. Georectification forced them to realize that maps are not authoritative (just looking at how mapmakers aligned their maps caused some students to question how maps were created). But doing annotations about different places on their maps also made some students aware of the relative proximity (or distance) of various connected places within the city, from simple realizations such as why the Custom House had to be close to the wharves, to how the introduction of railroads to the city changed the way the mental health asylum functioned (and eventually forced the asylum out to the suburbs to get away from the ruckus caused by the trains).
There’s so much more to do on the project. We need to start building out the application to effectively use all these great maps and annotations. Some of the annotations need to be cleaned up a little, and decisions need to be made about how to deal with many annotations of the same feature that say essentially the same thing. The next major interpretive step is to research and write about the maps themselves–who were the mapmakers, what was the map’s purpose, how did the maps betray their own time?
If we implement this same procedure again in undergraduate classes in order to get more maps into our series, I’ve learned a few things about what needs to change.
1. I’ll insist on having at least two full class periods for the introduction of the project. This will allow me to address some of the problems we experienced this semester up front rather than trying to put out fires later.
2. I’ll ask that professors not assign the due date as the last day or week of the semester, so students have a chance to revise their work if necessary.
3. I’ll focus more on teaching students how to do primary-source research, including showing them online archives and how to use them (rather than primarily just telling them), and suggesting that they also go the extra mile to actually visit some archives in the area. I’ll also try to enlist the help of some local archivists to make the process of primary-source research less opaque.
Overall, I’m very pleased by how the project went this semester, and I’m looking forward to continuing the work. Right now it seems that the more work we’ve done, the more work remains. Thankfully, it looks like circumstances have aligned so that I’ll be able to continue putting significant time and energy into the project next semester and in spring 2015, as the project becomes an official NULab project. Hopefully this change means more people across more disciplines will get the chance to work on the project.
This semester, Northeastern University’s history department is branching out into new territory: we’re beginning a large-scale digital project that is being implemented across several classes in the department. The goal of the project is to investigate urban and social change in the city of Boston using historical maps. We’re very excited to be partnering with the Leventhal Map Center at the Boston Public Library for this project.
This project was originally conceived as an offshoot of a group project from Prof. William Fowler’s America and the Sea course last spring. The original plan was just to think about how the waterfront changed, but it has expanded significantly in response to feedback from faculty in the department. Our focus has become both the topography and the culture of Boston, and how those two intertwine.
Our final product will be an interactive, layered series of historical maps with annotations that help to explore urban and social change across 250 years of Boston’s history. We’ll be building our map series in Leaflet, which we think is a beautiful and flexible medium for such a task.
We made the decision to use historical maps for several reasons. Getting at the topographical changes in the city calls for map comparison. Boston’s topography has changed so substantially in its history that a 1630 map is essentially unrecognizable as the same city. In many senses, modern Boston isn’t even the same land as 1630 Boston. Because the actual land forms have changed so much, it’s impossible to tell the story of Boston without investigating its maps.
Space is an important part of the story of Boston. As the function and prospects of the city change, so does its landform. But Bostonians have never been content to merely take land from the west, as so many other coastal cities have done. Instead, they literally make land in the sea. Over the course of almost four hundred years, Boston has made so much land that its 1630 footprint is essentially unrecognizable in its 2014 footprint.
These drastic topographical changes are inextricably linked to the life of the city. Many of the changes connect explicitly to commercial concerns–the building of new wharves, for instance. So one major goal of the Boston Maps Project is to make obvious these connections between the city’s life and its land.
We’re fortunate to have such a great collection of maps at our disposal. For this semester, we’re going to be using approximately 25 maps, spanning from 1723 to 1899. In the future, we’d like to expand further toward the present, but the Leventhal maps don’t extend far into the 20th century.
Beginning the process
The first step in our process is to get the maps georectified and then annotated. Aligning these historical maps with each other is critical for tracking how the city changes. The work of georectification and annotation is being done this semester by undergraduate and graduate students in seven classes, ranging in subject from public history to Colonial and Revolutionary America. They’re using QGIS to georectify the maps, and then using Omeka as a repository for their annotations.
The georectification process helps the students compare maps and think about how things have developed over time. These georectified maps are the backbone of the project, as they provide the structure for the story of change. Eventually, they’ll provide both the conceptual and the physical structure of the project as well.
But merely georectifying the maps doesn’t really tell us that much about the changes that are going on within the city. To get at those changes, students are identifying features on the maps and writing paragraph-length descriptions of them that describe their purpose and evolution. We hope these annotations will provide context that enriches our understanding of topographical and social change in the city.
Thus far, the rollout has been mostly successful. We’ve had a few technical blips along the way (word to the wise Mac user: download all those extra packages before installing QGIS!), but in general the students are excited about beginning the work on this project. I’ve lectured in several of the classes already about the idea of the project and the technical aspects of it, and the students are all beginning to work on their individual pieces.
This project would never have gone forward without encouragement and advice from several people.
Chief encourager and motivator has been Professor Bill Fowler, who has always believed that a large-scale digital project is not only possible, but profitable to implement in undergrad courses. He is learning right along with the students about the tools and technologies that we’re using, and he is our biggest advocate with the BPL and other organizations.
Chief technical adviser, without whom the project would have already completely imploded, is Ben Schmidt. He has written scripts, hashed out schemas, wrangled servers, and done many other tasks that I don’t yet have the technical competency to deal with. In addition, he has provided invaluable advice about best practices for digital projects and the direction the project should go.
All of the staff at the Leventhal Map Center have jumped on board this project with enthusiasm. They’ve met with us, advised us on the best maps to use, and helped us think through how the project can best benefit both NEU and the BPL.
All the faculty who have agreed to implement this project in their courses deserve special thanks as well. The project takes away class time from lectures on their own subject matter, and it certainly adds an element of uncertainty to the course structure. I appreciate their willingness to go out on a limb to make this project happen.
I’m very grateful to all these people—and plenty of others—who have already helped to make the Boston Maps Project a success.
We’re very excited to begin this new project. I hope to write infrequent reports on our progress, and hopefully our final product will be beautiful and useful to scholars, visitors, and residents of the city of Boston.