Category Archives: NULab

Passing on the Scissors and the Quill

Faithful readers of this blog (all one of you) will notice that I haven’t posted in almost a year. It’s not that I’ve had nothing interesting to say, but rather that I’ve been too busy with those interesting things to write about them for the blog. Here’s a brief rundown.

In the summer of 2014, my family moved to Fairfax, VA, when my husband was hired by George Mason University. For the 2014-2015 school year, I commuted to Boston from Virginia almost every week so I could finish my coursework at Northeastern University. In August 2015, I passed my comprehensive exams and defended my dissertation proposal, officially becoming a PhD candidate. For the past year, I’ve been researching and writing my dissertation, as well as continuing to work on the Viral Texts project.

The Viral Texts project has been part of my graduate-school experience almost since the beginning. I joined the project as part of the inaugural group of NULab fellows in the spring of 2013. I remember sitting around a table with the other fellows, hearing about all the different projects we might be assigned to, and thinking, “I really hope the spots for that newspaper project don’t fill up before I get to choose.” Thankfully, they didn’t. The NULab fellows’ role has changed since then, but I’ve always been able to stay attached to the project, and I’m so grateful.

Over the past three years, I’ve done a lot of crazy stuff with Viral Texts. I’ve taught myself R, Python, a little Ruby on Rails, and a little JavaScript. I’ve read enough 19th-century newspapers that some of their editors feel a little like friends. I’ve made maps, graphs, networks, and a host of other things. I’ve seen our data grow from a few hundred newspapers in a handful of American states, to periodicals on three continents and in multiple languages. I’ve written an article on fugitive texts with Ryan Cordell (forthcoming in American Periodicals). I’ve been a jack-of-all-trades, though perhaps a master of none.

Viral Texts is one of the defining pieces of my graduate school experience. It shaped my understanding of digital humanities, and it stretched me to work in multiple disciplines. It taught me how to work with a team while keeping my individuality. And I learned an awful lot about how nineteenth-century newspapers work.

And now, in true Viral Texts fashion, it’s time for me to pass on the scissors and the quill. Starting in May, I’ll be joining the research division at the Roy Rosenzweig Center for History and New Media at George Mason University. I’ll be working with PressForward, Zotero, and mostly Tropy, CHNM’s new Mellon-funded project for archiving and organizing photos. I’m particularly excited about working with Tropy, though I’m a little bummed that my dissertation will (I hope) be close to complete before Tropy is ready for the big time. :)

The projects and tools at CHNM were my first encounter with digital humanities, even before I wanted to embrace the digital in my own work. Throughout my graduate career, I’ve benefited greatly from Zotero and Omeka and other amazing work at the center, and I’m looking forward to helping develop other great tools for myself and others to use.

In joining CHNM and departing Viral Texts, I take these words from the valedictory editorial of Thomas Ritchie, editor of the Richmond Enquirer: “I cannot close this hasty valedictory, without again expressing the sentiments of gratitude and affection with which I am so profoundly penetrated.” So to everyone on the team—Ryan, David, and Fitz in particular—thanks. It’s been great.

On Newspapers and Being Human

Last week, an opinion piece appeared in the New York Times, arguing that the advent of algorithmically derived human-readable content may be destroying our humanity, as the lines between technology and humanity blur. A particular target in this article is the advent of “robo-journalism,” or the use of algorithms to write copy for the news. 1 The author cites a study that alleges that “90 percent of news could be algorithmically generated by the mid-2020s, much of it without human intervention.” The obvious rebuttal to this statement is that algorithms are written by real human beings, which means that there are human interventions in every piece of algorithmically derived text. But statements like these also imply an individualism that simply does not match the historical tradition of how newspapers are created. 2

In the nineteenth century, algorithms didn’t write texts, but neither did each newspaper’s staff write its own copy with personal attention to each article. Instead, newspapers borrowed texts from each other—no one would ever have expected individualized copy for news stories. 3 Newspapers were amalgams of texts from a variety of sources, cobbled together by editors who did more with scissors than with a pen (and they often described themselves this way). Continue reading On Newspapers and Being Human

Notes:

  1. The article also decries other types of algorithmically derived texts, but the case for computer-generated creative fiction or poetry is fairly well argued by people such as Mark Sample, and is not an argument that I have anything new to add to.
  2. This post is based on my research for the Viral Texts project at Northeastern University.
  3. In 1844, the New York Daily Tribune published a humorous story illustrating exactly the opposite, in fact—some readers preferred a less human touch.

Passing on the Scissors and the Quill: Editorial Tenure in Viral Texts

The newspaper business was highly variable in the nineteenth century (in different ways than it is in the 21st century). Changes in editorship, political affiliation, and even location were frequent. Editorial changes were particularly significant, since very few editors maintained exactly the same newspaper that they inherited from a predecessor. Editors came and went quite often, passing on the “scissors and the quill,” in the words of the outgoing editor of the Polynesian, Edwin O. Hall.

A Hoe press, of the type made famous by John McClanahan, editor of the Memphis Daily Appeal
A Hoe press, of the type made famous by John McClanahan, editor of the Memphis Daily Appeal (Creative Commons licensed image from flickr user jwyg)

Continue reading Passing on the Scissors and the Quill: Editorial Tenure in Viral Texts

Frontier Editor: Orion Clemens (1825-1897)

Though he’s often overshadowed by his younger brother Samuel, Orion Clemens had a colorful and varied career that included agriculture, journalism, and politics on the frontier of the United States. He was the eldest of seven children, though only he, Samuel, and their sister Pamela survived to adulthood. The Clemens family moved from Tennessee to Hannibal, Missouri, in 1839, where Orion worked in the general store. As a young man, he moved to St. Louis and began to study law.[1. Apparently the law education didn’t “take”; Samuel Clemens wrote to his mother and sister in 1875, “If he were packed and crammed full of law, it would be worthless lumber to him, for his is such a capricious and ill-regulated mind that he would apply the principles of law with no more judgment than a child of ten years.” (The Complete Letters of Mark Twain, Sunday 1875)]

Clemens was never to establish himself in one location or profession for long. In 1850, he moved back to Hannibal and bought the Hannibal Journal, whose name he changed to the Western Union, and then back to the Hannibal Journal in 1852. This paper would print the first published work of the author “Mark Twain,” who was, of course, Orion’s younger brother Samuel. But the paper was unable to sustain itself, and in 1853 it folded, and both Samuel and Orion Clemens went out to look for a new career.

Clemens moved to Iowa, first Muscatine and then Keokuk, where he took up work as a printer in 1854 (though not a newspaperman).  In 1860, he was appointed Secretary of the Territory of Nevada, and once again he brought his younger brother along for the journey. (This journey, loosely construed, would appear as Roughing It.) Orion was a popular political figure in Nevada Territory, especially after avoiding a border dispute during his tenure as interim governor.

In 1864, Clemens’s young daughter Jennie died. Later that year, Clemens unsuccessfully ran for assemblyman in the state legislature of Nevada. By 1866, the Clemens family left Nevada for the east coast. They moved around quite a bit and Clemens tried out a number of occupations. By 1875, he was attempting to be a chicken farmer, though with little success. He continually went back to law and journalism, but he was never able to make them profitable.

In 1880, Orion Clemens wrote an autobiography, but the manuscript was somehow lost after he wrote it. Thus, we don’t know nearly as much about his life as we might. Much of what we do know of him is from Samuel Clemens’s writings. Orion Clemens’s relationship with his brother was a fraught one. Samuel Clemens found his brother flighty, unsettled, and incapable of real thought. Orion’s successful tenure as secretary/governor of Nevada seems to belie these evaluations. It does seem likely that Orion’s continual failures, and perhaps the loss of his daughter, made him less stable than Samuel Clemens apparently wanted. Nevertheless, Samuel described Orion as having a generous spirit, an apt conclusion since for the first half of Orion’s life, he was supporting Samuel’s own literary ventures. [2. Ibid.]

Editor Vignette: Edward E. Cross

In my work on Viral Texts, I run across a host of interesting people, including editors whose lives are just as interesting as the stories they publish. To highlight some of these interesting people, I’m writing short posts about them as I research their papers. This first vignette is about the first editor of the first newspaper published in Arizona, before Arizona was even a state. I write about him today on the 150th anniversary of his death.

Edward Ephraim Cross (1832-1863)

Edward Cross began his newspaper career at the age of 15, at the Coos Democrat, a paper in his native Lancaster, New Hampshire. He moved to Cincinnati in 1850, where he continued to work as a printer, now at the Cincinnati Times. 

Soon, Cross became a reporter for the Times, even becoming their Washington correspondent for a short time. But he invested in some mining operations in Arizona, and he moved out to Tubac, Arizona, in 1859. In Tubac, under the auspices of the Santa Rita Silver Mining Company, he began the first newspaper in Arizona, the Weekly Arizonian. Cross had strong political opinions, and those opinions often found their way into his newspaper. He was especially concerned with the need for Arizona to have its own government (separate from New Mexico), since he felt that the two territories had sufficiently different needs to also need different representation in the government. Cross was primarily concerned with Arizona politics, and it seems that in general, the newspaper was somewhat ambivalent about national politics.

Another of Cross’s goals as a newspaperman was to paint a picture of Arizona as it really was. Robert Grandchamp, a biographer of Cross, claimed that many of Cross’s editorials were not meant for Arizonians, but rather for people back East reading the Weekly Arizonian.[1. Grandchamp 59.] (If that’s true, it shows something about how editors themselves viewed reprint culture in the USA.) Just as with every territorial expansion, writers often embellished the benefits of the territorial life and downplayed its dangers. Cross disliked such idyllic portraits of Arizona, so his editorials featured the rough and difficult life of Arizonians.

This desire to portray the hard life in the territory brought Cross into contention with one Sylvester Mowry, a wealthy mine owner who also happened to represent the territory in Congress. Mowry had written some reports about the status of Arizona that Cross felt were too rosy, describing the land as highly fertile and the native Indians as of minimal concern. Cross decided to take on Mowry in the press. He didn’t publish his editorial in the Weekly Arizonian (possibly, he wanted better nationwide than he thought he’d get from the Arizonian), but rather in an Eastern newspaper, the States. A complicated dance of letters and replies ensued (Mowry was in Washington, Cross in Arizona–travel time was definitely an issue). 

Mowry realized that the only way to deal with Cross was direct confrontation, in Arizona. Upon his return to the territory, Mowry issued a challenge. Cross accepted the challenge and the duel was on.

Cross decided to make the duel interesting by choosing Burnside carbines as the weapons instead of standard dueling pistols. Though both men were purportedly good shots,[2. Grandchamp states that each man practiced the previous day; Cross shot up a cactus and Mowry a cottonwood tree.] after four rounds in which neither man hit the other, Mowry declared himself satisfied.

The issue might have continued to be contentious, despite published apologies from both parties, except that a week after the duel, Mowry bought the Weekly Arizonian from the Santa Rita Mining Company. Obviously, Cross would not remain the editor. The paper moved to Tucson and became a paper with stronger Democratic leanings.

Though Cross moved back to New Hampshire after losing the Weekly Arizonian, he remained concerned about Arizona politics and military affairs. He wrote repeatedly to the secretary of war about the situation in Arizona. The attachment Cross felt to Arizona is somewhat remarkable, considering that he lived in the territory for less than a year (he took on Mowry after only one month of residence!).

Later in 1860, Cross invested once again in a silver mine in Arizona, volunteering to travel to the mine as a scout. Though he supported Stephen Douglas for president, his political concerns were primarily local: when the Army left Arizona to deal with the fractious Southern states, Cross’s mining investment was sacked by Indians. After that loss, he left Arizona to serve briefly with the Mexican army of General Juarez.

When war broke out in America, Cross headed back to New Hampshire to command the Fifth New Hampshire Regiment of Volunteers. He served with distinction at many famous battles, including Fredericksburg, Antietam, and Chancellorsville, and he became known for his toughness on the battlefield.

In July 1863, the 5th New Hampshire was among the regiments that fought at Gettysburg. His brigade fought at the Wheatfield, where he was mortally wounded. He died of his wounds on July 3, 1863.

Monument to the 5th New Hampshire at Gettysburg National Battlefield Park CreativeCommons licensed photo by Flickr user BattlefieldPortraits.com
Monument to the 5th New Hampshire at Gettysburg National Battlefield Park
CreativeCommons licensed photo by Flickr user BattlefieldPortraits.com

You can read more about Edward Cross here:
Grandchamp, Robert. Colonel Edward E. Cross, New Hampshire Fighting Fifth: A Civil War Biography. Jefferson, NC: McFarland, 2012.
Cross, Edward Ephraim. Stand Firm and Fire Low: The Civil War Writings of Colonel Edward E. CrossBoston: University Press of New England, 2003.

Boston-Area Days of DH Wrap-up

[cross-posted to HASTAC.org]

Now that it’s been almost a month since the Boston-Area Days of DH, I figured I’d better write a wrap-up of the conference. It was my very great pleasure to help Prof. Ryan Cordell organize the conference, and along the way I learned a lot about DH and about scholarly work in general (and about scheduling and organization and making sure the coffee gets to the right place…).

The Boston-Area Days of DH conference was sponsored by Northeastern University’s NULab for Texts, Maps, and Networks. Originally, it was designed to coincide with the worldwide Day of DH, sponsored by CenterNet. It would do in a conference what Day of DH does online: highlight the work that Boston-area digital humanists are doing and start conversations based on that work. In addition, we tried to include sessions to help digital humanists do their work better.

Day 1 Breakdown

Our first session, the lightning talks, was designed to highlight as many projects as possible in a short amount of time. All the presentations were interesting, but I’d like to especially mention a couple. First, the Lexomics group from Wheaton College presented on their text analysis work on Old English texts. This group was unusual both for the work they did and also for their place in the field: all the presenters were undergraduates at Wheaton. I found it very heartening to see undergraduates doing serious scholarly work using digital humanities. Second, Siobhan Senier’s work on Native American literature was especially inspiring. I love how she is using digital tools to help expose and analyze literature of New England Native Americans. She’s using Omeka as a digital repository for Native American literature, much of which is not literature in words, but rather in art or handicraft (such as baskets). I think this is a perfect use for the Omeka platform.

After the lightning talks, we were able to run a set of workshops twice during the first day of the conference. The topics ranged from network analysis (taught by Jean Bauer), to text analysis (taught by David Smith), to historical GIS (taught by Ryan Cordell). I heard lots of good feedback about how helpful these workshops were, though I wasn’t able to attend any myself.

The keynote address has to rate as one of the most entertainingly educational talks I’ve ever heard. Matt Jockers, from the University of Nebraska, Lincoln, sparred with Julia Flanders from Brown University in a mock debate over the relative merits of big data and small data. They’ve posted their whole talk, along with some post-talk comments on their respective blogs (Matt’s and Julia’s). The talk is certainly well worth the read, so rather than outlining or overviewing it here,  I’ll just entreat you to go to the source itself.

Day 2 Breakdown

On Day 2, we suffered an environmental crisis: a sudden snowstorm in the night on Monday night which made travel a much greater hassle than it already is in Boston. As a result, our numbers were greatly reduced, but we soldiered on, sans coffee and muffins.

Our first session was a series of featured talks about specific projects. Topics ranged from gaming, to GIS, to pedagogy, to large-scale text analysis. Augusta Rohrbach discussed how a game she’s working on, Trifles, incorporates elements of history and literature into a game environment to teach students about both history and literature, while engaging in questions about gender and social issues as well. Michael Hanrahan talked about how GIS can reframe questions about rebellions in England in 1381, and on a wider scale, how GIS can reframe questions of information dissemination. Shane Landrum talked about how he uses digital technology to teach at a large, public, urban university, and the challenges of doing DH in a place where computer access and time to “screw around” are real problems. And Ben Schmidt talked about doing textual analysis on large corpora using Bookworm, a tool created at the Harvard Cultural Observatory.

The final session of the conference was a grants workshop with Brett Bobley, director of the NEH’s Office of Digital Humanities. By staging a mock panel discussion such as might occur in a real review of grant proposals, Brett was able to instruct us about what the NEH-ODH is looking for in grant proposals, and how the grant-awarding process works. I found the issues that Brett raised about grant proposals to be helpful in thinking through all of my work: am I being specific about my objectives? about who this will reach? about how exactly it’s all going to get done? These questions ought to inform our practice not just for grants, but for all the work we do.

 

All in all, despite some environmental setbacks, I think the conference was a great success. A friend, upon seeing the program, remarked to me, “Wow, a digital humanities conference that’s not a THATCamp!” I’m all for THATCamps, but I do think that pairing this sort of conference with the THATCamp model allows us to talk about our work in different ways, all of which are valuable. So, with some trepidation, I will join those who have already called for this conference to become an annual event. (After all, with a year of experience under our belt, what could go wrong?)

Developing High- and Low-Tech Digital Competencies

Last week, Ben Schmidt gave a talk at Northeastern, part of which was about developing technical competency in digital methods. This semester, I’ve had the chance to develop my technical competency in working with data, mostly by jumping in with both feet and flailing around in all directions.

The task I was given in the NULab has allowed me to play with several different digital methods. The base project was this: turn strings such as these

10138 sn86071378/1854-12-14/ed-1 sn85038518/1854-12-07/ed-1
8744 sn83030213/1842-12-08/ed-1 sn86053954/1842-12-14/ed-1
8099 sn84028820/1860-01-05/ed-1 sn88061076/1859-12-23/ed-2
7819 sn85026050/1860-12-06/ed-1 sn83035143/1860-12-06/ed-1
7792 sn86063325/1850-01-03/ed-1 sn89066057/1849-12-31/ed-1

into a usable representation of a pair of newspapers who share a printed text. This snippet is 5 lines of a document of over 2 million lines, so obviously doing the substitutions by hand was not really an option.

David Smith, the computer science professor who wrote the algorithm that generated these pairs, suggested a Python program, using the dictionary data structure, for creating the usable list. That dictionary would draw its key from the text file provided by the Library of Congress for the Chronicling America newspapers. That was all fine, except that I had never even seen a Python script before.

I started very basic: The Programming Historian! Though that program was very helpful in learning the syntax and vocabulary, the brief discussion of dictionaries in The Programming Historian wasn’t sufficient for what I needed. So I turned to other sources of information: Python documentation (not that helpful) and my husband Lincoln (very helpful).

Through a lot of frustration, bother, and translating Ruby scripts into Python, Lincoln and I (95% Lincoln) were able to come up with a working program that generated a .csv file with lines of text that looked like this:

Democrat and sentinel. (Ebensburg, Pa.) 1853-1866 Nashville union and American. (Nashville, Tenn.) 1853-1862
New-York daily tribune. (New-York [N.Y.]) 1842-1866 Jeffersonian Republican. (Stroudsburg, Pa.) 1840-1853
Holmes County Republican. (Millersburg, Holmes County, Ohio) 1856-1865 Clarksville chronicle. (Clarksville, Tenn.) 1857-1865
Fremont journal. (Fremont, Sandusky County, [Ohio]) 1853-1866 Cleveland morning leader. (Cleveland [Ohio]) 1854-1865
Glasgow weekly times. (Glasgow, Mo.) 1848-1861 Democratic banner. (Bowling Green, Pike County, Mo.) 1845-1852
Belmont chronicle. (St. Clairsville, Ohio) 1855-1973 Clarksville chronicle. (Clarksville, Tenn.) 1857-1865

The next step was pulling out the dates of publication (for the shared texts) and adding them to the .csv file. To do so, I had to update my Python program. I wrote a regular expression that detected the dates by searching for fields that looked like ####/##/##. In order to accommodate the Atlantic Monthly, which didn’t do its dates the same way, I added a variation that found the string beginning with 18 and recorded the 18 plus the next 6 digits. (At some point, I’ll write a separate thing that will add in the hyphens, perhaps?)

Third, I used the command line to remove the parentheses and brackets in the master newspapers file, and tab delimit the fields so that the location was its own column. This command looks like this:

tr '()' '\t' < newspapers-edit.txt | tr ',' '\t' | tr '[]' '\t' > newspapers-edit-expanded.txt

However, I realized when I did this command that it messes up my newspaper dictionary (from step 1) because the LCCN number, which was the last field, is now in a non-fixed location depending on how many fields were created by moving the comma-separated information into new tab-separated fields. So I did the highest-tech thing I know: I opened the .txt file in LibreOffice Calc (the poor man’s MS Excel) and simply moved the LCCN column in the original newspapers-edit.txt file over so that it wouldn’t be affected when I ran the tab-separating command. Then I ran the command again.

The data set now looks like this:
Democrat and sentinel. (Ebensburg, Pa.) 1853-1866 1854-12-14 Nashville union and American. (Nashville, Tenn.) 1853-1862 1854-12-07
New-York daily tribune. (New-York [N.Y.]) 1842-1866 1842-12-08 Jeffersonian Republican. (Stroudsburg, Pa.) 1840-1853 1842-12-14
Holmes County Republican. (Millersburg, Holmes County, Ohio) 1856-1865 1860-01-05 Clarksville chronicle. (Clarksville, Tenn.) 1857-1865 1859-12-23
Fremont journal. (Fremont, Sandusky County, [Ohio]) 1853-1866 1860-12-06 Cleveland morning leader. (Cleveland [Ohio]) 1854-1865 1860-12-06
Glasgow weekly times. (Glasgow, Mo.) 1848-1861 1850-01-03 Democratic banner. (Bowling Green, Pike County, Mo.) 1845-1852 1849-12-31
Belmont chronicle. (St. Clairsville, Ohio) 1855-1973 1857-12-10 Clarksville chronicle. (Clarksville, Tenn.) 1857-1865 1857-12-14

My next task was figuring out how to write the dictionary to draw out the city/state as their own separate fields, which can then be geocoded in ArcGIS. I wrote the dictionary in a sort of stack: the LCCN calls the title; the title calls the city; the city calls the state. When I figured out how to set this up, I felt (for the first time) a major advancement in my understanding of Python syntax.

And this is how the data set has finally ended up looking:

Democrat and sentinel. Ebensburg Pennsylvania 1854-12-14 Nashville union and American. Nashville Tennessee 1854-12-07
New-York daily tribune. New-York New York 1842-12-08 Jeffersonian Republican. Stroudsburg Pennsylvania 1842-12-14
Holmes County Republican. Millersburg Ohio 1860-01-05 Clarksville chronicle. Clarksville Tennessee 1859-12-23
Fremont journal. Fremont Ohio 1860-12-06 Cleveland morning leader. Cleveland Ohio 1860-12-06
Glasgow weekly times. Glasgow Missouri 1850-01-03 Democratic banner. Bowling Green Missouri 1849-12-31
Belmont chronicle. St. Clairsville Ohio 1857-12-10 Clarksville chronicle. Clarksville Tennessee 1857-12-14

At the very beginning, I set up a shortened set (10 lines) of pairwise data to run my tests on, so I wouldn’t super-mess any of the big data up (or wait a really long time to discover that I’d done something wrong and the output wasn’t what I intended). This was a really helpful way to test my program without major consequences.

Each time, when it was time to replace the test file with the real one, I got all knock-kneed, fearful that something would go terribly awry. With the first program, something did go awry: we discovered that the test one worked but the big one didn’t because of mysterious empty lines in the big one. We solved that problem by (1) finding the blank lines and removing–don’t quite know how, to be honest, and (2) writing an exception that skipped over aberrant lines. Since that time, I fixed the aberrant line problem by adding the problem publication (the Atlantic Monthly) into the newspapers master list I’m pulling my dictionary keys from. So in the second iteration of the program, not only were there dates, but all the lines in the file were actually being identified. Troubleshooting these problems was quite beneficial in helping me learn exactly how Python works.

My first experiences with programming, though a very great frustration to me at times, have stretched me a lot in thinking about how data can be manipulated, and the best ways to get the job done. I look forward to continuing to flail around in all directions, both on this project and hopefully on some of my own.