Computational Literary Studies: Participant Forum Responses


Ted Underwood

In the humanities, as elsewhere, researchers who work with numbers often reproduce and test each other’s claims.Nan Z. Da’s contribution to this growing genre differs from previous examples mainly in moving more rapidly. For instance, my coauthors and I spent 5,800 words describing, reproducing, and partially criticizing one article about popular music.By contrast, Da dismisses fourteen publications that use different methods in thirty-eight pages. The article’s energy is impressive, and its long-term effects will be positive.

But this pace has a cost. Da’s argument may be dizzying if readers don’t already know the works summarized, as she rushes through explanation to get to condemnation. Readers who know these works will recognize that Da’s summaries are riddled with material omissions and errors. The time is ripe for a theoretical debate about computing in literary studies. But this article is unfortunately too misleading—even at the level of paraphrase—to provide a starting point for the debate.

For instance, Da suggests that my article “The Life Cycles of Genres”makes genres look stable only because it forgets to compare apples to apples: “Underwood should train his model on pre-1941 detective fiction (A) as compared to pre-1941 random stew and post-1941 detective fiction (B) as compared to post-1941 random stew, instead of one random stew for both” (p. 608).3

This perplexing critique tells me to do exactly what my article (and public code) make clear that I did: compare groups of works matched by publication date.4There is also no “random stew” in the article. Da’s odd phrase conflates a random contrast set with a ghastly “genre stew” that plays a different role in the argument.

More importantly, Da’s critique suppresses the article’s comparative thesis—which identifies detective fiction as more stable than several other genres—in order to create a straw man who argues that all genres “have in fact been more or less consistent from the 1820s to the present” (p. 609). Lacking any comparative yardstick to measure consistency, this straw thesis becomes unprovable. In other cases Da has ignored the significant results of an article, in order to pour scorn on a result the authors acknowledge as having limited significance—without ever mentioning that the authors acknowledge the limitation. This is how she proceeds with Jockers and Kirilloff (p. 610).

In short, this is not an article that works hard at holistic critique. Instead of describing the goals that organize a publication, Da often assumes that researchers were trying (and failing) to do something she believes they should have done. Topic modeling, for instance, identifies patterns in a corpus without pretending to find a uniquely correct description. Humanists use the method mostly for exploratory analysis. But Da begins from the assumption that topic modeling must be a confused attempt to prove hypotheses of some kind. So, she is shocked to discover (and spends a page proving) that different topics can emerge when the method is run multiple times. This is true. It is also a basic premise of the method, acknowledged by all the authors Da cites—who between them spend several pages discussing how results that vary can nevertheless be used for interpretive exploration. Da doesn’t acknowledge the discussion.

Finally, “The Computational Case” performs some crucial misdirection at the outset by implying that cultural analytics is based purely on linguistic evidence and mainly diction. It is true that diction can reveal a great deal, but this is a misleading account of contemporary trends. Quantitative approaches are making waves partly because researchers have learned to extract social relations from literature and partly because they pair language with external social testimony—for instance the judgments of reviewers.Some articles, like my own on narrative pace, use numbers entirely to describe the interpretations of human readers.Once again, Da’s polemical strategy is to isolate one strand in a braid, and critique it as if it were the whole.

A more inquisitive approach to cultural analytics might have revealed that it is not a monolith but an unfolding debate between several projects that frequently criticize each other. Katherine Bode, for instance, has critiqued other researchers’ data (including mine), in an exemplary argument that starts by precisely describing different approaches to historical representation.Da could have made a similarly productive intervention—explaining, for instance, how researchers should report uncertainty in exploratory analysis. Her essay falls short of that achievement because a rush to condemn as many examples as possible has prevented it from taking time to describe and genuinely understand its objects of critique.

TED UNDERWOOD is professor of information sciences and English at the University of Illinois, Urbana-Champaign. He has published in venues ranging from PMLA to the IEEE International Conference on Big Data and is the author most recently of Distant Horizons: Digital Evidence and Literary Change (2019).

1.Andrew Goldstone, “Of Literary Standards and Logistic Regression: A Reproduction,” January 4, 2016, Jonathan Goodwin, “Darko Suvin’s Genres of Victorian SF Revisited,” Oct 17, 2016,

2. Ted Underwood, “Can We Date Revolutions in the History of Literature and Music?”, The Stone and the Shell, October 3, 2015, Ted Underwood, Hoyt Long, Richard Jean So, and Yuancheng Zhu, “You Say You Found a Revolution,” The Stone and the Shell, February 7, 2016,

3. Nan Z. Da, “The Computational Case against Computational Literary Studies,” Critical Inquiry 45 (Spring 2019): 601-39.

4. Ted Underwood, “The Life Cycles of Genres,” Journal of Cultural Analytics, May 23, 2016,

5. Eve Kraicer and Andrew Piper, “Social Characters: The Hierarchy of Gender in Contemporary English-Language Fiction,” Journal of Cultural Analytics, January 30, 2019,

6. Ted Underwood, “Why Literary Time is Measured in Minutes,” ELH 25.2 (2018): 341-65.

7. Katherine Bode, “The Equivalence of ‘Close’ and ‘Distant’ Reading; or, Toward a New Object for Data-Rich Literary History,” MLQ 78.1 (2017): 77-106.


Ted Underwood, Critical Response II. The Theoretical Divide Driving Debates about Computation

1 Comment

Filed under Uncategorized

One response to “Computational Literary Studies: Participant Forum Responses

  1. Pingback: Computational Literary Studies: A Critical Inquiry Online Forum | In the Moment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.