Computational Literary Studies: A Critical Inquiry Online Forum

Beginning on 1 April, this Critical Inquiry online forum will feature responses to and discussion about Nan Z. Da’s “The Computational Case against Computational Literary Studies.”  This essay raises a number of challenges for the field of computational literary studies. As Da observes in the first sentence of this piece:

This essay works at the empirical level to isolate a series of technical problems, logical fallacies, and conceptual flaws in an increasingly popular subfield in literary studies variously known as cultural analytics, literary data mining, quantitative formalism, literary text mining, computational textual analysis, computational criticism, algorithmic literary studies, social computing for literary studies, and computational literary studies (the phrase I use here).

Since its publication in Critical Inquiry on 14 March 2019, this essay has already prompted numerous responses online. For instance, in The Chronicle of Higher Education on 27 March 2019, Da published a companion piece to the Critical Inquiry essay, “The Digital Humanities Debacle,” and Ted Underwood published a defense of the digital humanities and cultural analytics, “Dear Humanists: Fear Not the Digital Revolution.” Other responses have emerged across social media.

In order to continue this conversation, in a shared space, Critical Inquiry has invited several practitioners and critics in the digital humanities and the computational literary studies to respond. This group of participants includes several of the scholars discussed in Da’s essay, as well as a few additional contributors to and critics of the field.

CONTRIBUTORS

RESPONDENT

AFTERWORD

  • Stanley Fish (Yeshiva University). Response

This forum begins with a series of short responses from participants. It then continues for several days with an open-ended discussion. We invite you to follow along and return to this page, as the blog will be updated several times a day to incorporate new posts. In order to enable mutual responses, we have limited the number of primary contributors. However, the comments will be available for responses for others from outside of this group, so readers should feel free to contribute with their own thoughts. We look forward to a generative discussion.

Patrick Jagoda (Executive Editor, Critical Inquiry, University of Chicago)

 


DAY 1 RESPONSES 


Criticism, Augmented

Mark Algee-Hewitt

A series of binaries permeates Nan Z. Da’s article “The Computational Case against Computational Literary Studies”: computation OR reading; numbers OR words; statistics OR critical thinking. Working from these false oppositions, the article conjures a conflict between computation and criticism. The field of cultural analytics, however rests on the discovery of compatibilities between these binaries: the ability of computation to work hand in hand with literary criticism and the use of critical interpretation by its practitioners to make sense of their statistics.

The oppositions she posits lead Da to focus exclusively on the null hypothesis testing of confirmatory data analysis (CDA): graphs are selected, hypotheses are proposed, and errors in significance are sought.[1]

But, for mathematician John Tukey, the founder of exploratory data analysis (EDA), allowing the data to speak for itself, visualizing it without an underlying hypothesis, allows researchers to avoid the pitfalls of confirmation bias.[2]This is what psychologist William McGuire (1989) calls “the hypothesis testing myth”: if a researcher begins by believing a hypothesis (for example, that literature is too complex for computational analysis), then, with a simple manipulation of statistics, she or he can prove herself or himself correct (by cherry-picking examples that support her argument).[3]Practitioners bound by the orthodoxy of their fields often miss the new patterns revealed when statistics are integrated into new areas of research.

In literary studies, the visualizations produced by EDA do not replace the act of reading but instead redirect it to new ends.[4]Each site of statistical significance reveals a new locus of reading: the act of quantification is no more a reduction than any interpretation.[5]Statistical rigor remains crucial, but equally as essential are the ways in which these data objects are embedded within a theoretical apparatus that draws on literary interpretation.[6]And yet, in her article, Da plucks single statistics from thirteen articles with an average length of about 10,250 words each.[7]It is only by ignoring these 10,000 words, by refusing to read the context of the graph, the arguments, justifications, and dissentions, that she can marshal her arguments.

In Da’s adherence to CDA, her critiques require a hypothesis: when one does not exist outside of the absent context, she is forced to invent one. Even a cursory reading of “The Werther Topologies” reveals that we are not interested in questions of the “influence of Werther on other texts”: rather we are interested in exploring the effect on the corpus when it is reorganized around the language of Werther.[8]The topology creates new adjacencies, prompting new readings: it does not prove or disprove, it is not right or wrong – to suggest otherwise is to make a category error.

Cultural analytics is not a virtual humanities that replaces the interpretive skills developed by scholars over centuries with mathematical rigor. It is an augmented humanities that, at its best, presents new kinds of evidence, often invisible to even the closest reader, alongside carefully considered theoretical arguments, both working in tandem to produce new critical work.

 

MARK ALGEE-HEWITT is an assistant professor of English and Digital Humanities at Stanford University where he directs the Stanford Literary Lab. His current work combines computational methods with literary criticism to explore large scale changes in aesthetic concepts during the eighteenth and nineteenth centuries. The projects that he leads at the Literary Lab include a study of racialized language in nineteenth-century American literature and a computational analysis of differences in disciplinary style. Mark’s work has appeared in New Literary History, Digital Scholarship in the Humanities, as well as in edited volumes on the Enlightenment and the Digital Humanities.

[1]Many of the articles cited by Da combine both CDA and EDA; a movement of the field noted by Ted Underwood in Distant Horizons (p. xii).

[2]Tukey, John. Exploratory Data Analysis, New York, Pearson, 1977.

[3]McGuire, William J. A perspectivist approach to the strategic planning of programmatic scientific research.” In Psychology of Science: Contributions to Metascience ed. B. Gholson et al. Cambridge: Cambridge UP, 1989. 214-245. See also Frederick Hartwig and Brian Dearling on the need to not rely exclusively on CDA (Exploratory Data Analysis. Newbury Park: Sage Publications, 1979) and John Behrens on the “hypothesis testing myth.” (“Principles and Procedures of Exploratory Data Analysis.” Psychological Methods. 2(2): 1997, 131-160.

[4]Da, Nan Z. “The Computational Case against Computational Literary Analysis.” Critical Inquiry 45 (3): 2019. 601-639.

[5]See, for example, Gemma, Marissa, et al. “Operationalizing the Colloquial Style: Repetition in 19th-Century American Fiction” Digital Scholarship in the Humanities, 32(2): 2017. 312-335; or Laura B. McGrath et al. “Measuring Modernist Novelty” The Journal of Cultural Analytics (2018).

[6]See, for example, our argument about the “modularity of criticism” in Algee-Hewitt, Mark, Fredner, Erik, and Walser, Hannah. “The Novel As Data.” Cambridge Companion to the Noveled. Eric Bulson. Cambridge: Cambridge UP, 2018. 189-215.

[7]Absent the two books, which have a different relationship to length, Da extracts visualizations or numbers from 13 articles totaling 133,685 words (including notes and captions).

[8]Da (2019), 634; Piper and Algee-Hewitt, (“The Werther Effect I” Distant Readings: Topologies of German Culture in the Long Nineteenth Century, Ed Matt Erlin and Lynn Tatlock. Rochester: Camden House, 2014), 156-157.


Katherine Bode

Nan Z. Da’s statistical review of computational literary studies (CLS) takes issue with an approach I also have concerns about, but it is misconceived in its framing of the field and of statistical inquiry. Her definition of CLS—using statistics, predominantly machine learning, to investigate word patterns—excludes most of what I would categorize as computational literary studies, including research that: employs data construction and curation as forms of critical analysis; analyzes bibliographical and other metadata to explore literary trends; deploys machine-learning methods to identify literary phenomena for noncomputational interpretation; or theorizes the implications of methods such as data visualization and machine learning for literary studies. (Interested readers will find diverse forms of CLS in the work of Ryan Cordell, Anne DeWitt, Johanna Drucker, Lauren Klein, Matthew Kirschenbaum, Anouk Lang, Laura B. McGrath, Stephen Ramsay, and Glenn Roe, among others.)

Beyond its idiosyncratic and restrictive definition of CLS, what strikes me most about Da’s essay is its constrained and contradictory framing of statistical inquiry. For most of the researchers Da cites, the pivot to machine learning is explicitly conceived as rejecting a positivist view of literary data and computation in favor of modelling as a subjective practice. Da appears to argue, first, that this pivot has not occurred enough (CLS takes a mechanistic approach to literary interpretation) and, second, that it has gone too far (CLS takes too many liberties with statistical inference, such as “metaphor[izing] … coding and statistics” [p. 606 n. 9]). On the one hand, then, Da repeatedly implies that, if CLS took a slightly different path—that is, trained with more appropriate samples, demonstrated greater rigor in preparing textual data, avoided nonreproducible methods like topic modelling, used Natural Language Processing with the sophistication of corpus linguists—it could reach a tipping point at which the data used, methods employed, and questions asked became appropriate to statistical analysis. On the other, she precludes this possibility in identifying “reading literature well” as the “cut-off point” at which computational textual analysis ceases to have “utility” (p. 639). This limited conception of statistical inquiry also emerges in Da’s two claims about statistical tools for text mining: they are “ethically neutral”; and they must be used “in accordance with their true function” (p. 620), which Da defines as reducing information to enable quick decision making. Yet as with any intellectual inquiry, surely any measurements—let alone measurements with this particular aim—are interactions with the world that have ethical dimensions.

Statistical tests of statistical arguments are vital. And I agree with Da’s contention that applications of machine learning to identify word patterns in literature often simplify complex historical and critical issues. As Da argues, these simplifications include conceiving of models as “intentional interpretations” (p. 621) and of word patterns as signifying literary causation and influence. But there’s a large gap between identifying these problems and insisting that statistical tools have a “true function” that is inimical to literary studies. Our discipline has always drawn methods from other fields (history, philosophy, psychology, sociology, and others). Perhaps it’s literary studies’ supposed lack of functional utility (something Da claims to defend) that has enabled these adaptations to be so productive; perhaps such adaptations have been productive because the meaning of literature is not singular but forged constitutively with a society where the prominence of particular paradigms (historical, philosophical, psychological, sociological, now statistical) at particular moments shapes what and how we know. In any case, disciplinary purity is no protection against poor methodology; and cross disciplinarity can increase methodological awareness.

Da’s rigid notion of a “true function” for statistics prevents her asking more “argumentatively meaningful” (p. 639) questions about possible encounters between literary studies and statistical methods. These might include: If not intentional or interpretive, what is the epistemological—and ontological and ethical—status of patterns discerned by machine learning? Are there ways of connecting word counts with other, literary and nonliterary, elements that might enhance the “explanatory power” (p. 604) and/or critical potential of such models and, if not, why not? As is occurring in fields such as philosophy, sociology, and science and technology studies, can literary studies apply theoretical perspectives (such as feminist empiricism or new materialism) to reimagine literary data and statistical inquiry? Without such methodological and epistemological reflection, Da’s statistical debunking of statistical models falls into the same trap she ascribes to those arguments: of confusing “what happens mechanistically with insight” (p. 639). We very much need critiques of mechanistic—positivist, reductive, and ahistorical—approaches to literary data, statistics, and machine learning. Unfortunately, Da’s critique demonstrates the problems it decries.

 

KATHERINE BODE is associate professor of literary and textual studies at the Australian National University. Her latest book, A World of Fiction: Digital Collections and the Future of Literary History (2018), offers a new approach to literary research with mass-digitized collections, based on the theory and technology of the scholarly edition. Applying this model, Bode investigates a transnational collection of around 10,000 novels and novellas, discovered in digitized nineteenth-century Australian newspapers, to offer new insights into phenomena ranging from literary anonymity and fiction syndication to the emergence and intersections of national literary traditions.

 


 

Sarah Brouillette

DH is here to stay, including in the CLS variant whose errors Nan Da studies. This variant is especially prevalent in English programs, and it will continue to gain force there. Even when those departments have closed or merged with other units, people with CLS capacities will continue to find positions—though likely contractually —when others no longer can. This is not to say that DH is somehow itself the demise of the English department. The case rather is that both the relative health of DH and the general decline in literary studies—measured via enrollments, number of tenured faculty, and university heads’ dispositions toward English—arise from the same underlying factors. The pressures that English departments face are grounded in the long economic downturn and rising government deficits, deep cuts to funding for higher education, rising tuition, and a turn by university administrators toward boosting business and STEM programs. We know this. There has been a foreclosure of futurity for students who are facing graduation with significant debt burdens and who doubt that they will find stable work paying a good wage. Who can afford the luxury of closely reading five hundred pages of dense prose? Harried anxious people accustomed to working across many screens, many open tabs, with constant pings from social media, often struggle with sustained reading. Myself included. DH is a way of doing literary studies without having to engage in long periods of sustained reading, while acquiring what might feel like job skills. It doesn’t really matter how meaningful CLS labs’ findings are. As Da points out, practitioners themselves often emphasize how tentative their findings are or stress flaws in the results or the method that become the occasion for future investment and development. That is the point: investment and development. The key to DH’s relative health is that it supports certain kinds of student training and the development of technologically enhanced learning environments. One of the only ways to get large sums of grant money from the Social Sciences and Humanities Research Council of Canada (SSHRC) is to budget for equipment and for student training. Computer training is relatively easy to describe in a budget justification. Universities for their part often like DH labs because they attract these outside funders, and because grants don’t last forever, a campus doesn’t have to promise anything beyond short-term training and employment. As for the students: to be clear, those with DH skills don’t necessarily walk more easily into jobs than those without them. But DH labs, which at least in Canada need to be able to list training as a priority, offer an experience of education that has an affective appeal for many students—an appeal that universities work hard to cultivate and reinforce. This cultivation is there in the constant contrasts made between old fashioned and immersive learning, between traditional and project-based classrooms, between the dull droning lecture and the experiential . . . well, experience. (The government of Ontario has recently mandated that every student have an opportunity to experience “work-integrated learning” before graduation.) It is there also in the push to make these immersive experiences online ones, mediated by learning management systems such as Brightspace or Canvas, which store data via Amazon Web Services. Learning in universities increasingly occurs in data capturable forms. The experience of education, from level of participation to test performance, is cultivated, monitored, and tracked digitally. Students who have facility with digital technologies are, needless to say, at an advantage in this environment. Meanwhile the temptation to think that courses that include substantial digital components are more practical and professional – less merely academic – is pretty understandable, as universities are so busily cultivating and managing engagement in a context in which disengagement otherwise makes total sense. DH is simply far more compatible with all of these observable trends than many other styles of literary inquiry.

SARAH BROUILLETTE is a professor in the Department of English at Carleton University in Ottawa, Canada.


What Is Literary Studies?

Ed Finn

This is the question that underpins Da’s takedown of what she calls computational literary studies (CLS). The animus with which she pursues this essay is like a search light that creates a shadow behind it. “The discipline is about reducing reductionism,” she writes (p. 638), which is a questionable assertion about a field that encompasses many kinds of reduction and contradictory epistemic positions, from thing theory to animal studies. Da offers no evidence or authority to back up her contention that CLS fails to validate its claims. Being charitable, what Da means, I think, is that literary scholars should always attend to context, to the particulars of the works they engage.

Da’s essay assails what she terms the false rigor of CLS: the obsession with reductive analyses of large datasets, the misapplied statistical methods, the failure to disentangle artifacts of measurement from significant results. And there may be validity to these claims: some researchers use black box tools they don’t understand, not just in the digital humanities but in fields from political science to medicine.  The most helpful contribution of Da’s article is tucked away in the online appendix, where she suggests a very good set of peer review and publication guidelines for DH work. I can imagine a version of this essay that culminated with those guidelines rather than the suggestion that “reading literature well” is a bridge too far for computational approaches.

The problem with the spotlight Da shines on the rigor of CLS is that shadow looming behind it. What does rigor look like in “the discipline” of literary studies, which is defined so antagonistically to CLS here? What are the standards of peer review that ensure literary scholarship validates its methods, particularly when it draws those methods from other disciplines? Nobody is calling in economists to assess the validity of Marxist literary analysis, or cognitive psychologists to check applications of affect theory, and it’s hard to imagine that scholars would accept the disciplinary authority of those critics. I am willing to bet Critical Inquiry’s peer review process for Da’s article did not include federal grants program officers, university administrators, or scholars of public policy being asked to assess Da’s rhetorical—but central—question “of why we need ‘labs’ or the exorbitant funding that CLS has garnered” (p. 603).

I contend this is actually a good idea: literary studies can benefit from true dialog and collaboration with fields across the entire academy. Da clearly feels that this is justified in the case of CLS, where she calls for more statistical expertise (and brings in a statistician to guide her analysis in this paper). But why should CLS be singled out for this kind of treatment?

Either one accepts that rigor sometimes demands literary studies should embrace expertise from other fields—like Da bringing in a statistician to validate her findings for this paper—or one accepts that literary studies is made up of many contradictory methods and that “the discipline” is founded on borrowing methods from other fields without any obligation validate findings by the standards of those other fields. What would it look like to generalize Da’s proposals for peer review to other areas of literary studies? The contemporary research I find most compelling makes this more generous move: bringing scholars in the humanities together with researchers in the social sciences, the arts, medicine, and other arenas where people can actually learn from one another and do new kinds of work.

To me, literary studies is the practice of reading and writing in order to better understand the human condition. And the condition is changing. Most of what we read now comes to us on screens that are watching us as we watch them. Many of the things we think about have been curated and lobbed into our consciousness by algorithmic feeds and filters. I studied Amazon recommendation networks because they play an important role in contemporary American literary reception and the lived experience of fiction for millions of readers—at least circa 2010, when I wrote the article. My approach in that work hewed to math that I understand and a scale of information that I call small data because it approximates the headspace of actual readers thinking about particular books. Small data always leads back to the qualitative and to the particular, and it is a minor example of the contributions humanists can make beyond the boundaries of “the discipline.”

We desperately need the humanities to survive the next century, when so many of our species’ bad bets are coming home to roost. Text mining is not “ethically neutral,” as Da gobsmackingly argues (p. 620), any more than industrialization was ethically neutral, or the NSA using network analysis to track suspected terrorists (Da’s example of a presumably acceptable “operationalizable end” for social network analysis) (p. 632). The principle of charity would, I hope, preclude Da’s shortsighted framing of what matters in literary studies, and it would open doors to other fields like computer science where many researchers are, either unwittingly or uncaringly, deploying words like human and read and write with the same kind of facile dismissal of methods outside “the discipline” that are on display here. That is the context in which we read and think about literature now, and if we want to “read literature well,” we need to bring the insights of literary study to broader conversations where we participate, share, educate, and learn.

ED FINN is the founding director of the Center for Science and the Imagination at Arizona State University where he is an associate professor in the School of Arts, Media, and Engineering and the Department of English.


What the New Computational Rigor Should Be

Lauren F. Klein

Writing about the difficulties of evaluating digital scholarship in a recent special issue of American Quarterlydevoted to DH, Marisa Parham proposes the concept of “The New Rigor” to account for the labor of digital scholarship as well as its seriousness: “It is the difference between what we say we want the world to look like and what we actually carry out in our smallest acts,” she states (p. 683). In “The Computational Case against Computational Literary Studies,” Nan Z. Da also makes the case for a new rigor, although hers is more narrowly scoped. It entails both a careful adherence to the methods of statistical inquiry and a concerted rejection of the application of those methods to domains—namely, literary studies—that fall beyond their purported use.

No one would argue with the former. But it is the latter claim that I will push back against. Several times in her essay, Da makes the case that “statistical tools are designed to do certain things and solve specific problems,” and for that reason, they should not be employed to “capture literature’s complexity” (pp. 619-20, 634). To be sure, there exists a richness of language and an array of ineffable—let alone quantifiable—qualities of literature that cannot be reduced to a single model or diagram. But the complexity of literature exceeds even that capaciousness, as most literary scholars would agree. And for that very reason, we must continue to explore new methods for expanding the significance of our objects of study. As literary scholars, we would almost certainly say that we want to look at—and live in—a world that embraces complexity. Given that vision, the test of rigor then becomes, to return to Parham’s formulation, how we usher that world into existence through each and every one of “our smallest acts” of scholarship, citation, and critique.

In point of fact, many scholars already exhibit this new computational rigor. Consider how Jim Casey, the national codirector of the Colored Conventions Project, is employing social network analysis—including the centrality scores and modularity measures that Da finds lacking in the example she cites—in order to detect changing geographic centers for this important nineteenth-century organizing movement. Or how Lisa Rhody has found an “interpretive space that is as vital as the weaving and unraveling at Penelope’s loom” in a topic model of a corpus of 4,500 poems. This interpretive space is one that Rhody creates in no small part by accounting for the same fluctuations of words in topics—the result of the sampling methods employed in almost all topic model implementations—that Da invokes, instead, in order to dismiss the technique out of hand. Or how Laura Estill, Dominic Klyve, and Kate Bridal have employed statistical analysis, including a discussion of the p-values that Da believes (contramany statisticians) are always required, in order to survey the state of Shakespeare studies as a field.

That these works are authored by scholars in a range of academic roles, including postdoctoral fellows and DH program coordinators as well as tenure-track faculty, and are published in a range of venues, including edited collections and online as well as domain-specific journals; further points to the range of extant work that embraces the complexity of literature in precisely the ways that Da describes. But these works to do more: they also embrace the complexity of the statistical methods that they employ. Each of these essays involve a creative repurposing of the methods they borrow from more computational fields, as well as a trenchant self-critique. Casey, for example, questions how applying techniques of social network analysis, which are premised on a conception of sociality as characterized by links between individual “nodes,” can do justice to a movement celebrated for its commitment to collective action. Rhody, for another, considers the limits of the utility of topic modeling, as a tool “designed to be used with texts that employ as little figurative language as possible,” for her research questions about ekphrasis. These essays each represent “small acts” and necessarily so. But taken alongside the many other examples of computational work that are methodologically sound, creatively conceived, and necessarily self-critical, they constitute the core of a field committed to complexity in boththe texts they elucidate andthe methods they employ.

In her formulation of the “The New Rigor,” Parham—herself a literary scholar—places her emphasis on a single word: “Carrying, how we carry ourselves in our relationships and how we carry each other, is the real place of transformation,” she writes. Da, the respondents collected in this forum, and all of us in literary studies—computational and not—might linger on that single word. If our goal remains to celebrate the complexity of literature—precisely because it helps to illuminate the complexity of the world—then we must carry ourselves, and each other, with intellectual generosity and goodwill. We must do so, moreover, with a commitment to honoring the scholarship, and the labor, that has cleared the path up to this point. Only then can we carry forward the field of computational literary studies into the transformative space of future inquiry.

LAUREN F. KLEIN is associate professor at the School of Literature, Media, and Communication, Georgia Institute of Technology.


Trust in Numbers

Hoyt Long and Richard Jean So

 

Nan Da’s “The Computational Case against Computational Literary Criticism” stands out from past polemics against computational approaches to literature in that it purports to take computation seriously. It recognizes that a serious engagement with this kind of research means developing literacy of statistical and other concepts. Insofar as her essay promises to move the debate beyond a flat rejection of numbers, and towards something like a conversation about replication, it is a useful step forward.

This, however, is where its utility ends. “Don’t trust the numbers,” Da warns. Or rather, “Don’t trust their numbers, trust mine.” But should you? If you can’t trust their numbers, she implies, the entire case for computational approaches falls apart. Trust her numbers and you’ll see this. But her numbers cannot be trusted. Da’s critique of fourteen articles in the field of cultural analytics is rife with technical and factual errors. This is not merely quibbling over details. The errors reflect a basic lack of understanding of fundamental statistical concepts and are akin to an outsider to literary studies calling George Eliot a “famous male author.” Even more concerning, Da fails to understand statistical method as a contextual, historical, and interpretive project. The essay’s greatest error, to be blunt, is a humanist one.

Here we focus on Da’s errors related to predictive modeling. This is the core method used in the two essays of ours that she critiques. In “Turbulent Flow,” we built a model of stream-of-consciousness (SOC) narrative with thirteen linguistic features and found that ten of them, in combination, reliably distinguished passages that we identified as SOC (as compared with passages taken from a corpus of realist fiction). Type-token ratio (TTR), a measure of lexical diversity, was the most distinguishing of these, though uninformative on its own. The purpose of predictive modeling, as we carefully explain in the essay, is to understand how multiple features work in concert to identify stylistic patterns, not alone. Nothing in Da’s critique suggests she is aware of this fundamental principle.

Indeed, Da interrogates just one feature in our model (TTR) and argues that modifying it invalidates our modeling. Specifically, she tests whether the strong association of TTR with SOC holds after removing words in her “standard stopword list,” instead of in the stopword list we used. She finds it doesn’t. There are two problems with this. First, TTR and “TTR minus stopwords” are two separate features. We actually included both in our model and found the latter to be minimally distinctive. Second, while the intuition to test for feature robustness is appropriate, it is undercut by the assertion that there is a “standard” stopword list that should be universally applied. Ours was specifically curated for use with nineteenth- and early twentieth-century fiction. Even if there was good reason to adopt her “standard” list, one still must rerun the model to test if the remeasured “TTR minus stopwords” feature changes the overall predictive accuracy. Da doesn’t do this. It’s like fiddling with a single piano key and, without playing another note, declaring the whole instrument to be out of tune.

But the errors run deeper than this. In Da’s critique of “Literary Pattern Recognition,” she tries to invalidate the robustness of our model’s ability to classify English-language haiku poems from nonhaiku poems. She does so by creating a new corpus of “English translations of Chinese couplets” and tests our model on this corpus. Why do this? She suggests that it is because they are filled “with similar imagery” to English haiku and are similarly “Asian.” This is a misguided decision that smacks of Orientalism. It completely erases context and history, suggesting an ontological relation where there is none. This is why we spend over twelve pages delineating the English haiku form in both critical and historical terms.

These errors exemplify a consistent refusal to contextualize and historicize one’s interpretative practices (indeed to “read well”), whether statistically or humanistically. We do not believe there exist “objectively” good literary interpretations or that there is one “correct” way to do statistical analysis: Da’s is a position most historians of science, and most statisticians themselves, would reject.  Conventions in both literature and science are continuously debated and reinterpreted, not handed down from on high. And like literary studies, statistics is a body of knowledge formed from messy disciplinary histories, as well as diverse communities of practice. Da’s essay insists on a highly dogmatic, “objective,” black-and-white version of knowledge, a disposition totally antithetical to both statistics and literary studies. It is not a version that encourages much trust.

Hoyt Long is associate professor of Japanese literature at the University of Chicago. He publishes widely in the fields of Japanese literary studies, media history, and cultural analytics. His current book project is Figures of Difference: Quantitative Approaches to Modern Japanese Literature.

Richard Jean So is assistant professor of English and cultural analytics at McGill University. He works on computational approaches to literature and culture with a focus on contemporary American writing and race. His current book project is Redlining Culture: A Data History of Race and US Fiction.

 


The Select

Andrew Piper

Nan Z. Da’s study published in Critical Inquiry participates in an emerging trend across a number of disciplines that falls under the heading of “replication.”[1]In this, her work follows major efforts in other fields, such as the Open Science Collaboration’s “reproducibility project,” which sought to replicate past studies in the field of psychology.[2]As the authors of the OSC collaboration write, the value of replication, when done well, is that it can “increase certainty when findings are reproduced and promote innovation when they are not.”

And yet despite arriving at sweeping claims about an entire field, Da’s study fails to follow any of the procedures and practices established by projects like the OSC.[3]While invoking the epistemological framework of replication—that is, to prove or disprove the validity of both individual articles as well as an entire field—her practices follow instead the time-honoured traditions of selective reading from the field of literary criticism. Da’s work is ultimately valuable not because of the computational case it makes (that work still remains to be done), but the way it foregrounds so many of the problems that accompany traditional literary critical models when used to make large-scale evidentiary claims. The good news is that this article has made the problem of generalization, of how we combat the problem of selective reading, into a central issue facing the field.

Start with the evidence chosen. When undertaking their replication project, the OSC generated a sample of one hundred studies taken from three separate journals within a single year of publication to approximate a reasonable cross-section of the field. Da on the other hand chooses “a handful” of articles (fourteen by my count) from different years and different journals with no clear rationale of how these articles are meant to represent an entire field. The point is not the number chosen but that we have no way of knowing why these articles and not others were chosen and thus whether her findings extend to any work beyond her sample. Indeed, the only linkage appears to be that these studies all “fail” by her criteria. Imagine if the OSC had found that 100 percent of articles sampled failed to replicate. Would we find their results credible? Da by contrast is surprisingly only ever right.

Da’s focus within articles exhibits an even stronger degree of nonrepresentativeness. In their replication project, the OSC establishes clearly defined criteria through which a study can be declared not to replicate, while also acknowledging the difficulty of arriving at this conclusion. Da by contrast applies different criteria to every article, making debatable choices, as well as outright errors, that are clearly designed to foreground differences.[4]She misnames authors of articles, mis-cites editions, mis-attributes arguments to the wrong book, and fails at some basic math.[5]And yet each of these assertions always adds-up to the same certain conclusion: failed to replicate. In Da’s hands, part is always a perfect representation of whole.

Perhaps the greatest limitation of Da’s piece is her extremely narrow (that is, nonrepresentative) definition of statistical inference and computational modeling. In Da’s view, the only appropriate way to use data is to perform what is known as significance testing, where we use a statistical model to test whether a given hypothesis is “true.”[6]There is no room for exploratory data analysis, for theory building, or predictive modeling in her view of the field.[7]This is particularly ironic given that Da herself performs no such tests. She holds others to standards to which she herself is not accountable. Nor does she cite articles where authors explicitly undertake such tests[8]or research that calls into question the value of such tests[9]or research that explores the relationship between word frequency and human judgments that she finds so problematic.[10]The selectivity of Da’s work is deeply out of touch with the larger research landscape.

All of these practices highlight a more general problem that has for too long gone unexamined in the field of literary study. How are we to move reliably from individual observations to general beliefs about things in the world? Da’s article provides a tour de forceof the problems of selective reading when it comes to generalizing about individual studies or entire fields. Addressing the problem of responsible and credible generalization will be one of the central challenges facing the field in the years to come. As with all other disciplines across the university, data and computational modeling will have an integral role to play in that process.

ANDREW PIPER is Professor and William Dawson Scholar in the Department of Languages, Literatures, and Cultures at McGill University. He is the author most recently of Enumerations: Data and Literary Study (2018).

[1]Nan Z. Da, “The Computational Case Against Computational Literary Studies,” Critical Inquiry 45 (Spring 2019) 601-639. For accessible introductions to what has become known as the replication crisis in the sciences, see Ed Yong, “Psychology’s Replication Crisis Can’t Be Wished Away,” The Atlantic, March 4, 2016.

[2]Open Science Collaboration, “Estimating the Reproducibility of Psychological Science,” Science 28 Aug 2015:Vol. 349, Issue 6251, aac4716.DOI: 10.1126/science.aac4716.

[3]Compare Da’s sweeping claims with the more modest ones made by the OSC in Science even given their considerably larger sample and far more rigorous effort at replication, reproduced here. For a discussion of the practice of replication, see Brian D. Earp and David Trafimow, “Replication, Falsification, and the Crisis of Confidence in Social Psychology,” Frontiers in Psychology May 19, 2015: doi.org/10.3389/fpsyg.2015.00621.

[4]For a list, see Ben Schmidt, “A computational critique of a computational critique of a computational critique.” I provide more examples in the scholarly response here: Andrew Piper, “Do We Know What We Are Doing?Journal of Cultural Analytics, April 1, 2019.

[5]She cites Mark Algee-Hewitt as Mark Hewitt, cites G. Casella as the author of Introduction to Statistical Learning when it was Gareth James, cites me and Andrew Goldstone as co-authors in the Appendix when we were not, claims that “the most famous example of CLS forensic stylometry” was Hugh Craig and Arthur F. Kinney’s book that advances a theory of Marlowe’s authorship of Shakespeare’s plays which they do not, and miscalculates the number of people it would take to read fifteen thousand novels in a year. The answer is 1250 not 1000 as she asserts. This statistic is also totally meaningless.

[6]Statements like the following also suggest that she is far from a credible guide to even this aspect of statistics: “After all, statistics automatically assumes that 95 percent of the time there is no difference and that only 5 percent of the time there is a difference. That is what it means to look for p-value less than 0.05.” This is not what it means to look for a p-value less than 0.05. A p-value is the estimated probability of getting our observed data assuming our null hypothesis is true. The smaller the p-value, the more unlikely it is to observe what we did assuming our initial hypothesis is true. The aforementioned 5% threshold says nothing about how often there will be a “difference” (in other words, how often the null hypothesis is false). Instead, it says: “if our data leads us to conclude that there is a difference, we estimate that we will be mistaken 5% of the time.” Nor does “statistics” “automatically” assume that .05 is the appropriate cut-off. It depends on the domain, the question and the aims of modeling. These are gross over-simplifications.

[7]For reflections on literary modeling, see Andrew Piper, “Think Small: On Literary Modeling.” PMLA132.3 (2017): 651-658; Richard Jean So, “All Models Are Wrong,” PMLA132.3 (2017); Ted Underwood, “Algorithmic Modeling: Or, Modeling Data We Do Not Yet Understand,” The Shape of Data in Digital Humanities: Modeling Texts and Text-based Resources, eds. J. Flanders and F. Jannidis (New York: Routledge, 2018).

[8]See Andrew Piper and Eva Portelance, “How Cultural Capital Works: Prizewinning Novels, Bestsellers, and the Time of Reading,” Post-45(2016); Eve Kraicer and Andrew Piper, “Social Characters: The Hierarchy of Gender in Contemporary English-Language Fiction,” Journal of Cultural Analytics, January 30, 2019. DOI: 10.31235/osf.io/4kwrg; and Andrew Piper, “Fictionality,” Journal of Cultural Analytics, Dec. 20, 2016. DOI: 10.31235/osf.io/93mdj.

[9]The literature debating the values of significance testing is vast. See Simmons, Joseph P., Leif D. Nelson, and Uri Simonsohn. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22, no. 11 (November 2011): 1359–66. doi:10.1177/0956797611417632.

 [10]See Rens Bod, Jennifer Hay, and Stefanie Jannedy, Probabilistic Linguistics (Cambridge, MA: MIT Press, 2003); Dan Jurafsky and James Martin, “Vector Semantics,” Speech and Language Processing, 3rd Edition (2018): https://web.stanford.edu/~jurafsky/slp3/6.pdf; for the relation of communication to information theory, M.W. Crocker, Demberg, V. & Teich, E. “Information Density and Linguistic Encoding,” Künstliche Intelligenz 30.1 (2016) 77-81. https://doi.org/10.1007/s13218-015-0391-y; and for the relation to language acquisition and learning, Erickson  LC, Thiessen  ED, “Statistical learning of language: theory, validity, and predictions of a statistical learning account of language acquisition,” Dev. Rev. 37 (2015): 66–108.doi:10.1016/j.dr.2015.05.002.

 


Ted Underwood

In the humanities, as elsewhere, researchers who work with numbers often reproduce and test each other’s claims.1Nan Z. Da’s contribution to this growing genre differs from previous examples mainly in moving more rapidly. For instance, my coauthors and I spent 5,800 words describing, reproducing,and partially criticizing one article about popular music.2By contrast, Da dismisses fourteen publications that use different methods in thirty-eight pages. The article’s energy is impressive, and its long-term effects will be positive.

But this pace has a cost. Da’s argument may be dizzying if readers don’t already know the works summarized, as she rushes through explanation to get to condemnation. Readers who know these works will recognize that Da’s summaries are riddled with material omissions and errors. The time is ripe for a theoretical debate about computing in literary studies. But this article is unfortunately too misleading—even at the level of paraphrase—to provide a starting point for the debate.

For instance, Da suggests that my article “The Life Cycles of Genres” makes genres look stable only because it forgets to compare apples to apples: “Underwood should train his model on pre-1941 detective fiction (A) as compared to pre-1941 random stew and post-1941 detective fiction (B) as compared to post-1941 random stew, instead of one random stew for both” (p. 608).3

This perplexing critique tells me to do exactly what my article (and public code) make clear that I did: compare groups of works matched by publication date.4There is also no “random stew” in the article. Da’s odd phrase conflates a random contrast set with a ghastly “genre stew” that plays a different role in the argument.

More importantly, Da’s critique suppresses the article’s comparative thesis—which identifies detective fiction as more stable than several other genres—in order to create a straw man who argues that all genres “have in fact been more or less consistent from the 1820s to the present” (p. 609). Lacking any comparative yardstick to measure consistency, this straw thesis becomes unprovable. In other cases Da has ignored the significant results of an article, in order to pour scorn on a result the authors acknowledge as having limited significance—without ever mentioning that the authors acknowledge the limitation. This is how she proceeds with Jockers and Kirilloff (p. 610).

In short, this is not an article that works hard at holistic critique. Instead of describing the goals that organize a publication, Da often assumes that researchers were trying (and failing) to do something she believes they should have done. Topic modeling, for instance, identifies patterns in a corpus without pretending to find a uniquely correct description. Humanists use the method mostly for exploratory analysis. But Da begins from the assumption that topic modeling must be a confused attempt to prove hypotheses of some kind. So, she is shocked to discover (and spends a page proving) that different topics can emerge when the method is run multiple times. This is true. It is also a basic premise of the method, acknowledged by all the authors Da cites—who between them spend several pages discussing how results that vary can nevertheless be used for interpretive exploration. Da doesn’t acknowledge the discussion.

Finally, “The Computational Case” performs some crucial misdirection at the outset by implying that cultural analytics is based purely on linguistic evidence and mainly diction. It is true that diction can reveal a great deal, but this is a misleading account of contemporary trends. Quantitative approaches are making waves partly because researchers have learned to extract social relations from literature and partly because they pair language with external social testimony—for instance the judgments of reviewers.5Some articles, like my own on narrative pace, use numbers entirely to describe the interpretations of human readers.6Once again, Da’s polemical strategy is to isolate one strand in a braid, and critique it as if it were the whole.

A more inquisitive approach to cultural analytics might have revealed that it is not a monolith but an unfolding debate between several projects that frequently criticize each other. Katherine Bode, for instance, has critiqued other researchers’ data (including mine), in an exemplary argument that starts by precisely describing different approaches to historical representation.7Da could have made a similarly productive intervention—explaining, for instance, how researchers should report uncertainty in exploratory analysis. Her essay falls short of that achievement because a rush to condemn as many examples as possible has prevented it from taking time to describe and genuinely understand its objects of critique.

TED UNDERWOODis professor of information sciences and English at the University of Illinois, Urbana-Champaign. He has published in venues ranging from PMLA to the IEEE International Conference on Big Data and is the author most recently of Distant Horizons: Digital Evidence and Literary Change (2019).

1.Andrew Goldstone, “Of Literary Standards and Logistic Regression: A Reproduction,” January 4, 2016, https://andrewgoldstone.com/blog/2016/01/04/standards/. Jonathan Goodwin, “Darko Suvin’s Genres of Victorian SF Revisited,” Oct 17, 2016, https://jgoodwin.net/blog/more-suvin/.

2. Ted Underwood, “Can We Date Revolutions in the History of Literature and Music?”, The Stone and the Shell, October 3, 2015, https://tedunderwood.com/2015/10/03/can-we-date-revolutions-in-the-history-of-literature-and-music/ Ted Underwood, Hoyt Long, Richard Jean So, and Yuancheng Zhu, “You Say You Found a Revolution,” The Stone and the Shell, February 7, 2016, https://tedunderwood.com/2016/02/07/you-say-you-found-a-revolution/.

3. Nan Z. Da, “The Computational Case against Computational Literary Studies,” Critical Inquiry 45 (Spring 2019): 601-39.

4. Ted Underwood, “The Life Cycles of Genres,” Journal of Cultural Analytics, May 23, 2016, http://culturalanalytics.org/2016/05/the-life-cycles-of-genres/.

5. Eve Kraicer and Andrew Piper, “Social Characters: The Hierarchy of Gender in Contemporary English-Language Fiction,” Journal of Cultural Analytics, January 30, 2019, http://culturalanalytics.org/2019/01/social-characters-the-hierarchy-of-gender-in-contemporary-english-language-fiction/

6. Ted Underwood, “Why Literary Time is Measured in Minutes,” ELH 25.2 (2018): 341-65.

7. Katherine Bode, “The Equivalence of ‘Close’ and ‘Distant’ Reading; or, Toward a New Object for Data-Rich Literary History,” MLQ 78.1 (2017): 77-106.


DAY 2 RESPONSES 


Argument

Nan Z Da

First, a qualification. Due to the time constraints of this forum, I can only address a portion of the issues raised by the forum participants and in ways still imprecise. I do plan to issue an additional response that addresses the more fine-grained technical issues.

“The Computational Case against Computational Literary Studies” was not written for the purposes of refining CLS. The paper does not simply call for “more rigor” or for replicability across the board. It is not about figuring out which statistical mode of inquiry best suits computational literary analysis. It is not a method paper; as some of my respondents point out, those are widely available.

The article was written to empower literary scholars and editors to ask logical questions about computational and quantitative literary criticism should they suspect a conceptual mismatch between the result and the argument or perceive the literary-critical payoff to be extraordinarily low.

The paper, I hope, teaches us to recognize two types of CLS work. First, there is statistically rigorous work that cannot actually answer the question it sets out to answer or doesn’t ask an interesting question at all. Second, there is work that seems to deliver interesting results but is either nonrobust or logically confused. The confusion sometimes issues from something like user error, but it is more often the result of the suboptimal or unnecessary use of statistical and other machine-learning tools. The paper was an attempt to demystify the application of those tools to literary corpora and to explain why technical errors are amplified when your goal is literary interpretation or description.

My article is the culmination of a long investigation into whether computational methods and their modes of quantitative analyses can have purchase in literary studies. My answer is that what drives quantitative results and data patterns often has little to do with the literary critical or literary historical claims being made by scholars that claim to be finding such results and uncovering such patterns—though it sometimes looks like it. If the conclusions we find in CLS corroborate or disprove existing knowledge, this is not a sign that they are correct but that they are tautological at best, merely superficial at worst.

The article is agnostic on what literary criticism ought to be and makes no prescriptions about interpretive habits. The charge that it takes a “purist” position is pure projection. The article aims to describe what scholarship ought not to be. Even the appeal to reading books in the last pages of the article does not presume the inherent meaningfulness of “actually reading” but only serves as a rebuttal to the use of tools that wish to do simple classifications for which human decision would be immeasurably more accurate and much less expensive.

As to the question of Exploratory Data Analysis versus Confirmatory Data Analysis: I don’t prioritize one over the other. If numbers and their interpretation are involved, then statistics has to come into play; I don’t know any way around this. If you wish to simply describe your data, then you have to show something interesting that derives from measurements that are nonreductive. As to the appeal to exploratory tools: if your tool will never be able to explore the problem in question, because it lacks power or is overfitted to its object, your exploratory tool is not needed.

It seems unobjectionable that quantitative methods and nonquantitative methods might work in tandem.  My paper is simply saying: that may be true in theory but it falls short in practice. Andrew Piper points us to the problem of generalization, of how to move from local to global, probative to illustrative. This is precisely the gap my article interrogates because that’s where the collaborative ideal begins to break down. One may call the forcible closing of that gap any number of things—a new hermeneutics, epistemology, or modality—but in the end, the logic has to clear.

My critics are right to point out a bind. The bind is theirs, however, not mine. My point is also that, going forward, it is not for me or a very small group of people to decide what the value of this work is, nor how it should be done.

Ed Finn accuses me of subjecting CLS to a double standard: “Nobody is calling in economists to assess the validity of Marxist literary analysis, or cognitive psychologists to check applications of affect theory, and it’s hard to imagine that scholars would accept the disciplinary authority of those critics.”

This is faulty reasoning. For one thing, literary scholars ask for advice and assessment from scholars in other fields all the time. For another, the payoff of the psychoanalytic reading, even as it seeks extraliterary meaning and validity, is not for psychology but for literary-critical meaning, where it succeeds or fails on its own terms. CLS wants to say, “it’s okay that there isn’t much payoff in our work itself as literary criticism, whether at the level of prose or sophistication of insight; the payoff is in the use of these methods, the description of data, the generation of a predictive model, or the ability for someone else in the future to ask (maybe better) questions. The payoff is in the building of labs, the funding of students, the founding of new journals, the cases made for tenure lines and postdoctoral fellowships and staggeringly large grants. When these are the claims, more than one discipline needs to be called in to evaluate the methods, their applications, and their result. Because printed critique of certain literary scholarship is generally not refuted by pointing to things still in the wings, we are dealing with two different scholarly models. In this situation, then, we should be maximally cross-disciplinary.

NAN Z. DA teaches literature at the University of Notre Dame.


Errors

Nan Z. Da

This first of two responses addresses errors, real and imputed; the second response is the more substantive.

1. There is a significant mistake in footnote 39 (p. 622) of my paper. In it I attribute to Hugh Craig and Arthur F. Kinney the argument that Marlowe wrote parts of some late Shakespeare plays after his (Marlowe’s) death. The attribution is incorrect. What Craig asks in “The Three Parts of Henry VI” (pp. 40-77) is whether Marlowe wrote segments of these plays. I would like to extend my sincere apologies to Craig and to the readers of this essay for the misapprehension that it caused.

2. The statement “After all, statistics automatically assumes” (p. 608) is incorrect. A more correct statement would be: In standard hypothesis testing a 95 percent confidence level means that, when the null is true, you will correctly fail to reject 95 percent of the time.

3. The description of various applications of text-mining/machine-learning (p. 620) as “ethically neutral” is not worded carefully enough. I obviously do not believe that some of these applications, such as tracking terrorists using algorithms, is ethically neutral. I meant that there are myriad applications of these tools: for good, ill, and otherwise. On balance it’s hard to assign an ideological position to them.

4. Ted Underwood is correct that, in my discussion of his article on “The Life Cycle of Genres,” I confused the “ghastly stew” with the randomized control sets used in his predictive modeling. Underwood also does not make the elementary statistical mistake I suggest he has made in my article (“Underwood should train his model on pre-1941” [p. 608]).

As to the charge of misrepresentation: paraphrasing a paper whose “single central thesis … is that the things we call ‘genres’ may be entities of different kinds, with different life cycles and degrees of textual coherence” is difficult. Underwood’s thesis here refers to the relative coherence of detective fiction, gothic, and science fiction over time, with 1930 as the cutoff point.

The other things I say about the paper remain true. The paper cites various literary scholars’ definitions of genre change, but its implicit definition of genre is “consistency over time of 10,000 frequently used terms.” It cannot “reject Franco Moretti’s conjecture that genres have generational cycles” (a conjecture that most would already find too reductive) because it is not using the same testable definition of genre or change.

5. Topic Modeling: my point isn’t that topic models are non-replicable but that, in this particular application, they are non-robust. Among other evidence: if I remove one document out of one hundred, the topics change. That’s a problem.

6. As far as Long and So’s essay “Turbulent Flow” goes, I need a bit more time than this format allows to rerun the alternatives responsibly. So and Long have built a tool in which there are thirteen features for predicting the difference between two genres—Stream of Consciousness and Realism. They say: most of these features are not very predictive alone but together become very predictive, with that power being concentrated in just one feature. I show that that one feature isn’t robust. To revise their puzzling metaphor: it’s as if someone claims that a piano plays beautifully and that most of that sound comes from one key. I play that key; it doesn’t work.

7. So and Long argue that by proving that their classifier misclassifies nonhaikus—not only using English translations of Chinese poetry, as they suggest, but also Japanese poetry that existed long before the haiku—I’ve made a “misguided decision that smacks of Orientalism. . . . It completely erases context and history, suggesting an ontological relation where there is none.” This is worth getting straight. Their classifier lacks power because it can only classify haikus with reference to poems quite different from haikus; to be clear, it will classify equally short texts with overlapping keywords close to haikus as haikus. Overlapping keywords is their predictive feature, not mine. I’m not sure how pointing this out is Orientalist. As for their model, I would if pushed say it is only slightly Orientalist, if not determinatively so.

8. Long and So claim that my “numbers cannot be trusted,” that my “critique . . . is rife with technical and factual errors”; in a similar vein it ends with the assertion that my essay doesn’t “encourag[e] much trust.”  I’ll admit to making some errors in this article, though not in my analyses of Long and So’s papers (the errors mostly occur in section 3). I hope to list all of these errors in the more formal response that appears in print or else in an online appendix. That said, an error is not the same as a specious insinuation that the invalidation of someone’s model indicates Orientalism, pigheadedness, and so on. Nor is an error the same as the claim that “CI asked Da to widen her critique to include female scholars and she declined” recently made by So, which is not an error but a falsehood.

NAN Z. DA teaches literature at the University of Notre Dame.


Katherine Bode

The opening statements were fairly critical of Da’s article, less so of CLS. To balance the scales, I want to suggest that Da’s idiosyncratic definition of CLS is partly a product of problematic divisions within digital literary studies.

Da omits what I’d call digital literary scholarship: philological, curatorial, and media archaeological approaches to digital collections and data. Researchers who pursue these approaches, far from reducing all digit(al)ized literature(s) to word counts, maintain––like Da––that analyses based purely or predominantly on such features tend to produce “conceptual fallacies from a literary, historical, or cultural-critical perspective” (p. 604). Omitting such research is part of the way in which Da operationalizes her critique of CLS: defining the field as research that focuses on word counts, then criticizing the field as limited because focused on word counts.

But Da’s perspective is mirrored by many of the researchers she cites. Ted Underwood, for instance, describes “otiose debates about corpus construction” as “well-intentioned red herrings” that detract attention from the proper focus of digital literary studies on statistical methods and inferences.[1] Da has been criticized for propagating a male-dominated version of CLS. But those who pursue the methods she criticizes are mostly men. By contrast, much digital literary scholarship is conducted by women and/or focused on marginalized literatures, peoples, or cultures. The tendency in CLS to privilege data modeling and analysis––and to minimize or dismiss the work of data construction and curation––is part of the culture that creates the male dominance of that field.

More broadly, both the focus on statistical modelling of word frequencies in found datasets, and the prominence accorded to such research in our discipline, puts literary studies out of step with digital research in other humanities fields. In digital history, for instance, researchers collaborate to construct rich datasets––for instance, of court proceedings (as in The Proceedings of the Old Bailey)[2] or social complexity (as reported in a recent Nature article)[3]––that can be used by multiple researchers, including for noncomputational analyses. Where such research is statistical, the methods are often simpler than machine learning models (for instance, trends over time; measures of relationships between select variables) because the questions are explicitly related to scale and the aggregation of well-defined scholarly phenomena, not to epistemologically-novel patterns discerned among thousands of variables.

Some things I want to know: Why is literary studies so hung up on (whether in favor of, or opposed to) this individualistic, masculinist mode of statistical criticism? Why is this focus allowed to marginalize earlier, and inhibit the development of new, large-scale, collaborative environments for both computational and noncomputational literary research? Why, in a field that is supposedly so attuned to identity and inequality, do we accept––and foreground––digital research that relies on platforms (Google Books, HathiTrust, EEBO, and others) that privilege dominant literatures and literary cultures? What would it take to bridge the scholarly and critical––the curatorial and statistical––dimensions of (digital) literary studies and what alternative, shared futures for our discipline could result?

KATHERINE BODE is associate professor of literary and textual studies at the Australian National University. Her latest book, A World of Fiction: Digital Collections and the Future of Literary History (2018), offers a new approach to literary research with mass-digitized collections, based on the theory and technology of the scholarly edition. Applying this model, Bode investigates a transnational collection of around 10,000 novels and novellas, discovered in digitized nineteenth-century Australian newspapers, to offer new insights into phenomena ranging from literary anonymity and fiction syndication to the emergence and intersections of national literary traditions.

[1]Ted Underwood, Distant Horizons: Digital Evidence and Literary Change (Chicago: Chicago University Press, 2019): 180; 176.

[2]Tim Hitchcock, Robert Shoemaker, Clive Emsley, Sharon Howard and Jamie McLaughlin, et al., The Proceedings of the Old Bailey, http://www.oldbaileyonline.org, version 8.0, March 2018).

[3]Harvey Whitehouse, Pieter François, Patrick E. Savage, Thomas E. Currie, Kevin C. Feeney, Enrico Cioni, Rosalind Purcell, et al., “Complex Societies Precede Moralizing Gods Throughout World History,” Nature March 20 (2019): 1.


Ted Underwood

More could be said about specific claims in “The Computational Case.” But frankly, this forum isn’t happening because literary critics were persuaded by (or repelled by) Da’s statistical arguments. The forum was planned before publication because the essay’s general strategy was expected to make waves. Social media fanfare at the roll-out made clear that rumors of a “field-killing” project had been circulating for months among scholars who might not yet have read the text but were already eager to believe that Da had found a way to hoist cultural analytics by its own petard—the irrefutable authority of mathematics.

That excitement is probably something we should be discussing. Da’s essay doesn’t actually reveal much about current trends in cultural analytics. But the excitement preceding its release does reveal what people fear about this field—and perhaps suggest how breaches could be healed.

While it is undeniably interesting to hear that colleagues have been anticipating your demise, I don’t take the rumored plans for field-murder literally. For one thing, there’s no motive: literary scholars have little to gain by eliminating other subfields. Even if quantitative work had cornered a large slice of grant funding in literary studies (which it hasn’t), the total sum of all grants in the discipline is too small to create a consequential zero-sum game.

The real currency of literary studies is not grant funding but attention, so I interpret excitement about “The Computational Case” mostly as a sign that a large group of scholars have felt left out of an important conversation. Da’s essay itself describes this frustration, if read suspiciously (and yes, I still do that). Scholars who tried to critique cultural analytics in a purely external way seem to have felt forced into an unrewarding posture—“after all, who would not want to appear reasonable, forward-looking, open-minded?” (p. 603). What was needed instead was a champion willing to venture into quantitative territory and borrow some of that forward-looking buzz.

Da was courageous enough to try, and I think the effects of her venture are likely to be positive for everyone. Literary scholars will see that engaging quantitative arguments quantitatively isn’t all that hard and does produce buzz. Other scholars will follow Da across the qualitative/quantitative divide, and the illusory sharpness of the field boundary will fade.

Da’s own argument remains limited by its assumption that statistics is an alien world, where humanistic guidelines like “acknowledge context” are replaced by rigid hypothesis-testing protocols. But the colleagues who follow her will recognize, I hope, that statistical reasoning is an extension of ordinary human activities like exploration and debate. Humanistic principles still apply here. Quantitative models can test theories, but they are also guided by theory, and they shouldn’t pretend to answer questions more precisely than our theories can frame them. In short, I am glad Da wrote “The Computational Case” because her argument has ended up demonstrating—as a social gesture—what its text denied: that questions about mathematical modeling are continuous with debates about interpretive theory.

TED UNDERWOOD is professor of information sciences and English at the University of Illinois, Urbana-Champaign. He has published in venues ranging from PMLA to the IEEE International Conference on Big Data and is the author most recently of Distant Horizons: Digital Evidence and Literary Change (2019).


DAY 3 RESPONSES


Lauren F. Klein

The knowledge that there are many important voices not represented in this forum has prompted me to think harder about the context for the lines I quoted at the outset of my previous remarks. Parham’s own model for “The New Rigor” comes from diversity work, and the multiple forms of labor—affective as much as intellectual—that are required of individuals, almost always women and people of color, in order to compensate for the structural deficiencies of the university. I should have provided that context at the outset, both to do justice to Parham’s original formulation, and because the same structural deficiencies are at work in this forum, as they are in the field of DH overall.

In her most recent response, Katherine Bode posed a series of crucial questions about why literary studies remains fixated on the “individualistic, masculinist mode of statistical criticism” that characterizes much of the work that Da takes on in her essay. Bode further asks why the field of literary studies has allowed this focus to overshadow so much of the transformative work that has been pursued alongside—and, at times, in direct support of––this particular form of computational literary studies.

But I think we also know the answers, and they point back to the same structural deficienciesthat Parham explores in her essay: a university structure that rewards certain forms of work and devalues others. In a general academic context, we might point to mentorship, advising, and community-building as clear examples of this devalued work. But in the context of the work discussed in this forum, we can align efforts to recover overlooked texts, compile new datasets, and preserve fragile archives, with the undervalued side of this equation as well. It’s not only that these forms of scholarship, like the “service” work described just above, are performed disproportionally by women and people of color. It is also that, because of the ways in which archives and canons are constructed, projects that focus on women and people of color require many more of these generous and generative scholarly acts. Without these acts, and the scholars who perform them, much of the formally-published work on these subjects could not begin to exist.

Consider Kenton Rambsy’s “Black Short Story Dataset,” a dataset creation effort that he undertook because his own research questions about the changing composition of African American fiction anthologies could not be answered by any existing corpus; Margaret Galvan’s project to create an archive of comics in social movements, which she has undertaken in order to support her own computational work as well as her students’ learning; or any number of the projects published with Small Axe Archipelagos, a born-digital journal edited and produced by a team of librarians and faculty that has been intentionally designed to be read by people who live in the Caribbean as well as for scholars who work on that region. These projects each involve sophisticated computational thinking—at the level of resource creation and platform development as well as of analytical method. They respond both to specific research questions and to larger scholarly need. They require work, and they require time.

It’s clear that these projects provide significant value to the field of literary studies, as they do to the digital humanities and to the communities to which their work is addressed. In the end, the absence of the voices of the scholars who lead these projects, both from this forum and from the scholarship it explores, offers the most convincing evidence of what—and who—is valued most by existing university structures; and what work—and what people—should be at the center of conversations to come.

LAUREN F. KLEIN is associate professor at the School of Literature, Media, and Communication, Georgia Institute of Technology.


Katherine Bode

Da’s is the first article (I’m aware of) to offer a statistical rejection of statistical approaches to literature. The exaggerated ideological agenda of earlier criticisms, which described the use of numbers or computers to analyze literature as neoliberal, neoimperialist, neoconservative, and more, made them easy to dismiss. Yet to some extent, this routinized dismissal instituted a binary in CLS, wherein numbers, statistics, and computers became distinct from ideology. If nothing else, this debate will hopefully demonstrate that no arguments––including statistical ones––are ideologically (or ethically) neutral.

But this realization doesn’t get us very far. If all arguments have ideological and ethical dimensions, then making and assessing them requires something more than proving their in/accuracy; more than establishing their reproducibility, replicability, or lack thereof. Da’s “Argument” response seemed to move us toward what is needed in describing the aim of her article as: “to empower literary scholars and editors to ask logical questions about computational and quantitative literary criticism should they suspect a conceptual mismatch between the result and the argument or perceive the literary-critical payoff to be extraordinarily low.” However, she closes that path down in allowing only one possible answer to such questions: “in practice” there can be no “payoff … [in terms of] literary-critical meaning, from these methods”; CLS “conclusions”––whether “corroborat[ing] or disprov[ing] existing knowledge”––are only ever “tautological at best, merely superficial at worse.”

Risking blatant self-promotion, I’d say I’ve often used quantification to show “something interesting that derives from measurements that are nonreductive.” For instance, A World of Fiction challenges the prevailing view that nineteenth-century Australian fiction replicates the legal lie of terra nullius by not representing Aboriginal characters, in establishing their widespread prevalence in such fiction; and contrary to the perception of the Australian colonies as separate literary cultures oriented toward their metropolitan centers, it demonstrates the existence of a largely separate, strongly interlinked, provincial literary culture.[1] To give just one other example from many possibilities, Ted Underwood’s “Why Literary Time is Measured in Minutes” uses hand-coded samples from three centuries of literature to indicate an acceleration in the pace of fiction.[2] Running the gauntlet from counting to predictive modelling, these arguments are all statistical, according to Da’s definition: “if numbers and their interpretation are involved, then statistics has come into play.” And as in this definition, they don’t stop with numerical results, but explore their literary critical and historical implications.

If what happens prior to arriving at a statistical finding cannot be justified, the argument is worthless; the same is true if what happens after that point is of no literary-critical interest. Ethical considerations are essential in justifying what is studied, why, and how. This is not––and should not be––a low bar. I’d hoped this forum would help build connections between literary and statistical ways of knowing. The idea that quantification and computation can only yield superficial or tautological literary arguments shows that we’re just replaying the same old arguments, even if both sides are now making them in statistical terms.

KATHERINE BODE is associate professor of literary and textual studies at the Australian National University. Her latest book, A World of Fiction: Digital Collections and the Future of Literary History (2018), offers a new approach to literary research with mass-digitized collections, based on the theory and technology of the scholarly edition. Applying this model, Bode investigates a transnational collection of around 10,000 novels and novellas, discovered in digitized nineteenth-century Australian newspapers, to offer new insights into phenomena ranging from literary anonymity and fiction syndication to the emergence and intersections of national literary traditions.

[1]Katherine Bode, A World of Fiction: Digital Collections and the Future of Literary History (Ann Arbor: University of Michigan Press, 2018).

[2]Ted Underwood, “Why Literary Time is Measured in Minutes,” ELH 85.2 (2018): 341–365.


Mark Algee-Hewitt

In 2010, as a new postdoctoral fellow, I presented a paper on James Thomson’s 1730 poem The Seasons to a group of senior scholars. The argument was modest: I used close readings to suggest that in each section of the poem Thomson simulated an aesthetic experience for his readers before teaching them how to interpret it. The response was mild and mostly positive. Six months later, having gained slightly more confidence, I presented the same project with a twist: I included a graph that revealed my readings to be based on a pattern of repeated discourse throughout the poem. The response was swift and polarizing: while some in the room thought that the quantitative methods deepened the argument, others argued strongly that I was undermining the whole field. For me, the experience was formative: the simple presence of numbers was enough to enrage scholars many years my senior, long before Digital Humanities gained any prestige, funding, or institutional support.

My experience suggests that this project passed what Da calls the “smell test”: the critical results remained valid, even without the supporting apparatus of the quantitative analysis. And while Da might argue that this proves that the quantitative aspect of the project was unnecessary in the first place, I would respectfully disagree. The pattern I found was the basis for my reading and to present it as if I had discovered it through reading alone was, at best, disingenuous. The quantitative aspect to my argument also allowed me to connect the poem to a larger pattern of poetics throughout the eighteenth century.  And I would go further to contend that just as introduction of quantification into a field changes the field, so too does the field change the method to suit its own ends; and that confirming a statistical result through its agreement with conclusions derived from literary historical methods is just as powerful as a null hypothesis test. In other words, Da’s “smell test” suggests a potential way forward in synthesizing these methods.

But the lesson I learned remains as powerful as ever: regardless of how they are embedded in research, regardless of who uses them, computational methods provoke an immediate, often negative, response in many humanities scholars. And it is worth asking why. Just as it is always worth reexamining the institutional, political, and gendered history of methods such as new history, formalism, and even close reading, so too is it important, as Katherine Bode suggests, to think through these same issues in Digital Humanities as a whole. And it is crucial that we do so without erasing the work of the new, emerging, and often structurally vulnerable members of the field that Lauren Klein highlights. These methods have a powerful appeal among emerging groups of students and young scholars. And to seek to shut down scholarship by asserting a blanket incompatibility between method and object is to do a disservice to the fascinating work of emerging scholars that is reshaping our critical practices and our understanding of literature.

MARK ALGEE-HEWITT is an assistant professor of English and Digital Humanities at Stanford University where he directs the Stanford Literary Lab. His current work combines computational methods with literary criticism to explore large scale changes in aesthetic concepts during the eighteenth and nineteenth centuries. The projects that he leads at the Literary Lab include a study of racialized language in nineteenth-century American literature and a computational analysis of differences in disciplinary style. Mark’s work has appeared in New Literary History, Digital Scholarship in the Humanities, as well as in edited volumes on the Enlightenment and the Digital Humanities.

11 Comments

Filed under Uncategorized

11 responses to “Computational Literary Studies: A Critical Inquiry Online Forum

  1. #academictwitter is very interested in the thought process of inviting Stanley Fish to respond. (He’s only tangentially-related to this field; and his screeds against an umbrella concept of Digital Humanities are very well known.) The only thing much of us can think is for click-bait. If there’s a valid reason, we’d like to know.

    • Would you tell #academictwitter that Stanley Fish is the author of the following two essays? They document an earlier debate that is directly (not tangentially) related to the present debate. Presumably that has something to do with the mysterious invitation.

      Fish, Stanley E. “What Is Stylistics and Why Are They Saying Such Terrible Things about It? — Part I.” Approaches to Poetics: Selected Papers from the English Institute, edited by Seymour Chatman, Columbia University Press, 1973, pp. 109–52.

      —. “What Is Stylistics and Why Are They Saying Such Terrible Things about It? — Part II.” Boundary 2, vol. 8, no. 1, 1979, pp. 129–46.

      Digital humanities enthusiasts never seem to object to the clickbait that benefits them!

    • Critical Inquiry asked Stanley Fish to participate in this forum for several reasons. First, he has written public commentaries about the digital humanities, including widely read pieces for the New York Times and the Chronicle of Higher Education. In our experience, he is also effective at synthesizing extensive discussions such as this one (certainly, he will not offer the only synthesis but it may be one that is valuable to a number of readers). Additionally, since nearly all of the ten other participants whom we invited are core digital humanities practitioners and supporters of the field, this was an attempt at maintaining a modicum of balance and plurality of perspectives. That is, we sought to include others, besides Nan Da, who have offered public criticism of the digital humanities in the past. Finally, Stanley Fish is a member of the extended Critical Inquiry editorial board. As such, we believe he is a scholar who can offer a bridge between the journal’s broader readership (which includes many scholars who are interested in timely developments and debates across humanistic subfields other than their own) and the more focused discussion taking place among specialists who work within computational literary studies.

      We do not see any of the respondents who are participating in this forum, whether they are specialists or generalists, as “clickbait.” We think that each one of these contributors is thoughtful and accomplished enough in their scholarship that they might attract readers to engage with this timely debate and respond with their own comments.

  2. maurizio lana

    if it is true that (cite from A. Piper, http://culturalanalytics.org/2019/04/do-we-know-what-we-are-doing/)
    “… despite Da’s aims of testing the reproducibility of computational literary research, her work fails to follow any of the procedures and practices established by replication projects like the OSC. Indeed, her article cites no relevant literature on the issue, suggesting little familiarity with the topic. [the list of invalid methods continues]”,
    why so much people are devoting so much time and effort to counter-argument the *invalid* arguments of Da?

    • Because her text appeared in a prestigious journal and its mix of detailled analysis of specific papers and general arguments could look convincing to an outsider – and the whole discussion is about the reputation of these people and their methods in the humanities in general.

  3. David Thieme

    Nan Z. Da succeeds here “to empower literary scholars and editors to ask logical questions about computational and quantitative literary criticism.” Da rightly interrogates the process by which statistical inference moves to literary model. Bracketing off the accuracy of Da’s specific critiques–of which I share some (but not all) of the reservations of the respondents–Da accurately points out how much statistical “wobble” can arise both in the choice of dataset and the algorithmic tools by which that dataset is evaluated. Other professions that work close to the divide between human and machine (e.g., economics, medicine) have, to a certain degree, standardized this methodological move between statistics and inference. These fields, through long years of struggles and mistakes, have come to some degree of intuition as to when one can accept statistical results prima facie and when one needs to view results with some degree of skepticism. Literary DH has not.

    What I do wish was emphasized more in Da’s essay is the fact that is entirely feasible (and perhaps desirable) to dissociate CLS with “literary DH” as a whole. CLS is preoccupied with statistics. In this humanists’ opinion, the interesting problems in literary DH are orthogonal to statistics. Alexa and Siri and countless other algorithms “read” us constantly, and they do so via processes that are not reducible to statistics. It probably does not come as a shock that these algorithms will continue to evolve without regard to this productive debate. But regardless of (and perhaps precisely because of) this indifference within the technical community, literary DH needs to come to terms with what “reading” means to these algorithms. Framing a DH argument about genre theory in a statistically significant manner is interesting. But when we are being interpreted by machines, it seems like people with expertise in interpretation and computation should be focused elsewhere.

    • bbenzon

      “Alexa and Siri and countless other algorithms “read” us constantly, and they do so via processes that are not reducible to statistics.”

      Entirely reducible to statistics? No. But both Alexa and Siri are pervaded by by a variety of statistical processes, certainly in speech recognition, but other processes as well.

  4. Pingback: More Responses to “The Computational Case against Computational Literary Studies”  | In the Moment

  5. Pingback: Sex and Numbers: Pleasure, Reproduction, and Digital Biopolitics – Jacqueline Wernimont

  6. Pingback: Summer 2019 Links | The Hyperarchival Parallax

Leave a Reply to Patrick Jagoda Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.