Computational Literary Studies: Participant Forum Responses, Day 3

 

Lauren F. Klein

The knowledge that there are many important voices not represented in this forum has prompted me to think harder about the context for the lines I quoted at the outset of my previous remarks. Parham’s own model for “The New Rigor” comes from diversity work, and the multiple forms of labor—affective as much as intellectual—that are required of individuals, almost always women and people of color, in order to compensate for the structural deficiencies of the university. I should have provided that context at the outset, both to do justice to Parham’s original formulation, and because the same structural deficiencies are at work in this forum, as they are in the field of DH overall.

In her most recent response, Katherine Bode posed a series of crucial questions about why literary studies remains fixated on the “individualistic, masculinist mode of statistical criticism” that characterizes much of the work that Da takes on in her essay. Bode further asks why the field of literary studies has allowed this focus to overshadow so much of the transformative work that has been pursued alongside—and, at times, in direct support of––this particular form of computational literary studies.

But I think we also know the answers, and they point back to the same structural deficienciesthat Parham explores in her essay: a university structure that rewards certain forms of work and devalues others. In a general academic context, we might point to mentorship, advising, and community-building as clear examples of this devalued work. But in the context of the work discussed in this forum, we can align efforts to recover overlooked texts, compile new datasets, and preserve fragile archives, with the undervalued side of this equation as well. It’s not only that these forms of scholarship, like the “service” work described just above, are performed disproportionally by women and people of color. It is also that, because of the ways in which archives and canons are constructed, projects that focus on women and people of color require many more of these generous and generative scholarly acts. Without these acts, and the scholars who perform them, much of the formally-published work on these subjects could not begin to exist.

Consider Kenton Rambsy’s “Black Short Story Dataset,” a dataset creation effort that he undertook because his own research questions about the changing composition of African American fiction anthologies could not be answered by any existing corpus; Margaret Galvan’s project to create an archive of comics in social movements, which she has undertaken in order to support her own computational work as well as her students’ learning; or any number of the projects published with Small Axe Archipelagos, a born-digital journal edited and produced by a team of librarians and faculty that has been intentionally designed to be read by people who live in the Caribbean as well as for scholars who work on that region. These projects each involve sophisticated computational thinking—at the level of resource creation and platform development as well as of analytical method. They respond both to specific research questions and to larger scholarly need. They require work, and they require time.

It’s clear that these projects provide significant value to the field of literary studies, as they do to the digital humanities and to the communities to which their work is addressed. In the end, the absence of the voices of the scholars who lead these projects, both from this forum and from the scholarship it explores, offers the most convincing evidence of what—and who—is valued most by existing university structures; and what work—and what people—should be at the center of conversations to come.

LAUREN F. KLEIN is associate professor at the School of Literature, Media, and Communication, Georgia Institute of Technology.

1 Comment

Filed under Uncategorized

Computational Literary Studies: Participant Forum Responses, Day 2

 

Ted Underwood

More could be said about specific claims in “The Computational Case.” But frankly, this forum isn’t happening because literary critics were persuaded by (or repelled by) Da’s statistical arguments. The forum was planned before publication because the essay’s general strategy was expected to make waves. Social media fanfare at the roll-out made clear that rumors of a “field-killing” project had been circulating for months among scholars who might not yet have read the text but were already eager to believe that Da had found a way to hoist cultural analytics by its own petard—the irrefutable authority of mathematics.

That excitement is probably something we should be discussing. Da’s essay doesn’t actually reveal much about current trends in cultural analytics. But the excitement preceding its release does reveal what people fear about this field—and perhaps suggest how breaches could be healed.

While it is undeniably interesting to hear that colleagues have been anticipating your demise, I don’t take the rumored plans for field-murder literally. For one thing, there’s no motive: literary scholars have little to gain by eliminating other subfields. Even if quantitative work had cornered a large slice of grant funding in literary studies (which it hasn’t), the total sum of all grants in the discipline is too small to create a consequential zero-sum game.

The real currency of literary studies is not grant funding but attention, so I interpret excitement about “The Computational Case” mostly as a sign that a large group of scholars have felt left out of an important conversation. Da’s essay itself describes this frustration, if read suspiciously (and yes, I still do that). Scholars who tried to critique cultural analytics in a purely external way seem to have felt forced into an unrewarding posture—“after all, who would not want to appear reasonable, forward-looking, open-minded?” (p. 603). What was needed instead was a champion willing to venture into quantitative territory and borrow some of that forward-looking buzz.

Da was courageous enough to try, and I think the effects of her venture are likely to be positive for everyone. Literary scholars will see that engaging quantitative arguments quantitatively isn’t all that hard and does produce buzz. Other scholars will follow Da across the qualitative/quantitative divide, and the illusory sharpness of the field boundary will fade.

Da’s own argument remains limited by its assumption that statistics is an alien world, where humanistic guidelines like “acknowledge context” are replaced by rigid hypothesis-testing protocols. But the colleagues who follow her will recognize, I hope, that statistical reasoning is an extension of ordinary human activities like exploration and debate. Humanistic principles still apply here. Quantitative models can test theories, but they are also guided by theory, and they shouldn’t pretend to answer questions more precisely than our theories can frame them. In short, I am glad Da wrote “The Computational Case” because her argument has ended up demonstrating—as a social gesture—what its text denied: that questions about mathematical modeling are continuous with debates about interpretive theory.

TED UNDERWOOD is professor of information sciences and English at the University of Illinois, Urbana-Champaign. He has published in venues ranging from PMLA to the IEEE International Conference on Big Data and is the author most recently of Distant Horizons: Digital Evidence and Literary Change (2019).

1 Comment

Filed under Uncategorized

Computational Literary Studies: Participant Forum Responses, Day 2

 

Katherine Bode

The opening statements were fairly critical of Da’s article, less so of CLS. To balance the scales, I want to suggest that Da’s idiosyncratic definition of CLS is partly a product of problematic divisions within digital literary studies.

Da omits what I’d call digital literary scholarship: philological, curatorial, and media archaeological approaches to digital collections and data. Researchers who pursue these approaches, far from reducing all digit(al)ized literature(s) to word counts, maintain––like Da––that analyses based purely or predominantly on such features tend to produce “conceptual fallacies from a literary, historical, or cultural-critical perspective” (p. 604). Omitting such research is part of the way in which Da operationalizes her critique of CLS: defining the field as research that focuses on word counts, then criticizing the field as limited because focused on word counts.

But Da’s perspective is mirrored by many of the researchers she cites. Ted Underwood, for instance, describes “otiose debates about corpus construction” as “well-intentioned red herrings” that detract attention from the proper focus of digital literary studies on statistical methods and inferences.[1] Da has been criticized for propagating a male-dominated version of CLS. But those who pursue the methods she criticizes are mostly men. By contrast, much digital literary scholarship is conducted by women and/or focused on marginalized literatures, peoples, or cultures. The tendency in CLS to privilege data modeling and analysis––and to minimize or dismiss the work of data construction and curation––is part of the culture that creates the male dominance of that field.

More broadly, both the focus on statistical modelling of word frequencies in found datasets, and the prominence accorded to such research in our discipline, puts literary studies out of step with digital research in other humanities fields. In digital history, for instance, researchers collaborate to construct rich datasets––for instance, of court proceedings (as in The Proceedings of the Old Bailey)[2] or social complexity (as reported in a recent Nature article)[3]––that can be used by multiple researchers, including for noncomputational analyses. Where such research is statistical, the methods are often simpler than machine learning models (for instance, trends over time; measures of relationships between select variables) because the questions are explicitly related to scale and the aggregation of well-defined scholarly phenomena, not to epistemologically-novel patterns discerned among thousands of variables.

Some things I want to know: Why is literary studies so hung up on (whether in favor of, or opposed to) this individualistic, masculinist mode of statistical criticism? Why is this focus allowed to marginalize earlier, and inhibit the development of new, large-scale, collaborative environments for both computational and noncomputational literary research? Why, in a field that is supposedly so attuned to identity and inequality, do we accept––and foreground––digital research that relies on platforms (Google Books, HathiTrust, EEBO, and others) that privilege dominant literatures and literary cultures? What would it take to bridge the scholarly and critical––the curatorial and statistical––dimensions of (digital) literary studies and what alternative, shared futures for our discipline could result?

KATHERINE BODE is associate professor of literary and textual studies at the Australian National University. Her latest book, A World of Fiction: Digital Collections and the Future of Literary History (2018), offers a new approach to literary research with mass-digitized collections, based on the theory and technology of the scholarly edition. Applying this model, Bode investigates a transnational collection of around 10,000 novels and novellas, discovered in digitized nineteenth-century Australian newspapers, to offer new insights into phenomena ranging from literary anonymity and fiction syndication to the emergence and intersections of national literary traditions.

[1]Ted Underwood, Distant Horizons: Digital Evidence and Literary Change (Chicago: Chicago University Press, 2019): 180; 176.

[2]Tim Hitchcock, Robert Shoemaker, Clive Emsley, Sharon Howard and Jamie McLaughlin, et al., The Proceedings of the Old Bailey, http://www.oldbaileyonline.org, version 8.0, March 2018).

[3]Harvey Whitehouse, Pieter François, Patrick E. Savage, Thomas E. Currie, Kevin C. Feeney, Enrico Cioni, Rosalind Purcell, et al., “Complex Societies Precede Moralizing Gods Throughout World History,” Nature March 20 (2019): 1.

3 Comments

Filed under Uncategorized

Computational Literary Studies: Participant Forum Responses, Day 2

 

Argument

(This response follows Nan Da’s previous “Errors” response)

Nan Z Da

First, a qualification. Due to the time constraints of this forum, I can only address a portion of the issues raised by the forum participants and in ways still imprecise. I do plan to issue an additional response that addresses the more fine-grained technical issues.

“The Computational Case against Computational Literary Studies” was not written for the purposes of refining CLS. The paper does not simply call for “more rigor” or for replicability across the board. It is not about figuring out which statistical mode of inquiry best suits computational literary analysis. It is not a method paper; as some of my respondents point out, those are widely available.

The article was written to empower literary scholars and editors to ask logical questions about computational and quantitative literary criticism should they suspect a conceptual mismatch between the result and the argument or perceive the literary-critical payoff to be extraordinarily low.

The paper, I hope, teaches us to recognize two types of CLS work. First, there is statistically rigorous work that cannot actually answer the question it sets out to answer or doesn’t ask an interesting question at all. Second, there is work that seems to deliver interesting results but is either nonrobust or logically confused. The confusion sometimes issues from something like user error, but it is more often the result of the suboptimal or unnecessary use of statistical and other machine-learning tools. The paper was an attempt to demystify the application of those tools to literary corpora and to explain why technical errors are amplified when your goal is literary interpretation or description.

My article is the culmination of a long investigation into whether computational methods and their modes of quantitative analyses can have purchase in literary studies. My answer is that what drives quantitative results and data patterns often has little to do with the literary critical or literary historical claims being made by scholars that claim to be finding such results and uncovering such patterns—though it sometimes looks like it. If the conclusions we find in CLS corroborate or disprove existing knowledge, this is not a sign that they are correct but that they are tautological at best, merely superficial at worst.

The article is agnostic on what literary criticism ought to be and makes no prescriptions about interpretive habits. The charge that it takes a “purist” position is pure projection. The article aims to describe what scholarship ought not to be. Even the appeal to reading books in the last pages of the article does not presume the inherent meaningfulness of “actually reading” but only serves as a rebuttal to the use of tools that wish to do simple classifications for which human decision would be immeasurably more accurate and much less expensive.

As to the question of Exploratory Data Analysis versus Confirmatory Data Analysis: I don’t prioritize one over the other. If numbers and their interpretation are involved, then statistics has to come into play; I don’t know any way around this. If you wish to simply describe your data, then you have to show something interesting that derives from measurements that are nonreductive. As to the appeal to exploratory tools: if your tool will never be able to explore the problem in question, because it lacks power or is overfitted to its object, your exploratory tool is not needed.

It seems unobjectionable that quantitative methods and nonquantitative methods might work in tandem.  My paper is simply saying: that may be true in theory but it falls short in practice. Andrew Piper points us to the problem of generalization, of how to move from local to global, probative to illustrative. This is precisely the gap my article interrogates because that’s where the collaborative ideal begins to break down. One may call the forcible closing of that gap any number of things—a new hermeneutics, epistemology, or modality—but in the end, the logic has to clear.

My critics are right to point out a bind. The bind is theirs, however, not mine. My point is also that, going forward, it is not for me or a very small group of people to decide what the value of this work is, nor how it should be done.

Ed Finn accuses me of subjecting CLS to a double standard: “Nobody is calling in economists to assess the validity of Marxist literary analysis, or cognitive psychologists to check applications of affect theory, and it’s hard to imagine that scholars would accept the disciplinary authority of those critics.”

This is faulty reasoning. For one thing, literary scholars ask for advice and assessment from scholars in other fields all the time. For another, the payoff of the psychoanalytic reading, even as it seeks extraliterary meaning and validity, is not for psychology but for literary-critical meaning, where it succeeds or fails on its own terms. CLS wants to say, “it’s okay that there isn’t much payoff in our work itself as literary criticism, whether at the level of prose or sophistication of insight; the payoff is in the use of these methods, the description of data, the generation of a predictive model, or the ability for someone else in the future to ask (maybe better) questions. The payoff is in the building of labs, the funding of students, the founding of new journals, the cases made for tenure lines and postdoctoral fellowships and staggeringly large grants. When these are the claims, more than one discipline needs to be called in to evaluate the methods, their applications, and their result. Because printed critique of certain literary scholarship is generally not refuted by pointing to things still in the wings, we are dealing with two different scholarly models. In this situation, then, we should be maximally cross-disciplinary.

NAN Z. DA teaches literature at the University of Notre Dame.

 

2 Comments

Filed under Uncategorized

Computational Literary Studies: Participant Forum Responses, Day 2

 

Errors

Nan Z. Da

This first of two responses addresses errors, real and imputed; the second response is the more substantive.

1. There is a significant mistake in footnote 39 (p. 622) of my paper. In it I attribute to Hugh Craig and Arthur F. Kinney the argument that Marlowe wrote parts of some late Shakespeare plays after his (Marlowe’s) death. The attribution is incorrect. What Craig asks in “The Three Parts of Henry VI” (pp. 40-77) is whether Marlowe wrote segments of these plays. I would like to extend my sincere apologies to Craig and to the readers of this essay for the misapprehension that it caused.

2. The statement “After all, statistics automatically assumes” (p. 608) is incorrect. A more correct statement would be: In standard hypothesis testing a 95 percent confidence level means that, when the null is true, you will correctly fail to reject 95 percent of the time.

3. The description of various applications of text-mining/machine-learning (p. 620) as “ethically neutral” is not worded carefully enough. I obviously do not believe that some of these applications, such as tracking terrorists using algorithms, is ethically neutral. I meant that there are myriad applications of these tools: for good, ill, and otherwise. On balance it’s hard to assign an ideological position to them.

4. Ted Underwood is correct that, in my discussion of his article on “The Life Cycle of Genres,” I confused the “ghastly stew” with the randomized control sets used in his predictive modeling. Underwood also does not make the elementary statistical mistake I suggest he has made in my article (“Underwood should train his model on pre-1941” [p. 608]).

As to the charge of misrepresentation: paraphrasing a paper whose “single central thesis … is that the things we call ‘genres’ may be entities of different kinds, with different life cycles and degrees of textual coherence” is difficult. Underwood’s thesis here refers to the relative coherence of detective fiction, gothic, and science fiction over time, with 1930 as the cutoff point.

The other things I say about the paper remain true. The paper cites various literary scholars’ definitions of genre change, but its implicit definition of genre is “consistency over time of 10,000 frequently used terms.” It cannot “reject Franco Moretti’s conjecture that genres have generational cycles” (a conjecture that most would already find too reductive) because it is not using the same testable definition of genre or change.

5. Topic Modeling: my point isn’t that topic models are non-replicable but that, in this particular application, they are non-robust. Among other evidence: if I remove one document out of one hundred, the topics change. That’s a problem.

6. As far as Long and So’s essay “Turbulent Flow” goes, I need a bit more time than this format allows to rerun the alternatives responsibly. So and Long have built a tool in which there are thirteen features for predicting the difference between two genres—Stream of Consciousness and Realism. They say: most of these features are not very predictive alone but together become very predictive, with that power being concentrated in just one feature. I show that that one feature isn’t robust. To revise their puzzling metaphor: it’s as if someone claims that a piano plays beautifully and that most of that sound comes from one key. I play that key; it doesn’t work.

7. So and Long argue that by proving that their classifier misclassifies nonhaikus—not only using English translations of Chinese poetry, as they suggest, but also Japanese poetry that existed long before the haiku—I’ve made a “misguided decision that smacks of Orientalism. . . . It completely erases context and history, suggesting an ontological relation where there is none.” This is worth getting straight. Their classifier lacks power because it can only classify haikus with reference to poems quite different from haikus; to be clear, it will classify equally short texts with overlapping keywords close to haikus as haikus. Overlapping keywords is their predictive feature, not mine. I’m not sure how pointing this out is Orientalist. As for their model, I would if pushed say it is only slightly Orientalist, if not determinatively so.

8. Long and So claim that my “numbers cannot be trusted,” that my “critique . . . is rife with technical and factual errors”; in a similar vein it ends with the assertion that my essay doesn’t “encourag[e] much trust.”  I’ll admit to making some errors in this article, though not in my analyses of Long and So’s papers (the errors mostly occur in section 3). I hope to list all of these errors in the more formal response that appears in print or else in an online appendix. That said, an error is not the same as a specious insinuation that the invalidation of someone’s model indicates Orientalism, pigheadedness, and so on. Nor is an error the same as the claim that “CI asked Da to widen her critique to include female scholars and she declined” recently made by So, which is not an error but a falsehood.

NAN Z. DA teaches literature at the University of Notre Dame.

1 Comment

Filed under Uncategorized

Computational Literary Studies: Participant Forum Responses

 

Ted Underwood

In the humanities, as elsewhere, researchers who work with numbers often reproduce and test each other’s claims.Nan Z. Da’s contribution to this growing genre differs from previous examples mainly in moving more rapidly. For instance, my coauthors and I spent 5,800 words describing, reproducing, and partially criticizing one article about popular music.By contrast, Da dismisses fourteen publications that use different methods in thirty-eight pages. The article’s energy is impressive, and its long-term effects will be positive.

But this pace has a cost. Da’s argument may be dizzying if readers don’t already know the works summarized, as she rushes through explanation to get to condemnation. Readers who know these works will recognize that Da’s summaries are riddled with material omissions and errors. The time is ripe for a theoretical debate about computing in literary studies. But this article is unfortunately too misleading—even at the level of paraphrase—to provide a starting point for the debate.

For instance, Da suggests that my article “The Life Cycles of Genres”makes genres look stable only because it forgets to compare apples to apples: “Underwood should train his model on pre-1941 detective fiction (A) as compared to pre-1941 random stew and post-1941 detective fiction (B) as compared to post-1941 random stew, instead of one random stew for both” (p. 608).3

This perplexing critique tells me to do exactly what my article (and public code) make clear that I did: compare groups of works matched by publication date.4There is also no “random stew” in the article. Da’s odd phrase conflates a random contrast set with a ghastly “genre stew” that plays a different role in the argument.

More importantly, Da’s critique suppresses the article’s comparative thesis—which identifies detective fiction as more stable than several other genres—in order to create a straw man who argues that all genres “have in fact been more or less consistent from the 1820s to the present” (p. 609). Lacking any comparative yardstick to measure consistency, this straw thesis becomes unprovable. In other cases Da has ignored the significant results of an article, in order to pour scorn on a result the authors acknowledge as having limited significance—without ever mentioning that the authors acknowledge the limitation. This is how she proceeds with Jockers and Kirilloff (p. 610).

In short, this is not an article that works hard at holistic critique. Instead of describing the goals that organize a publication, Da often assumes that researchers were trying (and failing) to do something she believes they should have done. Topic modeling, for instance, identifies patterns in a corpus without pretending to find a uniquely correct description. Humanists use the method mostly for exploratory analysis. But Da begins from the assumption that topic modeling must be a confused attempt to prove hypotheses of some kind. So, she is shocked to discover (and spends a page proving) that different topics can emerge when the method is run multiple times. This is true. It is also a basic premise of the method, acknowledged by all the authors Da cites—who between them spend several pages discussing how results that vary can nevertheless be used for interpretive exploration. Da doesn’t acknowledge the discussion.

Finally, “The Computational Case” performs some crucial misdirection at the outset by implying that cultural analytics is based purely on linguistic evidence and mainly diction. It is true that diction can reveal a great deal, but this is a misleading account of contemporary trends. Quantitative approaches are making waves partly because researchers have learned to extract social relations from literature and partly because they pair language with external social testimony—for instance the judgments of reviewers.Some articles, like my own on narrative pace, use numbers entirely to describe the interpretations of human readers.Once again, Da’s polemical strategy is to isolate one strand in a braid, and critique it as if it were the whole.

A more inquisitive approach to cultural analytics might have revealed that it is not a monolith but an unfolding debate between several projects that frequently criticize each other. Katherine Bode, for instance, has critiqued other researchers’ data (including mine), in an exemplary argument that starts by precisely describing different approaches to historical representation.Da could have made a similarly productive intervention—explaining, for instance, how researchers should report uncertainty in exploratory analysis. Her essay falls short of that achievement because a rush to condemn as many examples as possible has prevented it from taking time to describe and genuinely understand its objects of critique.

TED UNDERWOOD is professor of information sciences and English at the University of Illinois, Urbana-Champaign. He has published in venues ranging from PMLA to the IEEE International Conference on Big Data and is the author most recently of Distant Horizons: Digital Evidence and Literary Change (2019).

1.Andrew Goldstone, “Of Literary Standards and Logistic Regression: A Reproduction,” January 4, 2016, https://andrewgoldstone.com/blog/2016/01/04/standards/. Jonathan Goodwin, “Darko Suvin’s Genres of Victorian SF Revisited,” Oct 17, 2016, https://jgoodwin.net/blog/more-suvin/.

2. Ted Underwood, “Can We Date Revolutions in the History of Literature and Music?”, The Stone and the Shell, October 3, 2015, https://tedunderwood.com/2015/10/03/can-we-date-revolutions-in-the-history-of-literature-and-music/ Ted Underwood, Hoyt Long, Richard Jean So, and Yuancheng Zhu, “You Say You Found a Revolution,” The Stone and the Shell, February 7, 2016, https://tedunderwood.com/2016/02/07/you-say-you-found-a-revolution/.

3. Nan Z. Da, “The Computational Case against Computational Literary Studies,” Critical Inquiry 45 (Spring 2019): 601-39.

4. Ted Underwood, “The Life Cycles of Genres,” Journal of Cultural Analytics, May 23, 2016, http://culturalanalytics.org/2016/05/the-life-cycles-of-genres/.

5. Eve Kraicer and Andrew Piper, “Social Characters: The Hierarchy of Gender in Contemporary English-Language Fiction,” Journal of Cultural Analytics, January 30, 2019, http://culturalanalytics.org/2019/01/social-characters-the-hierarchy-of-gender-in-contemporary-english-language-fiction/

6. Ted Underwood, “Why Literary Time is Measured in Minutes,” ELH 25.2 (2018): 341-65.

7. Katherine Bode, “The Equivalence of ‘Close’ and ‘Distant’ Reading; or, Toward a New Object for Data-Rich Literary History,” MLQ 78.1 (2017): 77-106.

 

1 Comment

Filed under Uncategorized

Computational Literary Studies: Participant Forum Responses

 

The Select

Andrew Piper

Nan Z. Da’s study published in Critical Inquiry participates in an emerging trend across a number of disciplines that falls under the heading of “replication.”[1] In this, her work follows major efforts in other fields, such as the Open Science Collaboration’s “reproducibility project,” which sought to replicate past studies in the field of psychology.[2] As the authors of the OSC collaboration write, the value of replication, when done well, is that it can “increase certainty when findings are reproduced and promote innovation when they are not.”

And yet despite arriving at sweeping claims about an entire field, Da’s study fails to follow any of the procedures and practices established by projects like the OSC.[3] While invoking the epistemological framework of replication—that is, to prove or disprove the validity of both individual articles as well as an entire field—her practices follow instead the time-honoured traditions of selective reading from the field of literary criticism. Da’s work is ultimately valuable not because of the computational case it makes (that work still remains to be done), but the way it foregrounds so many of the problems that accompany traditional literary critical models when used to make large-scale evidentiary claims. The good news is that this article has made the problem of generalization, of how we combat the problem of selective reading, into a central issue facing the field.

Start with the evidence chosen. When undertaking their replication project, the OSC generated a sample of one hundred studies taken from three separate journals within a single year of publication to approximate a reasonable cross-section of the field. Da on the other hand chooses “a handful” of articles (fourteen by my count) from different years and different journals with no clear rationale of how these articles are meant to represent an entire field. The point is not the number chosen but that we have no way of knowing why these articles and not others were chosen and thus whether her findings extend to any work beyond her sample. Indeed, the only linkage appears to be that these studies all “fail” by her criteria. Imagine if the OSC had found that 100 percent of articles sampled failed to replicate. Would we find their results credible? Da by contrast is surprisingly only ever right.

Da’s focus within articles exhibits an even stronger degree of nonrepresentativeness. In their replication project, the OSC establishes clearly defined criteria through which a study can be declared not to replicate, while also acknowledging the difficulty of arriving at this conclusion. Da by contrast applies different criteria to every article, making debatable choices, as well as outright errors, that are clearly designed to foreground differences.[4] She misnames authors of articles, mis-cites editions, mis-attributes arguments to the wrong book, and fails at some basic math.[5] And yet each of these assertions always adds-up to the same certain conclusion: failed to replicate. In Da’s hands, part is always a perfect representation of whole.

Perhaps the greatest limitation of Da’s piece is her extremely narrow (that is, nonrepresentative) definition of statistical inference and computational modeling. In Da’s view, the only appropriate way to use data is to perform what is known as significance testing, where we use a statistical model to test whether a given hypothesis is “true.”[6] There is no room for exploratory data analysis, for theory building, or predictive modeling in her view of the field.[7] This is particularly ironic given that Da herself performs no such tests. She holds others to standards to which she herself is not accountable. Nor does she cite articles where authors explicitly undertake such tests[8] or research that calls into question the value of such tests[9] or research that explores the relationship between word frequency and human judgments that she finds so problematic.[10] The selectivity of Da’s work is deeply out of touch with the larger research landscape.

All of these practices highlight a more general problem that has for too long gone unexamined in the field of literary study. How are we to move reliably from individual observations to general beliefs about things in the world? Da’s article provides a tour de force of the problems of selective reading when it comes to generalizing about individual studies or entire fields. Addressing the problem of responsible and credible generalization will be one of the central challenges facing the field in the years to come. As with all other disciplines across the university, data and computational modeling will have an integral role to play in that process.

ANDREW PIPER is Professor and William Dawson Scholar in the Department of Languages, Literatures, and Cultures at McGill University. He is the author most recently of Enumerations: Data and Literary Study (2018).

[1]Nan Z. Da, “The Computational Case Against Computational Literary Studies,” Critical Inquiry 45 (Spring 2019) 601-639. For accessible introductions to what has become known as the replication crisis in the sciences, see Ed Yong, “Psychology’s Replication Crisis Can’t Be Wished Away,” The Atlantic March 4, 2016.

[2]Open Science Collaboration, “Estimating the Reproducibility of Psychological Science,” Science 28 Aug 2015: Vol. 349, Issue 6251, aac4716.DOI: 10.1126/science.aac4716.

[3]Compare Da’s sweeping claims with the more modest ones made by the OSC in Science even given their considerably larger sample and far more rigorous effort at replication, reproduced here. For a discussion of the practice of replication, see Brian D. Earp and David Trafimow, “Replication, Falsification, and the Crisis of Confidence in Social Psychology,” Frontiers in Psychology May 19, 2015: doi.org/10.3389/fpsyg.2015.00621.

[4]For a list, see Ben Schmidt, “A computational critique of a computational critique of a computational critique.” I provide more examples in the scholarly response here: Andrew Piper, “Do We Know What We Are Doing?Journal of Cultural Analytics, April 1, 2019.

[5]She cites Mark Algee-Hewitt as Mark Hewitt, cites G. Casella as the author of Introduction to Statistical Learning when it was Gareth James, cites me and Andrew Goldstone as co-authors in the Appendix when we were not, claims that “the most famous example of CLS forensic stylometry” was Hugh Craig and Arthur F. Kinney’s book that advances a theory of Marlowe’s authorship of Shakespeare’s plays which they do not, and miscalculates the number of people it would take to read fifteen thousand novels in a year. The answer is 1250 not 1000 as she asserts. This statistic is also totally meaningless.

[6]Statements like the following also suggest that she is far from a credible guide to even this aspect of statistics: “After all, statistics automatically assumes that 95 percent of the time there is no difference and that only 5 percent of the time there is a difference. That is what it means to look for p-value less than 0.05.” This is not what it means to look for a p-value less than 0.05. A p-value is the estimated probability of getting our observed data assuming our null hypothesis is true. The smaller the p-value, the more unlikely it is to observe what we did assuming our initial hypothesis is true. The aforementioned 5% threshold says nothing about how often there will be a “difference” (in other words, how often the null hypothesis is false). Instead, it says: “if our data leads us to conclude that there is a difference, we estimate that we will be mistaken 5% of the time.” Nor does “statistics” “automatically” assume that .05 is the appropriate cut-off. It depends on the domain, the question and the aims of modeling. These are gross over-simplifications.

[7]For reflections on literary modeling, see Andrew Piper, “Think Small: On Literary Modeling.” PMLA 132.3 (2017): 651-658; Richard Jean So, “All Models Are Wrong,” PMLA 132.3 (2017); Ted Underwood, “Algorithmic Modeling: Or, Modeling Data We Do Not Yet Understand,” The Shape of Data in Digital Humanities: Modeling Texts and Text-based Resources, eds. J. Flanders and F. Jannidis (New York: Routledge, 2018).

[8]See Andrew Piper and Eva Portelance, “How Cultural Capital Works: Prizewinning Novels, Bestsellers, and the Time of Reading,” Post-45 (2016); Eve Kraicer and Andrew Piper, “Social Characters: The Hierarchy of Gender in Contemporary English-Language Fiction,” Journal of Cultural Analytics, January 30, 2019. DOI: 10.31235/osf.io/4kwrg; and Andrew Piper, “Fictionality,” Journal of Cultural Analytics, Dec. 20, 2016. DOI: 10.31235/osf.io/93mdj.

[9]The literature debating the values of significance testing is vast. See Simmons, Joseph P., Leif D. Nelson, and Uri Simonsohn. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22, no. 11 (November 2011): 1359–66. doi:10.1177/0956797611417632.

 [10]See Rens Bod, Jennifer Hay, and Stefanie Jannedy, Probabilistic Linguistics (Cambridge, MA: MIT Press, 2003); Dan Jurafsky and James Martin, “Vector Semantics,” Speech and Language Processing, 3rd Edition (2018): https://web.stanford.edu/~jurafsky/slp3/6.pdf; for the relation of communication to information theory, M.W. Crocker, Demberg, V. & Teich, E. “Information Density and Linguistic Encoding,” Künstliche Intelligenz 30.1 (2016) 77-81. https://doi.org/10.1007/s13218-015-0391-y; and for the relation to language acquisition and learning, Erickson  LC, Thiessen  ED, “Statistical learning of language: theory, validity, and predictions of a statistical learning account of language acquisition,” Dev. Rev. 37 (2015): 66–108.doi:10.1016/j.dr.2015.05.002.

1 Comment

Filed under Uncategorized