Jordan Brower and Scott Ganz
In “Publication, Power, and Patronage: On Inequality and Academic Publishing,” Chad Wellmon and Andrew Piper motivate their study in a laudable spirit: they seek to expose and root out elitism in the name of a more egalitarian and truly meritocratic academy. That the study at the same time makes a claim for more studies of its kind— “What we need in our view is not less quantification but more” (“P”)—seems justifiable based on the results it found. We find, then, an argument for the continued practice of the digital humanities (DH).
But this study is not DH as we typically understand the term. Wellmon and Piper are not producing new software or a digital archive, or offering an interpretation of a large corpus of books using quantitative methods. Rather, they are humanists making a claim about social organization, where the organization in question is their own field. This is an important distinction to make. Rather than holding their study to a research standard held by other digital humanists, we ought instead to evaluate their work using the rubrics of disciplines that answer similar kinds of questions.
Specifically, Wellmon and Piper assess the heterogeneity of university representation in top humanities journals as an indicator of the extent to which publication practices in the humanities are corrupted by “patterns and practices of patronage and patrimony and the tight circulation of cultural capital” (“P”). Perhaps unknowingly, the authors find themselves a part of a long and contentious literature in the social sciences and natural sciences over the creation and interpretation of metrics for diversity (and its opposite, concentration) that continues through the current decade. The authors put themselves into the shoes of ecologists seeking novel data in unexplored terrain. Traditional bibliometric indicators of status and concentration in the sciences that rely on citation and coauthorship lose traction in the humanities. As such, the authors seek to do what any good ecologist might: they go out into the field and count species.
In their analysis, the field is represented by articles published in four prominent humanities journals, and observations are individual articles. Observations are grouped into species by examining their university affiliation: is that finch Harvard crimson or Yale blue? Then the raw counts are aggregated into summary metrics that try to capture the concept of heterogeneity. The latter half of their paper presents conclusions drawn from their expedition.
The first two parts of this essay examine a pair of questions associated with this effort. First, how closely does Wellmon and Piper’s constructed measure of heterogeneity reflect what is usually meant by heterogeneity? Second, are the data collected representative of the field of the humanities that they seek to analyze? In our final section, we turn to a brief consideration of the broader cultural and political motivations for and implications of this study.
We conclude that the heterogeneity metric is inappropriate. We also worry that the data may not be representative of the field of the humanities due to numerous recording errors and a lack of conceptual clarity about what constitutes a publication. As two pillars of statistical analysis are the representativeness of the sample and the consistency of measure, we believe the study fails to achieve the level of methodological rigor demanded in other fields. There are many aspects of Wellmon and Piper’s study that live up to the highest standards of scientific method. Our criticism would not have been possible had the authors’ data and methods not been transparent or had the authors not willingly engaged in lengthy correspondence. However, the shortcomings of their quantitative analysis corrupt the foundations of their study’s conclusions.
Our essay is also a call for digital humanists to take seriously the multidisciplinary nature of their project. At a time when universities are clamoring to produce DH scholarship, it is imperative that humanities scholars subject that work to the same level of rigorous criticism that they apply to other types of arguments. At the same time, DH scholars must admit that the criticism they seek is different in kind. This is to say that in order to take DH work seriously, scholars must take the methods seriously, which means an investment in learning statistical methods and a push towards coauthorship with others willing to lend their expertise.
The latter half of Wellmon and Piper’s analysis measures the heterogeneity in the data they collect. Their “heterogeneity score,” which is the total number of unique universities divided by the total number of articles, seeks to capture a spectrum from “institutional homogeneity” to “institutional difference” (“P”). They justify their metric through reference to the similar type-token ratio metric of vocabulary richness.
There are two serious problems with Wellmon and Piper’s measure. The first is that heterogeneity is not synonymous with richness. Heterogeneity instead is associated with both richness and evenness. In the present context, richness refers to the number of unique universities represented in each journal. Evenness refers to the extent to which articles are equally distributed among the institutions represented. A good metric for heterogeneity should therefore increase with the number of universities represented and increase with evenness of representation across universities. Wellmon and Piper treat a journal that publishes authors from one university eleven times and authors from nine other universities one time each the same as a journal that publishes authors from ten universities two times each.
Another useful property of a heterogeneity metric is that it should not decline as the total number of observations increases. Whether the ecologist spends a day or a month counting species on a tropical island should not affect the assessed level of heterogeneity, on average. (That said, if an ecologist spends one month each on two different islands and records more observations on the first than the second, that might well indicate greater ecological diversity on the first island.) In this respect, the Wellmon-Piper heterogeneity metric also fails, because larger observation counts will mechanically produce lower scores indicating more homogeneity. (As Brian Richards notes, the type-token ratio, too, faces this shortcoming, which is why linguists assign their subjects a fixed number of tokens.) The probability of observing a first-time publisher decreases with each additional article recorded. Journals that publish more articles (such as PMLA) will therefore tend to have lower heterogeneity scores than those that publish fewer articles (like Representations).
The following thought experiment demonstrates this troubling property of Wellmon and Piper’s heterogeneity score. We take one thousand random samples of sixty articles from PMLA and Representations, journals that are indicated to be approximately equal in their heterogeneity score with respect to PhD institution. We then calculate the mean of Wellmon and Piper’s heterogeneity score across the samples as the number of articles grows from ten to sixty. Figure 1 displays the trend of the mean heterogeneity score for PMLA (black solid line) and Representations (red solid line) (fig. 1). For all article counts, PMLA is identified as considerably more heterogeneous than Representations. However, when evaluated at the average number of articles per year (represented for PMLA and Representations by the black and red dotted lines, respectively), Representations receives a higher mark for diversity (indicated by the fact that the dashed red line exceeds the dashed black line). This is the type of perverse outcome a metric for heterogeneity should seek to avoid.
Standard measures of diversity and concentration avoid these pitfalls. They decompose into a function of the equality of the shares across the groups represented and the total number of groups. They are not mechanically tied to the number of observations. One metric of concentration that has both of these characteristics is the Herfindahl-Hirschman Index (HHI). Use of the HHI and similar indices is widespread. For example, the HHI is used by the US Department of Justice when considering the competitiveness implications of potential mergers. The HHI is one of a class of metrics that are a function of the weighted sum of the shares of overall resources allocated to each group that is observed in a population. The standard HHI equals the sum of the squared market shares of each firm in an industry or, in our setting, the share of the number of articles published in a journal by authors from each university. The range of the HHI thus spans from 1/N to 1, where N is the number of firms in an industry. The inverse of the HHI (in other words, 1/HHI) is a commonly used measure of diversity (in ecology, the inverse of the HHI is referred to as a “Hill number”). This metric corresponds to the number of firms in an industry in which all firms have equal market share with the equivalent HHI as the one under observation. As such, it is often called the “effective number” of firms. Imagine an industry with four firms, one with half of the market share and the others with one sixth each. The HHI equals 1/22 +3· 1/62 = 1/3, which is the same as the HHI in an industry with three firms with equal market share. The effective number of firms is, therefore, three. Figure 2 recreates figure 1 using the effective number of universities metric (fig. 2). PMLA remains considerably more diverse with respect to PhD affiliation than Representations for all sample sizes. However, now the difference in the number of articles published per year creates a larger divergence between the estimated level of institutional diversity across the two journals.
Heterogeneity Comparisons Over Time and Across Journals
Using the effective number of universities metric changes many of the quantitative conclusions in the study. The trend toward journals publishing more articles over time and the differences between the count of articles published annually across the four journals leads Wellmon and Piper to mistakenly identify more recent and larger journals as less heterogeneous. For example, we reproduce Wellmon and Piper’s Figure 4, which examines the trend in heterogeneity across the four journals over time, using the effective number of universities metric in figure 3 (fig. 3). In the figure, the black line indicates the heterogeneity with respect to the authors’ current institution and the red line indicates heterogeneity with respect to the authors’ PhD institution. Wellmon and Piper’s graph indicates a long-term decline in heterogeneity, but little change since 1990. While we also observe little change since 1990, the “effective number of firms” metric indicates a longer-term trend towards greater diversity.
Similarly, we come to different conclusions about the relative level of heterogeneity across the four journals. In table 1, we present the effective number of universities for each journal both in terms of the current and PhD university affiliations of the authors, along with 95 percent confidence intervals, using the methodology in Chao and Jost.
We find that New Literary History (NLH) and PMLA are the most heterogeneous, both in terms of the author’s current and PhD affiliations. Representations is the least heterogeneous. Critical Inquiry (CI) falls in the middle. Unlike Wellmon and Piper, we find this ranking to be consistent across the types of author affiliation. The journals with more diverse institutional representation in terms of current author affiliation are also more diverse in terms of where the author received their PhD. However, we do observe that there is greater disparity across journals when examining the diversity of the authors’ current affiliation than the PhD affiliation.
The foregoing analysis therefore suggests that, far from being merely “different,” as Wellmon and Piper suggest in their thirty-fourth footnote, the HHI and its close cousins better reflect the concepts of concentration and heterogeneity than the index the authors invent. Our analysis using the effective number of firms controverts the most provocative conclusion they draw: that representation of institutions in major humanities publications has become less diverse over time. But the quality of our own data analysis is contingent on the quality of the underlying data. We are concerned that the data, in its current state, is sufficiently error-laden to call into question any claims the authors wish to make.
As an entry point for this criticism, we highlight some egregious errors evident from cursory examination of the data. The article title “Women’s Speech and Silence in Hartmann von Aue’s Erec” is attributed to one hundred PMLA articles ranging from 1970 to 2007. The Erec instance is only the most glaring of such issues. Others include: errors of duplication – or triplication, in the case of Laurent Dubreuil’s “What is Literature’s Now?”; errors of omission such as Leonard B. Meyer’s fifty-five page “Concerning the Sciences, the Arts: And the Humanities” in the inaugural issue of CI; and what can only be called errors of identification, as in the case where Lindsay Waters, the Harvard UP humanities editor and author of “The Crisis in Scholarly Publishing” is identified as female.
This last instance, in addition to being an innocent mistake, points to a more fundamental problem: Wellmon and Piper fail to adequately answer the logically prior question that undergirds their study: what is a publication? In footnote thirty-three, they write, “Only research articles five pages or longer were selected from JSTOR. Our manual additions were intended to include only articles and not book reviews, but these could include shorter pieces such as critical responses. Our aim was to capture as broad a profile as possible of contributors to these journals. We removed editors’ introductions and interviews” (“P”). Despite their desire to capture only “research articles,” many included entries clearly do not qualify as such. “The Crisis in Scholarly Publishing,” for instance, is the name of a subsection of PMLA’s “Forum,” which is a collection of letters to the editor. Waters’s entry is a bit over a page long; how, given that Wellmon and Piper only included articles of five pages or longer, could this be the case? The issue, it seems, is that JSTOR collects all six of the responses under the subheading and considers the composite unit an “article” of six pages. In Wellmon and Piper’s dataset, five authors—David Galef, Richard M. Berrong, Waters, Kirby Olson, and Wendell V. Harris—are given credit for writing an article titled “The Crisis in Scholarly Publishing”; one author in the section— William B. Hunter—is not.
The inconsistency of inclusion is one matter; whether any of these pieces of writing ought to be considered “research articles” is another. Our inclination is to say that they ought not to be, but the burden is on Wellmon and Piper to explain why they are. To take another example: in its 1998 issue, PMLA solicited responses to a “call for comments on the status and reputation of PMLA outside North America.” Twenty-eight scholars wrote in at roughly one page each; twenty-three of those letters are included as individual articles in Wellmon and Piper’s database. A final instance, included for irony’s sake: In 1988, PMLA published a “guest column” by Stanley Fish about peer review called “No Bias, No Merit.” The column was actually written in 1979—when PMLA was switching over to blind peer review—but was published nine years later along with a postscript accounting for changes that occurred during that interval. The Fish article is not included in the database. However, some eight responses are included, some as short as a few sentences, such as Sieglinde Lug’s, which we produce in its entirety here:
To the Editor:
In the framework of the humanities, Fish’s argument is the equivalent of the capitalist’s stance in economics. A major problem with both is the assumption that if only you “labor in the vineyards” you will reap the fruits. Women, minorities, and in general those who do not cultivate the “right” connections know otherwise. A major journal, for example, will not publish their nine-year-old articles with afterwords attached. It will ask for a rewritten version.
A reversal of the anonymous-submission policy would cause a drastic decline in submissions by excellent but unknown writers; in competition with a Fish, the cards are stacked against them, or—as the German phrase goes-—“sie können gegen den Fish nicht anstinken.”
The problem of article definition exceeds PMLA. Wellmon and Piper suggest that a difference between blind peer-reviewed (NLH and PMLA) and nonblind peer-reviewed publications (CI and Representations) might account for the differences in institutional diversity between the journals (see “P”). However, this rough distinction does not adequately account for the heterogeneity in editorial practices across the four journals. Mollie Washburne, managing editor of NLH, informed us that the journal compiles “one or two special issues a year; for these, we solicit articles from authors whose scholarship has made substantial contributions to the topic of the special issue.” Washburne goes on to say that all solicited articles are subject to review, that such articles are on occasion rejected if unsuitable, and, further, that the journal also welcomes unsolicited submissions for these special issues. The situation is similar at PMLA, the other notionally blind-review publication; as noted in the front matter of each issue of the journal, “The editor and the Editorial Board periodically invite studies and commentaries by specific authors on topics of wide interest. These contributions appear in the following series: Theories and Methodologies, The Changing Profession, The Book Market, The Journal World, Letters from Librarians, and Correspondents at Large.” According to the PMLA staff, “Theories and Methodologies” articles range from 3,500 to 5,000 words, and those in “The Changing Profession” are typically around 2,000 words. Alternatively, articles in PMLA are longer, at a maximum of 9,000 words, and are subject to blind review. Ought these pieces— subject to different standards of assessment and acceptance and procured in different ways—all be considered equivalent “research articles”?
And as for Representations and CI: is it any surprise that the two universities that run these journals—Berkeley and Chicago, respectively—are the two most represented institutions in Wellmon and Piper’s list? Twenty-four of the twenty-eight articles by W.J.T. Mitchell—including the Autumn 2012 piece “Preface to ‘Occupy’: Three Inquiries in Disobedience” (despite the authors’ claim that they removed introductions)—appeared in CI, which he has edited since 1978. For some perspective, those twenty-four articles are about half as many as are attributed to Indiana University, the twentieth most-published university in the “author affiliation” category. Of the 246 Chicago faculty appearances in the journals studied, 196 of those instances were in CI (and 13.5 percent of all articles in CI); of the 329 Berkeley faculty appearances, 173 were in Representations (23.7 percent of all articles in Representations). The fact that these journals give preference to their own faculty seems less a pernicious and opaque form of informal patronage as an overt and transparent case of home field advantage.
So, why these articles? Why these journals? The answer is that Wellmon and Piper believe these journals to be prestigious, and therefore the writings inside them are consecrated as prestigious as well. But what does “prestige” mean to them? Nowhere is this defined, and as a result the authors rely on some notion of general sentiment within the academy. But this just won’t do. Is a “Critical Response”—like this essay—as prestigious, on balance, as the article to which it responds? Is a note in “The Changing Profession” section of PMLA as prestigious as a much longer and blind-reviewed article in that journal? Our suspicion is, probably not. But, absent a way to capture prestige, we’re left wondering what exactly it is that we’re counting.
As a necessarily limited critique, this essay cannot address all the points of contention that Wellmon and Piper’s study raises. We bracket, for instance, their condensed history of academic writing’s relation to prestige, leaving that work to scholars better versed in the development of the European and especially the German university; likewise, we do not present certain ready-to-hand arguments about the novelty of this study’s claims (one can read the entries at the Humanities Journals Wiki for often lively assertions about open and closed shops and other issues). Nor do we consider some commonsense explanations for a disparity in university representation in certain journals: that more prestigious universities can afford to recruit and maintain talent; that prestigious universities require more publication from faculty to climb the tenure ladder; that more prestigious universities tend to be in environments where people prefer to live (and, as places dense in scholars, are themselves more ideal environments for study and writing); and so on.
But we would be remiss if we did not at least acknowledge this study’s interesting position within the consensus cultural politics of the humanities in the United States. Alas, to do so is to court controversy because this study’s key terms are “inequality” and “diversity.”
Eleven years after Walter Benn Michaels wrote “diversity has become virtually a sacred concept in American life today,” Wellmon and Piper invoke the concept to justify their study. Their use of the term, however, is different from what Michaels analyzed a decade ago; where Michaels described the American valorization of ethnic or racial identity at the expense of class difference, Wellmon and Piper take a more economically inflected approach by attempting to expose differences in institutional representation. But the sociocultural and political desideratum of diversity nonetheless informs and in some sense authorizes their study. The intention is good—more voices from less fancy places—but good intentions too often pave undesirable roads.
For there is a dangerous slippage here from diversity of ethnicities or cultures to diversity of ideas within a structure that intends to produce and maintain knowledge. Indeed, there is a pronounced imbalance in the essay between “inequality” and “quality.” Wellmon and Piper only take up the issue of quality in their last few pages through an invocation of Bourdieu and a rhetorical question framed in the negative: “And yet how can we be certain that such imagined epistemic quality is not in some way contaminated by those very networks of influence and patronage that produce it” (“P”)? They go on: “the observed hierarchies are so pronounced that it would be naïve to assume that elite institutions are disproportionally better at filtering knowledge than all other universities” (“P”). We will start by admitting that nothing is certain to us from the last two passages. The authors’ implicit hypothesis, we believe, is that elite networks and patronage have led to a decoupling of objective quality (measured by the capacity to filter knowledge) and perceived (or imagined) quality. But, where is the test? Where is the attempt to actually measure quality and its distribution across institutions so that readers can learn the extent of our collective naïveté? Worse, their heterogeneity metric demands that the numbers look bad; as we show above, positing maximal institutional diversity as an ideal skews the representation of the world we live in and prevents us from finding good solutions to the problems we certainly have.
Organizational sociologists define status (one component of prestige) as arising from “accumulated acts of deference” that have the potential to drive a wedge between perceived quality and objective quality. To the extent that deference to institutional privilege leads observers to substitute an author’s academic affiliation for a critical reading of their work, the field is undoubtedly the worse off. Furthermore, journals should be applauded for taking risks with new authors advocating new ideas. True novelty is unlikely from a pool of homogeneous students trained in the same manner by the same faculty. Yet, false equality is also undesirable. The best way to combat a status ranking that diverges from objective quality is to make it easier for audiences to observe objective quality. If you want an unbiased measure of the quality of wine, for example, use a blind taste test. This, in theory, is a service being performed by the editorial boards of journals and the process of peer review. However, if the criteria for publication in major journals begins to reflect something other than quality—for example, institutional diversity for its own sake—readers seeking quality will look for other signals. We can safely conjecture that institutional affiliation will be high on the list.
To their credit, the authors acknowledge that they don’t yet have all the answers they seek: “What remains unclear is the relationship of this system to the quality and diversity of ideas, indeed to the ways in which the very ideas of quality and diversity might be imagined to intersect” (“P”). However, they stymie their attempt at doing so in their own study by substituting diversity for quality. When they write, comparing their heterogeneity metric to the HHI, “[b]ut the important point here is that no one score accounts for the entirety of the problem. Each score captures a different aspect of the problem while missing others,” they substitute an evident truth—of course no single metric can capture everything about a multifarious issue—for the more arduous work of determining whether one metric is preferable to another (“P”). That is to say, they substitute the logic of difference for the logic of disagreement.
Jordan Brower is a lecturer on history and literature at Harvard University. Scott Ganz is assistant professor at the Georgia Institute of Technology in the School of Public Policy.
NOTE: This essay originally responded to the version of Wellmon and Piper’s essay posted on each of their websites on 18 Jan. 18 2017 and reported on two days later by Inside Higher Ed. After correspondence with the authors in which we discussed many of the issues raised here, Wellmon and Piper updated their data and essay. We received their updates on 4 April 2017 and adjusted our essay to take into account their changes
 See Chad Wellmon and Andrew Piper, “Publication, Power, and Patronage: On Inequality and Academic Publishing,” Critical Inquiry, criticalinquiry.uchicago.edu; hereafter abbreviated “P.”
 See Marshall Hall and Nicholas Tideman, “Measures of Concentration,” Journal of the American Statistical Association 62 (Mar. 1967): 162–68; P. E. Hart, “Entropy and Other Measures of Concentration,” Journal of the Royal Statistical Society 134 (1971): 73–85; and Albert O. Hirschman, “The Paternity of an Index,” The American Economic Review 54 (Sept. 1964): 761.
 See M. O. Hill, “Diversity and Evenness: A Unifying Notation and its Consequences,” Ecology 54 (Mar. 1973): 427–32; Stuard H. Hurlbert, “The Nonconcept of Species Diversity: A Critique and Alternative Parameters,” Ecology 52 (Jul. 1971): 577-86; and E. H. Simpson, “Measurement of Diversity,” Nature 163 (April 1949): 688.
 See Anne Chao and Lou Jost, “Estimating Diversity and Entropy Profiles via Discovery Rates of New Species,” Methods in Ecology and Evolution 6 (Aug. 2015): 873–82; Sönke Hoffmann and Andreas Hoffmann, “Is There a ‘True’ Diversity?” Ecological Economics 65 (April 2008): 213-15; and Jost, “Entropy and Diversity,” Oikos 113 (May 2006): 363–75.
 See Vincent Larivière, Yves Gingras, and Éric Archambault, “Canadian Collaboration Networks: A Comparative Analysis of the Natural Sciences, Social Sciences and the Humanities,” Scientometrics 68 (Sept. 2006): 519–33.
 See Hall and Tideman, “Measures of Concentration,” and Hurlbert, “The Nonconcept of Species Diversity.”
 See Brian Richards, “Type/Token Ratios: What Do They Really Tell Us?” Journal of Child Language 14 (June 1987): 201-9.
 Jacob A. Bikker and Katharina Haaf, “Measures of Competition and Concentration in the Banking Industry: A Review of the Literature,” Economic and Financial Modeling 9 (Summer 2002): http://www.dnb.nl/en/binaries/ot027_tcm47-146045.pdf. Other metrics of this type are the Entropy index, Rosenbluth index, and Hill-Tideman index. The Simpson index, commonly used in ecology, is identical to the HHI.
 Chao and Jost derive an estimator of the variance of the estimate of the effective number of species that includes an adjustment for the bias caused by under-sampling of species in a finite sample (see Chao and Jost, “Estimating diversity and entropy profiles via discovery rates of a new species”).
 Patrick M. McConeghy, “Women’s Speech and Silence in Hartmann von Aue’s Erec,” PMLA 102 (Oct. 1987): 772–83.
 See Laurent Dubreuil, “What Is Literature’s Now?” New Literary History 38 (Winter 2007): 43–70; Leonard B. Meyer, “Concerning the Sciences, the Arts: And the Humanities,” Critical Inquiry 1 (Sept. 1974): 163–217; and Richard M. Berrong et al., “The Crisis in Scholarly Publishing,” PMLA 118 (Oct. 2003): 1338–43.
 See Stanley Fish, “No Bias, No Merit: The Case against Blind Submission,” PMLA 103 (Oct. 1988): 739–48.
 Sieglinde Lug, “Fish on Blind Submission,” PMLA 104 (Mar. 1989): 217-18.
 Author’s personal email. Mollie Washburne to Jordan Brower, 24 Apr. 2017.
 See W. J. T. Mitchell, “Preface to ‘Occupy: Three Inquiries in Disobedience,’” Critical Inquiry 39 (Autumn 2012): 1–7.
 “Literary Studies Journals,” Humanities Journals Wiki, humanitiesjournals.wikia.com/wiki/Literary_Studies_Journals
 Walter Benn Michaels, The Trouble with Diversity: How We Learned to Love Identity and Ignore Inequality (New York, 2006), p. 12.
 Michael Sauder, Freda Lynn, and Joel M. Podolny, “Status: Insights from Organizational Sociology,” Annual Review of Sociology 38 (Aug. 2012): 267–83.