Jordan Brower and Scott Ganz
In “Publication, Power, and Patronage: On Inequality and Academic Publishing,” Chad Wellmon and Andrew Piper motivate their study in a laudable spirit: they seek to expose and root out elitism in the name of a more egalitarian and truly meritocratic academy. That the study at the same time makes a claim for more studies of its kind— “What we need in our view is not less quantification but more” (“P”)—seems justifiable based on the results it found. We find, then, an argument for the continued practice of the digital humanities (DH).
But this study is not DH as we typically understand the term. Wellmon and Piper are not producing new software or a digital archive, or offering an interpretation of a large corpus of books using quantitative methods. Rather, they are humanists making a claim about social organization, where the organization in question is their own field. This is an important distinction to make. Rather than holding their study to a research standard held by other digital humanists, we ought instead to evaluate their work using the rubrics of disciplines that answer similar kinds of questions.
Specifically, Wellmon and Piper assess the heterogeneity of university representation in top humanities journals as an indicator of the extent to which publication practices in the humanities are corrupted by “patterns and practices of patronage and patrimony and the tight circulation of cultural capital” (“P”). Perhaps unknowingly, the authors find themselves a part of a long and contentious literature in the social sciences and natural sciences over the creation and interpretation of metrics for diversity (and its opposite, concentration) that continues through the current decade. The authors put themselves into the shoes of ecologists seeking novel data in unexplored terrain. Traditional bibliometric indicators of status and concentration in the sciences that rely on citation and coauthorship lose traction in the humanities. As such, the authors seek to do what any good ecologist might: they go out into the field and count species.
In their analysis, the field is represented by articles published in four prominent humanities journals, and observations are individual articles. Observations are grouped into species by examining their university affiliation: is that finch Harvard crimson or Yale blue? Then the raw counts are aggregated into summary metrics that try to capture the concept of heterogeneity. The latter half of their paper presents conclusions drawn from their expedition.
The first two parts of this essay examine a pair of questions associated with this effort. First, how closely does Wellmon and Piper’s constructed measure of heterogeneity reflect what is usually meant by heterogeneity? Second, are the data collected representative of the field of the humanities that they seek to analyze? In our final section, we turn to a brief consideration of the broader cultural and political motivations for and implications of this study.
We conclude that the heterogeneity metric is inappropriate. We also worry that the data may not be representative of the field of the humanities due to numerous recording errors and a lack of conceptual clarity about what constitutes a publication. As two pillars of statistical analysis are the representativeness of the sample and the consistency of measure, we believe the study fails to achieve the level of methodological rigor demanded in other fields. There are many aspects of Wellmon and Piper’s study that live up to the highest standards of scientific method. Our criticism would not have been possible had the authors’ data and methods not been transparent or had the authors not willingly engaged in lengthy correspondence. However, the shortcomings of their quantitative analysis corrupt the foundations of their study’s conclusions.
Our essay is also a call for digital humanists to take seriously the multidisciplinary nature of their project. At a time when universities are clamoring to produce DH scholarship, it is imperative that humanities scholars subject that work to the same level of rigorous criticism that they apply to other types of arguments. At the same time, DH scholars must admit that the criticism they seek is different in kind. This is to say that in order to take DH work seriously, scholars must take the methods seriously, which means an investment in learning statistical methods and a push towards coauthorship with others willing to lend their expertise.
The latter half of Wellmon and Piper’s analysis measures the heterogeneity in the data they collect. Their “heterogeneity score,” which is the total number of unique universities divided by the total number of articles, seeks to capture a spectrum from “institutional homogeneity” to “institutional difference” (“P”). They justify their metric through reference to the similar type-token ratio metric of vocabulary richness.
There are two serious problems with Wellmon and Piper’s measure. The first is that heterogeneity is not synonymous with richness. Heterogeneity instead is associated with both richness and evenness. In the present context, richness refers to the number of unique universities represented in each journal. Evenness refers to the extent to which articles are equally distributed among the institutions represented. A good metric for heterogeneity should therefore increase with the number of universities represented and increase with evenness of representation across universities. Wellmon and Piper treat a journal that publishes authors from one university eleven times and authors from nine other universities one time each the same as a journal that publishes authors from ten universities two times each.
Another useful property of a heterogeneity metric is that it should not decline as the total number of observations increases. Whether the ecologist spends a day or a month counting species on a tropical island should not affect the assessed level of heterogeneity, on average. (That said, if an ecologist spends one month each on two different islands and records more observations on the first than the second, that might well indicate greater ecological diversity on the first island.) In this respect, the Wellmon-Piper heterogeneity metric also fails, because larger observation counts will mechanically produce lower scores indicating more homogeneity. (As Brian Richards notes, the type-token ratio, too, faces this shortcoming, which is why linguists assign their subjects a fixed number of tokens.) The probability of observing a first-time publisher decreases with each additional article recorded. Journals that publish more articles (such as PMLA) will therefore tend to have lower heterogeneity scores than those that publish fewer articles (like Representations).
The following thought experiment demonstrates this troubling property of Wellmon and Piper’s heterogeneity score. We take one thousand random samples of sixty articles from PMLA and Representations, journals that are indicated to be approximately equal in their heterogeneity score with respect to PhD institution. We then calculate the mean of Wellmon and Piper’s heterogeneity score across the samples as the number of articles grows from ten to sixty. Figure 1 displays the trend of the mean heterogeneity score for PMLA (black solid line) and Representations (red solid line) (fig. 1). For all article counts, PMLA is identified as considerably more heterogeneous than Representations. However, when evaluated at the average number of articles per year (represented for PMLA and Representations by the black and red dotted lines, respectively), Representations receives a higher mark for diversity (indicated by the fact that the dashed red line exceeds the dashed black line). This is the type of perverse outcome a metric for heterogeneity should seek to avoid.
Standard measures of diversity and concentration avoid these pitfalls. They decompose into a function of the equality of the shares across the groups represented and the total number of groups. They are not mechanically tied to the number of observations. One metric of concentration that has both of these characteristics is the Herfindahl-Hirschman Index (HHI). Use of the HHI and similar indices is widespread. For example, the HHI is used by the US Department of Justice when considering the competitiveness implications of potential mergers. The HHI is one of a class of metrics that are a function of the weighted sum of the shares of overall resources allocated to each group that is observed in a population. The standard HHI equals the sum of the squared market shares of each firm in an industry or, in our setting, the share of the number of articles published in a journal by authors from each university. The range of the HHI thus spans from 1/N to 1, where N is the number of firms in an industry. The inverse of the HHI (in other words, 1/HHI) is a commonly used measure of diversity (in ecology, the inverse of the HHI is referred to as a “Hill number”). This metric corresponds to the number of firms in an industry in which all firms have equal market share with the equivalent HHI as the one under observation. As such, it is often called the “effective number” of firms. Imagine an industry with four firms, one with half of the market share and the others with one sixth each. The HHI equals 1/22 +3· 1/62 = 1/3, which is the same as the HHI in an industry with three firms with equal market share. The effective number of firms is, therefore, three. Figure 2 recreates figure 1 using the effective number of universities metric (fig. 2). PMLA remains considerably more diverse with respect to PhD affiliation than Representations for all sample sizes. However, now the difference in the number of articles published per year creates a larger divergence between the estimated level of institutional diversity across the two journals.
Heterogeneity Comparisons Over Time and Across Journals
Using the effective number of universities metric changes many of the quantitative conclusions in the study. The trend toward journals publishing more articles over time and the differences between the count of articles published annually across the four journals leads Wellmon and Piper to mistakenly identify more recent and larger journals as less heterogeneous. For example, we reproduce Wellmon and Piper’s Figure 4, which examines the trend in heterogeneity across the four journals over time, using the effective number of universities metric in figure 3 (fig. 3). In the figure, the black line indicates the heterogeneity with respect to the authors’ current institution and the red line indicates heterogeneity with respect to the authors’ PhD institution. Wellmon and Piper’s graph indicates a long-term decline in heterogeneity, but little change since 1990. While we also observe little change since 1990, the “effective number of firms” metric indicates a longer-term trend towards greater diversity.
Similarly, we come to different conclusions about the relative level of heterogeneity across the four journals. In table 1, we present the effective number of universities for each journal both in terms of the current and PhD university affiliations of the authors, along with 95 percent confidence intervals, using the methodology in Chao and Jost.
We find that New Literary History (NLH) and PMLA are the most heterogeneous, both in terms of the author’s current and PhD affiliations. Representations is the least heterogeneous. Critical Inquiry (CI) falls in the middle. Unlike Wellmon and Piper, we find this ranking to be consistent across the types of author affiliation. The journals with more diverse institutional representation in terms of current author affiliation are also more diverse in terms of where the author received their PhD. However, we do observe that there is greater disparity across journals when examining the diversity of the authors’ current affiliation than the PhD affiliation.