Methods: How to Study What Is Not Being Studied

Executive Summary

This paper addresses the central methodological challenge of neglect studies: how to make rigorous, defensible claims about scholarly attention that is missing rather than present. The challenge is genuine. Claims of neglect are easy to make, and the field’s credibility depends on developing standards that distinguish substantiated claims from impressionistic ones. The paper proposes that neglect studies should be methodologically pluralist by necessity, drawing on bibliometric and scientometric mapping, expert elicitation, historical and archival recovery, comparative analysis, counterfactual reasoning, and qualitative interviewing — but that the pluralism must be disciplined by shared standards rather than left as a license for any method to be acceptable.

The paper develops six methodological domains in turn, identifies the strengths and limitations of each, addresses the problem of triangulation across methods, and proposes a tiered standard of evidence — exploratory, substantiated, and rigorous — that the field can use to characterize the confidence warranted by particular claims. It concludes with a discussion of the risks specific to neglect-studies methodology, including the temptations toward circularity, motivated identification, and the substitution of researcher preferences for the preferences of absent constituencies.

1. Introduction

The previous paper proposed a working definition of scholarly neglect and a taxonomy of seven overlapping categories of it. Both the definition and the taxonomy presupposed that neglect can be identified — that it is possible to say, with some confidence, that a particular question or area is receiving substantially less attention than would be warranted, and that the deficit is attributable to mechanisms other than considered judgment. The presupposition is the methodological problem this paper addresses.

The problem has three structural features that shape what follows. First, the object of study is defined negatively, by absence rather than presence. Ordinary empirical methods are designed to study what exists; studying what does not exist requires inversions and indirections that are not standard methodological territory. Second, claims about absence are difficult to falsify in any straightforward way, because the absence of evidence is consistent with both genuine absence and with measurement failure. The field must develop ways to distinguish these. Third, the field’s claims have consequences. To assert that a question is neglected is implicitly to recommend that attention be redirected toward it, and the field’s interventions in research-policy decisions will be only as good as the methods that support them.

The argument of this paper is that no single method is adequate to the task, that the methods that are available have well-understood strengths and limitations that can be combined productively, and that the field’s professional standards should require triangulation across methods rather than reliance on any one of them. The paper does not propose a single canonical methodology; it proposes a methodological framework within which several approaches can be applied and combined, with a tiered standard of evidence that lets readers assess how seriously to take any particular claim.

A note on what this paper is not. It is not a technical handbook. The bibliometric, computational, and qualitative methods discussed below have their own substantial methodological literatures, and the field will need to engage with those literatures in depth. The present paper is a strategic statement of which methods the field needs, how they relate, and what standards should govern their use. The technical work belongs in subsequent methodological papers, in the field’s eventual handbook, and in the training curricula that Paper 6 will address.

2. Bibliometric and Scientometric Mapping

The bibliometric methods developed in library and information science over the past half-century constitute the field’s most mature technical infrastructure. The standard techniques — citation analysis, co-citation analysis, bibliographic coupling, topic modeling, co-occurrence networks, term-frequency analysis — were developed to map the structure of active research literatures, and they can be adapted to map the negative space around those structures.

The adaptation requires two inversions of the standard analytical move. The first is to ask not which topics are most central in a literature but which adjacent topics are conspicuously sparse given the structure of what surrounds them. A topic-modeling analysis of a field will identify clusters of dense activity; a neglect-studies analysis asks what lies between and beyond those clusters and whether the gaps are explicable on intellectual grounds or whether they appear arbitrary. The second inversion is to compare the actual distribution of attention to a counterfactual distribution constructed from external evidence about importance — disease burden, economic significance, expressed public concern, theoretical centrality — and to identify the points of largest divergence. Both inversions are technically feasible with existing tools, though neither has been pursued at scale as a primary research program.

The strengths of bibliometric methods for neglect studies are several. They are quantitative and replicable, which gives them credibility with audiences skeptical of more interpretive approaches. They scale to large literatures, which allows the field to identify patterns that would be invisible at the level of individual case studies. They produce findings that can be visualized and communicated to non-specialist audiences. And they rest on infrastructure that already exists and is maintained by other parties, which lowers the field’s overhead.

The limitations are equally important and must be stated explicitly, because the field’s credibility will suffer if it overclaims for its quantitative tools. The standard bibliometric databases — Web of Science, Scopus, Dimensions, OpenAlex — have well-documented coverage biases by language, region, discipline, and outlet type. Work published outside the dominant English-language journals is systematically underrepresented, which means that any analysis using these databases as a complete map of attention will misidentify peripheral inquiries (in the taxonomy’s sense) as more neglected than they are. The databases also undercount work published in books, in non-indexed journals, in gray literature, and in non-Western languages. The implication is not that bibliometric methods are unusable but that bibliometric findings about apparent neglect always require validation against other sources before being treated as established.

A second limitation is that bibliometric methods measure published output, which is a downstream proxy for attention rather than a direct measure of it. A question may have substantial attention from researchers without producing much published output — because the work is in early stages, because results have been negative, because the publication system has biases against the relevant topic, or because the researchers have chosen to disseminate findings through other channels. Counting publications will misidentify such cases as neglect. The corrective is to combine publication counts with measures of upstream attention — grant funding, conference presentations, doctoral dissertations in progress, working papers — which are partially available through other databases but require integration work that is not yet mature.

A third limitation is that bibliometric methods are good at identifying the presence of clusters but less good at characterizing the questions those clusters are not asking. A topic-modeling analysis can show that a literature consists of certain themes; it cannot directly show which themes are absent without an external standard against which to compare. The construction of that external standard — what should this literature be addressing, given the state of the world it is studying? — is not a bibliometric task. It requires the methods addressed in the next sections.

3. Expert Elicitation

Expert elicitation methods produce structured information about scholarly opinion through procedures designed to reduce the biases inherent in informal consultation. The methods include the Delphi technique, in which a panel of experts responds to iterative rounds of structured questions with feedback between rounds; the nominal group technique, in which experts generate and rank items in a structured face-to-face setting; horizon-scanning exercises, in which experts identify emerging issues that established literatures have not yet incorporated; and the priority-setting partnership methodology developed by the James Lind Alliance and adapted by others.[^1]

For neglect studies, expert elicitation serves three distinct functions. The first is the identification of candidate cases — areas that knowledgeable observers consider neglected, which can then be examined by other methods to determine whether the perception is warranted. The second is the calibration of bibliometric findings — when a bibliometric analysis identifies an apparent gap, expert elicitation can establish whether the gap is real or whether it reflects measurement limitations. The third is the documentation of mechanisms — experts can often articulate why a question is neglected in ways that bibliometric data cannot, because the mechanisms operate through career incentives, funding decisions, and professional norms that experts experience directly.

The strengths of expert elicitation are that it taps tacit knowledge unavailable in published sources, that it can address questions for which quantitative data do not yet exist, and that it produces findings expressed in the working vocabulary of the relevant research community. The methods have been validated extensively in health-research priority-setting and in environmental horizon-scanning, with track records that allow assessment of their predictive value.

The limitations are substantial and the field must take them seriously. Expert panels reflect the composition of the expert community, which is itself shaped by the same attention patterns the field is trying to study. Asking established researchers in a discipline what their discipline neglects is a useful exercise, but it is also a limited one, because the questions most invisible to a discipline are precisely those that its trained members are least likely to identify. The corrective is to recruit panels that include scholars from adjacent disciplines, practitioners who use the field’s outputs, members of affected communities, and what the priority-setting literature calls outsiders within — scholars whose institutional position gives them a particular vantage on the field’s blind spots.

A second limitation is that expert elicitation produces information about perceived neglect, which may or may not correspond to actual neglect. Perceptions are shaped by the salience of particular issues at the moment of elicitation, by recent controversies, by ongoing political debates, and by the framing of the elicitation procedure itself. The field must distinguish between expert claims that are well-substantiated by other evidence and expert claims that may reflect transient perceptions. The methodological standard should be that expert elicitation findings are treated as candidate hypotheses rather than as established conclusions, and that the hypotheses are tested against other methods before being incorporated into the field’s claims.

A third limitation is specific to constituency-less questions (in the taxonomy of Paper 1). Expert elicitation can only solicit input from those who are present to provide it, which means that questions whose affected parties are future generations, the cognitively impaired, animals, or otherwise unable to articulate concerns cannot be reached by standard expert procedures. The field will need to develop methods for representing the interests of absent parties — drawing on analogous problems in policy analysis, ethics, and law — without simply substituting the researcher’s preferences for the constituency’s. This is an open methodological problem that the field will not solve quickly.

4. Historical and Archival Recovery

The third methodological domain is historical recovery: the systematic investigation of research programs, questions, and lines of inquiry that were active at earlier periods and have since been abandoned, suppressed, or forgotten. The methods are those of intellectual history, archival research, and the history of science, applied to the specific question of identifying recoverable knowledge.

The category most directly served by historical methods is abandoned research programs (in the Paper 1 taxonomy). For each such program, the recovery work involves documenting what was known when the program was active, why it was abandoned, what subsequent developments bear on whether the abandonment was warranted, and what the costs and benefits of revival would be. The work draws on published literatures, on archival materials in institutional libraries and personal papers, on oral histories with retired scholars, and on the technical literatures of the eras in question.

The strengths of historical methods are that they produce detailed accounts that allow careful assessment of particular cases, that they can identify recoverable knowledge that would be invisible to other methods, and that they connect the field to the established traditions of the history of science and intellectual history. Several of the field’s most plausible early applied successes will likely come from historical recovery, because the work of documenting a previously active program is more tractable than the work of building a new one and yields clearer recommendations.

The limitations are characteristic of historical methods in general. The work is labor-intensive and does not scale easily, which means that historical recovery can address only a small number of cases at a time and must be selective about which cases to pursue. The work depends on archival access, which is uneven and sometimes politically constrained. The judgment about whether an abandoned program was abandoned legitimately requires substantive expertise in the program’s subject matter, which means that historical recovery in neglect studies will typically require collaboration between historians and domain specialists.

A particular methodological hazard for the field is the temptation toward what might be called recovery enthusiasm — the tendency to find that abandoned programs were abandoned wrongly, because the scholar who has invested in recovering them has incentives to argue for their merit. The corrective is procedural: serious recovery work must include explicit assessment of the case for the abandonment being correct, must consult skeptics of the program’s revival, and must reach conclusions that are defensible against the strongest available criticism rather than against straw-man versions of it. The field should reward recovery studies that conclude against revival as readily as those that conclude for it, because the former are evidence of methodological discipline and the latter are professionally easier to produce.

5. Comparative Analysis

Comparative methods examine patterns of attention across different research systems, disciplines, eras, or institutional contexts, with the aim of identifying which patterns are general features of scholarly inquiry and which are contingent on particular conditions. The relevant comparisons include cross-national (research portfolios in different countries with different funding systems and disciplinary structures), cross-disciplinary (how the same underlying question is treated in different fields), cross-temporal (how attention to a topic has varied over time), and cross-institutional (how funding agencies, journal portfolios, or university programs differ in their coverage).

For neglect studies, comparative analysis serves a specific evidentiary function: it allows the field to distinguish necessary features of scholarly attention from contingent ones. A topic that is neglected in one national research system but actively studied in another is unlikely to be neglected because of any intrinsic feature of the topic; the neglect must be attributable to features of the system, and the comparison provides leverage for identifying those features. A topic that is neglected across all systems is more likely to face structural barriers that are general to the scholarly enterprise. The comparison does not by itself establish that the neglect is warranted or unwarranted, but it constrains the space of possible explanations.

The strengths of comparative analysis are that it disciplines causal claims about the mechanisms of neglect, that it identifies natural experiments in research-system design that the field can learn from, and that it connects neglect studies to the comparative-policy literatures in adjacent fields. The methods are well-developed in political science and policy studies and require only modest adaptation for neglect-studies purposes.

The limitations are familiar to anyone who has worked with comparative data. The categories used to describe research portfolios are not standardized across systems, which means that comparisons require careful translation work. The data are often not strictly comparable in their coverage, definitions, or quality. The mechanisms identified by comparison are often correlational rather than causal, and the field must be cautious about claiming more than the comparison supports.

A specific application of comparative methods that deserves emphasis is the comparison between current research portfolios and historical ones. The history of science provides natural experiments in which different generations of scholars have addressed the same underlying questions with different priorities, methods, and frames. Comparing what was studied a century ago to what is studied now can identify both questions that have been productively retired and questions that have been dropped without adequate replacement. The historical comparison is particularly valuable because it is largely free of contemporary political stakes, which makes it useful for demonstrating the field’s methods in cases where the conclusions are less contested.

6. Counterfactual Reasoning

Counterfactual reasoning addresses the question that the other methods circle around without directly answering: what would the expected value of attention to a neglected question be if attention were redirected to it? The question is unavoidable, because the field’s interventions in research policy implicitly presuppose answers to it. A recommendation that funding be shifted toward a neglected area is a recommendation that the expected value of that shift exceeds the expected value of the alternative uses of the same resources, and the recommendation cannot be made responsibly without some attempt to estimate the magnitudes involved.

The methods available for counterfactual reasoning in neglect studies are imperfect but not negligible. Bayesian decision analysis, as developed in health-technology assessment and in the value-of-information literature, provides a framework for estimating the expected returns to research investment under uncertainty.[^2] Expert elicitation can produce probability distributions over the outcomes of hypothetical research programs, which can then be combined with cost estimates to produce expected-value calculations. Historical analogues — cases of previously neglected areas that received subsequent investment and produced documented returns — can be used to calibrate expectations about the returns to currently neglected areas. The cause-specific work in global health, on disability-adjusted life-years averted per dollar invested, provides a template for the kind of structured estimation that the field can pursue.[^3]

The strengths of counterfactual reasoning are that it forces explicit articulation of the magnitudes that the field’s policy recommendations presuppose and that it produces estimates that can be compared across cases. The estimates are subject to substantial uncertainty, but they are better than implicit assumptions about expected value that go unexamined.

The limitations are also substantial. Counterfactual estimates depend on assumptions that are themselves often poorly grounded, and the estimates can be highly sensitive to choices that are made early in the analysis. The methods work better for cases where outcomes can be measured in standard units — lives, dollars, time — than for cases where the relevant outcomes are intellectual, cultural, or otherwise difficult to quantify. Many of the most important neglected questions in the humanities and in foundational areas of the sciences fall into the latter category, and the field must develop appropriate methods for these cases without overreaching into quantification where quantification is not honest.

The field’s standard for counterfactual reasoning should be that estimates are produced transparently, with explicit statement of the assumptions on which they rest, and that they are accompanied by sensitivity analyses that show how the estimates change under plausible variations in those assumptions. Estimates that are not robust under sensitivity analysis should be reported as such, and the field should resist the temptation to convert highly uncertain estimates into single numbers that imply more confidence than the underlying analysis supports.

7. Qualitative and Ethnographic Methods

The methods discussed so far address the patterns and magnitudes of neglect; qualitative and ethnographic methods address its mechanisms and lived experience. Interviews with scholars about questions they have considered and not pursued, ethnographic study of how research priorities are set in funding agencies and laboratories, narrative reconstruction of how research programs end, and discourse analysis of how disciplines describe their own legitimate questions all produce findings that quantitative methods cannot reach.[^4]

For neglect studies, qualitative methods are particularly important for the sensitive-questions category (in Paper 1’s taxonomy). The mechanisms by which scholars decline to pursue politically or commercially sensitive questions operate through professional networks, informal communications, and tacit understandings that do not appear in published sources or in formal funding records. The only way to document these mechanisms with any depth is through careful interviewing with researchers who have either pursued such questions and paid the costs or considered pursuing them and declined.

The strengths of qualitative methods are that they capture mechanisms invisible to other approaches, that they produce findings expressed in the participants’ own categories and vocabulary, and that they generate hypotheses that other methods can then test at larger scale. The interviews-with-scholars genre in particular has been productive in the history-of-science literature and can be adapted to neglect-studies purposes with limited modification.

The limitations include all the familiar concerns about qualitative work: limited generalizability, dependence on the interpretive skills of the researcher, vulnerability to selection effects in who agrees to be interviewed, and difficulty in establishing the field-wide patterns to which individual interview material speaks. The corrective is the same as for the other methods: triangulation. Interview findings should be treated as hypotheses about mechanisms that other methods can examine at larger scale, rather than as conclusions in themselves.

A specific methodological problem deserves explicit treatment. Interviews about why scholars do not pursue particular questions are vulnerable to several kinds of distortion. Scholars may have post-hoc rationalizations for choices that were originally made for less articulable reasons. Scholars may overstate or understate the role of professional costs depending on their current institutional position. Scholars may be reluctant to acknowledge that they have declined to pursue questions for prudential reasons. The methodological standard should require that interview findings be triangulated against documentary evidence — funding records, publication patterns, departmental decisions — wherever possible, and that the limitations of the interview method be acknowledged explicitly in any findings that rest substantially on it.

8. Triangulation and the Tiered Standard of Evidence

The argument of the preceding sections is that no single method is adequate to support rigorous claims about neglect and that the field’s standards should require triangulation across methods. The argument now needs to be made more specific: what does triangulation mean in practice, and what standard of evidence should the field require for claims of different kinds?

The proposal is for a tiered standard with three levels.

The first tier is exploratory. A claim at this level is supported by one method — typically bibliometric mapping or expert elicitation — and is offered as a candidate hypothesis for further investigation. Exploratory claims should be clearly labeled as such, should not form the basis for policy recommendations, and should be understood as identifying cases that warrant additional methodological work rather than as establishing that the cases are genuinely neglected.

The second tier is substantiated. A claim at this level is supported by at least two independent methods that produce consistent findings — for example, a bibliometric analysis showing apparent neglect combined with expert elicitation confirming the perception, or historical recovery documenting an abandoned program combined with substantive assessment that the abandonment was unwarranted. Substantiated claims can support cautious policy recommendations, with explicit acknowledgment of the limitations of the methods used.

The third tier is rigorous. A claim at this level is supported by multiple methods spanning different methodological domains — quantitative mapping, expert elicitation, historical or comparative analysis, and where appropriate counterfactual estimation — that produce findings consistent with one another. Rigorous claims can support substantive policy recommendations and can be cited in subsequent work without further methodological caveats, though the underlying methods and their limitations should remain documented.

The tiered standard is intended to do two things. The first is to give the field a vocabulary for distinguishing among claims of different evidentiary status, which is necessary if the field’s outputs are to be usable by readers who are not themselves methodological specialists. The second is to create professional incentives for methodologically careful work, by giving the field’s most consequential standing to claims that have undergone the most scrutiny. The tiered standard should be implemented in the field’s eventual flagship journal, in its handbook, and in the training curricula that Paper 6 will address.

9. Risks Specific to Neglect-Studies Methodology

Three methodological risks are specific enough to the field that they deserve explicit treatment.

The first is circularity. The field’s claims about which questions are neglected rest in part on assessments of which questions are important, and assessments of importance are themselves shaped by the same attention patterns the field is trying to study. A topic that has been neglected for a generation will not be widely recognized as important, and a methodologically naive analysis may conclude that it is appropriately neglected because no one is arguing that it is important. The corrective is to ground importance judgments in evidence that is partially independent of current scholarly attention — disease burden, expressed public concern, theoretical centrality in cognate fields that have not neglected the question, historical evidence that earlier generations considered the question important. None of these is fully independent of attention patterns, but the combination of several partially-independent grounds is more robust than reliance on any one.

The second is motivated identification. Scholars who study neglect have professional incentives to find it, because identifying neglected questions is the field’s central output. The incentive structure creates a risk parallel to the file-drawer problem in experimental research: cases of apparent neglect that turn out on examination to be appropriate deprioritization may be less likely to be reported than cases that confirm the neglect hypothesis. The corrective is procedural: the field’s professional standards should require that scholars report cases they have investigated and concluded not to be neglected, and the field’s flagship outlets should publish such cases on terms comparable to those for positive findings. This is a difficult standard to maintain in practice, because negative findings are inherently less interesting to readers, but the field’s long-term credibility depends on it.

The third risk is specific to constituency-less questions. The field’s interest in such questions requires it to make judgments on behalf of parties who are not present to make judgments for themselves, and the line between representing absent constituencies and substituting the researcher’s preferences for theirs is genuinely difficult to draw. The corrective is partial and procedural: scholars working on constituency-less cases should be explicit about which parties they take themselves to be representing, should articulate the grounds on which they take the representation to be legitimate, should consult adjacent constituencies and proxies wherever possible, and should subject their conclusions to scrutiny from scholars whose underlying commitments differ. The procedure does not eliminate the difficulty, but it makes the difficulty visible, which is a precondition for handling it responsibly.

10. Conclusion

This paper has proposed that neglect studies should be methodologically pluralist by necessity, drawing on bibliometric mapping, expert elicitation, historical recovery, comparative analysis, counterfactual reasoning, and qualitative methods. It has argued that the pluralism must be disciplined by triangulation across methods and by a tiered standard of evidence that distinguishes exploratory, substantiated, and rigorous claims. It has identified three risks specific to the field’s methodology — circularity, motivated identification, and the representation of absent constituencies — and proposed procedural correctives for each.

What remains undone after this paper is the technical work of operationalizing each method in detail, of developing standard procedures for triangulation, and of validating the tiered standard against actual cases. That work belongs in subsequent methodological papers, in the field’s handbook, and in the methodological training that any serious doctoral program in neglect studies would need to provide.

The methodological agenda outlined here is demanding, and the field will face pressure to relax it. The pressure will come from scholars who have invested in particular cases and want to advance them without the methodological scaffolding the standards require, from policy audiences who want clear recommendations on faster timelines than careful triangulation allows, and from the field’s own institutional pressures to produce visible outputs. Resisting that pressure is the central professional discipline that neglect studies will need to maintain. The field’s claim on serious attention rests on its methods being more rigorous than the impressionistic gap-claiming that characterizes much existing literature on research priorities, and the moment that distinction collapses, the field’s reason for existing collapses with it.

Paper 3 takes up the academic infrastructure — programs, journals, conferences, curricula — within which methodologically serious neglect studies could be sustained.

Notes

[^1]: The Delphi method has a substantial methodological literature dating to the original RAND Corporation work in the 1950s and 1960s; Linstone and Turoff (2002) is the standard reference. The James Lind Alliance methodology is documented in Cowan and Oliver (2021). Horizon-scanning methods are reviewed in Sutherland and Woodroof (2009) for ecology and conservation, with parallel applications in other fields.

[^2]: The value-of-information literature in health technology assessment has produced extensive methodology for estimating the expected returns to research investment under uncertainty; see Claxton et al. (2002) for a foundational treatment and Steuten et al. (2013) for a more recent review.

[^3]: The disability-adjusted life-year framework, developed in the Global Burden of Disease project, provides a template for the kind of structured comparison across health conditions that neglect studies will need to develop for its own purposes; see Murray et al. (2012) for the underlying methodology.

[^4]: The ethnographic study of scientific practice has a substantial tradition in science and technology studies; Latour and Woolgar (1986) and Knorr-Cetina (1999) are the foundational monographs, with subsequent work in many directions. The specific application to research-priority-setting in funding agencies is less developed; Lamont (2009) on peer review in the social sciences is among the most relevant adjacent studies.

References

Claxton, K., Sculpher, M., & Drummond, M. (2002). A rational framework for decision making by the National Institute for Clinical Excellence. The Lancet, 360(9334), 711–715.

Cowan, K., & Oliver, S. (2021). The James Lind Alliance guidebook (Version 10). James Lind Alliance.

Knorr-Cetina, K. (1999). Epistemic cultures: How the sciences make knowledge. Harvard University Press.

Lamont, M. (2009). How professors think: Inside the curious world of academic judgment. Harvard University Press.

Latour, B., & Woolgar, S. (1986). Laboratory life: The construction of scientific facts (2nd ed.). Princeton University Press.

Linstone, H. A., & Turoff, M. (Eds.). (2002). The Delphi method: Techniques and applications. Addison-Wesley. (Original work published 1975)

Murray, C. J. L., Ezzati, M., Flaxman, A. D., Lim, S., Lozano, R., Michaud, C., Naghavi, M., Salomon, J. A., Shibuya, K., Vos, T., Wikler, D., & Lopez, A. D. (2012). GBD 2010: Design, definitions, and metrics. The Lancet, 380(9859), 2063–2066.

Steuten, L., van de Wetering, G., Groothuis-Oudshoorn, K., & Retèl, V. (2013). A systematic and critical review of the evolving methods and applications of value of information in academia and practice. PharmacoEconomics, 31(1), 25–48.

Sutherland, W. J., & Woodroof, H. J. (2009). The need for environmental horizon scanning. Trends in Ecology & Evolution, 24(10), 523–527.