Data Infrastructure: Mapping the Negative Space


Executive Summary

This paper addresses the data infrastructure that neglect studies will require to function as a mature field. The premise is that a discipline whose central business is the systematic identification of underexplored questions cannot proceed on the basis of impressionistic claims alone; it requires standing data resources that allow scholars to map the distribution of attention across the research landscape, to identify candidate cases of neglect with defensible empirical grounding, and to track changes over time. The infrastructure must serve both the field’s internal research agenda and its applied engagement with research-governing institutions, and it must do so under constraints — coverage biases in existing databases, the difficulty of measuring absence rather than presence, the ethical complications of qualitative data — that the field shares with adjacent enterprises but that it must address with its own resources.

The paper develops six proposals. The first is a standing public dashboard of research distribution, drawing on existing bibliometric infrastructure but inverting the standard analytical questions to produce maps of negative space. The second is a structured registry of neglected questions, building on the proposal introduced in Paper 3 but specified here in greater detail as a data resource. The third is an archive of abandoned research programs, preserving documentation that would otherwise be lost as founding scholars retire and institutional records are discarded. The fourth is a set of common data elements — shared definitions, instruments, and codebooks — that allow studies conducted by different scholars in different settings to produce findings that aggregate meaningfully. The fifth is a federated data-access model that respects the privacy, intellectual property, and political constraints under which different data sources operate. The sixth is a qualitative archive of oral histories, interviews, and ethnographic materials that preserve the tacit knowledge of scholars whose careers have included engagement with neglected questions.

The paper concludes with a discussion of governance, sustainability, and the relationship between the field’s data infrastructure and the broader open-science movement on which much of it will depend.


1. Introduction

The methodological argument of Paper 2 was that no single method is adequate to support rigorous claims about neglect and that the field’s standards should require triangulation across methods. The institutional argument of Paper 3 was that the field requires academic infrastructure within which methodologically serious work can be conducted. This paper takes up what stands between them: the data resources on which methodologically serious work depends.

The argument can be stated simply. Bibliometric methods require databases. Expert elicitation requires panels and protocols. Historical recovery requires archives. Comparative analysis requires harmonized data across systems. Counterfactual reasoning requires structured estimates and the documentation of their assumptions. Qualitative methods require interview transcripts, field notes, and the apparatus for preserving them ethically. None of this infrastructure exists in a form designed for neglect-studies purposes, and the field’s capacity to produce credible findings depends on building it.

The argument is complicated by three features of the field’s situation. The first is that the data infrastructure required is partly continuous with existing scholarly resources — the standard bibliometric databases, the institutional archives of universities and funders, the established protocols of qualitative research — and partly distinctive, in that it must support analyses of absence rather than presence. The continuity allows the field to draw on substantial existing investment; the distinctiveness means that the field cannot simply inherit existing infrastructure and must adapt or build what is missing. The second is that data infrastructure is expensive to build and to maintain, and the field’s funding situation (Paper 4) makes choices about which resources to prioritize consequential. The third is that the data the field needs is held in many different places under many different governance arrangements, and the work of integration is at least as substantial as the work of original data collection.

The paper proceeds through six proposals, organized roughly by the maturity of the underlying infrastructure. The dashboard and the registry can be built relatively quickly on existing foundations. The abandoned-programs archive and the qualitative archive require more original work. The common data elements and the federated access model are coordination problems whose solutions depend on the field’s relationships with adjacent enterprises rather than on its own resources alone.

2. A Standing Dashboard of Research Distribution

The first proposal is a publicly accessible dashboard that visualizes the distribution of research attention across topics, populations, regions, methods, and disciplines, with the structure of the dashboard designed to make patterns of apparent neglect visible.

The technical foundation already exists. Open bibliometric databases — OpenAlex is the most ambitious, with several others operating on smaller scales — provide structured data on publications, citations, authors, and institutional affiliations at the scale required.[^1] The major commercial databases (Web of Science, Scopus, Dimensions) provide comparable data with different coverage profiles and licensing conditions. Topic-modeling and clustering tools developed in computational social science allow the data to be organized into meaningful categories.[^2] Visualization platforms developed in the broader open-data movement provide the user-facing layer.

What is missing is the integration of these resources into a single tool designed for neglect-studies questions. The standard bibliometric tools are designed to map what is being studied; the proposed dashboard would invert the standard analytical move to map what is conspicuously absent given the structure of adjacent literatures, the distribution of relevant external indicators, or historical baselines.

The dashboard’s design should include several specific features. It should allow users to specify a topic, discipline, or research area and to receive structured visualizations of how attention to that area has been distributed over time, across regions, across institutional types, and across methodological approaches. It should allow users to compare the distribution of attention to the distribution of external indicators of importance — disease burden for medical topics, environmental exposure for environmental topics, demographic significance for population-related topics — with explicit acknowledgment of the imperfections of the comparison. It should allow users to identify topics within a discipline that have unusual patterns of attention given the structure of the surrounding literature, with the unusualness flagged as a candidate for further investigation rather than as a conclusion. And it should allow users to track changes in attention patterns over time, both to identify areas where attention has been declining and to identify areas where previously neglected questions have begun to receive sustained attention.

The dashboard should not produce conclusive identifications of neglect. The tiered evidence standard of Paper 2 places dashboard findings at the exploratory level, as candidate hypotheses for further investigation rather than as substantiated claims. The dashboard should communicate this status to its users explicitly, with documentation of the limitations of the underlying data, the assumptions built into the visualizations, and the methodological caveats that apply to the patterns it displays. The communication is important: a tool that produces visually compelling representations of apparent neglect will be used by audiences who have not read the methodological caveats, and the dashboard’s credibility depends on its design making the limitations as visible as the findings.

The governance of the dashboard requires deliberate attention. The technical operation can reasonably be hosted by one of the founding centers, but the editorial decisions — which databases to include, how to structure topic classifications, how to handle the coverage biases of the underlying data — affect what users see and should be made through a process that includes diverse perspectives within the field. A standing editorial committee, with rotating membership and explicit responsibility for the dashboard’s accuracy and balance, is the appropriate structure. The committee should publish its methodological decisions, should respond to documented errors in the dashboard’s outputs, and should commission periodic external audits of the dashboard’s design.

The sustainability of the dashboard is the most difficult question. Open data resources of comparable scale have generally been funded through some combination of foundation support, institutional underwriting, and modest revenue from premium access for institutional users. The dashboard’s funding model should be settled at its launch and should include explicit provisions for what happens if any of its funding sources is withdrawn. A dashboard that ceases operation after five years because its initial grant ends would damage the field’s credibility more than no dashboard at all, and the funding planning should accordingly be conservative.

3. A Registry of Neglected Questions

The registry of neglected questions was introduced in Paper 3 as a companion publication outlet to the flagship journal. This section specifies its design as a data resource.

The registry’s central function is to provide a structured, citable, and updatable record of cases in which scholars have identified questions, populations, methods, or areas as neglected, with the documentation appropriate to the tier of evidence on which the identification rests. Each registry entry should include: the question or area identified; the discipline or disciplines to which it belongs; the category of neglect (in the Paper 1 taxonomy) to which the identification primarily applies; the methodological approach used to substantiate the identification; the evidence tier (exploratory, substantiated, rigorous) at which the identification stands; the scholars responsible for the identification; the date of submission and the dates of subsequent updates; pointers to relevant literature; and an open field for commentary, including the documentation of subsequent work that has either substantiated or revised the original identification.

The registry serves several functions that distinguish it from a journal. It supports cumulative knowledge in cases where individual contributions are too small to support full articles but where the collective contribution is substantial. It allows updating, so that a registry entry can be revised as new evidence emerges or as subsequent work changes the picture. It allows linking, so that an entry can document the relationships among related cases of neglect rather than treating each in isolation. And it produces a public-facing record that funders, learned societies, and policy bodies can consult, in a form that is more accessible to non-specialists than the journal literature alone.

The quality control problem is the same one identified in Paper 3 and bears restating here in the context of the data design. An open submission system will receive contributions that do not meet the field’s standards, and a registry that becomes a repository for unsubstantiated claims will damage the field’s credibility more comprehensively than the dashboard would, because the registry’s structured format gives entries a status that visualizations of bibliometric data do not. The proposed quality control combines three elements. First, submissions must include documentation appropriate to the evidence tier claimed, and the documentation requirements should be specified explicitly in the registry’s submission guidelines. Second, an editorial committee should review submissions before publication, with the review focused on whether the documentation supports the tier claimed rather than on the substantive merit of the identification. Third, the registry interface should display the evidence tier prominently alongside each entry, so that users can calibrate their interpretation accordingly.

A specific design question concerns the registry’s handling of contested cases. Some identifications of neglect will be contested either at the time of submission or subsequently, with different scholars reaching different conclusions about whether a particular case is genuinely neglected or appropriately deprioritized. The registry should accommodate such cases by allowing entries to include linked dissenting opinions, with the dissents documented to the same standards as the original entries. The structure communicates to users that the field’s claims are open to revision and that the registry preserves rather than suppresses methodological disagreement. The structure also creates a venue for the kind of structured argumentative engagement that produces the field’s most rigorous work over time.

The relationship between the registry and the flagship journal requires attention. The two are complementary rather than redundant: the journal publishes the methodological development, the case studies, and the field-wide analyses that contextualize particular registry entries, while the registry preserves the structured record of identifications themselves. A scholar who has identified a neglected question and produced a journal article about it should be expected to deposit a corresponding registry entry, both to make the identification accessible to subsequent scholars and to allow the entry to be updated as the case develops over time.

4. An Archive of Abandoned Research Programs

The third proposal is an archive specifically devoted to abandoned research programs — the category of neglect introduced in Paper 1 and developed methodologically in Paper 2. The case for the archive rests on the observation that the documentation of abandoned programs is unusually vulnerable to loss. When a research program is active, its documentation accumulates in journal literatures, institutional records, and the working files of active researchers. When the program is abandoned, its journal literature becomes increasingly difficult to locate, institutional records are discarded according to standard retention schedules, and the working files of researchers are lost when those researchers retire, die, or move on to other work. The window for recovering the documentation of an abandoned program closes within a generation or two of the abandonment, and once it closes the recovery becomes substantially harder.

The archive should be conceived as a federated rather than a centralized resource. Centralizing the physical collections of multiple abandoned programs at a single institution would be impractical and probably inappropriate, since the materials are often located at the institutions where the work was conducted and where local custodial expertise exists. The federated model treats the archive as a directory and integration layer that points to the holdings of multiple host institutions, that ensures consistent metadata across the holdings, and that supports cross-collection searching and analysis.

The archive’s metadata standard should include: the research program identified; the period of its active operation; the disciplinary location of the work; the reasons for abandonment, as documented in available sources; the institutions and individuals associated with the program; the holdings related to the program (publications, archival materials, working files, datasets, instruments) and the institutions where they are located; the contact information for the custodians of those holdings; and pointers to subsequent work on the program, whether by historians, by neglect-studies scholars, or by researchers attempting revival.

A specific component of the archive should be devoted to the oral histories of scholars whose careers included involvement in abandoned programs. The oral-history component is discussed in greater detail in section 7 below, but the connection to the abandoned-programs archive deserves note here. Many abandoned programs have living participants whose tacit knowledge would be lost without deliberate preservation, and the cost-benefit calculation for oral-history work strongly favors moving on the work before the participants are no longer available rather than waiting until the field’s other priorities are settled.

The archive should be developed in stages, beginning with a small number of well-documented cases that the founding centers identify as priorities. The first decade’s work should aim for perhaps twenty to forty programs documented to the standard described above, with the cases selected for their illustrative value, their historical interest, and the availability of participants for oral-history work. The selection process should be transparent, with the criteria published and the choices documented, so that subsequent scholars can both build on the early work and contribute additional cases through whatever scholarly community develops around the archive.

The relationship between the archive and the existing infrastructure for the history of science deserves explicit treatment. Several major repositories — the American Institute of Physics’s Niels Bohr Library, the Wellcome Trust’s archives, the Charles Babbage Institute, and others — already preserve materials related to scientific work, including some materials related to programs that subsequently abandoned. The proposed archive should not duplicate this work but should integrate with it, providing the metadata layer that allows existing holdings to be discovered and used for neglect-studies purposes. The integration work requires collaboration with the existing repositories and respect for their custodial responsibilities, and the archive’s design should make the collaboration straightforward rather than imposing additional burdens on partners whose own missions are different.

5. Common Data Elements

The fourth proposal addresses a coordination problem rather than a single resource. The methodological pluralism of the field (Paper 2) means that studies using different methods will produce findings whose aggregation requires shared definitions, instruments, and codebooks. Without such shared elements, the field’s outputs will be a collection of one-off studies whose findings cannot be combined to produce field-wide understanding, and the cumulative-knowledge function that distinguishes a mature field from a collection of individual scholars will be defeated.

The common data elements (CDEs) that the field requires fall into several categories.

The first is shared definitions for the categories of neglect introduced in Paper 1. The taxonomy proposed there — orphan topics, abandoned research programs, methodologically inaccessible questions, interstitial questions, peripheral inquiries, sensitive questions, and constituency-less questions — provides a starting vocabulary, but the categories require operationalization before they can be applied consistently across studies. The operationalization work should produce definitions specific enough to allow different scholars to classify cases the same way, while remaining flexible enough to accommodate the variation across cases that the categories are intended to capture. The work should be conducted through the kind of consensus-building process that Paper 1 introduced and that subsequent papers will need to specify further.

The second is shared instruments for the methodological approaches of Paper 2. Expert elicitation in particular benefits substantially from standardized protocols, both because the protocol design affects the findings in well-documented ways and because comparison across studies depends on the protocols being similar enough to allow meaningful comparison. The James Lind Alliance methodology provides a partial model,[^3] but the model is specific to health-research priority-setting and requires adaptation for the broader range of cases that neglect studies must address. Similar standardization work is needed for the documentation requirements of bibliometric mapping, for the protocols of historical recovery, and for the structure of counterfactual estimation.

The third is shared codebooks for the variables that recur across studies. Bibliometric studies routinely use measures of citation, of collaboration, of geographic distribution, and of disciplinary classification, and the comparison across studies depends on the measures being defined consistently. The existing literature on bibliometric methods provides considerable standardization, but neglect-studies applications may require extensions — for example, measures of attention to particular categories of questions, measures of the rate of change in attention over time, measures of the cross-disciplinary scope of attention — that are not yet standardized.

The development of CDEs is itself a substantial scholarly activity that requires its own resources. The work should be coordinated through one of the founding centers, with input from the field’s broader community through the structures (journal, conferences, registry) that Paper 3 established. The CDE work should be conducted transparently, with proposed elements published for community comment before adoption, and the adopted elements should be revised periodically as the field’s experience accumulates.

A specific consideration is the relationship between the field’s CDEs and those developed in adjacent fields. The CDE work in clinical research, in social science research, and in metascience has produced substantial infrastructure that the field can adapt rather than building from scratch.[^4] The adaptations require domain-specific work but the underlying methodological standards translate well, and the field’s CDE development should proceed in conversation with the adjacent efforts rather than in isolation from them.

6. Federated Access and Data Governance

The fifth proposal addresses the data-governance challenges that arise when the field’s research requires access to data held by multiple parties under multiple governance arrangements. The challenges are well documented in adjacent fields — health research, social-science survey data, education research — and the field can draw on established models rather than developing new approaches.[^5]

The relevant data types include: bibliometric data held by commercial and open providers under varying licensing terms; funding data held by agencies under varying disclosure policies; institutional data held by universities under privacy and competitive considerations; qualitative data held by individual scholars and subject to research-ethics review; and personal data — including the identities of scholars who have made career choices around particular questions — subject to the strictest protections.

The federated access model treats the data as remaining under the control of its original custodians, with access provided through standardized interfaces that allow analyses to be conducted without the data being transferred. The model is more complex to implement than centralized data deposits but has corresponding advantages: it preserves the data custodians’ control over their data, it accommodates the varying governance arrangements under which different data are held, and it allows analyses to be conducted on data that could not be released for centralization under any plausible governance arrangement.

The implementation of federated access requires technical infrastructure (standardized query interfaces, audit logs, computational environments that allow analysis without data transfer) and governance infrastructure (data-sharing agreements, access procedures, dispute-resolution mechanisms). Both have been developed in adjacent fields and the field’s implementation should draw on those models. The technical infrastructure can reasonably be hosted by one of the founding centers, with the governance infrastructure developed through agreements with the data custodians whose holdings are most important for the field’s work.

A specific governance question concerns the access of the field’s scholars to data that documents funding decisions. The funding agencies whose decision-making the field studies are also potential funders of the field’s work (Paper 4), and the access arrangements must be designed to preserve the field’s analytical independence. The arrangements that have worked in adjacent contexts include time delays between the decisions and the data release (typically five to ten years), aggregation requirements that prevent identification of individual decisions, and the involvement of the funders in the design of the access arrangements without involvement in the specific analyses that the data support. The field’s data-governance work should draw on these precedents and should be explicit about the protections that preserve analytical independence.

A separate governance question concerns the data that documents the experiences of individual scholars — interview transcripts, narrative accounts of career decisions, ethnographic field notes — that bear on the mechanisms of neglect. The data is sensitive both because it identifies individuals and because some of what it documents bears on professional reputations and institutional relationships. The governance arrangements for this category of data should follow the established standards of qualitative research ethics, with informed consent procedures that explicitly address the possibility of identification, with secure storage arrangements, and with disclosure standards that prioritize the protection of participants over the convenience of subsequent scholars. The standards are well established in adjacent fields and require adaptation rather than original development.

7. A Qualitative Archive

The sixth proposal is an archive of qualitative materials — oral histories, interview transcripts, ethnographic field notes — that document the tacit knowledge of scholars whose careers have included engagement with neglected questions. The archive’s case rests on the observation that much of the most important knowledge about how neglect operates is held by individual scholars in forms that are not preserved by ordinary scholarly publication. Scholars who have considered pursuing particular questions and decided not to, scholars who have pursued such questions and paid professional costs, scholars who have participated in the founding or the abandonment of research programs, and scholars who have served as program officers or peer reviewers whose decisions shaped the distribution of attention: all of them hold knowledge that the field’s understanding of its subject depends on, and all of them are subject to the ordinary attrition of careers and lives.

The archive should be developed in stages, with the first stage prioritizing the scholars whose careers are nearing their end and whose loss would be most consequential. The selection process should draw on multiple sources of identification: the field’s professional community, the historians of relevant disciplines, the institutional knowledge of the founding centers, and the suggestions of scholars who are themselves interviewed and who can identify others whose accounts would be valuable.

The methodology of the oral-history work should draw on the established traditions of the history of science and of qualitative research in adjacent fields.[^6] The interviews should be conducted by scholars trained in oral-history methods, should be transcribed and reviewed by the interview subjects before being archived, and should be subject to access conditions that respect the subjects’ interests in how their accounts are used. The transcripts should be deposited in repositories that can preserve them for the long term, with metadata that allows them to be discovered by subsequent scholars.

A specific consideration is the relationship between the qualitative archive and the field’s analytic work. The interview transcripts are primary sources rather than analyses, and their value to the field comes from their availability for analysis by scholars whose questions cannot be specified in advance. The archive’s design should accordingly emphasize preservation and discoverability rather than the particular analyses that motivate the initial collection. The archive should outlast the founding scholars of the field, and the materials it preserves should be available to scholars whose work the founding scholars cannot anticipate.

The qualitative archive’s relationship to research ethics deserves particular attention. The interviews will sometimes document practices and decisions that the subjects’ institutions or professional communities would prefer not to be public. The standards under which the interviews are conducted must include explicit protections for the subjects, including the option to restrict access to particular passages, to redact identifying information about third parties mentioned in the interviews, and to embargo materials for periods that the subjects specify. The standards should be developed in consultation with research-ethics committees and with the established oral-history community, and they should be documented in the archive’s published policies so that subjects can make informed decisions about participation.

8. Governance, Sustainability, and Open Science

The data infrastructure outlined above is substantial, and its governance and sustainability deserve explicit treatment.

The governance principle that runs through the proposals is that the data infrastructure should be governed by structures that include diverse perspectives within the field, that operate transparently, and that are accountable to the field’s professional community rather than to any single institution or funder. The principle requires explicit institutional design: standing committees with rotating membership, published procedures, regular reporting to the field’s community, and external review on a defined schedule. The institutional design should be settled at the founding of each data resource rather than developed in response to subsequent controversies, since governance arrangements established under pressure tend to be less stable than those established at the outset.

The sustainability of the infrastructure is the more difficult problem. Data resources have well-known patterns of decay: initial funding supports launch and early operation, but the long-term maintenance that determines whether the resource remains useful is harder to fund and often falls short. The field’s data infrastructure should be planned with sustainability in mind from the start, with explicit consideration of the operating costs after the launch funding ends, of the institutional commitments required to sustain operation, and of the contingency plans for what happens if particular funding sources are withdrawn. A data resource that becomes inaccessible or outdated after five years is sometimes worse than no resource at all, because users have come to depend on it in ways that the failure disrupts. The sustainability planning should be conservative.

The broader open-science movement provides important context for the field’s data infrastructure work. The movement has produced infrastructure (open repositories, persistent identifiers, standardized metadata, open licenses) on which the field can draw, and the movement’s standards have set the broader expectations for how scholarly data resources should operate.[^7] The field’s data infrastructure should align with open-science standards as a default, with departures from the standards justified explicitly when they are required by considerations specific to the field’s situation. The alignment serves both the field’s intrinsic commitments to the redistribution of attention — which sit poorly with restrictive access to its own outputs — and its practical interests in connecting with the broader research community.

A specific consideration is the field’s potential contribution to open-science infrastructure rather than just its consumption of that infrastructure. The methodological work on negative-space analysis, the development of common data elements for the field’s distinctive analyses, the qualitative archive’s preservation methodology: all of these have potential applications beyond the field’s own work and should be developed in ways that allow other scholars to use them. The contribution serves the field’s broader influence and provides a non-financial form of return on the open-science movement’s support.

9. Conclusion

This paper has proposed six elements of data infrastructure for neglect studies: a dashboard of research distribution, a registry of neglected questions, an archive of abandoned research programs, common data elements, federated access arrangements, and a qualitative archive. Each rests on existing scholarly resources and requires adaptation or extension for the field’s specific purposes. Each has governance and sustainability requirements that the field’s founding scholars must address explicitly. Each connects the field to adjacent enterprises whose collaboration the field will require.

The data infrastructure cannot be built without funding (Paper 4) and without the institutional homes that Paper 3 specified. It cannot be used effectively without the methodological standards of Paper 2 and the conceptual framework of Paper 1. The interdependencies are why the series has presented the elements in their current order: each paper presupposes the ones before it, and the data infrastructure proposed here is the operational expression of the commitments that the earlier papers articulated.

The papers that follow take up the workforce that will use the infrastructure (Paper 6), the research-governing institutions whose decisions the infrastructure exists to inform (Paper 7), and the field’s identity and self-critique (Paper 8). The data infrastructure paper is in some ways the most concrete of the series, and its proposals are accordingly the most easily evaluated. The field’s eventual success will be measurable in part by whether the resources proposed here exist in working form a decade after the field’s founding, and whether they are used by scholars whose work is improved by them.


Notes

[^1]: OpenAlex, launched in 2022 as a successor to Microsoft Academic Graph, has become the most widely used open bibliometric database. Its coverage, structure, and limitations are documented in Priem, Piwowar, and Orr (2022) and in the database’s own ongoing documentation. The comparison among the major bibliometric databases is reviewed in Visser, van Eck, and Waltman (2021).

[^2]: The topic-modeling literature relevant to bibliometric analysis is reviewed in Boyack and Klavans (2014), with subsequent methodological developments documented in the broader computational social science literature.

[^3]: The James Lind Alliance methodology has been adopted with variations by priority-setting partnerships in many fields, and the variations are themselves documented (Crowe et al., 2015; Cowan & Oliver, 2021). The methodology’s specific provisions for protocol standardization are part of what has allowed comparison across partnerships.

[^4]: The clinical research literature on common data elements is the most developed; the National Institutes of Health maintains the NIH CDE Repository as a coordinating resource. Sheehan et al. (2016) provides a useful overview of the principles. Adjacent work in social science research has been developed through the Data Documentation Initiative and the Inter-university Consortium for Political and Social Research.

[^5]: The federated data-access model has been developed extensively in health research, with the Observational Health Data Sciences and Informatics network providing one well-documented implementation. Voss et al. (2015) provides an overview. Similar approaches have been developed in social science research and in education research, with the European Data Infrastructure for the social sciences offering a comparable model.

[^6]: The oral-history of science tradition is documented through the American Institute of Physics’s program, the Royal Society’s archives, and several other institutional efforts. The methodological standards are reviewed in Doel and Söderqvist (2006).

[^7]: The open-science literature has grown substantially over the past two decades; Vicente-Saez and Martinez-Fuentes (2018) provides a review of the conceptual development, and the FAIR principles articulated in Wilkinson et al. (2016) have become a widely adopted standard for data resources.


References

Boyack, K. W., & Klavans, R. (2014). Creation of a highly detailed, dynamic, global model and map of science. Journal of the Association for Information Science and Technology, 65(4), 670–685.

Cowan, K., & Oliver, S. (2021). The James Lind Alliance guidebook (Version 10). James Lind Alliance.

Crowe, S., Fenton, M., Hall, M., Cowan, K., & Chalmers, I. (2015). Patients’, clinicians’ and the research communities’ priorities for treatment research: There is an important mismatch. Research Involvement and Engagement, 1, 2.

Doel, R. E., & Söderqvist, T. (Eds.). (2006). The historiography of contemporary science, technology, and medicine: Writing recent science. Routledge.

Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv:2205.01833.

Sheehan, J., Hirschfeld, S., Foster, E., Ghitza, U., Goetz, K., Karpinski, J., Lang, L., Moser, R. P., Odenkirchen, J., Reeves, D., Rubinstein, Y., Werner, E., & Huerta, M. (2016). Improving the value of clinical research through the use of Common Data Elements. Clinical Trials, 13(6), 671–676.

Vicente-Saez, R., & Martinez-Fuentes, C. (2018). Open science now: A systematic literature review for an integrated definition. Journal of Business Research, 88, 428–436.

Visser, M., van Eck, N. J., & Waltman, L. (2021). Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic. Quantitative Science Studies, 2(1), 20–41.

Voss, E. A., Makadia, R., Matcho, A., Ma, Q., Knoll, C., Schuemie, M., DeFalco, F. J., Londhe, A., Zhu, V., & Ryan, P. B. (2015). Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. Journal of the American Medical Informatics Association, 22(3), 553–564.

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.


Unknown's avatar

About nathanalbright

I'm a person with diverse interests who loves to read. If you want to know something about me, just ask.
This entry was posted in Musings and tagged , , , . Bookmark the permalink.

Leave a Reply