top of page

Citation Recommendations for NIAID Repositories

Overview

Clear citation instructions enable repository users to: (1) develop the practice of citing resources appropriately, (2) have precise information about what is required to provide resource attribution, and (3) understand the context of the cited resource. This page provides high-level guidance on how to promote and enable citations to data sets and data collections. These recommendations build on the guidance provided in the Blueprint for Including Digital Objects in the NIAID Data Ecosystem (Version 2), released by the NIAID Office of Data Science and Emerging Technologies (ODSET) in 2025.

 

There are five broad recommendations:

Use persistent identifiers (PIDs) to support persistent and consistent reference to data resources.

  • Provide clear citation instructions to end users for citing individual assets as well as the entire repository (where possible), using standard citation formats

  • Consider metadata integrations with citation management tools that make it easy for end users to quickly import the citation metadata.

  • Make connections between related PIDs to facilitate resource discovery, transparency, and usability

  • Measure the usage of PIDs by repository users, to assess impacts of the repository and to assess outcomes of FAIR-focused repository updates 
     

Recommendation #1: Use persistent identifiers (PIDs) to support persistent and consistent reference to data resources.

PIDs enable discrete identification of the data that people used to produce scientific findings, and provide a mechanism to directly link to the location where the data can be found. When creating and using PIDs, a few considerations are as follows:

  • Important characteristics of PIDs

  • Persistence: does the PID have a secure organizational backing, effective governance, and a sound technical base?

  • Services: does the PID system provide useful services, such as APIs for creating and querying PIDs, and web services that expose PID metadata?

  • Resolving to useful information: does the PID point to a web page that provides useful information about the resource being identified?

  • PIDs should resolve to a landing page that provides access to the resource being cited, as well as additional metadata about the resource.

  • Please review the Blueprint document for guidance on which types of PIDs are recommended for specific types of entities.

Example 1: ImmPort Repository - https://doi.org/10.21430/M3VF5F8ANE - Each public study in the ImmPort repository is assigned a DOI. The DOI is displayed along with other metadata for the study. DOIs are associated with the study identifier that ImmPort assigns to all of its studies. In this example, the study identifier is: https://www.immport.org/shared/study/SDY1109.

Screenshot 2026-04-16 at 10.46.30 AM.png

Recommendation #2: Display a recommended citation for data users.

Data users may not know how they should be citing your data resource(s). It is important to make it very easy for data users to find and use the recommended citation. Suggestions include:

  • Make it clear how you want people to cite your data. For example, do you want them to cite the data collection as a whole, or just the specific part or set that they used?

  • Make it easy for people to copy and paste the citation, and/or import into a citation management system. This will help with having the resource be cited more consistently, and allow people to fit these data citations into the regular citation management tools and workflows.

  • Include any PIDs in the recommended citations. This will enable citation services to compile citations for your data resources more consistently, as authors and journals may change other details of the citation for a variety of reasons. PIDs provide a consistent mechanism through which information about the use of your resource can be compiled.

The Basics of Data Citation provide guidance on how to structure recommended data citations. Additionally, tools like the DOI Citation Formatter provide identifiers (DOI) for citation mapping for many hundreds of citation styles.

Example 2: Dryad Data Repository - https://doi.org/10.5061/dryad.qn8n3 - Each dataset in the Dryad repository shows a recommended citation at the top right of the page, and includes the dataset’s DOI in the recommendation. Users can click on the “copy” button to easily grab the recommended citation text for pasting elsewhere.

Citation Recommendation Ex 2.png

Recommendation #3:
Citation management integration: Consider metadata integrations with citation management tools that make it easy for end users to quickly import the citation metadata.

Many researchers use citation management tools to collect and store citations, and to generate bibliographies for research publications. Examples of these tools include EndNote, Zotero, and Mendeley. Providing users with the ability to download/import citation metadata into these tools can be a significant time-saver, and incentivize them to store citations to datasets alongside citations to other types of literature.

Example 3.1: Johns Hopkins Research Data Repository - https://doi.org/10.7281/T17P8W98 - Along with recommended citation, the JHU Data Repository provides links for users to directly download the citation metadata into various citation formats and tools, including EndNote, BibTeX, and RIS (which is used by a variety of citation management tools).

Citation Recommendation Ex 3.png

Example 3.2: National e-Infrastructure for Research Data (NIRD) Research Data Archive, Norway - https://doi.org/10.11582/2020.00060 - The NIRD Archive lists a recommended citation via a “Citation” tab on each dataset landing page. NIRD then similarly provides links for users to directly download the citation metadata into various citation formats and tools, including BibTeX, RIS, and other formats.

Citation Recommendation Ex 3-2.png

Recommendation #4: Consider connections between resources, and make these connections visible via metadata and PIDs.

Where possible, the PID metadata and landing pages for resources should reference other related resources to enable cross linking for data discovery and provenance purposes.

  • On landing pages: List other relevant resources, including their PIDs

  • In PID metadata: Some PIDs provide metadata options in which explicit links between PIDs can be listed. The DataCite DOI metadata schema, for example, has a “RelatedItem” metadata field for this purpose. Similarly, RRID metadata can list other related resources.

  • Entities that can and should be cross-linked include (partial list):

    • Data

    • Software

    • Instrumentation and materials

    • Research resources (e.g. cell lines, transgenic models, plasmids/clones, antibodies, and other reagents)

    • Publications

    • People

    • Organizations

    • Experiments, Protocols, Trials

Example 4.1: University of Michigan Deep Blue Data Repository - https://doi.org/10.7302/7ym7-gp78 - The UM Deep Blue repository shows a recommended citation at the bottom of each dataset page. The repository also shows if there are related papers for a given dataset.

Citation Recommendation 4-1.png

Example 4.2 - Continued from 4.1 -

Citation Recommendation 4-2.png

In the DataCite DOI XML metadata for the dataset DOI shown in Example 4.1, the connection between the dataset and a related paper is shown through a “RelatedIdentifier” relationship that shows that the dataset “isReferencedBy” the paper. Note the XML also shows additional “RelatedIdentifiers”, in this case, “HasPart” relationships to indicate that the dataset has two distinct components. The DataCite DOI metadata schema allows the declaration of many additional relationship types that might be useful/relevant for data users.

Recommendation #5:  Measure citation counts and PID use as a tool for understanding and assessing impact of data repositories, and of FAIR initiatives.

PIDs are intended to provide linkages to original resources as much as they are used as tools for tracing impact. With such linkages, it becomes easier to perform micro impact assessment: at the PID level (e.g. how frequently a specific resource was used/cited in important work), as well as macro impact assessment: at the repository level (e.g. how many of the PIDs in a repository have been used/cited in important work). These assessments are an important tool for demonstrating value, but they are also increasingly automated and when repositories are able to improve their FAIR adherence, they necessarily increase their ability to understand impact.

 

PID-based citation metrics can be used to support a data repository’s operational management decisions, e.g. to indicate which datasets are getting less attention, and thus could be candidates for lower-priced storage and access services. These metrics are complementary to other measures of data use, such as repository page views and download counts. They will likely show the same trends (e.g. data with more downloads will likely get more citations), but differences between access and citation metrics may also indicate 1) usability challenges where people download data but can’t use them successfully, or 2) different audiences where particular datasets are used more for research while others are used more educational settings (or other non-research focused contexts). PID-based citations can also be more easily added to contributor CVs and other online services such as ORCID profiles.

Example 5: NSF National Center for Atmospheric Research, Geoscience Data Exchange Repository (GDEX) - https://doi.org/10.5065/BH6N-5N20 - The GDEX repository displays a recommended citation via a “Citation” tab. Users can select a citation style before copying and pasting, or can download the citation metadata in various formats. In addition, the GDEX shows the number of citations that a dataset has received, and displays those citations for the user to view. These citations are gathered by automatically searching for the dataset’s DOI using the search Application Programming Interfaces (APIs) of several citation indices, including CrossRef (free; https://www.crossref.org/documentation/retrieve-metadata/rest-api/), Scopus by Elsevier (non-free; https://dev.elsevier.com/documentation/ScopusSearchAPI.wadl), and the Web of Science/Incites by Clarivate (non-free; https://developer.clarivate.com/apis/incites).

Citation Recommendation 5-1.png
bottom of page