During the summer of 2017, my first summer as Director of the National Library of Medicine, Joyce Backus—our then-NLM Associate Director for Library Operations (ADLO)—approached me with a wild idea: “How about we engage an architectural firm to guide renovations of our library space?” Joyce was a forward-thinking ADLO and had already done much to spearhead important renovations to protect our collections and make them accessible to the public.
As you may know, NLM has 66 miles of shelving that house our expansive collections: from books to journals, audiovisual recordings to unique papers of medical and public health pioneers, to rare and unique manuscripts and volumes spanning ten centuries and originating from nearly every part of the world. From 2014 to 2019, NLM worked with the Wellcome Trust to digitize and make freely available via PubMed Central, or PMC, thousands of complete back issues of historically significant biomedical journals along with their human- and computer-readable citations; the availability of this important biomedical literature began the joint investment to advance research, education, and learning.
Fast forward to 2022 when we entered the third year of a global pandemic. Libraries around the world served as essential resources not just by providing up-to-the-minute, trustable access to COVID-19 information, but also by providing innovative and accessible free spaces to work, study, and gather safely. Many businesses and services had to turn on a dime to figure out how to protect their assets and deliver their operations remotely, but NLM was prepared for the challenge and already familiar with strategies that preserve our past and make our holdings available to people who may never set foot in our building.
Now—I would like to claim clairvoyance as an essential skill of the NLM workforce, but of course that would be foolhardy! No one can see into the future, including NLM staff. Almost 200 years of serving the public has engendered in our workforce an ability to serve increasingly diverse stakeholders in the present while keeping an eye to their future needs and anticipating ways to meet them. Libraries are essentially a human enterprise and designing spaces to make the best use of our excellent workforce is critical for our future.
So, it’s not too surprising that as the COVID-19 pandemic overtook the world, NLM in particular and libraries in general stepped up to the task! NLM expanded its terminologies to include new ones representing the emerging vaccines, therapeutics, and diagnostic tools; expanded the resources in our Network of the National Library of Medicine to support outreach and locally congruent information resources about the pandemic; and improved access to digitized versions of our holdings in an on-demand fashion. We planned new workspace arrangements to make the best use of our existing buildings to anticipate their suitability for hoteling, hybrid work engagements, and on-site meetings to bring teams together.
I am inspired by how we anticipated a future we never anticipated, and I spent the year reflecting with my leadership team to discern how this success will provide us with the guidance to anticipate the next future we never anticipated. Please join us in this process to make sure that we have the space and access to reliable biomedical information for your needs!
Two years later, one of the RADx programs—RADx Underserved Populations (RADx-UP)—reflects on lessons learned that have broken the mold of standard research paradigms to address health disparities.
Use of Common Data Elements
RADx-UP has presented unique challenges in terms of data collection, privacy concerns, measurement standardization, principles of data-sharing, and the opportunity to reexamine community-engaged research. Establishment of Common Data Elements (CDEs)—standardized, precisely defined questions paired with a set of allowable responses used systematically across different sites, studies, or clinical trials to ensure that the whole is greater than the sum of its parts—are not commonly used in community-engaged research. Use of CDEs enables data harmonization, aggregation, and analysis of related data across study sites as well as the ability to investigate relationships among data in unrelated data sets. CDEs can also lend statistical power to analyses of data for small subpopulations typically underrepresented in research.
RADx-UP is a community-engaged research program that builds on years of developing partnerships between communities and scientists. RADx-UP has funded 127 research projects with sites in every state and six U.S. territories as well as a RADx-UP Coordination and Data Collection Center (CDCC). RADx-UP assesses the needs and barriers related to COVID-19 testing and increase access to COVID-19 testing in underserved and vulnerable populations experiencing the highest rates of disparities in morbidity and mortality.
The COVID-19 pandemic necessitated establishing RADx-UP and its associated CDEs with unprecedented speed relying heavily on data elements derived from those already defined in the NIH-based PhenX Toolkit and Disaster Research Response (DR2) resources. The short time frame for this process did not allow for as extensive collaboration and input from RADx-UP investigators and community partners that would have been ideal. Additionally, many researchers, especially community partners engaged in RADx-UP projects, were not familiar with CDE data collection practices. As a result, CDE questionnaires had to be modified as studies progressed to better suit the needs of the consortium and investigators new to CDE collection had to be familiarized with these processes quickly. NIH program officers, NIH RADx-UP and CDCC leadership and engagement impact teams (EITs)—staff liaisons provided by the CDCC that link RADx-UP research teams to testing, data, and community-engagement resources—helped research teams implement and adjust CDE collection, ensured alignment across consortium research teams, and assisted with other data-related issues that arose.
All RADx programs are required to collect a standardized set of CDEs, including sociodemographic, medical history, and health status elements with the intent to provide researchers rapid access to data for secondary research analyses in the RADx Data Hub, the central repository for RADx data. However, implementation of CDEs in the context of underserved communities in the rapidly evolving COVID-19 pandemic presented complex issues for consideration.
Some of these issues included data privacy, the risk of re-identification of underserved and undocumented populations, and data collection burden on participants as well as researchers. The privacy of health data is protected under federal law. The RADx-UP program instituted measures to ensure program participants’ data remain protected and de-identified using a token-based hashing algorithm methodology that allows researchers to share individual-level participant data without exposing personally identifiable information. To address data collection and respondent burden concerns, projects modified questions to allow some flexibility in expanding response options more appropriate to some underserved communities. The CDCC also developed COLECTIV, a digital interface for projects to directly enter data into the data repository and included gateway questions to relieve respondent burden.
Respect for Tribal Data Sovereignty
RADx-UP leadership and investigators recognized that additional considerations for tribal sovereignty, practices, and policies needed to be addressed for projects that include American Indian and Alaska Native (AI/AN) participants. Through consultations with the NIH Tribal Advisory Committee and the broader AI/AN community and meetings with an informal RADx-UP AI/AN project working group established by the CDCC, NIH realized that deposition of tribal data into the RADx Data Hub would not meet the cultural, governance, or sovereignty needs of AI/AN RADx research data. In response, NIH hopes to establish a RADx Tribal Data Repository (TDR) responsible for the collection, protection, and sharing of data collected in AI/AN communities with respect for the practices and policies of Tribal data sovereignty. Applications for the repository have been solicited and NIH hopes to make an award for the TDR sometime in FY23.
Rapid Data Sharing
One of the largest hurdles the RADx-UP program has faced is implementing rapid sharing of research data for secondary analyses and to inform decision-making and public health practices related to the COVID-19 pandemic. RADx-UP research teams are expected to share their data on a timely cadence before data collection ends. This is a far more stringent practice relative to the current standard NIH data-sharing policy that requires data to be shared at the time of acceptance for publication of the main findings from the final data set. NIH and CDCC staff have worked together with the RADx research community to highlight the importance of and compliance with rapid data-sharing. Within the first six months, a total of 69 Phase 1 projects began transmitting CDE data to the RADx-UP CDCC. The COVID-19 pandemic posed a tremendous challenge, and NIH responded by collaborating with vulnerable and underserved communities. This collaboration has opened an unprecedented opportunity to build on a now established foundation for future research to address gaps in understanding the broader social, cultural, and structural factors that influence disparities in morbidity and mortality from COVID-19 and other diseases. Data collection and sharing efforts of the RADx-UP initiative comprise a significant contribution. Collaboration among the NIH, research investigators, and communities impacted by COVID-19 has been the catalyst. To learn more about RADx-UP, please visit a recent journal article available on PubMed.
Dr. Hodes has served as NIA director since 1993, overseeing studies of the biological, clinical, behavioral, and social aspects of aging. He has devoted his tenure to the development of a strong, diverse, and balanced research program focused on the genetics and biology of aging, basic and clinical studies aimed at reducing disease and disability, and investigation of the behavioral and social aspects of aging. Ultimately, these efforts have one goal — improving the health and quality of life for older people and their families. As a leading researcher in the field of immunology, Dr. Hodes has published more than 250 peer-reviewed papers.
Dr. Pérez-Stable practiced primary care internal medicine for 37 years attheUniversity of California, San Francisco before becoming the Director of NIMHD in 2015. His research interests have centered on improving the health of individuals from racial and ethnic minority communities through effective prevention interventions, understanding underlying causes of health disparities, and advancing patient-centered care for underserved populations. Recognized as a leader in Latino health care and disparities research, he spent 32 years leading research on smoking cessation and tobacco control in Latino populations in the United States and Latin America. Dr. Pérez-Stable has published more than 300 peer-reviewed papers.
As we start year three of the COVID-19 pandemic, it’s time for NLM to take stock of the parts of our past that will support the next normal and what we might need to change as we continue to fulfill our mission to acquire, collect, preserve, and disseminate biomedical literature to the world.
Today, I invite you to join me in considering the assumptions and presumptions we made about how scientists, clinicians, librarians and patients are using critical NLM resources and how we might need to update those assumptions to meet future needs. I will give you a hint… it’s not all bad—in fact, I find it quite exciting!
Let’s highlight some of our assumptions about how people are using our services, at least from my perspective. We anticipated the need for access to medical literature across the Network of the National Library of Medicine and created DOCLINE, an interlibrary loan request routing system that quickly and efficiently links participating libraries’ journal holdings. We also anticipated that we were preparing the literature and our genomic databases for humans to read and peruse. Now we’re finding that more than half of the accesses to NLM resources are generated and driven by computers through application programming interfaces. Even our MedlinePlus resource for patients now connects tailored electronic responses through MedlinePlus Connect to computer-generated queries originating in electronic health records.
Perhaps, and most importantly, we realize that while sometimes the information we present is actually read by a living person, other times the information we provide—for example, about clinical trials (ClinicalTrials.gov) or genotype and phenotype data (dbGaP)—is actually processed by computers! Increasingly, we provide direct access to the raw, machine-readable versions of our resources so those versions can be entered into specialized analysis programs, which allow natural-language processing programs to find studies with similar findings or machine-learning models to determine the similarities between two gene sequences. For example, NLM makes it possible for advocacy groups to download study information from all ClinicalTrials.gov records so anyone can use their own programs to point out trials that may be of interest to their constituents or to compare summaries of research results for related studies.
Machine learning and artificial intelligence have progressed to the point that they perform reasonably well in connecting similar articles—to this end, our LitCovid open-resource literature hub has served as an electronic companion to the human curation of coronavirus literature. NLM’s LitCovid is more efficient and has a sophisticated search function to create pathways that are more relevant and are more likely to curate articles that fulfill the needs of our users. Most importantly, innovations such as LitCovid help our users manage the vast and ever-growing collection of biomedical literature, now numbering more than 34 million citations in NLM’s PubMed, the most heavily used biomedical literature citation database.
Partnerships are a critical asset to bring biomedical knowledge into the hands (and eyes) of those who need it. Over the last decade, NLM moved toward a new model for managing citation data in PubMed. We released the PubMed Data Management system that allows publishers to quickly update or correct nearly all elements of their citations and that accelerates the delivery of correct and complete citation data to PubMed users.
As part of the MEDLINE 2022 Initiative, NLM transitioned to automated Medical Subject Headings (MeSH) indexing of MEDLINE citations in PubMed. Automated MeSH indexing significantly decreases the time for indexed citations to appear in PubMed without sacrificing the quality MEDLINE is known to provide. Our human indexers can focus their expertise on curation efforts to validate assigned MeSH terms, thereby continuously improving the automated indexing algorithm and enhancing discoverability of gene and chemical information in the future.
We’re already preparing for the next normal—what do you think it will be like?
I envision making our vast resources increasingly available to those who need them and forging stronger partnerships that improve users’ ability to acquire and understand knowledge. Imagine a service, designed and run by patients, that could pull and synthesize the latest information about a disease, recommendations for managing a clinical issue, or help a young investigator better pinpoint areas ripe for new interrogation! The next normal will make the best use of human judgment and creativity by selecting and organizing relevant data to create a story that forms the foundation of new inquiry or the basis of new clinical care. Come along and help us co-create the next normal!
This guest blog post is by Susan Gregurick, PhD, Associate Director for Data Science and Director for the Office of Data Science Strategy, NIH. It was originally posted on June 6 in the NIH Office of Extramural Research Open Mike blog. We encourage you to learn more about DataWorks! Prize by visiting Challenge.gov. Participants must register by June 28, 2022.
A $500,000 prize purse, rewarding data sharing and reuse in biomedical research, is a new, innovative strategy for supporting the research community. The DataWorks! Prize highlights the role of data sharing and reuse in scientific discovery while recognizing and rewarding researchers who engage in these practices. This prize, which launched on May 11, 2022, is a partnership between the NIH Office of Data Science Strategy and the Federation of American Societies for Experimental Biology (FASEB).
The future of biological and biomedical research hinges on researchers’ ability to share and reuse data. Sharing and reuse had a sizable, catalytic impact on the development of COVID-19 vaccines and treatment protocols. The DataWorks! Prize is an opportunity for the research community to share their stories about the practices, big and small, that lead to scientific discovery.
To participate, research teams share their stories through a simple two-stage application. Through narrative prompts, teams share details of the practices they used, the scientific impact of their achievements, and the potential for replicating their practices for further scientific research. This year, the DataWorks! Prize purse is up to $500,000 across 12 monetary awards including two $100,000 grand prize awards.
Beyond monetary awards, the DataWorks! Prize is an opportunity for the research community to learn from peers and apply those lessons to their research practices. The innovative approaches and tools from prize winners will be highlighted in a symposium 2023 and made available to support community learning.
As implementation of the NIH Data Management and Sharing Policy draws near, consider the broader intent of this policy: building a culture of data sharing and reuse in the biomedical research community. Incentives are a major part of culture change, and we are excited to provide a space for the community to share their achievements and learn together. Through initiatives like the prize and the launch of the new sharing.nih.gov website, we are taking new steps to support the future of biological and biomedical research at the center of the NIH’s Data Management and Sharing Policy.
The DataWorks! Prize is currently open for submissions. Participants must register to participate by June 28, 2022. Please visit Challenge.gov for more information and to apply.