Anticipating a Future We Never Anticipated

During the summer of 2017, my first summer as Director of the National Library of Medicine, Joyce Backus—our then-NLM Associate Director for Library Operations (ADLO)—approached me with a wild idea: “How about we engage an architectural firm to guide renovations of our library space?” Joyce was a forward-thinking ADLO and had already done much to spearhead important renovations to protect our collections and make them accessible to the public.

As you may know, NLM has 66 miles of shelving that house our expansive collections: from books to journals, audiovisual recordings to unique papers of medical and public health pioneers, to rare and unique manuscripts and volumes spanning ten centuries and originating from nearly every part of the world. From 2014 to 2019, NLM worked with the Wellcome Trust to digitize and make freely available via PubMed Central, or PMC, thousands of complete back issues of historically significant biomedical journals along with their human- and computer-readable citations; the availability of this important biomedical literature began the joint investment to advance research, education, and learning.

Fast forward to 2022 when we entered the third year of a global pandemic. Libraries around the world served as essential resources not just by providing up-to-the-minute, trustable access to COVID-19 information, but also by providing innovative and accessible free spaces to work, study, and gather safely. Many businesses and services had to turn on a dime to figure out how to protect their assets and deliver their operations remotely, but NLM was prepared for the challenge and already familiar with strategies that preserve our past and make our holdings available to people who may never set foot in our building.

Now—I would like to claim clairvoyance as an essential skill of the NLM workforce, but of course that would be foolhardy! No one can see into the future, including NLM staff. Almost 200 years of serving the public has engendered in our workforce an ability to serve increasingly diverse stakeholders in the present while keeping an eye to their future needs and anticipating ways to meet them. Libraries are essentially a human enterprise and designing spaces to make the best use of our excellent workforce is critical for our future.

So, it’s not too surprising that as the COVID-19 pandemic overtook the world, NLM in particular and libraries in general stepped up to the task! NLM expanded its terminologies to include new ones representing the emerging vaccines, therapeutics, and diagnostic tools; expanded the resources in our Network of the National Library of Medicine to support outreach and locally congruent information resources about the pandemic; and improved access to digitized versions of our holdings in an on-demand fashion. We planned new workspace arrangements to make the best use of our existing buildings to anticipate their suitability for hoteling, hybrid work engagements, and on-site meetings to bring teams together.

I am inspired by how we anticipated a future we never anticipated, and I spent the year reflecting with my leadership team to discern how this success will provide us with the guidance to anticipate the next future we never anticipated. Please join us in this process to make sure that we have the space and access to reliable biomedical information for your needs!

Giving Thanks Where Thanks is Due

One of the great joys of being the Director of the National Library of Medicine is the many opportunities for me to express gratitude. In the past, I have given thanks to NLM staff who are veterans (2021), for progress during my tenure (2020), and to our amazing NLM staff members (2019). This year, I am pausing to give thanks for the outstanding products and services developed and stewarded by our NLM staff, made available every day of the year to anyone with an internet connection—and even to some without!

First, I am thankful for our information collections in their many forms. The NLM Board of Regents oversees our Collection and Preservation Policy, which guides NLM as it meets its mission to acquire, organize, preserve, and disseminate biomedical knowledge from around the world. Our collection spans ten centuries from the 11th to the 21st, and ranges from the third oldest Arabic medical manuscript in existence to the “Rosetta Stone” of modern science, Marshall Nirenberg’s genetic chart, from genomic sequences essential for current and future research to information for mothers taking care of sick children.

Organizing the collections and making them findable and accessible builds on the knowledge of library and information science. This foundational knowledge means we can tag objects—real or virtual—with codes and terms that help with organization and retrieval. It also means we use our knowledge of library and information science to guide efforts to annotate and curate molecular data, literature citations, and images so they are accessible to the public. So I am grateful not only for the 66 miles of shelving that hold our precious objects, books, and journals here in Bethesda, but for the ever-powerful computer clouds that preserve our high-value research databases and 34 million bibliographic citations in PubMed. Libraries do more than house books; they use sophisticated knowledge to organize materials and make them readily available.

I am thankful for the ways that staff at NLM’s National Center for Biotechnology Information (NCBI) manages the submission, curation, and dissemination of our enormous genomic and molecular databases. From ClinVar (our collection of genomic sequences linked to clinical annotation) to the Sequence Read Archive (the world’s largest scientific data repository), our staff makes sure that depositors can effectively deposit data, scientific curators can conduct quality checks, and web and interface designers allow access to the data. A few years ago, the NCBI team led a cloud migration process to make available data from the entire 15-petabyte SRA resource on two commercial cloud providers. This bold step democratized sequence-based scientific inquiry and harnessed the computational power of cloud platforms, which contributed to industrial innovations and shortened the pathway for scientific discovery from days and months to minutes and hours. I am thankful for the role NLM plays in accelerating scientific advances and leveraging research resources for public health benefit.

NLM offers more than 1,000 easy-to-read health topic articles through our online consumer health information resource known as MedlinePlus. MedlinePlus is available in both English and Spanish, thereby assuring information access to speakers of two of the world’s most common languages. Through MedlinePlus Connect, our technical team also provides direct, tailored access to MedlinePlus resources automatically through electronic health records, patient portals, and other health information technology systems to deliver information from MedlinePlus to patients and providers at the point of care. I am thankful for the efforts of the MedlinePlus teams that bring timely and trusted information to the lives of everyone, everywhere.

I hinted earlier that there are two main pathways to access NLM products and services. Electronic access, supporting both human- and machine-readable forms, is by far the most common pathway to NLM. We also support the Network of the National Library of Medicine (NNLM) and its more than 8,000 members around the country in public, hospital, and academic medical center libraries to bring the power of NLM and its resources to the public. I am grateful for everyone who works as part of NNLM for their ability to bring NLM’s products and services to communities everywhere as well as how the needs and practices of those communities bring awareness of NLM.

As you pause this year in thanksgiving for the many public services that support you in everyday life, please remember to give thanks for NLM’s products and services. We think they are world class, and we are grateful for our ability to serve you.

When You Stand on the Shoulders of a Giant, What Do You See?

This blog contains my remarks from the 2022 Lindberg-King Lecture and Scientific Symposium: Science, Society, and the Legacy of Donald A.B. Lindberg, M.D., which took place on September 1, 2022. Watch a recording of the event here.

I had the great fortune of becoming the director of the National Library of Medicine immediately following the 30-plus-year tenure of Donald A.B. Lindberg, MD. I am sure that each of you here today treasures your own recollection of Don, maybe from a conversation or a laugh you may have had with this great leader, teacher, visionary, and colleague (and husband to Mary, father, grandfather, and friend). I am both proud and humbled to stand on the shoulders of this giant as I lead this incredible organization.

I know more viscerally than most about Don’s legacy as NLM director. I sit in the office he occupied, I walk the halls he walked, I work with the people he hired, and I see and experience the fruits of his judgement, investments, and vision.

I now sit where Don once sat, representing NLM at the leadership table of NIH with the other Institute and Center directors. With Don paving the way, I have a platform to extend NLM’s thought leadership and technical knowledge to guide NIH’s continued efforts to advance data-driven discovery. The good will and collaborative spirit engendered by Don across NIH opened doors for me and helped me continue his legacy to deliver on the promise of science accelerated by broad access to literature and data.

Don and I share a deep commitment to ensuring that the public benefits from NLM’s efforts to assemble, organize, preserve, and disseminate biomedical knowledge for society. It was his early vision that made MedlinePlus a trusted resource for consumer health information and ensured that the PubMed citation database and the PubMed Central full-text literature repository were open and accessible to everyone, everywhere, with an Internet connection, at any time and place.  

Don’s commitment to the public was also evident in his efforts to educate the next generation of biomedical informatics scholars. Frankly, I believe that of all of the aspects of his job, engagement with trainees was his favorite!

When you stand on the shoulders of a giant, you have a great advantage. The foundation Don built and the relationships he established provided me, the 4th appointed director of NLM, with a playbook right out of the gate. It is not enough to solely rely on his vision to guide our future as Don also inspired innovation; in one of our last conversations, he said to me, “This is your game—make sure you play it well!” In order to do that, I cannot simply stand on the shoulders of a giant; I must also keep my head up and my eyes forward to the future to envision new pathways and find new opportunities to bring forward the riches of NLM to the future benefit of science and society.

I close by inviting all of you to stand on the shoulders of this giant and meld your sights with his, for it is not by holding tight to that which he could see, but by using his vision as a stepping-off point for our own that will serve his legacy.

RADx-UP Program Addresses Data Gaps in Underrepresented Communities

Guest post by Richard J. Hodes, MD, Director, National Institute on Aging, and Eliseo Pérez-Stable, MD, Director, National Institute on Minority Health and Health Disparities, NIH.

A few months into the COVID-19 pandemic, we shared how NIH was working to speed innovation in the development, commercialization, and implementation of technologies for COVID-19 through NIH’s Rapid Acceleration of Diagnostics (RADx) initiative.

Two years later, one of the RADx programs—RADx Underserved Populations (RADx-UP)—reflects on lessons learned that have broken the mold of standard research paradigms to address health disparities.

Use of Common Data Elements

RADx-UP has presented unique challenges in terms of data collection, privacy concerns, measurement standardization, principles of data-sharing, and the opportunity to reexamine community-engaged research. Establishment of Common Data Elements (CDEs)—standardized, precisely defined questions paired with a set of allowable responses used systematically across different sites, studies, or clinical trials to ensure that the whole is greater than the sum of its parts—are not commonly used in community-engaged research. Use of CDEs enables data harmonization, aggregation, and analysis of related data across study sites as well as the ability to investigate relationships among data in unrelated data sets. CDEs can also lend statistical power to analyses of data for small subpopulations typically underrepresented in research.

RADx-UP is a community-engaged research program that builds on years of developing partnerships between communities and scientists. RADx-UP has funded 127 research projects with sites in every state and six U.S. territories as well as a RADx-UP Coordination and Data Collection Center (CDCC). RADx-UP assesses the needs and barriers related to COVID-19 testing and increase access to COVID-19 testing in underserved and vulnerable populations experiencing the highest rates of disparities in morbidity and mortality.

The COVID-19 pandemic necessitated establishing RADx-UP and its associated CDEs with unprecedented speed relying heavily on data elements derived from those already defined in the NIH-based PhenX Toolkit and Disaster Research Response (DR2) resources. The short time frame for this process did not allow for as extensive collaboration and input from RADx-UP investigators and community partners that would have been ideal. Additionally, many researchers, especially community partners engaged in RADx-UP projects, were not familiar with CDE data collection practices. As a result, CDE questionnaires had to be modified as studies progressed to better suit the needs of the consortium and investigators new to CDE collection had to be familiarized with these processes quickly. NIH program officers, NIH RADx-UP and CDCC leadership and engagement impact teams (EITs)—staff liaisons provided by the CDCC that link RADx-UP research teams to testing, data, and community-engagement resources—helped research teams implement and adjust CDE collection, ensured alignment across consortium research teams, and assisted with other data-related issues that arose.

All RADx programs are required to collect a standardized set of CDEs, including sociodemographic, medical history, and health status elements with the intent to provide researchers rapid access to data for secondary research analyses in the RADx Data Hub, the central repository for RADx data. However, implementation of CDEs in the context of underserved communities in the rapidly evolving COVID-19 pandemic presented complex issues for consideration.

Some of these issues included data privacy, the risk of re-identification of underserved and undocumented populations, and data collection burden on participants as well as researchers. The privacy of health data is protected under federal law. The RADx-UP program instituted measures to ensure program participants’ data remain protected and de-identified using a token-based hashing algorithm methodology that allows researchers to share individual-level participant data without exposing personally identifiable information. To address data collection and respondent burden concerns, projects modified questions to allow some flexibility in expanding response options more appropriate to some underserved communities. The CDCC also developed COLECTIV, a digital interface for projects to directly enter data into the data repository and included gateway questions to relieve respondent burden.

Respect for Tribal Data Sovereignty

RADx-UP leadership and investigators recognized that additional considerations for tribal sovereignty, practices, and policies needed to be addressed for projects that include American Indian and Alaska Native (AI/AN) participants. Through consultations with the NIH Tribal Advisory Committee and the broader AI/AN community and meetings with an informal RADx-UP AI/AN project working group established by the CDCC, NIH realized that deposition of tribal data into the RADx Data Hub would not meet the cultural, governance, or sovereignty needs of AI/AN RADx research data. In response, NIH hopes to establish a RADx Tribal Data Repository (TDR) responsible for the collection, protection, and sharing of data collected in AI/AN communities with respect for the practices and policies of Tribal data sovereignty. Applications for the repository have been solicited and NIH hopes to make an award for the TDR sometime in FY23.

Rapid Data Sharing

One of the largest hurdles the RADx-UP program has faced is implementing rapid sharing of research data for secondary analyses and to inform decision-making and public health practices related to the COVID-19 pandemic. RADx-UP research teams are expected to share their data on a timely cadence before data collection ends. This is a far more stringent practice relative to the current standard NIH data-sharing policy that requires data to be shared at the time of acceptance for publication of the main findings from the final data set. NIH and CDCC staff have worked together with the RADx research community to highlight the importance of and compliance with rapid data-sharing. Within the first six months, a total of 69 Phase 1 projects began transmitting CDE data to the RADx-UP CDCC. The COVID-19 pandemic posed a tremendous challenge, and NIH responded by collaborating with vulnerable and underserved communities. This collaboration has opened an unprecedented opportunity to build on a now established foundation for future research to address gaps in understanding the broader social, cultural, and structural factors that influence disparities in morbidity and mortality from COVID-19 and other diseases. Data collection and sharing efforts of the RADx-UP initiative comprise a significant contribution. Collaboration among the NIH, research investigators, and communities impacted by COVID-19 has been the catalyst. To learn more about RADx-UP, please visit a recent journal article available on PubMed.


Dr. Hodes has served as NIA director since 1993, overseeing studies of the biological, clinical, behavioral, and social aspects of aging. He has devoted his tenure to the development of a strong, diverse, and balanced research program focused on the genetics and biology of aging, basic and clinical studies aimed at reducing disease and disability, and investigation of the behavioral and social aspects of aging. Ultimately, these efforts have one goal — improving the health and quality of life for older people and their families. As a leading researcher in the field of immunology, Dr. Hodes has published more than 250 peer-reviewed papers.

Dr. Pérez-Stable practiced primary care internal medicine for 37 years at the University of California, San Francisco before becoming the Director of NIMHD in 2015. His research interests have centered on improving the health of individuals from racial and ethnic minority communities through effective prevention interventions, understanding underlying causes of health disparities, and advancing patient-centered care for underserved populations. Recognized as a leader in Latino health care and disparities research, he spent 32 years leading research on smoking cessation and tobacco control in Latino populations in the United States and Latin America. Dr. Pérez-Stable has published more than 300 peer-reviewed papers.

Using Large Datasets to Improve Health Outcomes

Guest post by Lyn Hardy, PhD, RN, Program Officer, Division of Extramural Programs, National Library of Medicine, National Institutes of Health.

Before the advent of algorithms to determine the best way to treat and prevent heart disease, a health care provider looking for best practices for their patients may not have had the resources to find that best method. Today, health care decision-making for individuals and their health care providers is made easier by predictive and preventive models, which were developed with the goal of guiding the decision-making process. One example is the Patient Level Prediction of Clinical Outcomes and Cost-Effectiveness project led by Columbia University Health Sciences.

These models are created using computer algorithms (a set of rules for problem-solving) based on data science methods that analyze large amounts of data. While computers can analyze facts within the data, they rely on human programming to define what pieces of data or what data types are important to include in the analysis to create a valid algorithm and model. The results are translated into information that health care providers can use to understand patterns and provide methods for predicting and preventing illness. If a health care provider is looking for ways to prevent heart disease, an accurate model might describe methods—like exercise, diet, and mindfulness practices—that can achieve that goal.

Algorithms and models have benefited the world by using special data science methods and techniques to understand patterns that guide clinical decisions, but identifying data used in their development still requires practitioners to be conscious of the results. Research has shown that algorithms and models can be misleading or biased if they do not account for population differences like gender, race, and age. These biases, also known as algorithmic fairness, can adversely affect the health of underserved populations by not giving individuals and health care providers information specific to and that directly addresses their diversity. An example of potential algorithm bias is creating an algorithm to treat hypertension without including variated treatments for women or considering life-related stress or the environment.

Researchers are focusing on methods to create fair and equitable algorithms and models to provide all populations with the best and most appropriate health care decisions. Researchers in our NLM Extramural Programs analyze this data through NLM funding opportunities that foster scientific inquiry so we better understand algorithmic effects on minority and marginalized populations. Some of those funding opportunities include NLM Research Grants in Biomedical Informatics and Data Science (R01 Clinical Trial Optional) and the NIH Research Project Grant (Parent R01 Clinical Trial Not Allowed).

NLM is interested in state-of-the-art methods and approaches to address problems using large health data sets and tools to analyze them. Specific areas of interest include:

  • Developing and testing computational or statistical approaches to apply to large or merged health data sets containing human and non-human data, with a focus on understanding and characterizing the gaps, errors, biases, and other limitations in the data or inferences based on the data.
  • Exploring approaches to correct these biases or compensate for missing data, including introducing debiasing techniques and policies or using synthetic data.
  • Testing new statistical algorithms or other computational approaches to strengthen research designs using specific types of biomedical and social/behavioral data.
  • Generating metadata that adequately characterizes the data, including its provenance, intended use, and processes by which it was collected and verified.
  • Improving approaches for integrating, mining, and analyzing health data in a way that preserves that data’s confidentiality, accuracy, completeness, and overall security.

These funding opportunities encourage inquiry into algorithmic fairness to improve health care for all individuals, especially those who are underserved. By using new research models that account for diverse populations, we will be able to provide data that will support the best treatment outcomes for everyone.

Dr. Hardy’s work and expertise focus on using health informatics to improve public health and health care decision-making. Dr. Hardy has held positions as a researcher and academician and is active in national informatics organizations. She has written and edited books on informatics and health care.

%d bloggers like this: