Pursuing Data-Driven Responses to Public Health Threats

In my 11th grade civics class, I learned about how a bill becomes a law, and I‘ll bet some of you can even remember the steps. Today, I want to introduce you to another way that the federal government takes actions – executive orders. As head of the executive branch, the president can issue an executive order to manage operations of the federal government.

In light of the COVID-19 pandemic, President Biden has issued executive orders to accelerate the country’s ability to respond to public health threats.

This is where I come in. As Director of the National Library of Medicine (NLM) and a member of the leadership team of the National Institutes of Health, I’m part of a group developing the implementation plan for the Executive Order entitled Ensuring a Data-Driven Response to COVID-19 and Future High-Consequence Public Health Threats.

This order directs the heads of all executive departments and agencies to work on COVID-19 and pandemic-related data issues. This includes making data that is relevant to high-consequence public health threats accessible to everyone, reviewing existing public health data systems to issue recommendations for addressing areas for improvement, and reviewing the workforce capacity for advanced information technology and data management. And, like all good government work, a report summarizing findings and providing recommendations will be issued.

Since March 2021, I have been meeting 2 to 3 times a month with public health and health data experts across the U.S. Department of Health & Human Services (HHS). Our committee includes staff from the Office of the National Coordinator for Health Information Technology, Food and Drug Administration, Centers for Disease Control and Prevention, Centers for Medicare & Medicaid Services, and Office of the Assistant Secretary for Planning and Evaluation.

After creating a work plan, our group arranged briefings with many other groups, including public health officials from states and territories, representatives from major health care systems, and the public, among others. We reviewed many initiatives to promote open data, data sharing, and data protection across the government sphere. We learned about the challenges of developing and adopting data standards, and the ability of different groups to come together to make data more useful in preparing the country to anticipate and respond to high-consequence public health threats. We discussed future strategies for data management and data protection, new analytical models, and workforce development initiatives. Our working group provided a report to the Office of Science and Technology Policy (OSTP), handing it off to the next team who will take the work process and keep moving it toward completion. In coordination with the National Science and Technology Council, OSTP will develop a plan for advancing innovation in public health data and analytics.

This was a beneficial experience for me, and I certainly learned a great deal. Implementing a public health response system requires engagement with many HHS divisions, each of which brings a unique perspective and experience. I also developed new relationships based on trust and collaboration with these colleagues. At NLM, we have experts in data standards and data collection, and we oversee vast data repositories, so we have substantial domain-specific knowledge to contribute. I drew frequently on the knowledge and expertise of NLM staff to inform the process through analyses of information and the preparation of reports. I am grateful for all who helped and supported me.

I believe our country is prepared to have the data necessary to prevent, detect, and respond to future high-consequence public health threats. This is yet another way that NLM is helping shape data-powered health for the future. What else can we do for you?

What Did You Do with Your Summer Vacation?

Well, if you are spending the summer at the NIH, you’ve likely been engaged in one of our many activities designed to access critical data and advance our understanding of the human experience by linking data sets together. Today, we are inviting you to engage in some additional best practices in accessing controlled data in ways that support science and preserve privacy.

In 2020, the NIH Scientific Data Council charged its Working Group for Streamlining Access to Controlled Data to spend a year engaging in dialogue within the NIH and with our extramural colleagues to better understand the experiences of scientists and the strategies that both facilitate and impede access to data. The group also considered where in the research process NIH should inform, engage, and gain consent of participants sufficiently to support science driven by access to controlled datasets.

NIH stores and facilitates access to many datasets, both open and controlled, with the goal of accelerating new discoveries and thereby maximizing taxpayer return on investment in the collection of these datasets. Data derived from humans that are shared through controlled-access mechanisms reflect NIH’s commitment to protect sensitive data and honor the informed consent provided by research participants in NIH-supported studies.

NIH has supported multiple controlled-access data repositories that uphold appropriate data protections for both human data and other sensitive data, while meeting the needs of various researcher communities. However, as data access requests increase, new repositories are established, and new mechanisms of providing access to data are developed, it is apparent that opportunities remain to improve efficiency and harmonization among repositories to make NIH-supported controlled-access data more FAIR: Findable, Accessible, Interoperable, and Reusable and to ensure appropriate oversight when data from different resources are combined. While these trends are enabling datasets and datatypes to be combined in new ways that advance the science, datasets, and datatypes that may or may not be controlled may, when combined, create inadvertent re-identification risks.

To help the agency address these issues in a way that is responsive to community needs, we are hosting a series of webinars through the end of July. We call these “breakout sessions” because they follow an outstanding webinar presented on July 9 available here. Richard Hodes, MD, director of the National Institute on Aging, launched the 3-hour seminar with a talk titled Opportunities for Advancing Research Through Better Access to Controlled Data. Ana Navas-Acien, MD, PhD, brought the perspective of indigenous and communities of people traditionally underrepresented in research, and she emphasized themes of community engagement and broadening the consent framework to consider community-level accountabilities as well as individual assent. Lucila Ohno-Machado, MD, MBA, PhD, addressed privacy preserving distributed analytics as a strategy to promote science while preserving privacy of data. Hoon Cho, PhD, described privacy-enhancing computational approaches to privacy preservation.

You can find the schedule for the breakout sessions below. These sessions are specifically designed to listen to the expectations, hopes, and concerns from researchers and participants. These webinars are free and open to the public; registration is required.

Breakout Session on “Making Controlled-Access Data Readily Findable and Accessible” on July 22 from 3 pm to 5:30 pm EST

Breakout Session on “General Opportunities for Streamlining Access to Controlled Data” on July 26 from 12:30 pm to 2 pm EST

Breakout Session on “Addressing Oversight, Governance, and Privacy Issues in Linking Controlled Access Data from Different Resources” on July 28 from 3 pm to 5:30 pm EST

To generate interest and hear from the broadest possible group of stakeholders, NIH has released a Request for Information on Streamlining Access to Controlled Data from NIH Data Repositories. Please note the closing date is August 9. We look forward to hearing from you! Please visit Streamlining Access to Controlled Data at the NIH for all of the information described in this post.

Finally, we would like to personally thank the many NIH staff members who serve on the working group:

  • Shu Hui Chen
  • Alicia Chou
  • Valentina Di Francesco
  • Greg Farber
  • Jamie Guidry Auvil
  • Nicole Garbarini
  • Lyric Jorgenson
  • Punam Mathur
  • Vivian Ota Wang
  • Jonathan Pollock
  • Rebecca Rodriguez
  • Alex Rosenthal
  • Steve Sherry
  • Julia Slutsman
  • Erin Walker
  • Alison Yao

I hope your summer vacation was as productive as ours!

(left to right)
Patricia Flatley Brennan, RN, PhD, NLM Director
Susan Gregurick, PhD, Associate Director for Data Science at NIH
Hilary S. Leeds, JD, Senior Health Science Policy Analyst for the Office of Science Policy at NIH

A Journey to Spur Innovation and Discovery

Guest post by Valerie Schneider, PhD, staff scientist at the National Library of Medicine’s National Center for Biotechnology Information, National Institutes of Health.

It’s been said that nature is the best teacher. When it comes to understanding human biology and improving health, examples abound of the advances that have been made from the study of a diverse set of non-human organisms. Over the last two centuries, the study of nematode worms has taught us about longevity and mRNAs (the biological molecule that is the basis for several COVID-19 vaccines), common fungi about cell division and cancer, and fruit flies about many things, from the role of chromosomes in heredity to our circadian rhythms. The ability to create targeted alterations in the genomes of model organisms has been transformative for studies to establish the function of specific genes in the etiology of human disease.

The modern era of genomic biology, in which genome sequencing and assembly are accessible to more researchers than ever before, provides data from an even greater range of organisms from which we might learn. Today, we rely not only on primate models, but on a whole host of species: for example, swine to understand organ transplantation, songbirds to understand vocalization and learning, and bats and pangolins to teach us about the evolution of the SARS-CoV-2 virus and how to fight its spread.

These rapidly growing collections of sequence and other data on species across the tree of life offer enormous promise for discoveries that have the potential to improve human health. To better enable such discoveries, with the support of NIH, NLM is planning a major modernization of its resources and their underlying infrastructure.

This modernization will support the needs of users engaged in data search and retrieval, gene annotation, evaluation of sequence quality, and comparative analyses. The new infrastructure, user interfaces, and tools should result in an improved experience for researchers doing a wide range of work, and also facilitate better data submissions.

This revamping aligns with NIH’s Strategic Plan for Data Science, which provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem, as well as NLM’s Strategic Plan, which furthers NLM’s commitment to provide data and information to accelerate biomedical discovery and improve health. NLM and NIH are committed to providing researchers with modern, stable, and cloud-oriented technologies that support research needs.

Over the last few years, NLM has demonstrated this commitment by re-designing several flagship products, including the PubMed database for searching published biomedical literature, the ClinicalTrials.gov database of information on privately and publicly funded clinical trials, and the Basic Local Alignment Search Tool (BLAST) for finding regions of similarity between biological sequences. As part of NIH’s Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative, NLM also made the data from its massive (36 petabyte) Sequence Read Archive (SRA) available on two commercial cloud platforms, facilitating large-scale computational research that would otherwise be difficult for many researchers. Revamping these resources has positioned them to support both the current and future needs of NLM’s diverse audience of researchers, clinicians, data scientists, educators and others.

Importantly, this current initiative to modernize NLM products, tools, and services, and concurrently develop content, will include extensive engagement with the research community, just as we’ve done with previous re-design efforts. The NLM is committed to offering interfaces accessible to both novices and experts. Additionally, NLM believes a key part of the next generation of its data resources requires an infrastructure that supports an ongoing, dynamic exchange of content, including contributions of metadata and gene functional information from knowledge builders in the community to complement and enhance NIH-provided content.

Community engagement will also ensure that externally sourced content is provided in ways that maintain the high value and trustworthiness of the datasets. Additionally, data connections that make the content of this new resource accessible to external knowledgebases containing other datatypes, such as images, will further promote integrative data analyses that support scientific discovery.

Many opportunities exist to streamline processes, look across resources, and gain insights that will provide new ways of learning. Through NLM’s continued commitment to modernization initiatives, we are ready to again improve the user experience for accessing, analyzing and visualizing sequence data and related information. Nature continues to be our best teacher — and we are now poised to learn from her in an exciting new classroom.

We invite you to come on this journey with us.

Valerie Schneider, PhD, is the deputy director of Sequence Offerings and the head of the Sequence Plus program. In these roles, she coordinates efforts associated with the curation, enhancement, and organization of sequence data, as well as oversees tools and resources that enable the public to access, analyze, and visualize biomedical data. She also manages NCBI’s involvement in the Genome Reference Consortium, the international collaboration tasked with maintaining the value of the human reference genome assembly.

Health Data Standards: A Common Language to Support Research and Health Care

Guest post by Dianne Babski, Associate Director for Library Operations and Robin Taylor, MLIS, National Library of Medicine

Every day we benefit from data standards, and every day most of us don’t even notice it! Did you wear a seatbelt today? Take a precise dose of medicine? Send an email? Plug a laptop into an outlet? These are examples of activities that are made possible through data standards. At NLM, we think a lot about data standards, particularly health data standards.

NLM partners with organizations such as the Office of the National Coordinator for Health Information Technology (ONC) to promote health data standards for data captured in electronic health records (EHRs), clinical research, and other health information systems. With a focus on how health data are collected, stored, described, and retrieved, health data standards make up the backbone of interoperability. This provides the ability to connect and seamlessly share data between computerized systems and allows for the information exchange between other applications and databases.

Let’s look at a current example where health data standards, a common data language, have had a real impact. When SARS-CoV-2, and the disease it causes, COVID-19, emerged in late 2019, researchers around the world began planning studies to figure out how to combat this global pandemic. Research questions, such as, “What date did the patient first display COVID-19 symptoms?” arose continuously. It sounds like a simple question, but there are so many ways to ask the question, and even more possible responses. If researchers apply health data standards in their investigations — if they ask questions and collect responses in a standardized way — the data they collect can be combined and compared with data from other COVID-19 studies and EHRs. This enables reuse of data across multiple sources, which increases statistical power and accelerates our understanding of this disease.  

For more than 20 years, NLM has served as the central coordinating body for clinical terminology standards nationally. Our long-standing efforts to establish common health terminology supported the COVID-19 response by allowing access to near-real time clinical information to guide the diagnosis, treatment, and prevention of this disease.

NLM supports multiple vocabulary standards and mappings, like RxNorm, SNOMED CT, and the UMLS, as well as terminology tools like AccessGUDID, DailyMed, MedlinePlus Connect, MetaMap, the Value Set Authority Center (VSAC), and the NIH CDE Repository, a database that provides access to structured human and machine-readable definitions of common data elements, more commonly referred to as CDEs.

CDEs are one type of health data standard that can help researchers normalize data across studies. CDEs are standardized, precisely defined questions that are paired with a set of specific allowable responses, then used systematically across different sites, studies, or clinical trials to ensure consistent data collection.

CDEs are in use across NIH, to varying degrees. Some NIH institutes and Centers have had mature CDE programs for years; others are just beginning to develop. NLM has been involved with CDEs since 2012 and plays a key role in encouraging CDE adoption across NIH by:

  • Hosting the NIH CDE Task Force (CDETF), a trans-NIH community of practice.
  • Forming a CDE Governance Committee that reports to the CDETF. The committee’s primary charge is to decide whether common data elements submitted to them by NIH recognized bodies (NIH Institutes, offices, etc.) meet criteria that merit their recommendation for use in NIH-funded research.
  • Maintaining the NIH CDE Repository, a central access point to data elements that have been recommended or required by NIH Institutes and Centers for use in research and for other purposes. In 2020, we completed a usability study of the NIH CDE Repository and have been implementing enhancements based on the recommendations.

This year, while continuing to enhance the usability of the NIH CDE Repository, we will also engage with users through a CDE awareness and training campaign.

Ms. Babski is responsible for overall management of one of NLM’s largest divisions with more than 450 staff who provide health information services to a global audience of health care professionals, researchers, administrators, students, historians, patients, and the public.

Robin Taylor, MLIS, joined NLM in 2016. Since 2018, she has been the lead for the NIH Common Data Elements Repository.

DOCLINE: Connecting Medical Libraries for 35 Years

Guest post by Lisa Theisen, Head of NLM’s Collection Access Section and Elisabeth (Lis) Unger, NLM DOCLINE Team Lead

It’s been 35 years since NLM’s interlibrary loan (ILL) request routing system, DOCLINE®, was launched with a goal of enabling medical libraries to get biomedical literature into the hands of people who need it as efficiently and quickly as possible. Today, DOCLINE continues to be used daily by nearly 2,000 hospital, academic, military, public, and other libraries that place approximately one million requests a year, including requests for newly published research not freely available online.

DOCLINE’s foundation and success stems from NLM’s collaboration with the Regional Medical Libraries of the Network of the National Library of Medicine (NNLM) to support resource sharing among the medical library community. Resource sharing through ILL means that participating libraries don’t have to own as many books and journals or collect as broad a range of topics because they can borrow from each other. Full participation is limited to libraries in the NNLM and Canada, but some international libraries use the system to place requests directly with NLM.

DOCLINE service is fast and use of the system is free. This service allows a wide range of libraries, including hospital libraries (which account for 60% of DOCLINE participants), to obtain articles for their patrons that are not in their own collections.

This is where DOCLINE fills a critical gap by connecting a wide network of librarians who are always ready to help each other out, often without charge. Without DOCLINE, access to literature outside of a library’s collection is severely curtailed.

When DOCLINE first launched on mainframe computers in 1985, finding a ‘copy’ of an article or a library with the right issue of a print journal was not as easy as performing a simple search online. If you had a modem and access to an NLM account, you might check SERHOLD, the NLM database of medical libraries’ serial holdings – or journal titles libraries report subscribing to. Then you could mail, or maybe fax, an ILL form to the library and request that they mail your library a photocopy of the article. 

Over the decades, DOCLINE evolved in response to technological advancements and user needs. Features and enhancements have been added to DOCLINE throughout the years to make the system faster and easier to use. DOCLINE has grown to include new ways to send copies of articles, such as emailing PDFs, and adapted to new ways that publishers offer content, including electronic journals and “epub ahead of print” articles found in NLM’s PubMed biomedical literature citation database, and borrowers now see alerts to free, full-text articles found in NLM’s PubMed Central (PMC) digital archive.

Around the turn of the century, DOCLINE 1.0 moved to the world wide web – at the same time email use was becoming more widespread. In 2003, DOCLINE 2.0 was released with a new user-friendly look and feel; in 2006 it was updated to allow a library to indicate “Urgent Patient Care” to expedite service for use in emergencies in the hospital setting. The latest version, DOCLINE 6.0, debuted in November 2018. The three core system components, 1) the user library records, 2) their collective biomedical journal listings, and 3) ILL requests, would still be familiar to a user of the original system, even though the website looks very different today. DOCLINE also includes indicators for supplementary data sets and journal embargoes which didn’t exist in its early days.

What made DOCLINE remarkable in 1985 and remains its most intricate, complex feature, is the efficient way in which requests are automatically matched to appropriate lenders based on their reported journal holdings. This ensures that DOCLINE’s average length of time to fill a request and the percentage of filled requests continues to be high compared to other ILL systems – advancing NLM’s mission of enabling biomedical research and supporting health care and public health. This means that clinicians who rely on medical librarians to obtain the most relevant and latest research articles cited in PubMed, for instance on COVID-19 treatments, can rely on DOCLINE.

Continued updates to DOCLINE underscore the commitment to advance NLM’s strategic goals to reach more people in more ways through enhanced dissemination and engagement, and to engage a wide range of audiences to ensure the “right information gets delivered to them at the right time.” For instance, in April of this year, a ‘Print Resources Available’ filter was added to the system to enable user librarians working remotely from home to connect with libraries that still had access to their physical collection.

In its 35-year history, over 65 million ILL requests have been completed by libraries using DOCLINE. NLM is proud to provide the system and values the work of libraries that generously and unflaggingly share with one another, making DOCLINE a system that has been widely embraced by the user community over the years. We are looking forward to what the next 35 years mean for DOCLINE – teleporting articles anyone?

Are you a part of the DOCLINE community? How has ILL helped you?

Lisa Theisen began serving as Head of the Collection Access Section in the Public Services Division in March 2020. Ms. Theisen has been at NLM for 13 years, supporting DOCLINE and NLM’s Interlibrary Loan (ILL) operation.

Elisabeth Unger, MLIS, joined NLM’s Public Access Division, Collection Access Section, Systems Unit in 2008 to support DOCLINE and NLM ILL after working at the National Agricultural Library. In 2005 she became DOCLINE Team Lead where she was responsible for the latest redesign and relaunch of the esteemed system.