Gearing Up for 2023 Part II: Implementing the NIH Data Management and Sharing Policy

This blog post is by Lyric Jorgenson, PhD, the Acting Director of the NIH Office of Science Policy. It was originally posted on May 12 on the NIH Office of Science Policy Under the Poliscope blog. We encourage you to read it and submit comments and feedback on the draft supplemental information to the NIH Policy for Data Management and Sharing: Protecting Privacy When Sharing Human Research Participant Data by June 27.

Sequels are all the rage these days.  I figure if Marvel can make endless “Avengers” movies, I could start making blog sequels.  Back in the beginning of the year, I wrote Part I of this blog series about how NIH is working to implement the new NIH Data Management and Sharing Policy (DMS Policy).  I mentioned at that time that additional resources were forthcoming.

I should note that when we started to receive comments on what was to become the NIH DMS Policy, one thing in particular stood out to us.  Many commentors told us it would be helpful to have clear information on how to protect the privacy and respect the autonomy of participants when sharing data.  Now, we all know that cliffhangers build anticipation, so without further delay, I want to share with you some of the tools NIH has been working on to answer that call.

First, if you have seen the Avengers movies, you likely will have noticed that they tend to introduce a new villain that the team needs to battle with either new tools (think of OSP with Thor’s Stormbreaker axe) or the help of new superheroes like Captain Marvel. While not exactly a new villain, the lack of consistent consent language to facilitate secondary research with data and biospecimens is certainly a challenge many of our stakeholders have raised and one that we thought we could help address.

NIH has a long history of developing consent language and, as such, our team worked across the agency – and with you! – to develop a new resource that shares best practices for developing informed consents to facilitate data/biospecimen storage and sharing for future use.  It also provides modifiable sample language that investigators and IRBs can use to assist in the clear communication of potential risks and benefits associated with data/biospecimen storage and sharing.  In developing this resource, we engaged with key federal partners, as well as scientific societies and associations.  Importantly, we also considered the 102 comments from stakeholders in response to a RFI that we issued in 2021.

As for our second resource, we are requesting public comment on protecting the privacy of research participants when data is shared. I think I need to be upfront and acknowledge that we have issued many of these types of requests over the last several months and NIH understands the effort that folks take to thoughtfully respond.  With that said, we think the research community will greatly benefit from this resource and we want to hear your thoughts on whether it hits the mark or needs adjustment.

When reviewing the document, please bear in mind that the main purpose is to provide researchers with information on:

  • Operational Principles for Protecting Participant Privacy when Sharing Scientific Data
  • Best Practices for Protecting Participant Privacy when Sharing Scientific Data
  • Points to Consider for Designating Scientific Data for Controlled Access

Comments on the draft will be accepted until June 27, 2022, and full information and how to submit a comment can be found here.

Finally, every sequel needs a twist ending! In November 2021, NIH published a request for comments on the future directions of the NIH Genomic Data Sharing Policy.  We are still reviewing the many points and perspectives that were raised, but while we consider next steps, the comments we received are now available on the OSP website.  Okay, so maybe that twist wasn’t as big as, say, Darth Vader revealing he is (spoiler alert) Luke’s father in The Empire Strikes Back, but it’s still pretty good for the science policy world.

With a little more than half a year left until the implementation date of the NIH DMS Policy, we will continue to provide updates and resources over the next several months.

Midnight in the Library

Right now, I am reading The Midnight Library by Matt Haig. It’s a fanciful story of a woman in limbo between life and death who finds herself in a magical library, and each book represents one of the lives she could have lived had she made even one tiny different decision. She then finds herself in many of these lives, experiencing what could have been.

This book got me thinking about how NLM helps people experience lives that could be. I see this on two levels:

The first is the scientific pathway: What if . . . ? What if we knew more about the interactions between evolutionary forces and molecular constraints (like the work of Aravind Iyer, PhD), or fully appreciated the potential of proteins for genome engineering (like the discoveries made by Eugene Koonin, PhD), or could envision how and why proteins fold or switch their folds (as explored by Lauren Porter, PhD), or had the power to enable machines to understand human thought (like the research from Dina Demner-Fushman, MD, PhD). In addition to the discoveries by our NLM intramural researchers, our vast literature and data repositories hold answers that could change lives: why some genetic structures lead to human characteristics, or why a certain biochemical compound helps prevent infection. We help scientists discover these pathways and connections by providing them with the tools to uncover what could be.

The second is how NLM helps people see their what if using the amazing richness of the resources that we make available through our collections. Our resources—which encompass clinical insights, medical information, care guidelines, and self-management—help clinicians determine how to care for people with complex diseases or diagnose an illness in a timely manner. Our repository of clinical information available through PubMed ensures that those in need can access well-reasoned, recognized guiding principles for their care, and our MedlinePlus web resource provides patients and their families and friends with reliable, up-to-date health information to support and encourage healthy behavioral changes.

As in The Midnight Library, books alone do not inspire discovery, guide clinical care, or inform self-management. In Haig’s novel, a fictional librarian who knows the collection shows the main character how to select books by carefully listening to her goals and needs. It is the main character’s engagement with the books that helps her explore the lives she could have lived. At NLM, we too have librarians—located in Bethesda, Maryland, and around the country through NLM’s Network of the National Library of Medicine—who organize the library’s collections and guide patrons toward the best choice of resources. Our resources must be findable, accessible, interoperable, reusable, and actionable! And then, the person—scientist, clinician, patient—must actively engage with the material.

As we approach the future of data-powered health, guided by the NLM Strategic Plan (2017-2027), we will fulfill our mission to collect biomedical literature, organize it, preserve it, and make it accessible to the world. As the knowledge of health and biomedicine continues to grow faster than we can process, we will turn our attention to applying emerging tools, including machine learning and artificial intelligence, to make it easier to find our materials and more efficient to examine them. Through our Extramural Programs, we will continue to stimulate new ways of presenting information to scientists, clinicians, patients, and the public so they can explore possible lives to be lived and test out their promise of better health for society. What lives can we help you explore?

Request for Public Comment: National AI Research and Development Strategic Plan

This blog post by Lynne Parker, Director, National AI Initiative Office, and Rashida Richardson, Senior Policy Advisor for Data and Democracy, was originally posted on the White House OSTP blog.

We encourage you to read it and submit comments on the update to the National Artificial Intelligence Research and Development Strategic Plan by Friday March 4, 2022.

Artificial Intelligence (AI) is becoming more prevalent in all of our lives. It powers all kinds of tools, from the digital assistants that answer questions on your phone, to breakthroughs in reading X-rays to better spot cancers. The so-called “intelligence” is the result of powerful computers sorting through mountains of data to find patterns, using algorithms designed and optimized by computer scientists.

Like all technology, AI is far from perfect. As we have started using AI for consequential decisions, we have realized that while AI can improve decision making, it too often compounds historical patterns of bias and deepens existing inequality. AI’s reliance on biased data or design processes has led to systems that produce discriminatory, or otherwise harmful, outcomes.

The Office of Science and Technology Policy is engaged in understanding the extraordinary promise of AI as well as its pitfalls. OSTP’s National AI Initiative Office (NAIIO) helps coordinate Federal activities in AI across government. OSTP is co-chairing the National AI Research Resource Task Force to answer Congress’s call to propose a vision for equitably expanding the research community’s access to the computing power, data, and testbed resources necessary to do AI research. OSTP has issued a call for the development of an AI Bill of Rights, and is working closely with both domestic and international partners across bilateral and multilateral venues to advance development, adoption, and oversight of AI in a manner that aligns with our democratic values.

Given the transformative potential of AI, we know it is critical that the American public have a voice in how this technology is used and governed. In late 2020, we initiated a public engagement process that included public listening sessions, a request for information on AI-enabled biometric technology, and stakeholder engagement meetings. Today, our National AI Initiative Office, in coordination with the Networking and Information Technology Research and Development Program of the National Science and Technology Council, is seeking public comments about how we should revise the National Artificial Intelligence Research and Development Strategic Plan. First published in 2016 and updated in 2019, the National AI R&D Strategic Plan identifies scientific and technological needs for AI innovation and investment priorities for Federally-funded AI research. In preparation for the Congressionally mandated 2022 Strategic Plan, this request for information seeks input on the goals, priorities, and metrics that Federal agencies should use to guide AI research and development. 

OSTP’s mission is to “maximize the benefits of science and technology to advance health, prosperity, security, environmental quality, and justice for all Americans.” Our work in AI is intended to maximize its benefits while ensuring that AI-driven systems do not cause harm or impede our pursuit of American ideals.

Through DS-I Africa, NIH is Fostering a New Health Data Science Community

Guest post by Laura K. Povlich, PhD, Program Director at the NIH Fogarty International Center (FIC) and Tiffani B. Lash, PhD, Program Director for the NIH National Institutes of Biomedical Imaging and Bioengineering (NIBIB). They co-coordinate the DS-I Africa program with assistance from a trans-NIH Working Group that includes Patricia Brennan, RN, PhD, Director of NLM; Roger Glass, MPH, MD, PhD, Director of FIC; Joshua A. Gordon, MD, PhD, Director of the National Institute of Mental Health; and Bruce Tromberg, PhD, Director of NIBIB.

Advances in data science and data ecosystems that support the mission of the NIH are reshaping biomedical and behavioral research. Enhanced international data ecosystems not only have the potential to support improved healthcare and public health domestically but could also be transformative in low- and middle-income countries. As a step in realizing this potential, the NIH Common Fund established the Harnessing Data Science for Health Discovery and Innovation in Africa (DS-I Africa) program.

The purpose of this program is to leverage data science technologies and prior NIH investments to develop solutions to the continent’s most pressing medical and public health problems through a robust ecosystem of new partners from academic, government, and private sectors.

Building off the success of a virtual symposium in 2020, the NIH recently invested over $74 million over five years to support 19 DS-I Africa awards that will conduct research and training activities across the continent. The DS-I Africa Open Data Science Platform and Coordinating Center, led out of the University of Cape Town, will catalyze and support the unique continental network of health data scientists, innovators, and researchers that work across the DS-I Africa program. This award will coordinate the 19 awards as a collaborative research consortium that benefits from shared resources and knowledge. Additionally, the Open Data Science Platform will develop into a scalable gateway that aims to lower some of the barriers to collaboration by democratizing access to data and tools.

The DS-I Africa consortium includes African-led multidisciplinary and multisectoral research hubs with projects in several important areas such as anti-microbial resistance, SARS-CoV-2, climate change, mental health, multi-disease morbidity, and more. Research training programs will build the next generation of African health data scientists and innovators. Lastly, research projects on the ethical, legal, and social implications of health data science from an African perspective are a key component of DS-I Africa and will further the policy discussions of these issues on the continent. The consortium will expand throughout the life of the program with the goal of bringing in new partners through pilot projects and possibly other funding mechanisms.

We are excited to see the DS-I Africa consortium grow and to stay apprised of opportunities to connect with other data science communities around the world. Many funding organizations see the potential for data science to transform medicine and public health in Africa, and we hope additional investments will have a synergistic effect in strengthening the health data science ecosystem in Africa. For more information about the DS-I Africa research studies visit the Harnessing Data Science for Health Discovery and Innovation in Africa Funded Research page.

In addition to her work at the NIH FIC with DS-I Africa, Dr. Povlich also co-coordinates Human Heredity and Health in Africa (H3Africa), which is another NIH Common Fund program. She earned both a BSE in Materials Science and Engineering and a PhD in Macromolecular Science and Engineering from the University of Michigan. Dr. Povlich was previously a Science & Technology Policy Fellow for the American Association for the Advancement of Science (AAAS).

Dr. Lash is the Program Director for the NIH Rapid Acceleration of Diagnostics Tech and Advanced Technology Platforms initiative, NIH Technology Accelerator Challenge and the NIBIB Point of Care Technologies Research Network. Her research portfolio includes Point of Care Technologies and Digital Health, both with the goal of developing biomedical technologies through collaborative efforts that merge scientific and technological capabilities with clinical need. Dr. Lash has been selected as a science policy fellow for both the AAAS and the National Academy of Engineering. Dr. Lash earned her PhD in Physical Chemistry from North Carolina State University

NLM is a Leader in Using AI to Improve User Experiences

Guest post by Dianne Babski, Associate Director for Library Operations at NLM; Zhiyong Lu, PhD, Senior Investigator for the Computational Biology Branch at NLM’s National Center for Biotechnology Information (NCBI); and Donald Comeau, PhD, Staff Scientist at NCBI.

In January 2021, the Department of Health and Human Services (HHS) released their Artificial Intelligence (AI) Strategy to help agencies best use AI to advance the health and wellbeing of all Americans. NIH has long collaborated and invested in AI-based projects to discover health solutions across research and medical settings, including the analysis of biomedical imaging to diagnose diseases such as COVID-19.

For many years, NLM has been enthusiastic about the promise and possibilities of AI. You can learn more about NLM’s awareness of and use of AI through some of our recent Musings posts, including: Artificial Intelligence, Imaging, and the Promising Future of Medicine; How NIH is Using Artificial Intelligence to Improve Operations; and NIH Strategically, and Ethically Building a Bridge to AI.

In support of the HHS AI Strategy, we’d like to share a few examples of how NLM is using AI to revolutionize our products and services to enhance usability and discovery of biomedical information.

Image depicting the algorithm behind the Best Match Service.
Figure 1: Best Match algorithm depiction
Graphic by: Donald Bliss, NLM

Best Match is a relevance search algorithm for NLM’s PubMed – a free search engine for biomedical literature accessed by millions of users around the world every day. This AI technique leverages the intelligence of our users and cutting-edge machine-learning technology as an alternative to the traditional date sort order that appears in many traditional search engines. Trained with past user searches with dozens of relevance-ranking factors, the Best Match algorithm demonstrates state-of-the-art retrieval performance and an improved user experience. Best Match increases the effectiveness of PubMed searches across the rapidly growing collection of biomedical literature to help users efficiently find the most relevant and high-quality information they need.

SingleCite is another automated search algorithm designed to improve single citation searches in the PubMed database. It predicts the probability of a retrieved document being the target of a query based on predefined variables. This helps increase the effectiveness of PubMed searches by making a user’s search for a specific document in PubMed more successful.

Computed Author is a machine-learning method that solves for irrelevant retrieval results in PubMed due to author name ambiguity (where different authors share the same name). When users search based on author names, Computed Author uses an algorithm to sort out papers with multiple authors with the same name, and cluster at the top of your results the articles that are likely by the same author. Again, the result is increased effectiveness of PubMed searches supporting NLM’s mission to advance health research and discovery.

Fully automated Medical Subject Headings (MeSH) indexing in NLM’s flagship bibliographic database, MEDLINE, is one of our most recent AI advancements. Automated indexing has been under development at NLM for many years, and the most significant outcome is the development of the NLM Medical Text Indexer (MTI). The MTI algorithm has been undergoing refinements as we move towards automation, including incorporation of deep learning approaches to improve the application of MeSH subheadings, the incorporation of rules and triggers for the indexing of publication types, and the application of Indexing Method designation. Automated indexing greatly expedites the time needed to access MeSH indexing metadata and allows NLM to scale MeSH indexing for MEDLINE to the volume of published biomedical literature.

Gene and chemical indexing are also part of automated MEDLINE indexing efforts to improve literature retrieval and information access. Currently, gene and chemical indexing is performed manually by expert indexers. To assist this process, we are using advanced Natural Language Process and deep learning methods to develop NLM-Gene and NLM-Chem— automatic tools for finding gene and chemical names in the biomedical literature.

We are very excited about our efforts to leverage AI and the advances we have made. Looking forward, we will strive to lead AI innovation in partnership with HHS, NIH, and the broader research community to ensure that we continue to meet our mission to accelerate biomedical discovery and data-powered health.

Do you have ideas for how we can harness AI to improve our projects? What are some of the ways you are using AI to improve products and services?

Dianne Babski is responsible for overall management of one of NLM’s largest divisions with more than 450 staff who provide health information services to a global audience of health care professionals, researchers, administrators, students, historians, patients, and the public. She oversees budget, facilities, administration, and operations, including of a national network of more than 8,000 academic health science libraries, hospital and public libraries, and community organizations to improve access to health information.

Zhiyong Lu is a senior investigator (tenured) at the NLM Intramural Research Program, and he leads research in biomedical information retrieval, natural language processing, and machine learning. As NCBI’s Deputy Director for Literature Search, Dr. Lu directs the research and development efforts to improve PubMed search and information access. Over the years, Dr. Lu has co-authored around 300 scientific publications and mentored 40 trainees, many of whom have gone on to independent faculty/research positions. Dr. Lu is a fellow of the American College of Medical Informatics.

Donald Comeau is a Staff Scientist working in Dr. Lu’s Text Mining Research group at the NLM’s NCBI. His primary responsibilities include identifying key phrases in PMC articles and NCBI Bookshelf. His research projects focus on applying text mining, machine learning, and natural language processing techniques to improving access to NCBI’s biomedical literature collections. Dr. Comeau earned his PhD in Physical Chemistry at Ohio State University,

%d bloggers like this: