Midnight in the Library

Right now, I am reading The Midnight Library by Matt Haig. It’s a fanciful story of a woman in limbo between life and death who finds herself in a magical library, and each book represents one of the lives she could have lived had she made even one tiny different decision. She then finds herself in many of these lives, experiencing what could have been.

This book got me thinking about how NLM helps people experience lives that could be. I see this on two levels:

The first is the scientific pathway: What if . . . ? What if we knew more about the interactions between evolutionary forces and molecular constraints (like the work of Aravind Iyer, PhD), or fully appreciated the potential of proteins for genome engineering (like the discoveries made by Eugene Koonin, PhD), or could envision how and why proteins fold or switch their folds (as explored by Lauren Porter, PhD), or had the power to enable machines to understand human thought (like the research from Dina Demner-Fushman, MD, PhD). In addition to the discoveries by our NLM intramural researchers, our vast literature and data repositories hold answers that could change lives: why some genetic structures lead to human characteristics, or why a certain biochemical compound helps prevent infection. We help scientists discover these pathways and connections by providing them with the tools to uncover what could be.

The second is how NLM helps people see their what if using the amazing richness of the resources that we make available through our collections. Our resources—which encompass clinical insights, medical information, care guidelines, and self-management—help clinicians determine how to care for people with complex diseases or diagnose an illness in a timely manner. Our repository of clinical information available through PubMed ensures that those in need can access well-reasoned, recognized guiding principles for their care, and our MedlinePlus web resource provides patients and their families and friends with reliable, up-to-date health information to support and encourage healthy behavioral changes.

As in The Midnight Library, books alone do not inspire discovery, guide clinical care, or inform self-management. In Haig’s novel, a fictional librarian who knows the collection shows the main character how to select books by carefully listening to her goals and needs. It is the main character’s engagement with the books that helps her explore the lives she could have lived. At NLM, we too have librarians—located in Bethesda, Maryland, and around the country through NLM’s Network of the National Library of Medicine—who organize the library’s collections and guide patrons toward the best choice of resources. Our resources must be findable, accessible, interoperable, reusable, and actionable! And then, the person—scientist, clinician, patient—must actively engage with the material.

As we approach the future of data-powered health, guided by the NLM Strategic Plan (2017-2027), we will fulfill our mission to collect biomedical literature, organize it, preserve it, and make it accessible to the world. As the knowledge of health and biomedicine continues to grow faster than we can process, we will turn our attention to applying emerging tools, including machine learning and artificial intelligence, to make it easier to find our materials and more efficient to examine them. Through our Extramural Programs, we will continue to stimulate new ways of presenting information to scientists, clinicians, patients, and the public so they can explore possible lives to be lived and test out their promise of better health for society. What lives can we help you explore?

Request for Public Comment: National AI Research and Development Strategic Plan

This blog post by Lynne Parker, Director, National AI Initiative Office, and Rashida Richardson, Senior Policy Advisor for Data and Democracy, was originally posted on the White House OSTP blog.

We encourage you to read it and submit comments on the update to the National Artificial Intelligence Research and Development Strategic Plan by Friday March 4, 2022.

Artificial Intelligence (AI) is becoming more prevalent in all of our lives. It powers all kinds of tools, from the digital assistants that answer questions on your phone, to breakthroughs in reading X-rays to better spot cancers. The so-called “intelligence” is the result of powerful computers sorting through mountains of data to find patterns, using algorithms designed and optimized by computer scientists.

Like all technology, AI is far from perfect. As we have started using AI for consequential decisions, we have realized that while AI can improve decision making, it too often compounds historical patterns of bias and deepens existing inequality. AI’s reliance on biased data or design processes has led to systems that produce discriminatory, or otherwise harmful, outcomes.

The Office of Science and Technology Policy is engaged in understanding the extraordinary promise of AI as well as its pitfalls. OSTP’s National AI Initiative Office (NAIIO) helps coordinate Federal activities in AI across government. OSTP is co-chairing the National AI Research Resource Task Force to answer Congress’s call to propose a vision for equitably expanding the research community’s access to the computing power, data, and testbed resources necessary to do AI research. OSTP has issued a call for the development of an AI Bill of Rights, and is working closely with both domestic and international partners across bilateral and multilateral venues to advance development, adoption, and oversight of AI in a manner that aligns with our democratic values.

Given the transformative potential of AI, we know it is critical that the American public have a voice in how this technology is used and governed. In late 2020, we initiated a public engagement process that included public listening sessions, a request for information on AI-enabled biometric technology, and stakeholder engagement meetings. Today, our National AI Initiative Office, in coordination with the Networking and Information Technology Research and Development Program of the National Science and Technology Council, is seeking public comments about how we should revise the National Artificial Intelligence Research and Development Strategic Plan. First published in 2016 and updated in 2019, the National AI R&D Strategic Plan identifies scientific and technological needs for AI innovation and investment priorities for Federally-funded AI research. In preparation for the Congressionally mandated 2022 Strategic Plan, this request for information seeks input on the goals, priorities, and metrics that Federal agencies should use to guide AI research and development. 

OSTP’s mission is to “maximize the benefits of science and technology to advance health, prosperity, security, environmental quality, and justice for all Americans.” Our work in AI is intended to maximize its benefits while ensuring that AI-driven systems do not cause harm or impede our pursuit of American ideals.

Through DS-I Africa, NIH is Fostering a New Health Data Science Community

Guest post by Laura K. Povlich, PhD, Program Director at the NIH Fogarty International Center (FIC) and Tiffani B. Lash, PhD, Program Director for the NIH National Institutes of Biomedical Imaging and Bioengineering (NIBIB). They co-coordinate the DS-I Africa program with assistance from a trans-NIH Working Group that includes Patricia Brennan, RN, PhD, Director of NLM; Roger Glass, MPH, MD, PhD, Director of FIC; Joshua A. Gordon, MD, PhD, Director of the National Institute of Mental Health; and Bruce Tromberg, PhD, Director of NIBIB.

Advances in data science and data ecosystems that support the mission of the NIH are reshaping biomedical and behavioral research. Enhanced international data ecosystems not only have the potential to support improved healthcare and public health domestically but could also be transformative in low- and middle-income countries. As a step in realizing this potential, the NIH Common Fund established the Harnessing Data Science for Health Discovery and Innovation in Africa (DS-I Africa) program.

The purpose of this program is to leverage data science technologies and prior NIH investments to develop solutions to the continent’s most pressing medical and public health problems through a robust ecosystem of new partners from academic, government, and private sectors.

Building off the success of a virtual symposium in 2020, the NIH recently invested over $74 million over five years to support 19 DS-I Africa awards that will conduct research and training activities across the continent. The DS-I Africa Open Data Science Platform and Coordinating Center, led out of the University of Cape Town, will catalyze and support the unique continental network of health data scientists, innovators, and researchers that work across the DS-I Africa program. This award will coordinate the 19 awards as a collaborative research consortium that benefits from shared resources and knowledge. Additionally, the Open Data Science Platform will develop into a scalable gateway that aims to lower some of the barriers to collaboration by democratizing access to data and tools.

The DS-I Africa consortium includes African-led multidisciplinary and multisectoral research hubs with projects in several important areas such as anti-microbial resistance, SARS-CoV-2, climate change, mental health, multi-disease morbidity, and more. Research training programs will build the next generation of African health data scientists and innovators. Lastly, research projects on the ethical, legal, and social implications of health data science from an African perspective are a key component of DS-I Africa and will further the policy discussions of these issues on the continent. The consortium will expand throughout the life of the program with the goal of bringing in new partners through pilot projects and possibly other funding mechanisms.

We are excited to see the DS-I Africa consortium grow and to stay apprised of opportunities to connect with other data science communities around the world. Many funding organizations see the potential for data science to transform medicine and public health in Africa, and we hope additional investments will have a synergistic effect in strengthening the health data science ecosystem in Africa. For more information about the DS-I Africa research studies visit the Harnessing Data Science for Health Discovery and Innovation in Africa Funded Research page.

In addition to her work at the NIH FIC with DS-I Africa, Dr. Povlich also co-coordinates Human Heredity and Health in Africa (H3Africa), which is another NIH Common Fund program. She earned both a BSE in Materials Science and Engineering and a PhD in Macromolecular Science and Engineering from the University of Michigan. Dr. Povlich was previously a Science & Technology Policy Fellow for the American Association for the Advancement of Science (AAAS).

Dr. Lash is the Program Director for the NIH Rapid Acceleration of Diagnostics Tech and Advanced Technology Platforms initiative, NIH Technology Accelerator Challenge and the NIBIB Point of Care Technologies Research Network. Her research portfolio includes Point of Care Technologies and Digital Health, both with the goal of developing biomedical technologies through collaborative efforts that merge scientific and technological capabilities with clinical need. Dr. Lash has been selected as a science policy fellow for both the AAAS and the National Academy of Engineering. Dr. Lash earned her PhD in Physical Chemistry from North Carolina State University

NLM is a Leader in Using AI to Improve User Experiences

Guest post by Dianne Babski, Associate Director for Library Operations at NLM; Zhiyong Lu, PhD, Senior Investigator for the Computational Biology Branch at NLM’s National Center for Biotechnology Information (NCBI); and Donald Comeau, PhD, Staff Scientist at NCBI.

In January 2021, the Department of Health and Human Services (HHS) released their Artificial Intelligence (AI) Strategy to help agencies best use AI to advance the health and wellbeing of all Americans. NIH has long collaborated and invested in AI-based projects to discover health solutions across research and medical settings, including the analysis of biomedical imaging to diagnose diseases such as COVID-19.

For many years, NLM has been enthusiastic about the promise and possibilities of AI. You can learn more about NLM’s awareness of and use of AI through some of our recent Musings posts, including: Artificial Intelligence, Imaging, and the Promising Future of Medicine; How NIH is Using Artificial Intelligence to Improve Operations; and NIH Strategically, and Ethically Building a Bridge to AI.

In support of the HHS AI Strategy, we’d like to share a few examples of how NLM is using AI to revolutionize our products and services to enhance usability and discovery of biomedical information.

Image depicting the algorithm behind the Best Match Service.
Figure 1: Best Match algorithm depiction
Graphic by: Donald Bliss, NLM

Best Match is a relevance search algorithm for NLM’s PubMed – a free search engine for biomedical literature accessed by millions of users around the world every day. This AI technique leverages the intelligence of our users and cutting-edge machine-learning technology as an alternative to the traditional date sort order that appears in many traditional search engines. Trained with past user searches with dozens of relevance-ranking factors, the Best Match algorithm demonstrates state-of-the-art retrieval performance and an improved user experience. Best Match increases the effectiveness of PubMed searches across the rapidly growing collection of biomedical literature to help users efficiently find the most relevant and high-quality information they need.

SingleCite is another automated search algorithm designed to improve single citation searches in the PubMed database. It predicts the probability of a retrieved document being the target of a query based on predefined variables. This helps increase the effectiveness of PubMed searches by making a user’s search for a specific document in PubMed more successful.

Computed Author is a machine-learning method that solves for irrelevant retrieval results in PubMed due to author name ambiguity (where different authors share the same name). When users search based on author names, Computed Author uses an algorithm to sort out papers with multiple authors with the same name, and cluster at the top of your results the articles that are likely by the same author. Again, the result is increased effectiveness of PubMed searches supporting NLM’s mission to advance health research and discovery.

Fully automated Medical Subject Headings (MeSH) indexing in NLM’s flagship bibliographic database, MEDLINE, is one of our most recent AI advancements. Automated indexing has been under development at NLM for many years, and the most significant outcome is the development of the NLM Medical Text Indexer (MTI). The MTI algorithm has been undergoing refinements as we move towards automation, including incorporation of deep learning approaches to improve the application of MeSH subheadings, the incorporation of rules and triggers for the indexing of publication types, and the application of Indexing Method designation. Automated indexing greatly expedites the time needed to access MeSH indexing metadata and allows NLM to scale MeSH indexing for MEDLINE to the volume of published biomedical literature.

Gene and chemical indexing are also part of automated MEDLINE indexing efforts to improve literature retrieval and information access. Currently, gene and chemical indexing is performed manually by expert indexers. To assist this process, we are using advanced Natural Language Process and deep learning methods to develop NLM-Gene and NLM-Chem— automatic tools for finding gene and chemical names in the biomedical literature.

We are very excited about our efforts to leverage AI and the advances we have made. Looking forward, we will strive to lead AI innovation in partnership with HHS, NIH, and the broader research community to ensure that we continue to meet our mission to accelerate biomedical discovery and data-powered health.

Do you have ideas for how we can harness AI to improve our projects? What are some of the ways you are using AI to improve products and services?

Dianne Babski is responsible for overall management of one of NLM’s largest divisions with more than 450 staff who provide health information services to a global audience of health care professionals, researchers, administrators, students, historians, patients, and the public. She oversees budget, facilities, administration, and operations, including of a national network of more than 8,000 academic health science libraries, hospital and public libraries, and community organizations to improve access to health information.

Zhiyong Lu is a senior investigator (tenured) at the NLM Intramural Research Program, and he leads research in biomedical information retrieval, natural language processing, and machine learning. As NCBI’s Deputy Director for Literature Search, Dr. Lu directs the research and development efforts to improve PubMed search and information access. Over the years, Dr. Lu has co-authored around 300 scientific publications and mentored 40 trainees, many of whom have gone on to independent faculty/research positions. Dr. Lu is a fellow of the American College of Medical Informatics.

Donald Comeau is a Staff Scientist working in Dr. Lu’s Text Mining Research group at the NLM’s NCBI. His primary responsibilities include identifying key phrases in PMC articles and NCBI Bookshelf. His research projects focus on applying text mining, machine learning, and natural language processing techniques to improving access to NCBI’s biomedical literature collections. Dr. Comeau earned his PhD in Physical Chemistry at Ohio State University,

Help Us Modernize NIH’s Genomic Data Sharing Policy

Guest post by Taunton Paine, MA, Director of the Division of Scientific Data Sharing Policy, NIH Office of Science Policy

Behind the NIH Genomic Data Sharing Policy

In November 2021, NIH published a request for information seeking public input on the future of the NIH Genomic Data Sharing (GDS) Policy. Originally published in 2014, the NIH GDS Policy expanded and refined an existing framework for the broad and responsible sharing of genomic research data originally created for genome-wide association studies. Since this policy framework was first implemented, NIH has accepted data from more than 1,200 studies in the NIH database of Genotypes and Phenotypes (dbGaP) hosted by NLM and facilitated more than 64,000 additional research uses. Many more studies involving non-human data and human data with study participant consent for full public access have been shared as a result of the GDS Policy through a variety of additional NIH repositories, such as GenBank and the Sequence Read Archive, which are also hosted by NLM.

While the GDS Policy has been remarkably successful at spurring the timely, productive, and secure sharing of genomic data, NIH has devoted substantial effort to maintaining the relevance of this framework by issuing updates as needed. NIH has provided substantial guidance to account for trends in science, technology, and society. For example, the policy and related guidance evolved to accommodate a growing shift toward cloud computing in genomic research.

Evolving Priorities: Help Us Shape the Future of Genomic Data Sharing

In October 2020, NIH issued the Final NIH Policy for Data Management and Sharing. The final policy will be effective on January 25, 2023. To better align the GDS and the NIH Policy for Data Management and Sharing policies, NIH is soliciting input about proposed changes to the GDS policy. Described below are some of the key proposed issues for which NIH is seeking comment in the request for information.

The use of genomic data in research continues to evolve. Specifically, there is growing interest in the use of human data elements that might be considered identifiable, which cannot currently be submitted to NIH genomic data repositories, and in the ability to match participants’ data across repositories or with data from other sources. The request for information seeks comment on whether NIH should permit these activities, and if so, what additional protections may be necessary.

To reduce the technical burden of analyzing genomic data, NIH has developed additional resources for storing, sharing and analyzing human genomic data in addition to dbGaP, resulting in an increasingly federated landscape of platforms and repositories hosted by NIH and awardee institutions. To ensure consistency of operations and protections, NIH is proposing core principles for NIH-supported genomic data repositories and platforms.

NIH frequently receives questions about other types of high-dimensional “omics” data, such as microbiomic or proteomic data, which describes new and comprehensive approaches for analyzing molecular profiles of humans and other organisms. In some cases, non-genomic data types may pose similar risks of re-identification as large-scale genomic data but may not be subject to the GDS Policy in all scenarios. Furthermore, the GDS Policy may not apply even when genomic data are generated in some scenarios, such as for very small studies. As a longer-term consideration, NIH is soliciting views on whether the more specific sharing expectations of the GDS Policy or the protective framework it offers should be adjusted to account for these other data types or scenarios.

We are Listening!

We are working to ensure that the framework established by the GDS Policy keeps pace with the needs of the research enterprise, research participants, and the patients it is ultimately intended to benefit. This RFI may result in updates to the GDS Policy, related guidance, or implementation. That’s why we’re asking you, the community, for your input. Please visit the request for information page today; comments are due by February 28. We look forward to hearing your input and appreciate your efforts!

Taunton Paine, MA is the Director of the Scientific Data Sharing Policy Division in the Office of Science Policy at the NIH. Taunton has been with the Office of Science Policy since 2011. His division is responsible for issues relating to data sharing policy, including issuance of the recent NIH Data Management and Sharing Policy, oversight of the NIH Genomic Data Sharing Policy, and management of the Data Science Policy Council. He holds a dual master’s degree from Columbia University and the London School of Economics and Political Science where he studied the history of international relations

%d bloggers like this: