Midnight in the Library

Right now, I am reading The Midnight Library by Matt Haig. It’s a fanciful story of a woman in limbo between life and death who finds herself in a magical library, and each book represents one of the lives she could have lived had she made even one tiny different decision. She then finds herself in many of these lives, experiencing what could have been.

This book got me thinking about how NLM helps people experience lives that could be. I see this on two levels:

The first is the scientific pathway: What if . . . ? What if we knew more about the interactions between evolutionary forces and molecular constraints (like the work of Aravind Iyer, PhD), or fully appreciated the potential of proteins for genome engineering (like the discoveries made by Eugene Koonin, PhD), or could envision how and why proteins fold or switch their folds (as explored by Lauren Porter, PhD), or had the power to enable machines to understand human thought (like the research from Dina Demner-Fushman, MD, PhD). In addition to the discoveries by our NLM intramural researchers, our vast literature and data repositories hold answers that could change lives: why some genetic structures lead to human characteristics, or why a certain biochemical compound helps prevent infection. We help scientists discover these pathways and connections by providing them with the tools to uncover what could be.

The second is how NLM helps people see their what if using the amazing richness of the resources that we make available through our collections. Our resources—which encompass clinical insights, medical information, care guidelines, and self-management—help clinicians determine how to care for people with complex diseases or diagnose an illness in a timely manner. Our repository of clinical information available through PubMed ensures that those in need can access well-reasoned, recognized guiding principles for their care, and our MedlinePlus web resource provides patients and their families and friends with reliable, up-to-date health information to support and encourage healthy behavioral changes.

As in The Midnight Library, books alone do not inspire discovery, guide clinical care, or inform self-management. In Haig’s novel, a fictional librarian who knows the collection shows the main character how to select books by carefully listening to her goals and needs. It is the main character’s engagement with the books that helps her explore the lives she could have lived. At NLM, we too have librarians—located in Bethesda, Maryland, and around the country through NLM’s Network of the National Library of Medicine—who organize the library’s collections and guide patrons toward the best choice of resources. Our resources must be findable, accessible, interoperable, reusable, and actionable! And then, the person—scientist, clinician, patient—must actively engage with the material.

As we approach the future of data-powered health, guided by the NLM Strategic Plan (2017-2027), we will fulfill our mission to collect biomedical literature, organize it, preserve it, and make it accessible to the world. As the knowledge of health and biomedicine continues to grow faster than we can process, we will turn our attention to applying emerging tools, including machine learning and artificial intelligence, to make it easier to find our materials and more efficient to examine them. Through our Extramural Programs, we will continue to stimulate new ways of presenting information to scientists, clinicians, patients, and the public so they can explore possible lives to be lived and test out their promise of better health for society. What lives can we help you explore?

Request for Public Comment: National AI Research and Development Strategic Plan

This blog post by Lynne Parker, Director, National AI Initiative Office, and Rashida Richardson, Senior Policy Advisor for Data and Democracy, was originally posted on the White House OSTP blog.

We encourage you to read it and submit comments on the update to the National Artificial Intelligence Research and Development Strategic Plan by Friday March 4, 2022.

Artificial Intelligence (AI) is becoming more prevalent in all of our lives. It powers all kinds of tools, from the digital assistants that answer questions on your phone, to breakthroughs in reading X-rays to better spot cancers. The so-called “intelligence” is the result of powerful computers sorting through mountains of data to find patterns, using algorithms designed and optimized by computer scientists.

Like all technology, AI is far from perfect. As we have started using AI for consequential decisions, we have realized that while AI can improve decision making, it too often compounds historical patterns of bias and deepens existing inequality. AI’s reliance on biased data or design processes has led to systems that produce discriminatory, or otherwise harmful, outcomes.

The Office of Science and Technology Policy is engaged in understanding the extraordinary promise of AI as well as its pitfalls. OSTP’s National AI Initiative Office (NAIIO) helps coordinate Federal activities in AI across government. OSTP is co-chairing the National AI Research Resource Task Force to answer Congress’s call to propose a vision for equitably expanding the research community’s access to the computing power, data, and testbed resources necessary to do AI research. OSTP has issued a call for the development of an AI Bill of Rights, and is working closely with both domestic and international partners across bilateral and multilateral venues to advance development, adoption, and oversight of AI in a manner that aligns with our democratic values.

Given the transformative potential of AI, we know it is critical that the American public have a voice in how this technology is used and governed. In late 2020, we initiated a public engagement process that included public listening sessions, a request for information on AI-enabled biometric technology, and stakeholder engagement meetings. Today, our National AI Initiative Office, in coordination with the Networking and Information Technology Research and Development Program of the National Science and Technology Council, is seeking public comments about how we should revise the National Artificial Intelligence Research and Development Strategic Plan. First published in 2016 and updated in 2019, the National AI R&D Strategic Plan identifies scientific and technological needs for AI innovation and investment priorities for Federally-funded AI research. In preparation for the Congressionally mandated 2022 Strategic Plan, this request for information seeks input on the goals, priorities, and metrics that Federal agencies should use to guide AI research and development. 

OSTP’s mission is to “maximize the benefits of science and technology to advance health, prosperity, security, environmental quality, and justice for all Americans.” Our work in AI is intended to maximize its benefits while ensuring that AI-driven systems do not cause harm or impede our pursuit of American ideals.

NLM is a Leader in Using AI to Improve User Experiences

Guest post by Dianne Babski, Associate Director for Library Operations at NLM; Zhiyong Lu, PhD, Senior Investigator for the Computational Biology Branch at NLM’s National Center for Biotechnology Information (NCBI); and Donald Comeau, PhD, Staff Scientist at NCBI.

In January 2021, the Department of Health and Human Services (HHS) released their Artificial Intelligence (AI) Strategy to help agencies best use AI to advance the health and wellbeing of all Americans. NIH has long collaborated and invested in AI-based projects to discover health solutions across research and medical settings, including the analysis of biomedical imaging to diagnose diseases such as COVID-19.

For many years, NLM has been enthusiastic about the promise and possibilities of AI. You can learn more about NLM’s awareness of and use of AI through some of our recent Musings posts, including: Artificial Intelligence, Imaging, and the Promising Future of Medicine; How NIH is Using Artificial Intelligence to Improve Operations; and NIH Strategically, and Ethically Building a Bridge to AI.

In support of the HHS AI Strategy, we’d like to share a few examples of how NLM is using AI to revolutionize our products and services to enhance usability and discovery of biomedical information.

Image depicting the algorithm behind the Best Match Service.
Figure 1: Best Match algorithm depiction
Graphic by: Donald Bliss, NLM

Best Match is a relevance search algorithm for NLM’s PubMed – a free search engine for biomedical literature accessed by millions of users around the world every day. This AI technique leverages the intelligence of our users and cutting-edge machine-learning technology as an alternative to the traditional date sort order that appears in many traditional search engines. Trained with past user searches with dozens of relevance-ranking factors, the Best Match algorithm demonstrates state-of-the-art retrieval performance and an improved user experience. Best Match increases the effectiveness of PubMed searches across the rapidly growing collection of biomedical literature to help users efficiently find the most relevant and high-quality information they need.

SingleCite is another automated search algorithm designed to improve single citation searches in the PubMed database. It predicts the probability of a retrieved document being the target of a query based on predefined variables. This helps increase the effectiveness of PubMed searches by making a user’s search for a specific document in PubMed more successful.

Computed Author is a machine-learning method that solves for irrelevant retrieval results in PubMed due to author name ambiguity (where different authors share the same name). When users search based on author names, Computed Author uses an algorithm to sort out papers with multiple authors with the same name, and cluster at the top of your results the articles that are likely by the same author. Again, the result is increased effectiveness of PubMed searches supporting NLM’s mission to advance health research and discovery.

Fully automated Medical Subject Headings (MeSH) indexing in NLM’s flagship bibliographic database, MEDLINE, is one of our most recent AI advancements. Automated indexing has been under development at NLM for many years, and the most significant outcome is the development of the NLM Medical Text Indexer (MTI). The MTI algorithm has been undergoing refinements as we move towards automation, including incorporation of deep learning approaches to improve the application of MeSH subheadings, the incorporation of rules and triggers for the indexing of publication types, and the application of Indexing Method designation. Automated indexing greatly expedites the time needed to access MeSH indexing metadata and allows NLM to scale MeSH indexing for MEDLINE to the volume of published biomedical literature.

Gene and chemical indexing are also part of automated MEDLINE indexing efforts to improve literature retrieval and information access. Currently, gene and chemical indexing is performed manually by expert indexers. To assist this process, we are using advanced Natural Language Process and deep learning methods to develop NLM-Gene and NLM-Chem— automatic tools for finding gene and chemical names in the biomedical literature.

We are very excited about our efforts to leverage AI and the advances we have made. Looking forward, we will strive to lead AI innovation in partnership with HHS, NIH, and the broader research community to ensure that we continue to meet our mission to accelerate biomedical discovery and data-powered health.

Do you have ideas for how we can harness AI to improve our projects? What are some of the ways you are using AI to improve products and services?

Dianne Babski is responsible for overall management of one of NLM’s largest divisions with more than 450 staff who provide health information services to a global audience of health care professionals, researchers, administrators, students, historians, patients, and the public. She oversees budget, facilities, administration, and operations, including of a national network of more than 8,000 academic health science libraries, hospital and public libraries, and community organizations to improve access to health information.

Zhiyong Lu is a senior investigator (tenured) at the NLM Intramural Research Program, and he leads research in biomedical information retrieval, natural language processing, and machine learning. As NCBI’s Deputy Director for Literature Search, Dr. Lu directs the research and development efforts to improve PubMed search and information access. Over the years, Dr. Lu has co-authored around 300 scientific publications and mentored 40 trainees, many of whom have gone on to independent faculty/research positions. Dr. Lu is a fellow of the American College of Medical Informatics.

Donald Comeau is a Staff Scientist working in Dr. Lu’s Text Mining Research group at the NLM’s NCBI. His primary responsibilities include identifying key phrases in PMC articles and NCBI Bookshelf. His research projects focus on applying text mining, machine learning, and natural language processing techniques to improving access to NCBI’s biomedical literature collections. Dr. Comeau earned his PhD in Physical Chemistry at Ohio State University,

Help Us Modernize NIH’s Genomic Data Sharing Policy

Guest post by Taunton Paine, MA, Director of the Division of Scientific Data Sharing Policy, NIH Office of Science Policy

Behind the NIH Genomic Data Sharing Policy

In November 2021, NIH published a request for information seeking public input on the future of the NIH Genomic Data Sharing (GDS) Policy. Originally published in 2014, the NIH GDS Policy expanded and refined an existing framework for the broad and responsible sharing of genomic research data originally created for genome-wide association studies. Since this policy framework was first implemented, NIH has accepted data from more than 1,200 studies in the NIH database of Genotypes and Phenotypes (dbGaP) hosted by NLM and facilitated more than 64,000 additional research uses. Many more studies involving non-human data and human data with study participant consent for full public access have been shared as a result of the GDS Policy through a variety of additional NIH repositories, such as GenBank and the Sequence Read Archive, which are also hosted by NLM.

While the GDS Policy has been remarkably successful at spurring the timely, productive, and secure sharing of genomic data, NIH has devoted substantial effort to maintaining the relevance of this framework by issuing updates as needed. NIH has provided substantial guidance to account for trends in science, technology, and society. For example, the policy and related guidance evolved to accommodate a growing shift toward cloud computing in genomic research.

Evolving Priorities: Help Us Shape the Future of Genomic Data Sharing

In October 2020, NIH issued the Final NIH Policy for Data Management and Sharing. The final policy will be effective on January 25, 2023. To better align the GDS and the NIH Policy for Data Management and Sharing policies, NIH is soliciting input about proposed changes to the GDS policy. Described below are some of the key proposed issues for which NIH is seeking comment in the request for information.

The use of genomic data in research continues to evolve. Specifically, there is growing interest in the use of human data elements that might be considered identifiable, which cannot currently be submitted to NIH genomic data repositories, and in the ability to match participants’ data across repositories or with data from other sources. The request for information seeks comment on whether NIH should permit these activities, and if so, what additional protections may be necessary.

To reduce the technical burden of analyzing genomic data, NIH has developed additional resources for storing, sharing and analyzing human genomic data in addition to dbGaP, resulting in an increasingly federated landscape of platforms and repositories hosted by NIH and awardee institutions. To ensure consistency of operations and protections, NIH is proposing core principles for NIH-supported genomic data repositories and platforms.

NIH frequently receives questions about other types of high-dimensional “omics” data, such as microbiomic or proteomic data, which describes new and comprehensive approaches for analyzing molecular profiles of humans and other organisms. In some cases, non-genomic data types may pose similar risks of re-identification as large-scale genomic data but may not be subject to the GDS Policy in all scenarios. Furthermore, the GDS Policy may not apply even when genomic data are generated in some scenarios, such as for very small studies. As a longer-term consideration, NIH is soliciting views on whether the more specific sharing expectations of the GDS Policy or the protective framework it offers should be adjusted to account for these other data types or scenarios.

We are Listening!

We are working to ensure that the framework established by the GDS Policy keeps pace with the needs of the research enterprise, research participants, and the patients it is ultimately intended to benefit. This RFI may result in updates to the GDS Policy, related guidance, or implementation. That’s why we’re asking you, the community, for your input. Please visit the request for information page today; comments are due by February 28. We look forward to hearing your input and appreciate your efforts!

Taunton Paine, MA is the Director of the Scientific Data Sharing Policy Division in the Office of Science Policy at the NIH. Taunton has been with the Office of Science Policy since 2011. His division is responsible for issues relating to data sharing policy, including issuance of the recent NIH Data Management and Sharing Policy, oversight of the NIH Genomic Data Sharing Policy, and management of the Data Science Policy Council. He holds a dual master’s degree from Columbia University and the London School of Economics and Political Science where he studied the history of international relations

Turning Talent into Treasure

One of NLM’s greatest assets is its talented, creative workforce. Last year, NIH called on its 27 Institutes and Centers to step up to mount an effective response to COVID-19. Supported by Congress, NIH invested more than $2 billion to ensure rapid access to COVID-19 testing for everyone in the United States — funding research to accelerate access to vaccines and therapeutics and leveraging existing clinical trials and electronic health record data to characterize, monitor, and treat the long-term sequalae of COVID-19 infections.

How is NLM supporting NIH’s COVID-19 response? Well, not surprisingly, our literature and genomic repositories are key to inspiring new research and providing the reference annotated genomes used to evaluate the SARS-CoV-2 virus and help discern its variants. Our Network of the National Library of Medicine (NNLM) gives NLM a face in communities across the United States, providing trustable, community-specific health information and increasing community engagement in NIH research programs. Our researchers are developing new analytic tools to more efficiently interpret medical images and refine the taxonomy of viruses so the properties of related viruses can be better understood. All of these activities draw on the talents of our almost 1,700 staff and the extensive partnerships we have with collaborators within the government and across the country. But it’s our special knowledge of data science, library science, and informatics that is making it possible for NIH to set up many new research programs with systematic attention to data coordination, data reuse, and data integration.

I want to highlight the talents of people working diligently across NIH. When NIH receives congressional funding for new programs or innovative research, a lot of work happens behind the scenes before these funds are awarded to investigators. Program announcements are written, solicitations offered, proposals received and reviewed, and awards made. Each of these steps requires an enormous amount of human effort. NIH has staff engaged in all of these activities for our typical programs and standard research mechanisms. To date, NIH received almost $4.9 billion to fight COVID, which is about 8.8% of the NIH’s total budget of nearly $43 billion for fiscal year 2021. NIH efforts to address COVID required a legion of staff members to refocus their regular priorities to participate in this emergency response. The contributions of NLM staff in this effort were amazing, with nearly 50 people from NLM stepping up to help write funding announcements, participate in reviews, and/or managing the awards process.

In particular, I want to elevate the work of three of our NLM staff who have made significant contributions to this effort. Yanli Wang, PhD, is a program officer in our Division of Extramural Programs. Because of her expertise in data science and training in chemistry, Dr. Wang was detailed to the RADx Radical (RADx-rad) program. RADx-rad is supporting innovative approaches, including rapid detection devices and home-based testing technologies, that will address current gaps in COVID-19 testing and extend existing approaches to make them more usable, accessible, or accurate. Dr. Wang serves as the program officer for the Discoveries and Data Coordinating Center and is working to provide programmatic stewardship and make sure that data across all studies is collected in a systematic manner that fosters data integration and data reuse. A critical aspect of Dr. Wang’s work is fostering the uses of common data elements across the projects and over time.

Two NLM staff members support NIH’s Researching COVID to Enhance Recovery or RECOVER Initiative. RECOVER is studying the post-acute experiences of the estimated 10% to 30% of people who contract COVID-19 and continue to experience a range of symptoms. Amanda J. Wilson, Chief of NLM’s Office of Engagement and Training, is our representative to the RECOVER Initiative executive and coordinating committee. In this role she helps prepare the many funding announcements that stimulate research or reuse of clinical data to best understand this complex problem. Ms. Wilson leverages the extensive resource of the NNLM in support of community-based education and support of the COVID-19 crisis.

Another NLM staffer supporting the RECOVER Initiative is Paul Fontelo. In addition to his roles in training and research in NLM’s Intramural Research Program, Dr. Fontelo is a pathologist by training. He provides specialized expertise to the Autopsy Cohort Studies to identify tissue injury due to SARS-COV-2 infection, delivers technical direction to awardees, and approves certain deliverables and reports as required. He also participates in the application reviews of the Autopsy Cohort and the Mobile/Digital Health platform and is a member of the Post-Acute Sequelae of SARS-CoV-2 Executive Coordination Committee.

I’m grateful to these colleagues, and many more across NLM, who are going above and beyond their usual job responsibilities to help NIH step up to the challenges of the COVID-19 pandemic! Join me in thanking them for their efforts and using the talents of the NLM to create invaluable treasures for NIH!

%d bloggers like this: