The Next Normal: Supporting Biomedical Discovery, Clinical Practice, and Self-Care

As we start year three of the COVID-19 pandemic, it’s time for NLM to take stock of the parts of our past that will support the next normal and what we might need to change as we continue to fulfill our mission to acquire, collect, preserve, and disseminate biomedical literature to the world.

Today, I invite you to join me in considering the assumptions and presumptions we made about how scientists, clinicians, librarians and patients are using critical NLM resources and how we might need to update those assumptions to meet future needs. I will give you a hint… it’s not all bad—in fact, I find it quite exciting!

Let’s highlight some of our assumptions about how people are using our services, at least from my perspective. We anticipated the need for access to medical literature across the Network of the National Library of Medicine and created DOCLINE, an interlibrary loan request routing system that quickly and efficiently links participating libraries’ journal holdings. We also anticipated that we were preparing the literature and our genomic databases for humans to read and peruse. Now we’re finding that more than half of the accesses to NLM resources are generated and driven by computers through application programming interfaces. Even our MedlinePlus resource for patients now connects tailored electronic responses through MedlinePlus Connect to computer-generated queries originating in electronic health records.

Perhaps, and most importantly, we realize that while sometimes the information we present is actually read by a living person, other times the information we provide—for example, about clinical trials (ClinicalTrials.gov) or genotype and phenotype data (dbGaP)—is actually processed by computers! Increasingly, we provide direct access to the raw, machine-readable versions of our resources so those versions can be entered into specialized analysis programs, which allow natural-language processing programs to find studies with similar findings or machine-learning models to determine the similarities between two gene sequences. For example, NLM makes it possible for advocacy groups to download study information from all ClinicalTrials.gov records so anyone can use their own programs to point out trials that may be of interest to their constituents or to compare summaries of research results for related studies.

Machine learning and artificial intelligence have progressed to the point that they perform reasonably well in connecting similar articles—to this end, our LitCovid open-resource literature hub has served as an electronic companion to the human curation of coronavirus literature. NLM’s LitCovid is more efficient and has a sophisticated search function to create pathways that are more relevant and are more likely to curate articles that fulfill the needs of our users. Most importantly, innovations such as LitCovid help our users manage the vast and ever-growing collection of biomedical literature, now numbering more than 34 million citations in NLM’s PubMed, the most heavily used biomedical literature citation database.

Partnerships are a critical asset to bring biomedical knowledge into the hands (and eyes) of those who need it. Over the last decade, NLM moved toward a new model for managing citation data in PubMed. We released the PubMed Data Management system that allows publishers to quickly update or correct nearly all elements of their citations and that accelerates the delivery of correct and complete citation data to PubMed users.

As part of the MEDLINE 2022 Initiative, NLM transitioned to automated Medical Subject Headings (MeSH) indexing of MEDLINE citations in PubMed. Automated MeSH indexing significantly decreases the time for indexed citations to appear in PubMed without sacrificing the quality MEDLINE is known to provide. Our human indexers can focus their expertise on curation efforts to validate assigned MeSH terms, thereby continuously improving the automated indexing algorithm and enhancing discoverability of gene and chemical information in the future.

We’re already preparing for the next normal—what do you think it will be like?

I envision making our vast resources increasingly available to those who need them and forging stronger partnerships that improve users’ ability to acquire and understand knowledge. Imagine a service, designed and run by patients, that could pull and synthesize the latest information about a disease, recommendations for managing a clinical issue, or help a young investigator better pinpoint areas ripe for new interrogation! The next normal will make the best use of human judgment and creativity by selecting and organizing relevant data to create a story that forms the foundation of new inquiry or the basis of new clinical care. Come along and help us co-create the next normal!

Meet the NLM Investigators: For Sameer Antani, PhD, Seeing is More Than Meets the Eye

It’s time for another round of introductions! Many of you may already know Sameer Antani, PhD—one of NLM’s most decorated and prestigious investigators—from his many awards and accolades. In March 2022, he was inducted into the American Institute for Medical and Biological Engineering’s College of Fellows, an impressive group that represents the top two percent of medical and biological engineers. This distinction is one of the highest honors that can be bestowed upon a medical and biological engineer. Can you tell we are proud of him?!  

We selected Dr. Antani to join our NLM family after a nationwide, competitive search, and his genius was readily apparent from the start. Dr. Antani’s career spans over two decades, during which he developed an innovative research portfolio focused on machine learning and artificial intelligence (AI). His lab at NLM focuses on using these tools to analyze enormous sets of biomedical data. Through this analysis, AI technology can “learn” to detect disease and assist health care professionals provide more efficient diagnoses. Examples of Dr. Antani’s work can be found in mobile radiology vehicles, which allow professionals to take chest X-rays and screen for HIV and tuberculosis using software containing algorithms developed in his lab. Check out the infographic below to learn more about the exciting research happening in Dr. Antani’s lab.

Infographic titled: Seeing is more than meets the eye. Under the title the investigator's name, title and division are listed as: Sameer Antani, PhD, Investigator, Computational Health. 

The first column of the infographic is titled: Projects. Two bullets are listed in the first column. The first bullet reads: Discovering the impact of data on automated AI and machine learning (AI/ML) processes on diagnostics. The second bullet says: Improving AI/ML algorithm decisions to be consistent, reproducible, portable, explainable, unbiased, and representative of severity.

The second column is titled: Process. The first bullet in this column reads: Using images and videos alongside AIML technology to identify and diagnose:
Cancers: Cervical, Oral, Skin (Kaposi Sarcoma)
Cardiomyopathy 
Cardiopulmonary diseases. 
The second bullet reads: Analyzing a variety of image types, including:
Computerized Tomography (CT), Magnetic Resonance Imaging (MRI), X-ray, ultrasound, photos, videos, microscopy. 

The third and final column in the infographic is titled: What It Looks Like. In this column there are four images of chest x-rays illustrating the detection of HIV and TB.

Now, in his own words, learn more about what makes Dr. Antani’s work so important!

What makes your team unique? Tell us more about the people working in your lab.   

The postdoctoral research fellows, long-term staff scientists, and research scientists on my team explore challenging computational health topics while simultaneously advancing topics in machine learning for medical imaging. Dr. Ghada Zamzmi, Dr. Peng Guo, and Dr. Feng Yang bring expertise and drive to our lab. The scientists on my team, Dr. Zhiyun (Jaylene) Xue and Dr. Sivarama Krishnan Rajaraman, add over two decades of combined research and mentoring experience.  

What do you enjoy about working at NLM?  

There are many positives about working at NLM. At the top of the list is the encouragement and support to explore cutting-edge problems in medical informatics, data science, and machine intelligence, among other initiatives. 

What is your advice for young scientists or people interested in pursuing a career in research?  

I urge young scientists to recognize the power of multidisciplinary teams. I would also urge them to develop skills to clearly communicate their goals and research interests with colleagues who might be from a different domain so they can effectively collaborate and arrive at mutually beneficial results. 

Where is your favorite place to travel?

I like to travel to places that exhibit the natural wonders of our planet. I hope to visit all our national parks someday. 

When you’re not in the lab, what do you enjoy doing?

I am studying and exploring different aspects of music structure.

You’ve read his words, and now you can hear him for yourself! Follow our NLM YouTube page for more exciting content from the NLM staff that make it all possible. If you’d like to learn more about our IRP program, view job opportunities, and explore research highlights, I invite you to explore our recently redesigned NLM IRP webpage.

YouTube: Sameer Antani and Artificial Intelligence

Transcript: [Antani]: I went to school for computer engineering in India. I’ve worked with image processing, computer vision, pattern recognition, machine learning. So my world was filled with developing algorithms that could extract interesting objects from images and videos. Pattern recognition is a family of techniques that looks for particular pixel characteristics or voxel characteristics inside an image and learns to recognize those objects. Deep learning is a way of capturing the knowledge inside an image and encapsulating it, and then researchers like me spend time advancing newer deep-learning networks that look more broadly into an image, recognizing these objects—recognizing organs, in my case, and diseases—and converting those visuals into numerical risk predictors that could be used by clinicians.

So my research is currently in three very different areas. One area looks at cervical cancer. A machine could look at the images and be a very solid predictor of the risk to the woman of developing cervical precancer, encouraging early treatment. Another area I work with [is] sickle cell disease. One of the risk factors in sickle cell disease is cardiac myopathy or cardiac muscle disease, which leads to stroke and perhaps even death. Looking at cardiac echo videos and using AI to be a solid predictor, along with other blood lab tests, improves the chances of survival.

A third area that I’m interested in is understanding the expression of tuberculosis [TB] in chest X-rays, particularly for children and those that are HIV-positive. The expression of disease in that subpopulation is very different from adults with TB who are not HIV positive. Every clinician has seen a certain number of patients in their clinical training. They perhaps have spent more time at hospitals or clinical centers, been exposed to a certain population, and they become very adept at that population. Machines, on the other hand, could be trained on data that is free of bias, from different parts of the world, different ethnicities, different age groups, so that there’s an improved caregiving and therefore, a better expectation on treatment and care.

Note: Transcript was modified for clarity.

Bridging the Resource Divide for Artificial Intelligence Research

This blog post is by Lynne Parker, Director, National AI Initiative Office and was originally posted on the White House Office of Science and Technology Policy blog. The Office of Science and Technology Policy and the National Science Foundation are seeking comments on the initial findings and recommendations contained in the interim report of the National Artificial Intelligence Research Resource (NAIRR) Task Force (“Task Force”) and particularly on potential approaches to implement those recommendations. We encourage you to read the RFI and submit comments on Implementing Initial Findings and Recommendations of the National Artificial Intelligence Research Resource Task Force by June 30, 2022.

Artificial Intelligence (AI) is transforming our world. The field is an engine of innovation that is already driving scientific discovery, economic growth, and new jobs. AI is an integral component of solutions ranging from those that tackle routine daily tasks to societal-level challenges, while also giving rise to new challenges necessitating further study and action. Most Americans already interact with AI-based systems on a daily basis, such as those that help us find the best routes to work and school, select the items we buy, and ask our phones to remind us of upcoming appointments.

Once studied by few, AI courses are now among the most popular across America’s universities. AI-based companies are being founded and scaled at a rapid rate. Worldwide AI-related research publications and patent applications continue to climb. 

However, this growth in the importance of AI to our future and the size of the AI community obscures the reality that the pathways to participate in AI research and development (R&D) often remain limited to those with access to certain essential resources. Progress at the current frontiers of AI is often tied to the use of large volumes of advanced computational power and data, and access to those resources today is too often limited to large technology companies and well-resourced universities. Consequently, the breadth of ideas and perspectives incorporated into AI innovations can be limited and lead to the creation of systems that perpetuate biases and other systemic inequalities.

This growing resource divide has the potential to adversely skew our AI research ecosystem, and in the process, threaten our Nation’s ability to cultivate an AI research community and workforce that reflects America’s rich diversity – and harness AI in a manner that serves all Americans. To prevent unintended consequences or disparate impacts from the use of AI, it matters who is doing the AI research and development.

Established in June 2021 pursuant to the National AI Initiative Act of 2020, the National AI Research Resource (NAIRR) Task Force has been seeking to address this resource divide. As a Congressionally-chartered Federal advisory committee, the NAIRR Task Force has been developing a plan for the establishment of a National AI Research Resource that would democratize access to AI R&D for America’s researchers and students. The NAIRR is envisioned as a broadly available and federated collection of resources, including computational infrastructure, public- and private-sector data, and testbeds. These resources would be made easily accessible in a manner that protects privacy, with accompanying educational tools and user support to facilitate their use. An important element of the NAIRR will be the expertise to design, deploy, federate, and operate these resources.

Since its establishment, the Task Force has held 7 public meetings, engaged with 39 experts on a wide range of aspects related to the design of the NAIRR, and considered 84 responses from the public to a request for information (RFI). Materials from all public meetings and responses to the RFI can be found at www.AI.gov/nairrtf.

Today, as co-chair of the Task Force and as part of OSTP’s broader work to advance the responsible research, development, and use of AI, I am proud to announce the submission of the interim report of the NAIRR Task Force to the President and Congress. This report lays out a vision for how this national cyberinfrastructure could be structured, designed, operated, and governed to meet the needs of America’s research community. In the report, the Task Force presents an approach to establishing the NAIRR that builds on existing and future Federal investments; designs in protections for privacy, civil rights, and civil liberties; and promotes diversity and equitable access. It details how the NAIRR should support the full spectrum of AI research – from foundational to use-inspired to translational – by providing opportunities for students and researchers to access resources that would otherwise be out of their reach. The vision laid out in this interim report is the first step towards a more equitable future for AI R&D in America – a future where innovation can flourish and the promise of AI can be realized in a way that works for all Americans.

Going forward, the Task Force will develop a roadmap for achieving the vision defined in the interim report. This implementation roadmap is planned for release as the final report of the Task Force at the end of this year. To inform this work, we are asking for feedback from the public on the findings and recommendations presented in the interim report as well as how those recommendations could be effectively implemented. Public responses to this request for information will be accepted through June 30, 2022. In addition, OSTP and the National Science Foundation will host a public listening session on June 23 to provide additional means for public input. Please see here for more information on how to participate.

If successful, the NAIRR would transform the U.S. national AI research ecosystem by strengthening and democratizing foundational, use-inspired, and translational AI R&D in the United States. The interim report of the NAIRR Task Force being released today represents a first step towards this future, putting forward a vision for the NAIRR for public comment and feedback.

We Can’t Go It Alone!

In February, I received the Miles Conrad Award from the National Information Standards Organization (NISO). NISO espouses a wonderful vision: “. . . a world where all can benefit from the unfettered exchange of information.” As the Director of the National Library of Medicine (NLM), this is music to my ears.

Standards are essential to NLM’s mission! Standards bring structure to information, assure common understanding, and make the products of scientific efforts—including literature and data—easier to discover. NLM’s efforts are devoted to the creation, dissemination, and use of terminology and messaging standards. These efforts include attaching indexing terms to citations in PubMed, our biomedical literature database housing over 34 million citations; using reference models to describe genome sequences; and serving as the HHS repository for the clinical terminologies needed to support health care delivery. NLM improves health and accelerates biomedical discovery by advancing the availability and use of standards. Standards are dynamic tools that must capture the context of biomedicine and health care at a given moment yet reflect the scientific development and changes in community vernacular.

By their very nature, standards create consensus across two or more parties on how to properly name, structure, or label phenomena. No single entity can create a standard all by itself! Standards are effective because they shape the conversation between and among entities, achieving a common goal by drawing on a common representation.

NLM alone cannot create, promulgate, or enforce standards. We work in partnership with professional societies, standards development organizations, and other federal entities, including the Office of the National Coordinator for Health Information Technology, to foster interoperability of clinical data. We support the development and distribution of SNOMED CT (the Systematized Nomenclature of Medicine – Clinical Terms) and the specific extension of SNOMED in the United States. We developed the MeSH (Medical Subject Headings) thesaurus, a controlled vocabulary used to index articles in PubMed. We also support the development and distribution of LOINC (the Logical Observation Identifiers Names and Codes), a common language—that is, a set of identifiers names and codes—used to identify health measurements, observations, and documents. Finally, we maintain RxNorm, a normalized naming system for generic and branded drugs and their uses, to support message exchanges across pharmacy management and drug interaction software.

Partnerships help us create and deploy standard ways to make scientific literature discoverable and accessible. To this end, we were instrumental in the adoption of NISO’s JATS (Journal Article Tag Suite), an XML format for describing the content of published articles, which we encourage journals to use when submitting citations to PubMed so users can efficiently search the literature and articles as they are described. MeSH RDF (Resource Description Framework) is a linked data representation of the MeSH vocabulary on the web, and the BIBFRAME (Bibliographic Framework) Initiative—a data exchange format initiated by the Library of Congress—adds MeSH RDF URIs (Uniform Resource Identifiers) to link data that will support complete bibliographic descriptions and foster resource sharing across the web and through the networked world.

Standards provide the resources necessary to understand complex phenomena and share scientific insights. Leveraging partnerships in order to develop and deploy these standards both allows efficiencies and produces a more connected, interoperable, understandable world of knowledge. Given the speed at which biomedical knowledge is growing, leveraging these partnerships assures that the institutions charged with acquiring and disseminating all the knowledge relevant to biomedicine and health can successfully and effectively meet their missions.

Gearing Up for 2023 Part II: Implementing the NIH Data Management and Sharing Policy

This blog post is by Lyric Jorgenson, PhD, the Acting Director of the NIH Office of Science Policy. It was originally posted on May 12 on the NIH Office of Science Policy Under the Poliscope blog. We encourage you to read it and submit comments and feedback on the draft supplemental information to the NIH Policy for Data Management and Sharing: Protecting Privacy When Sharing Human Research Participant Data by June 27.

Sequels are all the rage these days.  I figure if Marvel can make endless “Avengers” movies, I could start making blog sequels.  Back in the beginning of the year, I wrote Part I of this blog series about how NIH is working to implement the new NIH Data Management and Sharing Policy (DMS Policy).  I mentioned at that time that additional resources were forthcoming.

I should note that when we started to receive comments on what was to become the NIH DMS Policy, one thing in particular stood out to us.  Many commentors told us it would be helpful to have clear information on how to protect the privacy and respect the autonomy of participants when sharing data.  Now, we all know that cliffhangers build anticipation, so without further delay, I want to share with you some of the tools NIH has been working on to answer that call.

First, if you have seen the Avengers movies, you likely will have noticed that they tend to introduce a new villain that the team needs to battle with either new tools (think of OSP with Thor’s Stormbreaker axe) or the help of new superheroes like Captain Marvel. While not exactly a new villain, the lack of consistent consent language to facilitate secondary research with data and biospecimens is certainly a challenge many of our stakeholders have raised and one that we thought we could help address.

NIH has a long history of developing consent language and, as such, our team worked across the agency – and with you! – to develop a new resource that shares best practices for developing informed consents to facilitate data/biospecimen storage and sharing for future use.  It also provides modifiable sample language that investigators and IRBs can use to assist in the clear communication of potential risks and benefits associated with data/biospecimen storage and sharing.  In developing this resource, we engaged with key federal partners, as well as scientific societies and associations.  Importantly, we also considered the 102 comments from stakeholders in response to a RFI that we issued in 2021.

As for our second resource, we are requesting public comment on protecting the privacy of research participants when data is shared. I think I need to be upfront and acknowledge that we have issued many of these types of requests over the last several months and NIH understands the effort that folks take to thoughtfully respond.  With that said, we think the research community will greatly benefit from this resource and we want to hear your thoughts on whether it hits the mark or needs adjustment.

When reviewing the document, please bear in mind that the main purpose is to provide researchers with information on:

  • Operational Principles for Protecting Participant Privacy when Sharing Scientific Data
  • Best Practices for Protecting Participant Privacy when Sharing Scientific Data
  • Points to Consider for Designating Scientific Data for Controlled Access

Comments on the draft will be accepted until June 27, 2022, and full information and how to submit a comment can be found here.

Finally, every sequel needs a twist ending! In November 2021, NIH published a request for comments on the future directions of the NIH Genomic Data Sharing Policy.  We are still reviewing the many points and perspectives that were raised, but while we consider next steps, the comments we received are now available on the OSP website.  Okay, so maybe that twist wasn’t as big as, say, Darth Vader revealing he is (spoiler alert) Luke’s father in The Empire Strikes Back, but it’s still pretty good for the science policy world.

With a little more than half a year left until the implementation date of the NIH DMS Policy, we will continue to provide updates and resources over the next several months.

%d bloggers like this: