NLM Announces New Annual Lecture on Science, Technology, and Society

Guest post by Maryam Zaringhalam, PhD, National Library of Medicine Data Science and Open Science Officer and Mike Huerta, PhD, director of the Office of Strategic Initiatives and associate director of the National Library of Medicine.

In October 2019, NLM invited award-winning science journalist Angela Saini to discuss her research on how bias and prejudice have crept into science. Her lecture examined how racist and sexist ideas have permeated science over its history — and how science, in turn, has been contorted to justify and perpetuate pseudoscientific myths of innate inferiority. Saini’s work and insights sparked a crucial conversation within NLM about our role and responsibility as the world’s largest biomedical library and a leader in data science research, situated within the nation’s premiere medical research agency, to question how systemic biases affect our work and determine how we can correct them.

As advancing equity and rooting out structural discrimination in science and technology have become an increasingly urgent federal priority, NLM will build on this discussion, in part, by announcing the launch of an annual NLM Science, Technology, and Society Lecture on March 1, 2021.

Situated at the nexus of the NIH-supported research community and the public, NLM plays a vital role not only in advancing cutting-edge research, but also in acting as a steward of biomedical information in service of society. As leaders in facilitating and shaping the future of biomedical data science, we must understand the implications of our work for society as a whole. We must, for instance, question how biases may creep into algorithms that connect research results with the public and think through the ethical ramifications of emerging technologies that might reinforce and amplify those biases. As a national library, we serve as curators of the history of biomedical science, which must reflect both the great achievements made possible by research and the injustices committed within the scientific community. And as an institution with more than 8,000 points of presence through our Network of the National Library of Medicine, we have the means to fulfill our responsibility to meet the needs and understand the concerns of the communities we serve.

With these responsibilities along with NLM’s unique role and capabilities in mind, the NLM Lecture on Science, Technology, and Society Lecture aims to raise awareness around the societal and ethical implications of the conduct of biomedical research and the use of advanced technologies, while seeding conversations across the Library, NIH, and the broader biomedical research community. NLM sees such considerations as fundamental to advancing biomedical discovery and human health for the benefit of all.

Dr. Kate Crawford is the inaugural Visiting Chair of AI and Justice at the École Normale Supérieure, as well as a Senior Principal Researcher at Microsoft Research, and the cofounder of the AI Now Institute at New York University.

Each spring, we plan to invite a leading voice working at the intersection of biomedicine, data science, ethics, and justice to present their research and how it relates to the mission and vision of NLM, as well as NIH more broadly. This year, we are pleased to host Dr. Kate Crawford, a leading scholar of science, technology, and society, with over 20 years of experience studying large scale data systems and artificial intelligence (AI) in the wider contexts of history, politics, labor, and the environment. Her lecture, “Atlas of AI: Mapping the social and economic forces behind AI”, will explore how machine learning systems can reproduce and intensify forms of structural bias and discrimination and offer new paths for thinking through the research ethics and policy implications of the turn to machine learning.

As the interests, priorities, and concerns of our society continue to evolve, particularly in response to emerging technologies and shifting national conversations, we hope this annual lecture, alongside established lecture series such as NLM History Talks, will provide an invaluable perspective on the societal implications of our work and further establish NLM’s leadership as a trusted partner in health.

Dr. Zaringhalam is a member of the Office of Strategic Initiatives and is responsible for monitoring and coordinating data science and open science activities and development across NLM, NIH, and beyond. She completed her PhD in molecular biology at Rockefeller University in 2017 before joining NLM as an AAAS Science and Technology Policy Fellow.

Dr. Huerta leads NLM in identifying, implementing, and assessing strategic directions of NLM, including at the intersection of data science and open science. In his 30 years at NIH, he has led many trans-NIH research initiatives and helped establish neuroinformatics as a field. Dr. Huerta joined NIH’s National Institute of Mental Health in 1991, before moving to NLM in 2011.

A Journey to Spur Innovation and Discovery

Guest post by Valerie Schneider, PhD, staff scientist at the National Library of Medicine’s National Center for Biotechnology Information, National Institutes of Health.

It’s been said that nature is the best teacher. When it comes to understanding human biology and improving health, examples abound of the advances that have been made from the study of a diverse set of non-human organisms. Over the last two centuries, the study of nematode worms has taught us about longevity and mRNAs (the biological molecule that is the basis for several COVID-19 vaccines), common fungi about cell division and cancer, and fruit flies about many things, from the role of chromosomes in heredity to our circadian rhythms. The ability to create targeted alterations in the genomes of model organisms has been transformative for studies to establish the function of specific genes in the etiology of human disease.

The modern era of genomic biology, in which genome sequencing and assembly are accessible to more researchers than ever before, provides data from an even greater range of organisms from which we might learn. Today, we rely not only on primate models, but on a whole host of species: for example, swine to understand organ transplantation, songbirds to understand vocalization and learning, and bats and pangolins to teach us about the evolution of the SARS-CoV-2 virus and how to fight its spread.

These rapidly growing collections of sequence and other data on species across the tree of life offer enormous promise for discoveries that have the potential to improve human health. To better enable such discoveries, with the support of NIH, NLM is planning a major modernization of its resources and their underlying infrastructure.

This modernization will support the needs of users engaged in data search and retrieval, gene annotation, evaluation of sequence quality, and comparative analyses. The new infrastructure, user interfaces, and tools should result in an improved experience for researchers doing a wide range of work, and also facilitate better data submissions.

This revamping aligns with NIH’s Strategic Plan for Data Science, which provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem, as well as NLM’s Strategic Plan, which furthers NLM’s commitment to provide data and information to accelerate biomedical discovery and improve health. NLM and NIH are committed to providing researchers with modern, stable, and cloud-oriented technologies that support research needs.

Over the last few years, NLM has demonstrated this commitment by re-designing several flagship products, including the PubMed database for searching published biomedical literature, the database of information on privately and publicly funded clinical trials, and the Basic Local Alignment Search Tool (BLAST) for finding regions of similarity between biological sequences. As part of NIH’s Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative, NLM also made the data from its massive (36 petabyte) Sequence Read Archive (SRA) available on two commercial cloud platforms, facilitating large-scale computational research that would otherwise be difficult for many researchers. Revamping these resources has positioned them to support both the current and future needs of NLM’s diverse audience of researchers, clinicians, data scientists, educators and others.

Importantly, this current initiative to modernize NLM products, tools, and services, and concurrently develop content, will include extensive engagement with the research community, just as we’ve done with previous re-design efforts. The NLM is committed to offering interfaces accessible to both novices and experts. Additionally, NLM believes a key part of the next generation of its data resources requires an infrastructure that supports an ongoing, dynamic exchange of content, including contributions of metadata and gene functional information from knowledge builders in the community to complement and enhance NIH-provided content.

Community engagement will also ensure that externally sourced content is provided in ways that maintain the high value and trustworthiness of the datasets. Additionally, data connections that make the content of this new resource accessible to external knowledgebases containing other datatypes, such as images, will further promote integrative data analyses that support scientific discovery.

Many opportunities exist to streamline processes, look across resources, and gain insights that will provide new ways of learning. Through NLM’s continued commitment to modernization initiatives, we are ready to again improve the user experience for accessing, analyzing and visualizing sequence data and related information. Nature continues to be our best teacher — and we are now poised to learn from her in an exciting new classroom.

We invite you to come on this journey with us.

Valerie Schneider, PhD, is the deputy director of Sequence Offerings and the head of the Sequence Plus program. In these roles, she coordinates efforts associated with the curation, enhancement, and organization of sequence data, as well as oversees tools and resources that enable the public to access, analyze, and visualize biomedical data. She also manages NCBI’s involvement in the Genome Reference Consortium, the international collaboration tasked with maintaining the value of the human reference genome assembly.

Upcoming Training Opportunity: University-based Training for Research Careers in Biomedical Informatics and Data Science

Guest blog by Valerie Florance, PhD, Director of NLM’s Division of Extramural Programs

Explore the Training

NLM’s Extramural Programs Division is a principle source of NIH funding for research training in biomedical informatics, applying approaches in computer and information science to challenges in basic biomedical research, health care, and public health administration. NLM’s support fundamentally shapes the education, training, and advancement of biomedical informatics nationally. For decades, NLM has sponsored university-based training for predoctoral and postdoctoral fellows to prepare them for research careers. These programs support NLM’s long-term investment strategy to help influence and impact the field of biomedical informatics and data science.

Last October, NLM published NOT-LM-21-001 in the NIH Guide for Grants and Contracts to allow potential applicants sufficient time to develop meaningful collaborations and responsive projects. This program, a model among NIH training programs, advances training with big data in biomedical informatics and produces interdisciplinary, researchers that fully comprehend the challenges of knowledge representation, decision support, translational research, human-computer interaction, and social and organizational factors that influence effective adoption of health information technology in biomedical domains. This notice was the first step in a year-long process that will result in new 5-year grant awards that begin in July 2022. You’ll find the notice outlines the expected timetable for publishing the funding opportunity announcement, accepting applications, reviewing them and making awards.

The solicitation for new applications will be published in the NIH Guide for Grants and Contracts in March with applications due in May. For those interested in applying for an NLM training grant for the first time, we encourage a review of the previous solicitation to get a sense of the data and programmatic descriptions that are required for a training grant application.

Because issuance dates for the next competition are estimates, it is also helpful to subscribe to the weekly Table of Contents emails from the NIH Guide for Grants and Contracts. The extra benefit of this weekly mailing is that it lists all new funding issuances from NIH plus important notices about policy changes.

A Strong Foundation

NLM’s training programs offer graduate education and postdoctoral research experiences in a wide range of areas including health care informatics, translational bioinformatics, clinical research informatics, public health informatics, and biomedical data science. Each of these programs offer a combination of core curriculum and electives. In the current 5-year cycle, seven programs also offer special tracks in environmental exposure informatics supported by NIH’s National Institute of Environmental and Health Sciences.

A decades-old project, the university-based training initiatives is one of NLM’s signature grant programs. NLM’s training programs have produced many leaders in the field of biomedical informatics. Past trainees have taken positions in academia, industry, small businesses, health care organizations, and government. Currently, NLM supports 200 trainee positions at 16 universities around the United States and provides funding each year for up to 40 short-term trainee positions that are used to help recruit college graduates to our field by providing introductory training and research opportunities. To develop a sense of community among the trainees, NLM brings its trainees together each year, apart from those falling within a pandemic year, for an annual conference hosted at one of the university sites.

You can find a map with links to descriptions of the current programs here. The website also provides links to information about past annual conferences – check out past agendas to get a sense of the broad scope of science across the field of biomedical informatics.

Attendees comparing notes at NLM Informatics Training Conference 2017 in La Jolla, California

Did you take part in this training? What was your favorite thing about this experience? What advice would you give to current students? How can we make the program even better?

 Dr. Florance heads NLM’s Extramural Programs Division, which is responsible for the Library’s grant programs and coordinates NLM’s informatics training programs. 

Making Connections and Enabling Discoverability – Celebrating 30 Years of UMLS

Guest post by NLM staff: David Anderson, UMLS Production Coordinator; Liz Amos, Special Assistant to the Chief Health Data Standards Officer; Anna Ripple, Information Research Specialist; and Patrick McLaughlin, Head, Terminology QA & User Services Unit.

Shortly after Donald A.B. Lindberg, MD was sworn in as NLM Director in 1984, he asked “What is NLM, as a government agency, uniquely positioned to do?” Through conversations with experts, Dr. Lindberg identified a looming question in the field of bioinformatics — How can machines act as if they understand biomedical meaning? At the time, the information necessary to answer this question was distributed across a variety of resources. Very few publicly available tools for processing biomedical text had been developed. NLM had experience with terminology development and maintenance (MeSH – Medical Subject Headings), coordinating distributed systems (DOCLINE), and distributing and providing access to large datasets (MEDLINE) in an era when this was a challenge.

As a national library, NLM was deeply interested in providing good answers to biomedical questions. For these reasons, NLM was uniquely positioned to develop a system — the Unified Medical Language System (UMLS) — that could lay the groundwork for machines to act as if they understand biomedical meaning. This year marks the 30th anniversary of the release of the first edition of the UMLS in November 1990.

Achieving the Unified Medical Language System

The result of a large-scale, NLM-led research and development project, the UMLS began with the audacious goal of helping computer systems behave as if they understand the meaning of the language of biomedicine and health. The UMLS was expected to facilitate the development of systems that could retrieve, integrate, and aggregate conceptually-related information from disparate electronic sources such as literature databases, clinical records, and databanks despite differences in the vocabularies and coding systems used within them, and in the terminology employed by users.  

Betsy Humphreys (left) and Dr. Lindberg (right) tout the release of the Unified Medical Language System in 1990.

Under the direction of Dr. Donald Lindberg, then-Deputy Associate Director for Library Operations, Betsy Humphreys, and a multidisciplinary, international team from academia and the private sector, the UMLS evolved into an essential tool for enabling interoperability, natural language processing, information retrieval, machine learning, and  other data science use cases.

UMLS Knowledge Sources

Central to the UMLS model is the grouping of synonymous names into UMLS concepts and the assignment of broad categories (semantic types) to all those concepts. Since its first release in 1990, NLM has continued to expand and update the UMLS Knowledge Sources based on feedback from testing and use.

The UMLS Metathesaurus was the first biomedical terminology resource organized by concept, and its development had a significant impact on subsequent medical informatics theory and practice. The broad terminology coverage, synonymy, and semantic categorization in the UMLS, in combination with its lexical tools, enable its primary use cases:

  • identifying meaning in text,
  • mapping between vocabularies, and
  • improving information retrieval.

The growing increase in UMLS use over the past decade reflects broad developments in health policy, including the designation of SNOMED CT, LOINC, and RxNorm (three component vocabularies included in the UMLS Metathesaurus) as U.S. national standards for clinical data for quality improvement payment programs such as CMS’s Promoting Interoperability Programs (previously known as Meaningful Use). Many UMLS source vocabularies are also referenced in the United States Core Data for Interoperability (USCDI). Researchers continue to rely on the UMLS as a knowledge base for natural language processing and data mining. The UMLS community of users has developed several tools that enhance and expand the capabilities of the UMLS.

Celebrating 30 Years

Thirty years after the initial release of the UMLS Knowledge Sources, the UMLS resources continue to be of benefit to millions of people worldwide. The UMLS is used in NLM flagship applications such as PubMed and Additionally, some researchers and system developers use the UMLS to build or enhance electronic resources, clinical data warehouses, components of electronic health record systems, natural language processing pipelines, and test collections. UMLS resources are being used primarily as intended, to facilitate the interpretation of biomedical meaning in disparate electronic information and data in many different computer systems serving scientists, health professionals, and the public.

The Journal of the American Medical Informatics Association is commemorating the 30th UMLS anniversary with a special focus issue dedicated to the memory of Dr. Lindberg (1933–2019) that also includes information on current research and applications, broader impacts, and future directions of the UMLS.

Upon her retirement from NLM in 2017, Betsy Humphreys remarked that “systems that get used, get better.” As the UMLS enters its fourth decade, a review of UMLS production methods and priorities is underway with the same high standard goals with which it started – trailblazing into the future to improve biomedical information storage, processing and retrieval.

As we reflect on this important milestone, we want to thank stakeholders, like you, who have provided feedback over the years to help us make the UMLS leaner, stronger, and more useful.

Top row: David Anderson, UMLS Production Coordinator and Liz Amos, Special Assistant to the Chief Health Data Standards Officer

Bottom Row: Anna Ripple, Information Research Specialist and Patrick McLaughlin, Head, Terminology QA & User Services Unit

Fostering a Culture of Scientific Data Stewardship

Guest post by Jerry Sheehan, Deputy Director, National Library of Medicine.

Making research data broadly findable, accessible, interoperable, and reusable is essential to advancing science and accelerating its translation into knowledge and innovation. The global response to COVID-19 highlights the importance and benefits of sharing research data more openly.

The National Institutes of Health (NIH) has long championed policies that make the results of research available to the public. Last week, NIH released the NIH Policy for Data Management and Sharing (DMS Policy) to promote the management and sharing of scientific data generated from NIH-funded or conducted research. This policy replaces the 2003 NIH Data Sharing Policy.

The DMS policy was informed by public feedback and requires NIH-funded researchers to plan for the management and sharing of scientific data. It also makes clear that data sharing is a fundamental part of the research process.

Data sharing benefits the scientific community and the public.

For the scientific community, data sharing enables researchers to validate scientific results, increasing transparency and accountability. Data sharing also strengthens collaborations that allow for richer analyses. Strong data-sharing practices facilitate the reuse of hard-to-generate data, such as those acquired during complex experiments or once-in-a-lifetime events like natural disasters or pandemics.

For the public, sound data-sharing practices demonstrate good stewardship of taxpayer funds. Clear, well-written data sharing and management plans promote transparency and accountability to society. They also expand opportunities for data to be access and reused by clinicians, students, educators, and innovators in health care and other sectors of the economy.

As an organization dedicated to improving access to data and information to advance biomedical sciences and public health, NLM plays a key role in implementing the new policy and supporting researchers in meeting its requirements. NLM maintains a number of data repositories, such as the Sequence Read Archive and, that curate, preserve, and provide access to research data. NLM also maintains a longer list of NIH-supported data repositories that accept different types of data (e.g., genomic, imaging) from different research domains (e.g., cancer, neuroscience, behavioral sciences). Where appropriate domain-specific repositories do not exist, NLM has made clear how researchers can include small datasets (<2GB) with articles deposited in NLM’s PubMed Central (PMC) under the NIH Public Access Policy.

NLM also works with the broader library community to support improved data management and sharing. Supplemental information issued with the new policy makes it clear that research budgets can include costs of data management and sharing, such as those for data curation, formatting data to accepted standards, attaching metadata to foster discoverability, and preparing data for storage in a repository. These are the kinds of services increasingly provided by libraries and librarians in universities and academic medical centers across the country. NLM, through the Network of the National Library of Medicine, offers training in data management and data literacy to health science, public, and other librarians to expand capacity for these important services.

NIH’s DMS Policy applies to all research, funded or conducted in whole or in part by NIH, that results in the generation of scientific data. This includes research funded or conducted by extramural grants, contracts, intramural research projects, or other funding agreements. The DMS Policy does not apply to research and other activities that do not generate scientific data, including training, infrastructure development, and non-research activities.

NIH will continue to engage the research community to support the change and implementation of this new policy, which will go into effect in January 2023. NLM will continue to work within NIH and across the library and information science communities to develop innovative ways to support the policy and advance the effective stewardship of research data. Let us know how else we can support this important policy advance.

Read more about this major policy release in the NIH’s Under the Poliscope blog.

As NLM Deputy Director, Jerry Sheehan shares responsibility with the Director for overall program development, program evaluation, policy formulation, direction and coordination of all Library activities. He has made major contributions to the development and implementation of NIH, HHS, and U.S. government-wide policy related to open science, public access to government-funded information, clinical trials registration, and electronic health records.