Making Connections and Enabling Discoverability – Celebrating 30 Years of UMLS

Guest post by NLM staff: David Anderson, UMLS Production Coordinator; Liz Amos, Special Assistant to the Chief Health Data Standards Officer; Anna Ripple, Information Research Specialist; and Patrick McLaughlin, Head, Terminology QA & User Services Unit.

Shortly after Donald A.B. Lindberg, MD was sworn in as NLM Director in 1984, he asked “What is NLM, as a government agency, uniquely positioned to do?” Through conversations with experts, Dr. Lindberg identified a looming question in the field of bioinformatics — How can machines act as if they understand biomedical meaning? At the time, the information necessary to answer this question was distributed across a variety of resources. Very few publicly available tools for processing biomedical text had been developed. NLM had experience with terminology development and maintenance (MeSH – Medical Subject Headings), coordinating distributed systems (DOCLINE), and distributing and providing access to large datasets (MEDLINE) in an era when this was a challenge.

As a national library, NLM was deeply interested in providing good answers to biomedical questions. For these reasons, NLM was uniquely positioned to develop a system — the Unified Medical Language System (UMLS) — that could lay the groundwork for machines to act as if they understand biomedical meaning. This year marks the 30th anniversary of the release of the first edition of the UMLS in November 1990.

Achieving the Unified Medical Language System

The result of a large-scale, NLM-led research and development project, the UMLS began with the audacious goal of helping computer systems behave as if they understand the meaning of the language of biomedicine and health. The UMLS was expected to facilitate the development of systems that could retrieve, integrate, and aggregate conceptually-related information from disparate electronic sources such as literature databases, clinical records, and databanks despite differences in the vocabularies and coding systems used within them, and in the terminology employed by users.  

Betsy Humphreys (left) and Dr. Lindberg (right) tout the release of the Unified Medical Language System in 1990.

Under the direction of Dr. Donald Lindberg, then-Deputy Associate Director for Library Operations, Betsy Humphreys, and a multidisciplinary, international team from academia and the private sector, the UMLS evolved into an essential tool for enabling interoperability, natural language processing, information retrieval, machine learning, and  other data science use cases.

UMLS Knowledge Sources

Central to the UMLS model is the grouping of synonymous names into UMLS concepts and the assignment of broad categories (semantic types) to all those concepts. Since its first release in 1990, NLM has continued to expand and update the UMLS Knowledge Sources based on feedback from testing and use.

The UMLS Metathesaurus was the first biomedical terminology resource organized by concept, and its development had a significant impact on subsequent medical informatics theory and practice. The broad terminology coverage, synonymy, and semantic categorization in the UMLS, in combination with its lexical tools, enable its primary use cases:

  • identifying meaning in text,
  • mapping between vocabularies, and
  • improving information retrieval.

The growing increase in UMLS use over the past decade reflects broad developments in health policy, including the designation of SNOMED CT, LOINC, and RxNorm (three component vocabularies included in the UMLS Metathesaurus) as U.S. national standards for clinical data for quality improvement payment programs such as CMS’s Promoting Interoperability Programs (previously known as Meaningful Use). Many UMLS source vocabularies are also referenced in the United States Core Data for Interoperability (USCDI). Researchers continue to rely on the UMLS as a knowledge base for natural language processing and data mining. The UMLS community of users has developed several tools that enhance and expand the capabilities of the UMLS.

Celebrating 30 Years

Thirty years after the initial release of the UMLS Knowledge Sources, the UMLS resources continue to be of benefit to millions of people worldwide. The UMLS is used in NLM flagship applications such as PubMed and ClinicalTrials.gov. Additionally, some researchers and system developers use the UMLS to build or enhance electronic resources, clinical data warehouses, components of electronic health record systems, natural language processing pipelines, and test collections. UMLS resources are being used primarily as intended, to facilitate the interpretation of biomedical meaning in disparate electronic information and data in many different computer systems serving scientists, health professionals, and the public.

The Journal of the American Medical Informatics Association is commemorating the 30th UMLS anniversary with a special focus issue dedicated to the memory of Dr. Lindberg (1933–2019) that also includes information on current research and applications, broader impacts, and future directions of the UMLS.

Upon her retirement from NLM in 2017, Betsy Humphreys remarked that “systems that get used, get better.” As the UMLS enters its fourth decade, a review of UMLS production methods and priorities is underway with the same high standard goals with which it started – trailblazing into the future to improve biomedical information storage, processing and retrieval.

As we reflect on this important milestone, we want to thank stakeholders, like you, who have provided feedback over the years to help us make the UMLS leaner, stronger, and more useful.

Top row: David Anderson, UMLS Production Coordinator and Liz Amos, Special Assistant to the Chief Health Data Standards Officer

Bottom Row: Anna Ripple, Information Research Specialist and Patrick McLaughlin, Head, Terminology QA & User Services Unit

Fostering a Culture of Scientific Data Stewardship

Guest post by Jerry Sheehan, Deputy Director, National Library of Medicine.

Making research data broadly findable, accessible, interoperable, and reusable is essential to advancing science and accelerating its translation into knowledge and innovation. The global response to COVID-19 highlights the importance and benefits of sharing research data more openly.

The National Institutes of Health (NIH) has long championed policies that make the results of research available to the public. Last week, NIH released the NIH Policy for Data Management and Sharing (DMS Policy) to promote the management and sharing of scientific data generated from NIH-funded or conducted research. This policy replaces the 2003 NIH Data Sharing Policy.

The DMS policy was informed by public feedback and requires NIH-funded researchers to plan for the management and sharing of scientific data. It also makes clear that data sharing is a fundamental part of the research process.

Data sharing benefits the scientific community and the public.

For the scientific community, data sharing enables researchers to validate scientific results, increasing transparency and accountability. Data sharing also strengthens collaborations that allow for richer analyses. Strong data-sharing practices facilitate the reuse of hard-to-generate data, such as those acquired during complex experiments or once-in-a-lifetime events like natural disasters or pandemics.

For the public, sound data-sharing practices demonstrate good stewardship of taxpayer funds. Clear, well-written data sharing and management plans promote transparency and accountability to society. They also expand opportunities for data to be access and reused by clinicians, students, educators, and innovators in health care and other sectors of the economy.

As an organization dedicated to improving access to data and information to advance biomedical sciences and public health, NLM plays a key role in implementing the new policy and supporting researchers in meeting its requirements. NLM maintains a number of data repositories, such as the Sequence Read Archive and ClinicalTrials.gov, that curate, preserve, and provide access to research data. NLM also maintains a longer list of NIH-supported data repositories that accept different types of data (e.g., genomic, imaging) from different research domains (e.g., cancer, neuroscience, behavioral sciences). Where appropriate domain-specific repositories do not exist, NLM has made clear how researchers can include small datasets (<2GB) with articles deposited in NLM’s PubMed Central (PMC) under the NIH Public Access Policy.

NLM also works with the broader library community to support improved data management and sharing. Supplemental information issued with the new policy makes it clear that research budgets can include costs of data management and sharing, such as those for data curation, formatting data to accepted standards, attaching metadata to foster discoverability, and preparing data for storage in a repository. These are the kinds of services increasingly provided by libraries and librarians in universities and academic medical centers across the country. NLM, through the Network of the National Library of Medicine, offers training in data management and data literacy to health science, public, and other librarians to expand capacity for these important services.

NIH’s DMS Policy applies to all research, funded or conducted in whole or in part by NIH, that results in the generation of scientific data. This includes research funded or conducted by extramural grants, contracts, intramural research projects, or other funding agreements. The DMS Policy does not apply to research and other activities that do not generate scientific data, including training, infrastructure development, and non-research activities.

NIH will continue to engage the research community to support the change and implementation of this new policy, which will go into effect in January 2023. NLM will continue to work within NIH and across the library and information science communities to develop innovative ways to support the policy and advance the effective stewardship of research data. Let us know how else we can support this important policy advance.

Read more about this major policy release in the NIH’s Under the Poliscope blog.

As NLM Deputy Director, Jerry Sheehan shares responsibility with the Director for overall program development, program evaluation, policy formulation, direction and coordination of all Library activities. He has made major contributions to the development and implementation of NIH, HHS, and U.S. government-wide policy related to open science, public access to government-funded information, clinical trials registration, and electronic health records.

All for One…Health for All: The Role of Open Access, Evidence-Based Information to Improve Health for All Species

Guest post by Kristine M. Alpi, MLS, MPH, PhD, AHIP, OHSU Library, Oregon Health & Science University; Tova Johnson, MPH, MA, MLIS, OHSU Library, Oregon Health & Science University; and Heather K. Moberly, MSLS, AHIP, FHEA, PgCert (Vet Ed), University Libraries, Texas A&M University

Physical isolation arising from the COVID-19 pandemic has led many people to increasingly engage with the outdoor environment or bring companion animals into their lives as supports for their physical and mental wellbeing.

This connection among the health of humans, animals, and the environment exemplifies the One Health approach. 

One Health is not new, but it has gained new life amid rising concerns over COVID-19 and the environment in recent years. This model encourages collaboration across disciplines, with experts in human, animal, and environmental health, along with other specialties, to achieve better public health outcomes. While leaders often come from veterinary medicine or public health, anyone committed to keeping the world healthy is a potential partner in One Health. 


Providing high-quality, timely information to the people and professionals who need it most is critical to protecting the health of people, animals, and the environment. The FDA uses the term animal health literacy to provide the public with information about drug and food safety concerns that can have an impact on animals and humans. The joint NLM/FDA resource, DailyMed, includes drug listings approved for either humans or animals.

NLM’s MedlinePlus online health information resource provides robust information on animal-human interactions, but typically with a focus on those that threaten human health such as animal bites or zoonoses (diseases that can be passed between humans and animals). To get information from animal health experts, we can look to information providers such as veterinary educators to provide insights offering the interconnected One Health perspective.

Just as MedlinePlus relies, in part, on health professional societies to provide information on specialized care, veterinary medicine trains specialists in topics ranging from behavior to surgery, and provides information to support decision-making about large and small companion animal healthcare. Animal health information in multiple languages is not centrally coordinated, but the American College of Veterinary Surgeons is one example that offers information in Spanish as well as English.


Beyond personal experiences caring for animals at home or at work, One Health is a critical framework for providing timely, open, high-quality information during times of wildfires and natural disasters that can affect all species. Responding to natural disasters brings together teams who work primarily with humans and teams who typically work with animals. Many veterinary schools provide emergency preparedness education in addition to deploying veterinary emergency teams to respond to emergency situations that may be all species-focused or primarily a human health oriented mission. Central knowledge resources like the American Red Cross also provide apps and information to support people and pets during times of crisis.

Libraries who participate in the NLM-supported Network of the National Library of Medicine are essential resources for people seeking information online from trusted sources. Health sciences librarians, particularly the members of the Medical Library Association’s Animal and Veterinary Information Specialist Caucus, support the health of all species by addressing questions raised by people who live, work, and share the broader environment with companion animals and wildlife. These questions may come to public, community college, and university libraries who rely on free and direct access to high-quality resources written for a variety of audiences.

We recently presented Health Questions for All Species as a free webinar for the Oregon Reference Summit to highlight how to use NLM and other open access, evidence-based resources to address One Health questions. We acknowledged the value of regionally and culturally specific resources for populations who are particularly challenged by certain conditions or environmental exposures, and discussed similarities and differences in human and animal information sources, terminology and readability.

We hope this information expanded your awareness about how NLM and other information resources can promote One Health through an integrated approach to searching and addressing issues that impact humans, animals and our environment.

The One Health Commission is a great place for educational resources for teachers and learners who want to take another step towards Health for All.

Did you learn something new today? What can you do to contribute to One Health?

Kristine M. Alpi, MLS, MPH, PhD, AHIP, OHSU Library, Oregon Health & Science University and Adjunct Assistant Professor, Department of Population Health & Pathobiology, North Carolina State University.

Tova Johnson, MPH, MA, MLIS, OHSU Library, Oregon Health & Science University.

Heather K. Moberly, MSLS, AHIP, FHEA, PgCert (Vet Ed), Medical Sciences Library and department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University. Heather is a 2020 recipient of the Friends of the National Library of Medicine’s Michael E. DeBakey Library Services Outreach Award

What Health Literacy Outreach Looks Like at NLM

Guest post by M. Nichelle Midón, Project Scientist, Office of Engagement and Training, National Library of Medicine.

Earlier this year, NLM Director Dr. Patricia Flatley Brennan shared insights about how we, at NLM, support individual and organizational health literacy. As the world’s largest biomedical library, NLM provides physical and digital access to trusted, quality health information with the ability to reach people where they live, work and play.

One way we do this is through our Network of the National Library of Medicine (NNLM), which leverages more than 8,000 academic health science libraries, hospital and public libraries, and community organizations across the United States to promote health literacy and ensure that NLM resources are accessible to the public. NNLM develops and offers programs that affect communities in meaningful ways.  

One of NNLM’s recent success stories is Project TORDS (Technology Outreach to Reduce Health Disparities and Stigma). Tony Nguyen, MLIS, AHIP, executive director of the NNLM Southeastern Atlantic Region, recently described the program, saying “Project TORDS is designed to increase access to technology in rural and underserved communities in southern West Virginia by providing training on the use of technology while showing participants how to access, evaluate and use online health information, such as NLM’s MedlinePlus.”

According to Darryl Cannady, the executive director of South Central Educational Development, Inc., a local, community-based organization participating in Project TORDS, “Living in rural, poverty stricken Southern West Virginia, where residents live with many health disparities and social determinants of health, we have to create innovative ways to reach the most disenfranchised communities and provide the needed access to health education and access to quality health care. Project TORDS helps bridge gaps and connect the dots to health education and resources, while simultaneously reducing stigma through education.”

Watch all about it: Project TORDS

Click to learn more about the impact of Project TORDS.

Other NNLM health literacy outreach programs include the Wash and Learn and Promotores de Salud programs.

The Wash and Learn program transforms local laundromats into informal learning spaces where people can access early-learning literacy materials as they wait for their clothes to wash and dry.

NLM’s outreach to Promotores de Salud, the Spanish term for “community health workers,” reaches vulnerable and underserved members of the Latino/Hispanic community with health information and resources.  These outreach efforts include sessions that promote awareness of culturally appropriate health information from NLM.

Watch all about it: Wash and Learn

Click to watch how NNLM supports improving health literacy at a local laundromat.

Watch all about it: Promotores de Salud

Click to watch Promotores de Salud in action.

Join us in celebrating Health Literacy Month this October – what does health literacy month mean to you?

M. Nichelle Midón works with NLM’s National Network of Libraries of Medicine (NNLM) to provide researchers, health professionals, public health workforce, educators, and the public with equal access to biomedical and health information resources. She holds a Bachelor of Science in public health from the University of North Carolina at Chapel Hill, a Master of Science in library and information science from the Catholic University of America, and a Master of Science in instructional technology from Towson University.

Women in Tech at NIH: Togetherness Enables Transformation

Guest post by Susan Gregurick, PhD, associate director for data science and director of the Office of Data Science Strategy, National Institutes of Health.

There is an African proverb that says, “If you want to go fast, go alone. If you want to go far, go together.”

As I approach my first anniversary as the associate director for data science at NIH, this statement could not ring truer for me. By going together, NIH has made astonishing progress during this past year to enable more advanced data science, impressive data and computational infrastructure advances, and better FAIR data sharing.

Togetherness means collaboration that harnesses the power and strength of a diverse team. At NIH, women are using their expertise in data science and their teamwork skills to rapidly enable transformative programs.

Andrea Norris, director of the Center for Information Technology, said it well last year:

“This is such an exciting time for innovation at the intersection of biomedical, medical, and technology domains. It’s dynamic and fast moving. Whether you have scientific skills, business expertise or know technology, there’s a role — an important role — for you in this space, especially here at NIH.”

I spoke with 11 women who are significantly impacting data science activities at NIH about how they enable data science; their advice for young, aspiring women data scientists; and the data science accomplishments that make them proud.

Collaboration and the role that NIH has played in responding to the COVID-19 pandemic were common themes in our discussions. These women also spoke about the importance of having a mentor, the four antidotes to challenging times, and the necessity of diverse perspectives.

To get to know these women even better, read their full responses on the Women in Data Science  page.

Jessica Mazerik, PhD, Data Science Workforce Director, Office of Data Science Strategy (ODSS)

Leads the Coding it Forward Civic Digital Fellows at NIH and NIH DATA Scholars programs

Bringing diverse talent to NIH.

I lead central fellowship programs to bring talented computer and data scientists to NIH. Our external outreach efforts encourage women and other minorities to apply for the programs we support. And, internally, we support engagement across NIH to place students in diverse positions.

Breaking down silos to advance data science.

Talented and driven staff across NIH have mobilized to lead implementation tactics under the strategic plan for data science, and we’ve built a forum for discussion in monthly town hall meetings. Most importantly, teams across NIH are working together and communicating widely to break down silos to continue advancing data science. 

Teresa Zayas Cabán, PhD, Coordinator, Fast Healthcare Interoperability Resources (FHIR) Acceleration, National Library of Medicine (NLM)

Co-leads the NIH FHIR Working Group

Advancing data standards within and beyond NIH. 

I’m leading efforts to enable the use of standardized clinical and research data sharing to advance discovery. We’re not only working collaboratively within NIH to advance data science, but also across departments, government offices, and the field itself. Together, we are leading the field in a new direction with the use in research, as appropriate, of the same standards used in health care. 

Be confident in what you know.

Don’t sell yourself short — speak up about what you know. Find good mentors who can advise you and be in your corner throughout your career. Find a good cohort of colleagues to collaborate and commiserate with. 

Belinda Seto, PhD, Deputy Director, ODSS

Co-leads the NIH FHIR Working Group

Women leading data science communities.

We all have varying perspectives and visions for data science. Nonetheless, we have become nuclei of the NIH data science community. Through our collaborations, we are emissaries for data science to extramural grantee communities. I see this as a concentric circle of expanding national and even global communities of data science.

Technical and sociocultural accomplishments in data science.

A sociocultural accomplishment is that many silos have been dismantled, and the willingness and readiness to collaborate are demonstrably strong. On the technical front, there are successful examples of progress toward an NIH data ecosystem, both at the foundational level and at the leading edge.

Lisa Federer, PhD, Data Science and Open Science Librarian, Office of Strategic Initiatives, NLM

Leads the NIH Data Science Training Committee

Be a lifelong learner.

Embrace lifelong learning — there’s always something new to learn! I’ve made it a priority to learn new things that I can bring to my work, including going back to school to get a PhD in information science with a focus on data science.

Open science practices advancing our understanding of COVID-19.

NIH has been doing impressive work in advancing our understanding of COVID-19 and has been a leader in making data related to SARS-CoV-2 widely available so that researchers around the world can help tackle this important issue. In the face of this global problem, open science practices will help us make progress toward therapies and vaccines more quickly.

Jennie Larkin, PhD, Deputy Director, Division of Neuroscience, National Institute on Aging

Co-leads the FAIR Data Repositories Team, which ran the one-year NIH Figshare instance pilot

Engage and embed data science in different programs.

Ask questions, learn, and engage. We need more bright people who can bring new perspectives, expertise, and energy to data science and help embed data science in different research programs.

Working with the community to address the COVID-19 pandemic.

The increasing breadth and depth of data science expertise across NIH and the larger biomedical enterprise has allowed us to rapidly accomplish much more than was possible just a few years ago. We have seen the best of our community, in the willingness to come together to meet the challenge of the COVID-19 pandemic.

Rebecca Rosen, PhD, Program Lead, NIMH Data Archive and Senior Advisor, Office of Technology Development and Coordination, National Institute of Mental Health

Leads the Researcher Auth Service Initiative

Learn from traditional and nontraditional resources.

I encourage young women in all biomedical science fields to incorporate data science into their career development plans. Look for data science educational resources from both traditional and nontraditional sources and network within those sources.

Collaboration to realize a data ecosystem.

The NIH data ecosystem has an increasingly tangible presence. We have growing numbers of researchers analyzing data across NIH cloud-based platforms, thanks in part to the new Office of Data Science Strategy, the NIH STRIDES Initiative, and a greater level of collaboration across NIH Institutes and Centers.

Heidi Sofia, PhD, Program Director, National Human Genome Research Institute (NHGRI)

Co-leads the Biomedical Information Science and Technology Initiative consortium and organized supplements to enhance software tools for open science (NOT-OD-20-073)

Beauty, awe, love, and humor.

I am never happier than when some brilliant young or established scientist in the community brings forward innovative, transformative science which I can endeavor to foster. In these instances, I find the first two of the four antidotes to our challenging times (beauty, awe, love, and humor). And my colleagues often provide the last one.

Use your power for good.

Among the first “computers” were women who performed the mathematical calculations needed to advance science, starting in 1757 in the search for Halley’s comet. Today, data science is a superpower for women in fields ranging from medicine to the natural sciences to business. So empower yourself, and use your power for good!

Maryam Zaringhalam, PhD, Data Science and Open Science Officer, Office of Strategic Initiatives, NLM

Organized the Webinar on Sharing, Discovering, and Citing COVID-19 Data and Code in Generalist Repositories

Women make data science better.

The lived experiences and perspectives of women — particularly women who are Black, Indigenous and People of Color (BIPOC); members of the LGBTQIA+ community; or members of the disability community — are critically important in ensuring that the products of data science have the greatest benefit for us all. Every chance I get, I tell women that they not only belong in data science, but that data science is better because of them.

Enabling researchers to make COVID-19 data available.

I was proud to be involved in quickly planning and organizing a joint NLM-ODSS webinar on sharing, discovering, and citing COVID-19 data and code using generalist repositories. It’s been inspiring to see the research community so eager to share the data and tools they’ve been generating, so this workshop felt like a timely and impactful contribution in support of researchers.

Valentina Di Francesco, MS, Lead Program Director, Computational Genomics and Data Science Program, NHGRI

Co-lead for the NIH Cloud Platform Interoperability Effort

Realizing a trans-NIH federated data ecosystem.

Among the variety of projects I am involved in, I am particularly enthusiastic about the NIH Cloud Platform Interoperability Effort, which aims to establish and implement guidelines and technical standards to empower end-user analyses across participating cloud-based platforms established across NIH in order to facilitate the realization of a trans-NIH federated data ecosystem.

Data science is a science at NIH.

After many years at NIH, only recently have I noticed a solid appreciation of the essential contributions of the statistical, mathematical, and computer science approaches to better understand biological systems. Finally, data science is respected as a field at NIH! I can’t think of a better time to join the ranks of women data scientists in biomedical research.

Kim Pruitt, PhD, Chief, Information Engineering Branch, National Center for Biotechnology Information, NLM

Co-leads the Lifecycle Metrics Working Group, which hosted the NIH Virtual Workshop on Data Metrics

Persevere, find a mentor, understand expectations, persevere.

My advice to someone entering this field is to persevere, to find an excellent mentor, to go into collaborations with a clear understanding of each member’s role and publication expectations, and to continually look for lessons learned when an analysis strategy fails (that is, cycle back to persevere).

Providing data access in the cloud

Providing access to data on the NIH STRIDES Initiative cloud-based platform is a prerequisite to supporting and growing the biomedical data science field. Most notable to me is the significant achievement of providing the complete Sequence Read Archive data (roughly 40 PB and growing) in two formats and ahead of the planned schedule.

Jennifer Couch, PhD, Chief, Structural Biology and Molecular Applications Branch, National Cancer Institute

NIH Citizen Science Coordinator

Bringing new approaches to biomedical research.

My focus is on bringing new, diverse, and often outsider perspectives, tools, approaches, and methods into the biomedical research space. Together with many talented colleagues and collaborators, I look for ways to bring new approaches to biomedical research. Sometimes that involves creating opportunities for different research communities to come together and find ways to collaborate.

On finding the right collaborators.

Hone your skills, don’t be afraid to try out new methods, and find collaborators with interesting questions who will know the answer when they see it. Find those collaborators who appreciate that your skills and insights are critical to your joint project’s success.

Dr. Gregurick leads the implementation of the NIH Strategic Plan for Data Science through scientific, technical, and operational collaborations with the Institutes, Centers, and offices that make up NIH. She has substantial expertise in computational biology, high-performance computing, and bioinformatics.