Data Discovery at NLM

A montage of illustrations showing the use of data by researchers, industry, patients, and citizen scientists.

Guest post by David Hale, Information Technology Specialist at NLM.

Did you know that each day more than four million people use NLM resources and that every hour a petabyte of data moves in or out of our computing systems?

Those mammoth numbers indicate to me how essential NLM’s array of information products and services are to scientific progress. But as we gain more experience with providing information, particularly clinical, biologic, and genetic datasets, we’re finding that how we share data is as critical as the data itself.

To fuel the insights and solutions needed to improve public health, we must ensure data flow freely to the researchers, industry innovators, patient communities, and citizen scientists who can bring new lenses to these rich repositories of knowledge.

One way we’re opening doors to our data is through an open data portal called Data Discovery. While agencies like the Centers for Disease Control and the Centers for Medicare and Medicaid Services are already utilizing the same platform with success, NLM is the first of NIH’s Institutes and Centers to adopt the platform. Our first datasets are already available, including content from such diverse resources as the Dietary Supplement Label Database, Pillbox, ToxMap, Disaster Lit, and HealthReach.

Why did NLM take this step? While many of our data resources have long been publicly available online, housing them within Data Discovery offers unconstrained access and delivers key benefits:

  • Powerful data exploration tools—By showing the dataset as a spreadsheet, the Data Discovery platform offers freedom to filter and interact with the data in novel ways.
  • Intuitive data visualizations—A picture is worth a thousand words, and nowhere is that truer than leveraging data visualizations to bring new perspectives on scientific questions.
  • Open data APIs—Open data alone isn’t enough to fuel a new generation of insights. Open APIs are critical to making the data understandable, accessible, and actionable, based on the unique needs of the user or audience.

What does this mean in practice?

Let’s look at the Office of Dietary Supplements’ (ODS) Dietary Supplement Label Database (DSLD) to illustrate the potential of leveraging Data Discovery.

More than half of all Americans take at least one dietary supplement a day. Reliable information about those supplements is critical to their appropriate use, making DSLD a timely and important dataset to make available in an open data platform. Through Data Discovery, researchers, academics, health care providers, and the public will be able to explore and derive insights from the labels of more than 85,000 dietary supplement products currently or formerly sold in the US.

Developers and technologists who support research, health, and medical organizations require APIs that are modern, interoperable, and standards-compliant. Data Discovery provides a powerful solution to these needs, supporting NLM’s role as a platform for biomedical discovery and data-powered health.

Beyond fueling scientific discovery, open access to data holds another benefit for advancing public health: contributing to the professional development of data and informatics specialists. An increasingly important part of the health care workforce, informaticists help researchers extract the most meaningful insights from data, driving new developments in the lab and better management of patients and populations.

I invite you to explore the new Data Discovery portal. It’s an exciting step forward in achieving key aspects of the NLM Strategic Plan—to advocate for open science, further democratize access to data, and support the training and development of the data science workforce.

headshot of David Hale
Credit: Jacie Lee Almira Photography

David Hale is an Information Technology Specialist at the National Library of Medicine. In addition to leading Data Discovery, David is also project lead for NLM’s Pillbox, a drug identification, reference, and image resource. He received his Bachelor of Science in Physical Science from the University of Maryland.

5 thoughts on “Data Discovery at NLM

    1. Thank you for your question, Hal. Data Discovery is a stand-alone platform that complements other data repositories at NLM. Its ease of use by a broad audience and the functionality it provides for each dataset (e.g., filtering, visualizations, API) create a variety of novel uses. In this pilot stage, NLM is evaluating those use cases. Additionally, the platform is not focused on hosting any particular type of data, but rather on empowering a broad variety of programs that are not part of other NLM’s other well-established and highly-used platforms.

Leave a Reply