What Did You Do with Your Summer Vacation?

Well, if you are spending the summer at the NIH, you’ve likely been engaged in one of our many activities designed to access critical data and advance our understanding of the human experience by linking data sets together. Today, we are inviting you to engage in some additional best practices in accessing controlled data in ways that support science and preserve privacy.

In 2020, the NIH Scientific Data Council charged its Working Group for Streamlining Access to Controlled Data to spend a year engaging in dialogue within the NIH and with our extramural colleagues to better understand the experiences of scientists and the strategies that both facilitate and impede access to data. The group also considered where in the research process NIH should inform, engage, and gain consent of participants sufficiently to support science driven by access to controlled datasets.

NIH stores and facilitates access to many datasets, both open and controlled, with the goal of accelerating new discoveries and thereby maximizing taxpayer return on investment in the collection of these datasets. Data derived from humans that are shared through controlled-access mechanisms reflect NIH’s commitment to protect sensitive data and honor the informed consent provided by research participants in NIH-supported studies.

NIH has supported multiple controlled-access data repositories that uphold appropriate data protections for both human data and other sensitive data, while meeting the needs of various researcher communities. However, as data access requests increase, new repositories are established, and new mechanisms of providing access to data are developed, it is apparent that opportunities remain to improve efficiency and harmonization among repositories to make NIH-supported controlled-access data more FAIR: Findable, Accessible, Interoperable, and Reusable and to ensure appropriate oversight when data from different resources are combined. While these trends are enabling datasets and datatypes to be combined in new ways that advance the science, datasets, and datatypes that may or may not be controlled may, when combined, create inadvertent re-identification risks.

To help the agency address these issues in a way that is responsive to community needs, we are hosting a series of webinars through the end of July. We call these “breakout sessions” because they follow an outstanding webinar presented on July 9 available here. Richard Hodes, MD, director of the National Institute on Aging, launched the 3-hour seminar with a talk titled Opportunities for Advancing Research Through Better Access to Controlled Data. Ana Navas-Acien, MD, PhD, brought the perspective of indigenous and communities of people traditionally underrepresented in research, and she emphasized themes of community engagement and broadening the consent framework to consider community-level accountabilities as well as individual assent. Lucila Ohno-Machado, MD, MBA, PhD, addressed privacy preserving distributed analytics as a strategy to promote science while preserving privacy of data. Hoon Cho, PhD, described privacy-enhancing computational approaches to privacy preservation.

You can find the schedule for the breakout sessions below. These sessions are specifically designed to listen to the expectations, hopes, and concerns from researchers and participants. These webinars are free and open to the public; registration is required.

Breakout Session on “Making Controlled-Access Data Readily Findable and Accessible” on July 22 from 3 pm to 5:30 pm EST

Breakout Session on “General Opportunities for Streamlining Access to Controlled Data” on July 26 from 12:30 pm to 2 pm EST

Breakout Session on “Addressing Oversight, Governance, and Privacy Issues in Linking Controlled Access Data from Different Resources” on July 28 from 3 pm to 5:30 pm EST

To generate interest and hear from the broadest possible group of stakeholders, NIH has released a Request for Information on Streamlining Access to Controlled Data from NIH Data Repositories. Please note the closing date is August 9. We look forward to hearing from you! Please visit Streamlining Access to Controlled Data at the NIH for all of the information described in this post.

Finally, we would like to personally thank the many NIH staff members who serve on the working group:

  • Shu Hui Chen
  • Alicia Chou
  • Valentina Di Francesco
  • Greg Farber
  • Jamie Guidry Auvil
  • Nicole Garbarini
  • Lyric Jorgenson
  • Punam Mathur
  • Vivian Ota Wang
  • Jonathan Pollock
  • Rebecca Rodriguez
  • Alex Rosenthal
  • Steve Sherry
  • Julia Slutsman
  • Erin Walker
  • Alison Yao

I hope your summer vacation was as productive as ours!

(left to right)
Patricia Flatley Brennan, RN, PhD, NLM Director
Susan Gregurick, PhD, Associate Director for Data Science at NIH
Hilary S. Leeds, JD, Senior Health Science Policy Analyst for the Office of Science Policy at NIH

Common Data Elements: Increasing FAIR Data Sharing

Guest post by Carolina Mendoza-Puccini, MD, CDE Program Officer, Division of Clinical Research, National Institute of Neurological Disorders and Stroke (NINDS) and Kenneth J. Wilkins, PhD, Mathematical Statistician, Biostatistics Program and Office of Clinical Research Support, Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)

Previous posts published in Musings from the Mezzanine have explained the importance of health data standards and their role as the backbone of interoperability. Common Data Elements (CDEs) are a type of health data standard that is commonly used and reused in both clinical and research settings. CDEs capture complex phenomena, like depression, or recovery, through standardized, well defined questions (variables) that are paired with a set of allowable responses (values) that are used in a standardized way across studies or trials.

CDEs provide a way to standardize data collection—ensuring that data are collected consistently, and otherwise-avoidable variability is minimized.

Where possible, CDEs are linked to controlled vocabularies and terminologies commonly used in health care, such as SNOMED-CT and LOINC, and CDEs can provide a route to harmonize with non-prospective clinical research designs. Such links leverage common data entities, like clinical concepts underlying common data models, to align evidence of clinical studies with evidence from ‘real-world data’ such as electronic health records (EHRs), mobile/wearables, and patient-reported outcomes, what’s become known in recent years as ‘real world evidence’.

Importance of CDEs for Interoperability and Consistency of Evidence Across Settings

FAIR Data Principles (Source: National Institute of Environmental Health Sciences)

NIH’s response to the COVID-19 pandemic highlighted the importance of developing CDEs that can be used and endorsed across NIH-funded COVID-19 research so that resulting, urgently-needed data would be FAIR: Findable, Accessible, Interoperable, and Reusable.

Many groups across NIH identified, or are in the process of identifying, CDEs that are both COVID-19-related, and related to the needs of specific research projects such as NIH’s Disaster Research Response (DR2) program, Rapid Acceleration of Diagnostics—Underserved Populations (RADx-UP) and Researching COVID to Enhance Recovery (RECOVER) initiatives. There was also a need to develop a process for indicating NIH endorsement of CDEs that meet meaningful criteria, are made available through a common discovery platform (the NIH CDE Repository), and avoid duplicating functions of resources that already exist.

NIH’s Scientific Data Council charged a group of members of the NIH CDE Task Force, the CDE Governance Committee (Governance Committee), to develop this endorsement process based on the following criteria:

  • Clear definition of variable/measure with prompt and response 
  • Documented evidence of reliability and validity, where applicable
  • Human- and machine-readable formats
  • Recommended/designated by a recognized NIH body (Institute, Center, Office, Program/Project Committee, etc.)
  • Clear Licensing and Intellectual Property status (prefer Creative Commons or open source)

The role of the Governance Committee is to assure that the evidence of acceptability, reusability, and validity is properly presented and documented.

Submission of CDEs for Endorsement

The Governance Committee determined that CDEs will be submitted either as “Individual CDEs” or “Bundles.” Individual CDEs can be collected separately. Bundles are a group of questions or variables with specified sets of allowable responses that are grouped together and used as a set. Bundles may include standardized instruments, such as the Patient Health Questionnaire 9 (PHQ-9) Depression Scale, or a number of questions that must be collected as a group to maintain their meaning as individual elements (e.g., demographic features).

The Governance Committee will conduct a review of submissions based on the endorsement criteria approved. Once endorsed, Individual CDEs and possibly Bundles will be published in the NIH CDE Repository with an endorsement badge.

Reuse of NIH-endorsed CDEs Going Forward

With these governance-endorsed additions to the NIH CDE Repository, its role as a unified resource for common data entities and semantic concepts (the conceptual underpinnings of common data elements themselves) will lay the groundwork for researchers (NIH-funded or otherwise) to plan on interoperable data features. With the endorsement criteria and NLM-led efforts to enhance the NIH CDE Repository as an NIH-wide research resource, its role can grow along with those of related public and private sector alignment efforts. These include standards ranging from the United States Core Data for Interoperability for routine health care to the FDA submission standards within the Clinical Data Interchange Standards Consortium (CDISC) for treatments and preventive therapeutics, like vaccines, that we all rely upon for quality care.

Features to the NIH CDE Repository will continue to be enhanced—whether to search for semantically-related concepts or to highlight subtle distinctions among closely related CDEs. The NIH CDE Repository can also serve as a clearinghouse for interoperability in data from across a broad range of research, from prospectively-designed studies to those making use of data captured in the course of clinical care (such as EHRs) yet repurposed for real-world evidence.

In the wake of lessons learned from the most challenging aspects of early COVID-19 research, CDE use can increase FAIR data sharing across the research ecosystem in the near-seamless fashion just as envisioned by legislators when they enacted the 21st Century Cures Act. CDE governance processes are poised to adapt accordingly and to keep working toward greater data interoperability within this post-COVID-19 pandemic era.

CDE Governance Committee Members: Matt McAuliffe (Center for Information Technology), Kerry Goetz (National Eye Institute), Denise Warzel (National Cancer Institute), Erin Ramos (National Human Genome Research Institute), Jyoti Dayal (National Human Genome Research Institute), Deborah Duran (National Institute on Minority Health and Health Disparities), Janice Knable (National Cancer Institute). Chairs: Carolina Mendoza-Puccini (National Institute of Neurological Disorders and Stroke) and Kenneth Wilkins (National Institute of Diabetes and Digestive and Kidney Diseases). Ex Officio members: Robin Taylor, Mike Huerta, Lisa Federer (National Library of Medicine). Collaborator: Greg Farber (National Institute of Mental Health).

To learn more about the NIH Common Data Elements (CDE) Repository, watch this short video.

Dr. Mendoza-Puccini leads the NINDS Common Data Elements Project and is a Program Officer at the NINDS Division of Clinical Research.

Dr. Wilkins is a member of both the NIH-wide and NIDDK-specific Data Science and Data Management Working Groups and engages with researchers from across intramural and extramural programs on quantitative aspects of design and analysis.

Data Science @ NLM Journey Continues and What We Have Learned!

Guest post by the Data Science @ NLM Training Program team.

As part of our effort to advance Goal 3 of the NLM Strategic Plan (“Build a workforce for data driven research and health”), NLM launched the Data Science @ NLM (DS@NLM) Training Program in 2019 to help ensure that all staff are prepared to engage with and participate in NLM’s developing data science efforts.

Our efforts have stayed on track despite the changes caused by the COVID-19 pandemic, and we’re proud to highlight DS@NLM events held during the past year. We’re also sharing lessons learned throughout the training program, which are applicable to any individual or organization trying to help develop data science skills in the fields of health and biomedical information.

Earlier this month, we marked two years of the DS@NLM Training Program with a Spring Fling series of virtual events celebrating the data science training achievements of NLM staff.

Our Spring Fling kicked off with “lightning talk” presentations featuring several graduates of our intensive Data Science Fundamentals course, who shared their final class projects with NLM colleagues. Participants in our year-long Data Science Mentorship program also had the opportunity to present their Capstone projects. Our program mentees, who were mentored by NLM staff members, developed their data science skills by completing projects that applied data science techniques to help improve NLM operations.

What We’ve Learned:

Be responsive to specific needs; one size does NOT fit all.

Data plays a role in virtually everything we do at NLM, and as we aim to provide data training opportunities for staff working in many different areas, we recognize that different staff members have unique training needs. New training opportunities for some staff, such as our researchers, may hinge on their knowledge of machine learning. Metadata specialists may have more need for data cleaning or text processing skills, while administrators may benefit more from learning about data visualization.

People also learn in different ways, be it through shorter webinars and workshops, longer intensive courses, or self-directed learning. The DS@NLM program provides a variety of activities to meet these needs, including opportunities for various skill levels and topics, from short webinars to on-demand classes to ten-week intensive training courses.

Be responsive to staff feedback; give people what they ask for.

To help us determine what to offer, we engaged directly with our audience, asking NLM staff what they needed and listening to their responses. Because of the wide variety of work done at NLM, receiving feedback from staff helped us better understand their specific training needs. While we cannot always offer individualized programs to meet every need, staff feedback always helps us discover new ideas for future programming.

Teaching skills is just the beginning; applying new skills is essential.

A key lesson learned from staff feedback is that teaching new data skills is important, but that’s not enough on its own; teaching how to put newly acquired data skills to use in the real world or applying it to their work is just as important. Helping staff learn to apply data science techniques to their work transforms this new knowledge from theoretical to practical. The Data Science Mentorship Program, with its concluding Capstone project, is a great example of an opportunity for staff to both develop skills and practice applying them.

We applaud and celebrate all the hardworking staff from across NLM who have taken advantage of these training opportunities to advance the goal of building a workforce for data driven research and health, both at NLM and throughout the biomedical and health sciences information world.

Share with us and others how you are helping your staff apply data science skills in your organization—do you have any lessons learned?

Data Science @ NLM Training Program team
 
Top Row (left to right)
Dianne Babski, Associate Director, Library Operations
Maria Collins, Data & Systems Liaison, Office of the Associate Director for Library Operations
Peter Cooper, Strategic Communications Team Lead, National Center for Biotechnology Information

Bottom Row (left to right):
Mike Davidson, Librarian, Office of Engagement and Training, Division of Library Operations
Lisa Federer, NLM Data Science and Open Science Librarian, Office of Strategic Initiatives
Anna Ripple, Information Research Specialist, Lister Hill National Center for Biomedical Communications

Into the future: What NNLM has in store

Guest post by Martha Meacham, MA, MLIS, NNLM Project Director

It’s a time of transformation and growth for the NLM’s Network of the National Library of Medicine (the Network or NNLM). Throughout its 61-year history, the Network has excelled at reaching people in communities throughout the United States. Today, the Network comprises more than 8,800 academic health science libraries, hospital, and public libraries and community organizations. NNLM has endured because of its ability to adapt and respond to changes in support of its mission to advance the progress of medicine and improve public health by serving librarians, researchers, clinicians, and the public. Today, approximately 90% of the U.S. population lives in a county with at least one NNLM member, and 93% of minority populations in the U.S. lives in a county with at least one NNLM member.

Leveraging the strength and expertise of its member organizations, NNLM offers funding for community-based projects that improve access to health information, increase engagement with research and data, expand professional knowledge, and support outreach that promotes awareness and use of NLM resources in local communities. Through the extraordinary work done by Network staff, NNLM has successfully developed and demonstrated effective engagement strategies in communities across the country. For example, the recently funded project “Surviving COVID-19, In A Virtual World” partnered with a local beauty salon to educate, train, and inform the community about COVID-19 and ways to prevent contracting and spreading the disease. The “Combatting COVID-19 Misinformation with Health Literacy Microcontent” project involved working with a community-based organization to provide easily accessible, culturally appropriate “microcontent,” short-form imagery and video content that can be consumed in 10-30 seconds or less, aimed at dispelling misinformation around COVID-19 and vaccines. And the “Informacion para tu Salud en tu Casa” project involved working with a local non-profit organization to improve the health and wellbeing of the Hispanic community by providing health information and resources, and by connecting people to health services through community health workers.

Strengthening the Network’s reach and impact requires continued evaluation and improvement. This includes reassessment of NNLM’s Regional Medical Libraries (RMLs). RMLs coordinate the operations of regional and national programs, as well as ensure a continuity of quality service for core programs of the NNLM.

Over the years, NNLM has evolved from a maximum of ten institutions serving as RMLs, to the long held eight RMLs, and now to a new configuration of seven RMLs.

This reorganization reconfigures regions and reduces disparities between regions in two ways:

  1. Total population served
  2. Number of member libraries and organizations supported

Balancing the regional areas of coverage and populations served allows for deeper connections and greater impact. Also, in addition to our traditional library partners, NNLM membership has expanded to include a wider variety of community-based organizations including faith-based organizations and K-12 schools, among others. As of April 2021, nearly 40% (3,482) of the total number of NNLM member organizations are not what we would have traditionally called libraries. This diversity supports NNLM’s new goal to “advance health equity through information,” with a focus on serving underrepresented populations.

RMLs will meaningfully engage with current and future audiences to increase information access, prioritizing underrepresented populations, including those experiencing health disparities by nature of race and ethnicity, biological sex, gender identity or expression, and sexual orientation, cognitive and physical abilities, religious background or identification, socioeconomic status (past and current), education level, health literacy, and linguistic needs, geographic location including underrepresented populations from medically underserved areas, and other factors that create unequal access to health care. The Network is positioned to address health inequities that contribute to health disparities.

NLM is committed to addressing the challenge of health disparities and seeks new ways to provide understandable and trusted health information resources in a variety of ways to support diverse and underserved populations.

A number of creative projects are underway to support NNLM’s new goal.

Working with medical professionals in various specialties, NNLM’s “Educating Healthcare Professionals and the Sighted Community of Worcester County on the Health Disparities among VIPs 2020-2021” project seeks to improve health literacy and reduce health disparities for individuals who are blind or who suffer with low vision.  ‘Kina (Together)’ is a program serving Native Americans (Ojibwe) in northern Minnesota, and “Unidos en Nuestra Salud – Providing Capacity Building to our Community Members as Well as Public Health Education Regarding COVID-19” provides health education and health literacy skills to the Spanish-speaking community including, but not limited to, health information intermediaries such as librarians, community health workers, public health professionals, and community members that represent medically underserved populations.

For more than 60 years, NNLM has provided a trusted local platform for community outreach and engagement to promote health. As we look to the future, new possibilities and an agile approach will maintain, build upon, and grow this successful and valued program.

Martha Meacham is the Project Director of NNLM. Martha is a passionate advocate for improving the health of all through access to and understanding of health information.

A Peek into the Inner Workings of NLM’s Health Information Services

Guest post by Dianne Babski, Associate Director for Library Operations at NLM

How does an organization like NLM build and deploy 21st century products and services to support a global user audience? I’d like to give you a behind the scenes glimpse into NLM’s ever-evolving operations, and how we continue to develop the health information resources that you know and love, such as MEDLINE/PubMed, Medical Subject Headings (MeSH), and MedlinePlus.

Agile Product Development

NLM continues to move towards agile product development and digital unification. Where we used to release enhancements and features once or twice a year, we now develop incrementally and release product enhancements frequently. NLM supports innovation in our workforce by empowering product owners to make data-driven decisions through usability reviews and analytics of features, page views, and user requests to inform future actions.

We encourage staff to ask, “Are we meeting users’ needs—now and into the future?”

We have seen the success of this approach in the rollout of DOCLINE, our interlibrary loan request routing system, and the redesign of PubMed. We are in the planning phase of modernizing our flagship clinical trials registry and repository, ClinicalTrials.gov, to deliver an improved user experience on an updated platform to accommodate growth and enhance efficiency. We also embarked on the recommendations of several studies to increase the automation of MEDLINE Indexing. This involves incorporating machine learning and computational algorithms to apply MeSH terms to PubMed citations. As a result, the time for MEDLINE citations to be searched as indexed with MeSH in PubMed will be dramatically reduced, and, more importantly, will better leverage NLM staff expertise around chemical and gene names to enhance discoverability.

Data-Driven and Data-Informed

NLM uses data to balance our portfolio of products and offerings. I like to use the analogy of thinning garden beds to make room for healthier and stronger plants.  We created evaluation measures to review our products and services, which allow us to make data-driven and data-informed decisions to streamline, simplify, and optimize NLM’s portfolio of offerings.

NLM Herb Garden

One key principle is to consolidate information into fewer platforms for improved user experience, discoverability, and efficiency. Pruning our garden allows us to focus on products that are unique, high-quality, and trusted resources. I think we can all agree that it’s more difficult to find what you need when information is scattered and disparate. This has informed the retirement of some products that are no longer sustainable or have a succession plan, or low or declining usage. And while a product may no longer exist as a stand-alone product, we have ensured that data and information from those products are integrated into others, made available for download, or both. For example, by integrating Genetics Home Reference and GeneEd data, we enhanced and made MedlinePlus more robust.

Other agencies or organizations sometimes have equally sufficient information and resources available that duplicate efforts. For example, this is true for the resources held in our Disaster Information Management Research Center (DIMRC), which we have begun retiring by limiting updates to select resources, such as Disaster Lit. This resource is currently only updated with COVID-19-related information as the product (or data) transitions ownership to other organizations. Meanwhile, much of the grey literature from Disaster Lit will remain available in the Digital Collections or the NLM Bookshelf.

To help users navigate NLM collections, we are upgrading our Integrated Library System infrastructure with a cloud-based library services platform. The new platform will allow for better systems integration, collaborative functionality, and community features to keep pace with the data demands of a digital ecosystem and enable better distribution to libraries worldwide. Stay tuned for a new and improved Catalog!

A Common Data Language

As a standards organization, NLM designs and integrates products to make information Findable, Accessible, Interoperable, and Reusable (FAIR). Following the FAIR data principles, an interconnected ecosystem of biomedical data, tools and software enables faster research conclusions and resulting publication(s).

NLM’s goal is to link different but related digital research objects, such as articles, data sets, visualization tools, and predictive models, to advance discovery within our vast collection and resources beyond NLM. For example, in response to the global COVID-19 pandemic, we quickly processed provisional out-of-cycle codes and terms from terminology sources in UMLS, RxNorm, SNOMED CT, and VSAC, added new MeSH and supplemental concept records, and new COVID-19-related Common Data Elements (CDEs) in the NIH CDE Repository. NLM also convened a trans-NIH team to identify NIH-endorsed data elements. We are extremely proud of the role we played in accelerating the interoperability and discoverability of critical COVID-19-related information to help solve a global health crisis.

Looking ahead to January 2023, NIH will adopt a new NIH Policy for Data Management and Sharing, requiring NIH-funded researchers to prospectively submit a plan outlining how scientific data from their research will be managed and shared. In response, NLM developed the Dataset Metadata Model (DATMM), designed to describe biomedical research datasets to drive discoverability and re-use of shared research data.

Serving Society

NLM connects globally to a large and diverse mix of stakeholders both in public and private sectors. Our products and services—no matter how agile, digital, or interconnected—would be nothing without our valued users.

We intentionally aggregate diverse data and analytical tools into our collections to advance research on factors such as biological, genomic, social, behavioral, and environmental impacts on health, and characteristics such as sex, gender, age, race and ethnicity. Working with other standards development groups, we are actively involved in efforts to represent sex, gender, race, and social determinants of health in their resources. We develop reliable health information in visual ways that are accessible to broad audiences, including users with low literacy. For example, MedlinePlus offers a series of brief videos (in English and Spanish) covering several popular health topics, and maintains a Health Information in Multiple Languages Collection featuring more than 60 languages to support the information needs of a global audience.

In its 2021-26 funding cycle, the NLM-supported Network of the National Library of Medicine has a new goal to “advance health equity through information”, and will focus on serving underrepresented populations. NLM remains committed to addressing the challenge of health disparities and seeks new ways to provide understandable and trusted health information resources in a variety of ways to support a broad spectrum of users.

I hope this peek inside of NLM gives you a sense of the ways that our dedicated staff are striving to meet the digital demands of the 21st century. Using our strategic plan as a roadmap, we continue to evaluate and develop products with our diverse user base in mind, and recognize that sometimes we need to rethink, rebuild, and reduce our presentation structures.

We’d love to hear how you are reimagining your services. Until next time, may your garden of health and knowledge blossom this spring!

Dianne Babski is responsible for the overall management of one of NLM’s largest divisions, Library Operations, with more than 450 staff providing health information services to a global audience of health care professionals, researchers, administrators, students, historians, patients, and the public. She oversees budget, facilities, administration, and operations, including of a national network of more than 8,000 academic health science libraries, hospital and public libraries, and community organizations to improve access to health information.