Common Data Elements: Increasing FAIR Data Sharing

Guest post by Carolina Mendoza-Puccini, MD, CDE Program Officer, Division of Clinical Research, National Institute of Neurological Disorders and Stroke (NINDS) and Kenneth J. Wilkins, PhD, Mathematical Statistician, Biostatistics Program and Office of Clinical Research Support, Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)

Previous posts published in Musings from the Mezzanine have explained the importance of health data standards and their role as the backbone of interoperability. Common Data Elements (CDEs) are a type of health data standard that is commonly used and reused in both clinical and research settings. CDEs capture complex phenomena, like depression, or recovery, through standardized, well defined questions (variables) that are paired with a set of allowable responses (values) that are used in a standardized way across studies or trials.

CDEs provide a way to standardize data collection—ensuring that data are collected consistently, and otherwise-avoidable variability is minimized.

Where possible, CDEs are linked to controlled vocabularies and terminologies commonly used in health care, such as SNOMED-CT and LOINC, and CDEs can provide a route to harmonize with non-prospective clinical research designs. Such links leverage common data entities, like clinical concepts underlying common data models, to align evidence of clinical studies with evidence from ‘real-world data’ such as electronic health records (EHRs), mobile/wearables, and patient-reported outcomes, what’s become known in recent years as ‘real world evidence’.

Importance of CDEs for Interoperability and Consistency of Evidence Across Settings

FAIR Data Principles (Source: National Institute of Environmental Health Sciences)

NIH’s response to the COVID-19 pandemic highlighted the importance of developing CDEs that can be used and endorsed across NIH-funded COVID-19 research so that resulting, urgently-needed data would be FAIR: Findable, Accessible, Interoperable, and Reusable.

Many groups across NIH identified, or are in the process of identifying, CDEs that are both COVID-19-related, and related to the needs of specific research projects such as NIH’s Disaster Research Response (DR2) program, Rapid Acceleration of Diagnostics—Underserved Populations (RADx-UP) and Researching COVID to Enhance Recovery (RECOVER) initiatives. There was also a need to develop a process for indicating NIH endorsement of CDEs that meet meaningful criteria, are made available through a common discovery platform (the NIH CDE Repository), and avoid duplicating functions of resources that already exist.

NIH’s Scientific Data Council charged a group of members of the NIH CDE Task Force, the CDE Governance Committee (Governance Committee), to develop this endorsement process based on the following criteria:

  • Clear definition of variable/measure with prompt and response 
  • Documented evidence of reliability and validity, where applicable
  • Human- and machine-readable formats
  • Recommended/designated by a recognized NIH body (Institute, Center, Office, Program/Project Committee, etc.)
  • Clear Licensing and Intellectual Property status (prefer Creative Commons or open source)

The role of the Governance Committee is to assure that the evidence of acceptability, reusability, and validity is properly presented and documented.

Submission of CDEs for Endorsement

The Governance Committee determined that CDEs will be submitted either as “Individual CDEs” or “Bundles.” Individual CDEs can be collected separately. Bundles are a group of questions or variables with specified sets of allowable responses that are grouped together and used as a set. Bundles may include standardized instruments, such as the Patient Health Questionnaire 9 (PHQ-9) Depression Scale, or a number of questions that must be collected as a group to maintain their meaning as individual elements (e.g., demographic features).

The Governance Committee will conduct a review of submissions based on the endorsement criteria approved. Once endorsed, Individual CDEs and possibly Bundles will be published in the NIH CDE Repository with an endorsement badge.

Reuse of NIH-endorsed CDEs Going Forward

With these governance-endorsed additions to the NIH CDE Repository, its role as a unified resource for common data entities and semantic concepts (the conceptual underpinnings of common data elements themselves) will lay the groundwork for researchers (NIH-funded or otherwise) to plan on interoperable data features. With the endorsement criteria and NLM-led efforts to enhance the NIH CDE Repository as an NIH-wide research resource, its role can grow along with those of related public and private sector alignment efforts. These include standards ranging from the United States Core Data for Interoperability for routine health care to the FDA submission standards within the Clinical Data Interchange Standards Consortium (CDISC) for treatments and preventive therapeutics, like vaccines, that we all rely upon for quality care.

Features to the NIH CDE Repository will continue to be enhanced—whether to search for semantically-related concepts or to highlight subtle distinctions among closely related CDEs. The NIH CDE Repository can also serve as a clearinghouse for interoperability in data from across a broad range of research, from prospectively-designed studies to those making use of data captured in the course of clinical care (such as EHRs) yet repurposed for real-world evidence.

In the wake of lessons learned from the most challenging aspects of early COVID-19 research, CDE use can increase FAIR data sharing across the research ecosystem in the near-seamless fashion just as envisioned by legislators when they enacted the 21st Century Cures Act. CDE governance processes are poised to adapt accordingly and to keep working toward greater data interoperability within this post-COVID-19 pandemic era.

CDE Governance Committee Members: Matt McAuliffe (Center for Information Technology), Kerry Goetz (National Eye Institute), Denise Warzel (National Cancer Institute), Erin Ramos (National Human Genome Research Institute), Jyoti Dayal (National Human Genome Research Institute), Deborah Duran (National Institute on Minority Health and Health Disparities), Janice Knable (National Cancer Institute). Chairs: Carolina Mendoza-Puccini (National Institute of Neurological Disorders and Stroke) and Kenneth Wilkins (National Institute of Diabetes and Digestive and Kidney Diseases). Ex Officio members: Robin Taylor, Mike Huerta, Lisa Federer (National Library of Medicine). Collaborator: Greg Farber (National Institute of Mental Health).

To learn more about the NIH Common Data Elements (CDE) Repository, watch this short video.

Dr. Mendoza-Puccini leads the NINDS Common Data Elements Project and is a Program Officer at the NINDS Division of Clinical Research.

Dr. Wilkins is a member of both the NIH-wide and NIDDK-specific Data Science and Data Management Working Groups and engages with researchers from across intramural and extramural programs on quantitative aspects of design and analysis.

40 Years of Progress: It’s Time to End the HIV Epidemic

Guest post by Maureen M. Goodenow, PhD, Associate Director for AIDS Research and Director, Office of AIDS Research, National Institutes of Health

On June 5th, the National Institutes of Health (NIH) Office of AIDS Research (OAR) joined colleagues worldwide to commemorate the 40th anniversary of the landmark 1981 Centers for Disease Control and Prevention (CDC) Morbidity and Mortality Weekly Report (MMWR) that first recognized the syndrome of diseases later named AIDS. June 5th also marks HIV Long-Term Survivors Awareness Day. 

Forty years ago, the CDC’s MMWR described five people who were diagnosed with Pneumocystis carinii pneumonia—catalyzing a global effort that led to the identification of AIDS, and later, the virus that causes AIDS.

Over the years, much of the progress to guide the response to HIV has emerged from research funded by the NIH, and helped turn a once fatal disease into a now manageable chronic illness. This progress is attributable in large part to the nation’s longstanding HIV leadership and contributions at home and abroad.

NIH is taking action to recognize the milestones achieved through science, pay tribute to more than 32 million people who have died from AIDS-related illness globally (including 700,000 Americans), and support the goal of Ending the HIV Epidemic in the U.S. (EHE) and worldwide. OAR is coordinating with NIH Institutes, Centers, and Offices (ICOs) to share messaging that will continue through NIH’s World AIDS Day commemoration on December 1, 2021.

The NIH remains committed to supporting basic, clinical, and translational research to develop cutting-edge solutions for the ongoing challenges of the HIV epidemic. The scientific community has achieved groundbreaking advances in the understanding of basic virology, human immunology, and HIV pathogenesis and has led the development of safe, effective antiretroviral medications and effective interventions to prevent HIV acquisition and transmission.

Nevertheless, HIV remains a serious public health issue.

NIH established the OAR in 1988 to ensure that NIH HIV/AIDS research funding is directed at the highest priority research areas, and to facilitate maximum return on the investment. OAR’s mission is accomplished in partnership within the NIH through the ICs that plan and implement specific HIV programs or projects, coordinated by the NIH HIV/AIDS Executive Committee. As I reflect on our progress against HIV/AIDS, I would like to note the collaboration, cooperation, innovation, and other activities across the NIH ICOs in accelerating HIV/AIDS research.

Key scientific advances using novel methods and technologies have emerged in the priority areas of the NIH HIV research portfolio. Many of these advances stem from NIH-funded efforts, and all point to important directions for the NIH HIV research agenda in the coming years, particularly in the areas of new formulations of current drugs, new delivery systems, dual use of drugs for treatment and prevention, and new classes of drugs with novel strategies to treat viruses with resistance to current drug regimens.

Further development of long-lasting HIV prevention measures and treatments remains at the forefront of the NIH research portfolio on HIV/AIDS research.

NIH-funded investigators continue to uncover new details about the virus life cycle, which is crucial for the development of next generation HIV treatment approaches. Additionally, the NIH is focused on developing novel diagnostics to detect the virus as early as possible after infection.

Results in the next two years from ongoing NIH-supported HIV clinical trials will have vital implications for HIV prevention, treatment, and cure strategies going forward. For example, two NIH-funded clinical trials for HIV vaccines, Imbokodo and Mosaico, are evaluating an experimental HIV vaccine regimen designed to protect against a wide variety of global HIV strains. These studies comprise a crucial component of the NIH’s efforts to end the HIV/AIDS epidemic.

As we close on four decades of research, I look forward to the new advances aimed at prevention and treatment in the years to come.

You can play a role in efforts to help raise awareness and get involved with efforts to end the HIV epidemic. Visit OAR’s 40 Years of Progress: It’s Time to End the HIV Epidemic webpage, and use the toolkit of ready-to-go resources.

Dr. Goodenow leads the OAR in coordinating the NIH HIV/AIDS research agenda to end the HIV pandemic and improve the health of people with HIV. In addition, she is Chief of the Molecular HIV Host Interactions Laboratory at the NIH.

NLM Announces New Annual Lecture on Science, Technology, and Society

Guest post by Maryam Zaringhalam, PhD, National Library of Medicine Data Science and Open Science Officer and Mike Huerta, PhD, director of the Office of Strategic Initiatives and associate director of the National Library of Medicine.

In October 2019, NLM invited award-winning science journalist Angela Saini to discuss her research on how bias and prejudice have crept into science. Her lecture examined how racist and sexist ideas have permeated science over its history — and how science, in turn, has been contorted to justify and perpetuate pseudoscientific myths of innate inferiority. Saini’s work and insights sparked a crucial conversation within NLM about our role and responsibility as the world’s largest biomedical library and a leader in data science research, situated within the nation’s premiere medical research agency, to question how systemic biases affect our work and determine how we can correct them.

As advancing equity and rooting out structural discrimination in science and technology have become an increasingly urgent federal priority, NLM will build on this discussion, in part, by announcing the launch of an annual NLM Science, Technology, and Society Lecture on March 1, 2021.

Situated at the nexus of the NIH-supported research community and the public, NLM plays a vital role not only in advancing cutting-edge research, but also in acting as a steward of biomedical information in service of society. As leaders in facilitating and shaping the future of biomedical data science, we must understand the implications of our work for society as a whole. We must, for instance, question how biases may creep into algorithms that connect research results with the public and think through the ethical ramifications of emerging technologies that might reinforce and amplify those biases. As a national library, we serve as curators of the history of biomedical science, which must reflect both the great achievements made possible by research and the injustices committed within the scientific community. And as an institution with more than 8,000 points of presence through our Network of the National Library of Medicine, we have the means to fulfill our responsibility to meet the needs and understand the concerns of the communities we serve.

With these responsibilities along with NLM’s unique role and capabilities in mind, the NLM Lecture on Science, Technology, and Society Lecture aims to raise awareness around the societal and ethical implications of the conduct of biomedical research and the use of advanced technologies, while seeding conversations across the Library, NIH, and the broader biomedical research community. NLM sees such considerations as fundamental to advancing biomedical discovery and human health for the benefit of all.

Dr. Kate Crawford is the inaugural Visiting Chair of AI and Justice at the École Normale Supérieure, as well as a Senior Principal Researcher at Microsoft Research, and the cofounder of the AI Now Institute at New York University.

Each spring, we plan to invite a leading voice working at the intersection of biomedicine, data science, ethics, and justice to present their research and how it relates to the mission and vision of NLM, as well as NIH more broadly. This year, we are pleased to host Dr. Kate Crawford, a leading scholar of science, technology, and society, with over 20 years of experience studying large scale data systems and artificial intelligence (AI) in the wider contexts of history, politics, labor, and the environment. Her lecture, “Atlas of AI: Mapping the social and economic forces behind AI”, will explore how machine learning systems can reproduce and intensify forms of structural bias and discrimination and offer new paths for thinking through the research ethics and policy implications of the turn to machine learning.

As the interests, priorities, and concerns of our society continue to evolve, particularly in response to emerging technologies and shifting national conversations, we hope this annual lecture, alongside established lecture series such as NLM History Talks, will provide an invaluable perspective on the societal implications of our work and further establish NLM’s leadership as a trusted partner in health.

Dr. Zaringhalam is a member of the Office of Strategic Initiatives and is responsible for monitoring and coordinating data science and open science activities and development across NLM, NIH, and beyond. She completed her PhD in molecular biology at Rockefeller University in 2017 before joining NLM as an AAAS Science and Technology Policy Fellow.

Dr. Huerta leads NLM in identifying, implementing, and assessing strategic directions of NLM, including at the intersection of data science and open science. In his 30 years at NIH, he has led many trans-NIH research initiatives and helped establish neuroinformatics as a field. Dr. Huerta joined NIH’s National Institute of Mental Health in 1991, before moving to NLM in 2011.

A Journey to Spur Innovation and Discovery

Guest post by Valerie Schneider, PhD, staff scientist at the National Library of Medicine’s National Center for Biotechnology Information, National Institutes of Health.

It’s been said that nature is the best teacher. When it comes to understanding human biology and improving health, examples abound of the advances that have been made from the study of a diverse set of non-human organisms. Over the last two centuries, the study of nematode worms has taught us about longevity and mRNAs (the biological molecule that is the basis for several COVID-19 vaccines), common fungi about cell division and cancer, and fruit flies about many things, from the role of chromosomes in heredity to our circadian rhythms. The ability to create targeted alterations in the genomes of model organisms has been transformative for studies to establish the function of specific genes in the etiology of human disease.

The modern era of genomic biology, in which genome sequencing and assembly are accessible to more researchers than ever before, provides data from an even greater range of organisms from which we might learn. Today, we rely not only on primate models, but on a whole host of species: for example, swine to understand organ transplantation, songbirds to understand vocalization and learning, and bats and pangolins to teach us about the evolution of the SARS-CoV-2 virus and how to fight its spread.

These rapidly growing collections of sequence and other data on species across the tree of life offer enormous promise for discoveries that have the potential to improve human health. To better enable such discoveries, with the support of NIH, NLM is planning a major modernization of its resources and their underlying infrastructure.

This modernization will support the needs of users engaged in data search and retrieval, gene annotation, evaluation of sequence quality, and comparative analyses. The new infrastructure, user interfaces, and tools should result in an improved experience for researchers doing a wide range of work, and also facilitate better data submissions.

This revamping aligns with NIH’s Strategic Plan for Data Science, which provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem, as well as NLM’s Strategic Plan, which furthers NLM’s commitment to provide data and information to accelerate biomedical discovery and improve health. NLM and NIH are committed to providing researchers with modern, stable, and cloud-oriented technologies that support research needs.

Over the last few years, NLM has demonstrated this commitment by re-designing several flagship products, including the PubMed database for searching published biomedical literature, the ClinicalTrials.gov database of information on privately and publicly funded clinical trials, and the Basic Local Alignment Search Tool (BLAST) for finding regions of similarity between biological sequences. As part of NIH’s Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative, NLM also made the data from its massive (36 petabyte) Sequence Read Archive (SRA) available on two commercial cloud platforms, facilitating large-scale computational research that would otherwise be difficult for many researchers. Revamping these resources has positioned them to support both the current and future needs of NLM’s diverse audience of researchers, clinicians, data scientists, educators and others.

Importantly, this current initiative to modernize NLM products, tools, and services, and concurrently develop content, will include extensive engagement with the research community, just as we’ve done with previous re-design efforts. The NLM is committed to offering interfaces accessible to both novices and experts. Additionally, NLM believes a key part of the next generation of its data resources requires an infrastructure that supports an ongoing, dynamic exchange of content, including contributions of metadata and gene functional information from knowledge builders in the community to complement and enhance NIH-provided content.

Community engagement will also ensure that externally sourced content is provided in ways that maintain the high value and trustworthiness of the datasets. Additionally, data connections that make the content of this new resource accessible to external knowledgebases containing other datatypes, such as images, will further promote integrative data analyses that support scientific discovery.

Many opportunities exist to streamline processes, look across resources, and gain insights that will provide new ways of learning. Through NLM’s continued commitment to modernization initiatives, we are ready to again improve the user experience for accessing, analyzing and visualizing sequence data and related information. Nature continues to be our best teacher — and we are now poised to learn from her in an exciting new classroom.

We invite you to come on this journey with us.

Valerie Schneider, PhD, is the deputy director of Sequence Offerings and the head of the Sequence Plus program. In these roles, she coordinates efforts associated with the curation, enhancement, and organization of sequence data, as well as oversees tools and resources that enable the public to access, analyze, and visualize biomedical data. She also manages NCBI’s involvement in the Genome Reference Consortium, the international collaboration tasked with maintaining the value of the human reference genome assembly.

Upcoming Training Opportunity: University-based Training for Research Careers in Biomedical Informatics and Data Science

Guest blog by Valerie Florance, PhD, Director of NLM’s Division of Extramural Programs

Explore the Training

NLM’s Extramural Programs Division is a principle source of NIH funding for research training in biomedical informatics, applying approaches in computer and information science to challenges in basic biomedical research, health care, and public health administration. NLM’s support fundamentally shapes the education, training, and advancement of biomedical informatics nationally. For decades, NLM has sponsored university-based training for predoctoral and postdoctoral fellows to prepare them for research careers. These programs support NLM’s long-term investment strategy to help influence and impact the field of biomedical informatics and data science.

Last October, NLM published NOT-LM-21-001 in the NIH Guide for Grants and Contracts to allow potential applicants sufficient time to develop meaningful collaborations and responsive projects. This program, a model among NIH training programs, advances training with big data in biomedical informatics and produces interdisciplinary, researchers that fully comprehend the challenges of knowledge representation, decision support, translational research, human-computer interaction, and social and organizational factors that influence effective adoption of health information technology in biomedical domains. This notice was the first step in a year-long process that will result in new 5-year grant awards that begin in July 2022. You’ll find the notice outlines the expected timetable for publishing the funding opportunity announcement, accepting applications, reviewing them and making awards.

The solicitation for new applications will be published in the NIH Guide for Grants and Contracts in March with applications due in May. For those interested in applying for an NLM training grant for the first time, we encourage a review of the previous solicitation to get a sense of the data and programmatic descriptions that are required for a training grant application.

Because issuance dates for the next competition are estimates, it is also helpful to subscribe to the weekly Table of Contents emails from the NIH Guide for Grants and Contracts. The extra benefit of this weekly mailing is that it lists all new funding issuances from NIH plus important notices about policy changes.

A Strong Foundation

NLM’s training programs offer graduate education and postdoctoral research experiences in a wide range of areas including health care informatics, translational bioinformatics, clinical research informatics, public health informatics, and biomedical data science. Each of these programs offer a combination of core curriculum and electives. In the current 5-year cycle, seven programs also offer special tracks in environmental exposure informatics supported by NIH’s National Institute of Environmental and Health Sciences.

A decades-old project, the university-based training initiatives is one of NLM’s signature grant programs. NLM’s training programs have produced many leaders in the field of biomedical informatics. Past trainees have taken positions in academia, industry, small businesses, health care organizations, and government. Currently, NLM supports 200 trainee positions at 16 universities around the United States and provides funding each year for up to 40 short-term trainee positions that are used to help recruit college graduates to our field by providing introductory training and research opportunities. To develop a sense of community among the trainees, NLM brings its trainees together each year, apart from those falling within a pandemic year, for an annual conference hosted at one of the university sites.

You can find a map with links to descriptions of the current programs here. The website also provides links to information about past annual conferences – check out past agendas to get a sense of the broad scope of science across the field of biomedical informatics.

Attendees comparing notes at NLM Informatics Training Conference 2017 in La Jolla, California

Did you take part in this training? What was your favorite thing about this experience? What advice would you give to current students? How can we make the program even better?

 Dr. Florance heads NLM’s Extramural Programs Division, which is responsible for the Library’s grant programs and coordinates NLM’s informatics training programs.