Common Data Elements: Increasing FAIR Data Sharing

Guest post by Carolina Mendoza-Puccini, MD, CDE Program Officer, Division of Clinical Research, National Institute of Neurological Disorders and Stroke (NINDS) and Kenneth J. Wilkins, PhD, Mathematical Statistician, Biostatistics Program and Office of Clinical Research Support, Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)

Previous posts published in Musings from the Mezzanine have explained the importance of health data standards and their role as the backbone of interoperability. Common Data Elements (CDEs) are a type of health data standard that is commonly used and reused in both clinical and research settings. CDEs capture complex phenomena, like depression, or recovery, through standardized, well defined questions (variables) that are paired with a set of allowable responses (values) that are used in a standardized way across studies or trials.

CDEs provide a way to standardize data collection—ensuring that data are collected consistently, and otherwise-avoidable variability is minimized.

Where possible, CDEs are linked to controlled vocabularies and terminologies commonly used in health care, such as SNOMED-CT and LOINC, and CDEs can provide a route to harmonize with non-prospective clinical research designs. Such links leverage common data entities, like clinical concepts underlying common data models, to align evidence of clinical studies with evidence from ‘real-world data’ such as electronic health records (EHRs), mobile/wearables, and patient-reported outcomes, what’s become known in recent years as ‘real world evidence’.

Importance of CDEs for Interoperability and Consistency of Evidence Across Settings

FAIR Data Principles (Source: National Institute of Environmental Health Sciences)

NIH’s response to the COVID-19 pandemic highlighted the importance of developing CDEs that can be used and endorsed across NIH-funded COVID-19 research so that resulting, urgently-needed data would be FAIR: Findable, Accessible, Interoperable, and Reusable.

Many groups across NIH identified, or are in the process of identifying, CDEs that are both COVID-19-related, and related to the needs of specific research projects such as NIH’s Disaster Research Response (DR2) program, Rapid Acceleration of Diagnostics—Underserved Populations (RADx-UP) and Researching COVID to Enhance Recovery (RECOVER) initiatives. There was also a need to develop a process for indicating NIH endorsement of CDEs that meet meaningful criteria, are made available through a common discovery platform (the NIH CDE Repository), and avoid duplicating functions of resources that already exist.

NIH’s Scientific Data Council charged a group of members of the NIH CDE Task Force, the CDE Governance Committee (Governance Committee), to develop this endorsement process based on the following criteria:

  • Clear definition of variable/measure with prompt and response 
  • Documented evidence of reliability and validity, where applicable
  • Human- and machine-readable formats
  • Recommended/designated by a recognized NIH body (Institute, Center, Office, Program/Project Committee, etc.)
  • Clear Licensing and Intellectual Property status (prefer Creative Commons or open source)

The role of the Governance Committee is to assure that the evidence of acceptability, reusability, and validity is properly presented and documented.

Submission of CDEs for Endorsement

The Governance Committee determined that CDEs will be submitted either as “Individual CDEs” or “Bundles.” Individual CDEs can be collected separately. Bundles are a group of questions or variables with specified sets of allowable responses that are grouped together and used as a set. Bundles may include standardized instruments, such as the Patient Health Questionnaire 9 (PHQ-9) Depression Scale, or a number of questions that must be collected as a group to maintain their meaning as individual elements (e.g., demographic features).

The Governance Committee will conduct a review of submissions based on the endorsement criteria approved. Once endorsed, Individual CDEs and possibly Bundles will be published in the NIH CDE Repository with an endorsement badge.

Reuse of NIH-endorsed CDEs Going Forward

With these governance-endorsed additions to the NIH CDE Repository, its role as a unified resource for common data entities and semantic concepts (the conceptual underpinnings of common data elements themselves) will lay the groundwork for researchers (NIH-funded or otherwise) to plan on interoperable data features. With the endorsement criteria and NLM-led efforts to enhance the NIH CDE Repository as an NIH-wide research resource, its role can grow along with those of related public and private sector alignment efforts. These include standards ranging from the United States Core Data for Interoperability for routine health care to the FDA submission standards within the Clinical Data Interchange Standards Consortium (CDISC) for treatments and preventive therapeutics, like vaccines, that we all rely upon for quality care.

Features to the NIH CDE Repository will continue to be enhanced—whether to search for semantically-related concepts or to highlight subtle distinctions among closely related CDEs. The NIH CDE Repository can also serve as a clearinghouse for interoperability in data from across a broad range of research, from prospectively-designed studies to those making use of data captured in the course of clinical care (such as EHRs) yet repurposed for real-world evidence.

In the wake of lessons learned from the most challenging aspects of early COVID-19 research, CDE use can increase FAIR data sharing across the research ecosystem in the near-seamless fashion just as envisioned by legislators when they enacted the 21st Century Cures Act. CDE governance processes are poised to adapt accordingly and to keep working toward greater data interoperability within this post-COVID-19 pandemic era.

CDE Governance Committee Members: Matt McAuliffe (Center for Information Technology), Kerry Goetz (National Eye Institute), Denise Warzel (National Cancer Institute), Erin Ramos (National Human Genome Research Institute), Jyoti Dayal (National Human Genome Research Institute), Deborah Duran (National Institute on Minority Health and Health Disparities), Janice Knable (National Cancer Institute). Chairs: Carolina Mendoza-Puccini (National Institute of Neurological Disorders and Stroke) and Kenneth Wilkins (National Institute of Diabetes and Digestive and Kidney Diseases). Ex Officio members: Robin Taylor, Mike Huerta, Lisa Federer (National Library of Medicine). Collaborator: Greg Farber (National Institute of Mental Health).

To learn more about the NIH Common Data Elements (CDE) Repository, watch this short video.

Dr. Mendoza-Puccini leads the NINDS Common Data Elements Project and is a Program Officer at the NINDS Division of Clinical Research.

Dr. Wilkins is a member of both the NIH-wide and NIDDK-specific Data Science and Data Management Working Groups and engages with researchers from across intramural and extramural programs on quantitative aspects of design and analysis.

Leave a Reply