Traveling a Bridge2AI in a Quest for High-Quality, FAIR Data Sets

This blog was authored by NIH staff who serve on the Bridge to Artificial Intelligence (Bridge2AI) Working Group.

In April 2021, we introduced NIH Common Fund’s Bridge to Artificial Intelligence (Bridge2AI) program to tap the potential of artificial intelligence (AI) for revolutionizing biomedical discovery, increasing our understanding of human health, and improving the practice of medicine. In the past year, Bridge2AI researchers have been creating guidance and standards for the development of ethically sourced, state-of-the-art, AI-ready data sets to help solve some of the most pressing challenges in human health such as uncovering how genetic, behavioral, and environmental factors influence health and wellness. The program will also support the training required to enable the broader biomedical and behavioral research community to leverage AI technologies.

The NIH initiative will support diverse teams and tools to ensure that data sets adhere to FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. Beyond ensuring compliance to FAIR principles, Bridge2AI will develop and disseminate best practices that promote a culture of diversity and continuous ethical inquiry into how data are collected.

The Bridge2AI program will support innovative data-generation projects nationwide to collect complex AI-ready data in four biomedical areas:

Clinical Care Informatics—Intensive care units treat patients with urgent medical conditions such as sepsis and cardiac arrest. This data generation project will collect, integrate, annotate, and share high-resolution physiological data from adult and pediatric critical care patients from 14 health systems that can then be used by AI technologies to identify approaches to improve recovery from acute illness.

Functional GenomicsWithin each cell in the human body lies a wealth of information about health, disease, and the impact of environmental factors. This project will generate richly detailed proteomic, genomic, and cellular imaging data to help predict disease mechanisms and associated gene pathways and networks for a variety of health outcomes.

Precision Public Health—The human voice is as unique as a fingerprint and has been found to contain acoustic signatures of human health and disease. This project will collect large-scale multimodal data sets containing voice, genomic, and clinical data, which AI technologies can use to help improve screening for and the diagnosis and treatment of a variety of developmental, neurological, and mental health conditions.

Return to Health—Much can be learned by uncovering how individuals move from a less healthy to a healthier state, a process called salutogenesis. This project will collect data from a diverse population with varying stages of type 2 diabetes to help improve our understanding of chronic disease progression and recovery. To learn more about Bridge2AI and salutogenesis, please view Bridging Our Way to Health Restoration by Helene M. Langevin, MD, director of the National Center for Complementary and Integrative Health.

To support these data generation projects, the Bridge2AI program includes a BRIDGE Center with a range of expertise to support interdisciplinary team science. The center will facilitate development of cross-cutting products such as standards harmonization, ethical AI best practices, and workforce development opportunities for the research community.

One of the goals of Bridge2AI is to foster a culture that will identify, assess, and address ethical issues as an integral part of creating AI-ready data sets. Ethical considerations include informed consent, data privacy, bias in data, and its impact on fairness and trustworthiness of AI applications, equity, and justice, and inclusion and transparency in design.

Every component of the Bridge2AI program includes a plan for incorporating diverse perspectives at every step. The BRIDGE Center will serve as a hub for supporting ethical and trustworthy AI development across Bridge2AI with the goal of providing tools, best practices, and resources to address cross-cutting biomedical challenges.

Learn more about Bridge2AI in the press release and video. Find the latest news by visiting the Bridge2AI website and following the @NIH_CommonFund on Twitter.

Top Row (left to right):
Patricia Flatley Brennan, RN, PhD, Director, National Library of Medicine
Michael F. Chiang, MD, Director, National Eye Institute
Eric D. Green, MD, PhD, Director, National Human Genome Research Institute

Bottom Row (left to right):
Helene M. Langevin, MD, Director, National Center for Complementary and Integrative Health
Bruce J. Tromberg, PhD, Director, National Institute of Biomedical Imaging and Bioengineering

Is Age Really Just a Number?

Last week I turned 69! Can you believe that??? This is so amazing to me—how could I be THAT OLD?? Two years ago (when I was just 67!), I shared that…

In midlife, I think I’m where I’m supposed to be, because I feel like I’m 39, think I look like I’m 49, believe I have a career worthy of someone who’s 59, and am approaching the wisdom of someone who’s 69.

So now that I am 69, I still believe all those things are true—particularly the wisdom part. I am wiser about the speed of change, the value of tempering my vision with a dose of realism, and the importance of understanding people clearly. I still feel youthful, look pretty good for a woman my age, and remain proud of my career.

But suppose I want to pick the number that really represents my age. Age is a very important descriptor of patients and research participants. Across all types of clinical research, one of the most common variables collected is a participant’s age. Age is an important indicator of many things human, from physical capabilities that determine their likely response to a treatment, to potential behavioral or mental health challenges. Knowing participants’ ages helps guide the interpretation of research results, allowing scientists and clinicians to determine the relevance of those results to specific groups of people or to better understand the clinical manifestation of a disease. And knowing the age of a participant provides evidence that our NIH studies appropriately engage people across their lifespan.

You might be surprised to know that there are many ways to represent age. For most of us, age is estimated by counting the number of years since our birth. However, for babies, it may be more important to know the number of days, weeks, or months since birth. Some studies compute age as the difference between the date of birth and the date that the data are collected. In fact, in the PhenX Toolkit, a web-based catalogue of expert-provided recommended measurement protocols, there are almost 200 different ways to measure age in a research study. Sometimes information about age is acquired through self-report of the participant, and other times the information is obtained from some existing document like a patient’s clinical record. The PhenX Toolkit is an enumeration of a wide range of measurement approaches and allows for broad coverage in a way that lets a researcher pick the measure that best represents the phenomena of interest to their study.

Over the past decade, NLM has supported the creation, identification, and distribution of Common Data Elements (CDEs). CDEs are specialized ways to measure concepts common to two or more research projects in a manner that is consistent across studies. Using a similar approach to measures similar concepts sounds like a no-brainer, right? It improves the rigor and reproducibility of research and allows data collected in different studies to be grouped together, adding power to the interpretation of research efforts. The COVID-19 pandemic illustrated the value of the common approaches to measuring research concepts by allowing us to track this deadly virus and its manifestations across time and people.

NLM established the NIH CDE Repository to serve as a one-stop location for research programs and for NIH Institutes and Centers to house CDEs and make them available to other researchers. Each record includes the definition of the variable as an indicator of the concept, a way to measure the variable (usually a question-and-answer pair with acceptable responses), and machine-readable codes where possible. Recently, the NIH CDE Repository began supporting an NIH governance process that indicates which of the proposed CDEs that have been received are described with sufficient rigor to be designated as NIH-endorsed. This endorsement helps potential users who are seeking good ways to measure complex concepts. NIH-endorsed CDEs support FAIR (findable, accessible, interoperable, and reusable) data sharing. Adherence to FAIR principles provides high-quality, “computation-ready” data with standardized vocabularies and readable metadata retrievable by identifiers that modernize the NIH data ecosystem. When data are collected consistently across studies using CDEs, it’s possible to integrate data from multiple studies, which can make it easier to get meaningful results. CDEs can also make it easier to reuse data for future research by improving the data quality.

So if I wanted to be “counted” according to the years-alive mode of assessing age, I guess I am 69. But if you really want to know something else, like how happy I am in my career or how I’m feeling, don’t be surprised if I give a different number!

%d bloggers like this: