Using Comparative Genomics to Advance Scientific Discoveries

Guest post by Valerie Schneider, PhD, staff scientist at the National Library of Medicine’s National Center for Biotechnology Information, National Institutes of Health.

In a post from earlier this year, A Journey to Spur Innovation and Discovery, I shared news of an exciting NIH-supported NLM initiative, now known as the NIH Comparative Genomics Resource (CGR). CGR, which supports eukaryotic organisms, is modernizing NIH resources and infrastructure to support research involving non-human organisms. This initiative will improve the data foundational to analyses that rely on comparisons of diverse genomes in NLM databases, increase its connectivity to related content, and facilitate the discovery and retrieval of this information. Just as researchers look to the data from these organisms to teach them about a wide range of fundamental biological processes underpinning human health, NLM relies on the research community to help inform the development and delivery of organism-agnostic core tools and interfaces for CGR so that it can best support these analyses.

Stakeholder feedback and engagement is central to the vision and ethos of the NLM Strategic Plan 2017-2027. Since the plan’s inception, NLM enterprises undertaken in support of our three primary goals have placed heavy emphasis on community connections in both their planning and execution. Likewise, understanding stakeholder needs is a fundamental element of CGR. With more than 19,000 genomes from over 8,500 species (excluding bacteria and viruses) found in our Assembly database, it’s clear that CGR’s user base will hail from a large and diverse collection of research organism communities. Within each community, there is diversity in the role CGR will play due to variability in the amount of genomic sequence available, as well as the existence of organism-specific data resources, such as community knowledge bases. Data consumers, themselves, are a heterogeneous population and represent different levels of research interests, education, bioinformatics expertise, and analysis needs.

CGR is using a multi-tiered and multi-faceted approach to ensure stakeholder requirements are understood and appropriately prioritized throughout the project duration. CGR is working to identify community-supplied genome-related data that can be integrated to enhance content supplied by NLM. Two governance bodies are playing important roles in this effort. A trans-NIH CGR steering committee provides strategic oversight by guiding CGR with respect to the priorities of NIH institutional stakeholders, and an NLM Board of Regents CGR working group is charged with helping engage with the scientific community and enlist them as partners in the development effort. Working group members have expertise in topics relevant to the CGR initiative, such as comparative genomic analysis, emerging large-scale genomics approaches, organism-centered research into general biological or disease processes, biological education, and workforce development.

We are developing a presence for CGR at scientific conferences and workshops to encourage partnerships with members of research communities and connect with attendees. A CGR-related talk given at the BioDiversity Genomics 2021 conference in September introduced a new cloud-based tool for improving genomic quality to be released in 2022 and identified researchers to serve as beta testers. Additional targeted outreach will be held independent of conferences to gather feedback and inform development.

The CGR project utilizes an iterative development process in which user testing is an integral element. Feedback gathered through these testing exercises is incorporated into the next development cycle. This approach ensures we remain engaged with the CGR target audience throughout the project by understanding their needs and providing a resource that is valuable to their research pursuits. For example, recent user testing of a prototype Basic Local Alignment Search Tool (BLAST) database engineered to support sequence queries seeking a broad distribution of organisms in the results taught us about other content that will need to be provided for proper interpretation of results.

NLM is poised to learn great things from our users as part of the CGR project. You can learn more about engagement opportunities by contacting us at info@ncbi.nlm.nih.gov. We value your input as we continue this journey together.

Valerie Schneider, PhD, is the deputy director of Sequence Offerings and the head of the Sequence Plus program. In these roles, she coordinates efforts associated with the curation, enhancement, and organization of sequence data, as well as oversees tools and resources that enable the public to access, analyze, and visualize biomedical data. She also manages NCBI’s involvement in the Genome Reference Consortium, the international collaboration tasked with maintaining the value of the human reference genome assembly.

Request for Public Comment: Seeking Input on Nationwide AI Research Resource Implementation Plan

Guest post by Lynne E. Parker, PhD, Director of the White House National Artificial Intelligence Initiative Office, and Erwin Gianchandani, PhD, National Science Foundation Senior Advisor for Translation, Innovation, and Partnerships

The White House Office of Science and Technology Policy and National Science Foundation are looking for your input to shape the work of the National Artificial Intelligence Research Resource (NAIRR) Task Force. This Task Force is taking on a critically important initiative – building an implementation plan for a national infrastructure that would democratize access to artificial intelligence (AI) research and development (R&D).  

As directed by Congress in the National AI Initiative Act of 2020, the Task Force is serving as a Federal advisory committee to help create a blueprint for the NAIRR, which is envisioned as a shared computing and data infrastructure that would provide AI researchers and students across all scientific disciplines with access to computational resources, high-quality data, educational tools, and user support. This capability would help make AI R&D accessible to all Americans by lowering the barriers to entry for traditionally underserved communities, institutions, and regions. It would also fuel innovation by making it easier than ever before for Americans to pursue bold, visionary applications for AI.

The Task Force will provide recommendations for establishing and sustaining the NAIRR, including technical capabilities, governance, administration, and assessment, as well as requirements for security, privacy, civil rights, and civil liberties. The Task Force will submit two reports to Congress presenting a comprehensive strategy and implementation plan: an interim report in May 2022 and final report in November 2022.

To get this right, we want to tap into the deep technical expertise of the community and bring in a range of perspectives. We invite you to submit a response to our Request for Information before the comment period closes on October 1, and ask that you spread the word. This effort could set us on the path to transform our nation’s ability to harness AI across fields of science and engineering and economic sectors, and your insights could help shape our approach.

We appreciate your contributions and look forward to receiving your input.

Dr. Parker is the Founding Director of the National Artificial Intelligence (AI) Initiative Office and Assistant Director of AI in the White House Office of Science and Technology Policy (OSTP).  In these roles, she leads national AI policy efforts and coordinates AI activities across the Federal agencies in support of the National AI Initiative.  Dr. Parker is a professor of computer science at the University of Tennessee, on assignment to OSTP. She received her PhD from the Massachusetts Institute of Technology and is a Fellow of the American Association for the Advancement of Science and Institute of Electrical and Electronics Engineers.

Dr. Gianchandani is the National Science Foundation (NSF) Senior Advisor for Translation, Innovation, and Partnerships. For six years, he was the NSF Deputy Assistant Director for Computer and Information Science and Engineering. In this role, he contributed to the leadership and management of NSF’s CISE directorate, including formulation and implementation of the directorate’s $1 billion annual budget, strategic and human capital planning, and oversight of day-to-day operations. He received a BS in computer science and MS and PhD degrees in biomedical engineering from the University of Virginia.

Going Back to School Safely

Guest post by Diana W. Bianchi, MD, Director, Eunice Kennedy Shriver National Institute of Child Health and Human Development at the National Institutes of Health

Originally released on the Director’s Corner blog.

As schools across the United States begin to resume full-time, in-person education, I am hopeful that this academic year may be a more typical one. The in-person school environment and the wide range of services offered there are critical for the development and well-being of our nation’s young people. Without in-person schooling, many children miss out on school-based meals, speech or occupational therapy, and after-school programs. Loss of such services disproportionately affects minorities, socially and economically disadvantaged children, children with disabilities, and those with medical complexities.

Generating robust scientific data to inform policies to return children to the classroom safely and equitably during the ongoing COVID-19 pandemic is of paramount importance not only for children, but also to allow their parents to return to work. We now have safe and effective vaccines available for adults and children ages 12 years and older, as well as established public health measures to prevent transmission of SARS-CoV-2, the virus that causes COVID-19. Yet the emergence of the more transmissible delta SARS-CoV-2 variant and the rising COVID-19 cases across the country remind us that we must remain vigilant and adaptable to changing circumstances.

NICHD manages the Safe Return to School Diagnostic Testing Initiative, launched earlier this year as part of the NIH Rapid Acceleration of Diagnostics-Underserved Populations (RADx-UP) program. This initiative addresses the needs of children with unequal access to COVID-19 testing and who face barriers to attending school remotely, including those who do not have adequate equipment, internet access, or adult supervision at home.

The RADx-UP Return to School projects combine frequent COVID-19 testing with proven safety measures to reduce the spread of SARS-CoV-2. They also are exploring the influence of vaccination for eligible staff and students, addressing vaccine hesitancy, and seeking information on circulating SARS-CoV-2 variants and breakthrough infections. Funding for the first set of projects was announced in April, and additional projects were funded in July. Currently, 16 projects are ongoing at schools across the country, including public, charter, tribal, early education, and special education schools. Participating schools serve racially and ethnically diverse populations, including African Americans, American Indians/Alaska Natives, Latinos/Latinas, Asian Americans, and Native Hawaiians and other Pacific Islanders. Critically, several projects include children with medical complexities and/or intellectual and developmental disabilities who may not be able to use COVID-19 mitigation measures such as wearing masks or social distancing.

On August 9, NIH hosted a virtual workshop that brought together RADx-UP Return to School investigators and other researchers conducting school-based research on COVID-19 diagnostic testing to present data acquired to date, learn from each other, and support the safe return of children to in-person school.

Among other topics, workshop participants discussed the critical importance of engaging local communities in this research. Communities need access to the most up-to-date scientific evidence to weigh the benefits and challenges of implementing different COVID-19 mitigation strategies. Scientists need community input to understand the attitudes, knowledge, and barriers that may influence individual choices to enroll in COVID-19 testing programs and return to in-person learning. By working together, we can ensure a safe return to school for all.

Common Data Elements: Increasing FAIR Data Sharing

Guest post by Carolina Mendoza-Puccini, MD, CDE Program Officer, Division of Clinical Research, National Institute of Neurological Disorders and Stroke (NINDS) and Kenneth J. Wilkins, PhD, Mathematical Statistician, Biostatistics Program and Office of Clinical Research Support, Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)

Previous posts published in Musings from the Mezzanine have explained the importance of health data standards and their role as the backbone of interoperability. Common Data Elements (CDEs) are a type of health data standard that is commonly used and reused in both clinical and research settings. CDEs capture complex phenomena, like depression, or recovery, through standardized, well defined questions (variables) that are paired with a set of allowable responses (values) that are used in a standardized way across studies or trials.

CDEs provide a way to standardize data collection—ensuring that data are collected consistently, and otherwise-avoidable variability is minimized.

Where possible, CDEs are linked to controlled vocabularies and terminologies commonly used in health care, such as SNOMED-CT and LOINC, and CDEs can provide a route to harmonize with non-prospective clinical research designs. Such links leverage common data entities, like clinical concepts underlying common data models, to align evidence of clinical studies with evidence from ‘real-world data’ such as electronic health records (EHRs), mobile/wearables, and patient-reported outcomes, what’s become known in recent years as ‘real world evidence’.

Importance of CDEs for Interoperability and Consistency of Evidence Across Settings

FAIR Data Principles (Source: National Institute of Environmental Health Sciences)

NIH’s response to the COVID-19 pandemic highlighted the importance of developing CDEs that can be used and endorsed across NIH-funded COVID-19 research so that resulting, urgently-needed data would be FAIR: Findable, Accessible, Interoperable, and Reusable.

Many groups across NIH identified, or are in the process of identifying, CDEs that are both COVID-19-related, and related to the needs of specific research projects such as NIH’s Disaster Research Response (DR2) program, Rapid Acceleration of Diagnostics—Underserved Populations (RADx-UP) and Researching COVID to Enhance Recovery (RECOVER) initiatives. There was also a need to develop a process for indicating NIH endorsement of CDEs that meet meaningful criteria, are made available through a common discovery platform (the NIH CDE Repository), and avoid duplicating functions of resources that already exist.

NIH’s Scientific Data Council charged a group of members of the NIH CDE Task Force, the CDE Governance Committee (Governance Committee), to develop this endorsement process based on the following criteria:

  • Clear definition of variable/measure with prompt and response 
  • Documented evidence of reliability and validity, where applicable
  • Human- and machine-readable formats
  • Recommended/designated by a recognized NIH body (Institute, Center, Office, Program/Project Committee, etc.)
  • Clear Licensing and Intellectual Property status (prefer Creative Commons or open source)

The role of the Governance Committee is to assure that the evidence of acceptability, reusability, and validity is properly presented and documented.

Submission of CDEs for Endorsement

The Governance Committee determined that CDEs will be submitted either as “Individual CDEs” or “Bundles.” Individual CDEs can be collected separately. Bundles are a group of questions or variables with specified sets of allowable responses that are grouped together and used as a set. Bundles may include standardized instruments, such as the Patient Health Questionnaire 9 (PHQ-9) Depression Scale, or a number of questions that must be collected as a group to maintain their meaning as individual elements (e.g., demographic features).

The Governance Committee will conduct a review of submissions based on the endorsement criteria approved. Once endorsed, Individual CDEs and possibly Bundles will be published in the NIH CDE Repository with an endorsement badge.

Reuse of NIH-endorsed CDEs Going Forward

With these governance-endorsed additions to the NIH CDE Repository, its role as a unified resource for common data entities and semantic concepts (the conceptual underpinnings of common data elements themselves) will lay the groundwork for researchers (NIH-funded or otherwise) to plan on interoperable data features. With the endorsement criteria and NLM-led efforts to enhance the NIH CDE Repository as an NIH-wide research resource, its role can grow along with those of related public and private sector alignment efforts. These include standards ranging from the United States Core Data for Interoperability for routine health care to the FDA submission standards within the Clinical Data Interchange Standards Consortium (CDISC) for treatments and preventive therapeutics, like vaccines, that we all rely upon for quality care.

Features to the NIH CDE Repository will continue to be enhanced—whether to search for semantically-related concepts or to highlight subtle distinctions among closely related CDEs. The NIH CDE Repository can also serve as a clearinghouse for interoperability in data from across a broad range of research, from prospectively-designed studies to those making use of data captured in the course of clinical care (such as EHRs) yet repurposed for real-world evidence.

In the wake of lessons learned from the most challenging aspects of early COVID-19 research, CDE use can increase FAIR data sharing across the research ecosystem in the near-seamless fashion just as envisioned by legislators when they enacted the 21st Century Cures Act. CDE governance processes are poised to adapt accordingly and to keep working toward greater data interoperability within this post-COVID-19 pandemic era.

CDE Governance Committee Members: Matt McAuliffe (Center for Information Technology), Kerry Goetz (National Eye Institute), Denise Warzel (National Cancer Institute), Erin Ramos (National Human Genome Research Institute), Jyoti Dayal (National Human Genome Research Institute), Deborah Duran (National Institute on Minority Health and Health Disparities), Janice Knable (National Cancer Institute). Chairs: Carolina Mendoza-Puccini (National Institute of Neurological Disorders and Stroke) and Kenneth Wilkins (National Institute of Diabetes and Digestive and Kidney Diseases). Ex Officio members: Robin Taylor, Mike Huerta, Lisa Federer (National Library of Medicine). Collaborator: Greg Farber (National Institute of Mental Health).

To learn more about the NIH Common Data Elements (CDE) Repository, watch this short video.

Dr. Mendoza-Puccini leads the NINDS Common Data Elements Project and is a Program Officer at the NINDS Division of Clinical Research.

Dr. Wilkins is a member of both the NIH-wide and NIDDK-specific Data Science and Data Management Working Groups and engages with researchers from across intramural and extramural programs on quantitative aspects of design and analysis.

40 Years of Progress: It’s Time to End the HIV Epidemic

Guest post by Maureen M. Goodenow, PhD, Associate Director for AIDS Research and Director, Office of AIDS Research, National Institutes of Health

On June 5th, the National Institutes of Health (NIH) Office of AIDS Research (OAR) joined colleagues worldwide to commemorate the 40th anniversary of the landmark 1981 Centers for Disease Control and Prevention (CDC) Morbidity and Mortality Weekly Report (MMWR) that first recognized the syndrome of diseases later named AIDS. June 5th also marks HIV Long-Term Survivors Awareness Day. 

Forty years ago, the CDC’s MMWR described five people who were diagnosed with Pneumocystis carinii pneumonia—catalyzing a global effort that led to the identification of AIDS, and later, the virus that causes AIDS.

Over the years, much of the progress to guide the response to HIV has emerged from research funded by the NIH, and helped turn a once fatal disease into a now manageable chronic illness. This progress is attributable in large part to the nation’s longstanding HIV leadership and contributions at home and abroad.

NIH is taking action to recognize the milestones achieved through science, pay tribute to more than 32 million people who have died from AIDS-related illness globally (including 700,000 Americans), and support the goal of Ending the HIV Epidemic in the U.S. (EHE) and worldwide. OAR is coordinating with NIH Institutes, Centers, and Offices (ICOs) to share messaging that will continue through NIH’s World AIDS Day commemoration on December 1, 2021.

The NIH remains committed to supporting basic, clinical, and translational research to develop cutting-edge solutions for the ongoing challenges of the HIV epidemic. The scientific community has achieved groundbreaking advances in the understanding of basic virology, human immunology, and HIV pathogenesis and has led the development of safe, effective antiretroviral medications and effective interventions to prevent HIV acquisition and transmission.

Nevertheless, HIV remains a serious public health issue.

NIH established the OAR in 1988 to ensure that NIH HIV/AIDS research funding is directed at the highest priority research areas, and to facilitate maximum return on the investment. OAR’s mission is accomplished in partnership within the NIH through the ICs that plan and implement specific HIV programs or projects, coordinated by the NIH HIV/AIDS Executive Committee. As I reflect on our progress against HIV/AIDS, I would like to note the collaboration, cooperation, innovation, and other activities across the NIH ICOs in accelerating HIV/AIDS research.

Key scientific advances using novel methods and technologies have emerged in the priority areas of the NIH HIV research portfolio. Many of these advances stem from NIH-funded efforts, and all point to important directions for the NIH HIV research agenda in the coming years, particularly in the areas of new formulations of current drugs, new delivery systems, dual use of drugs for treatment and prevention, and new classes of drugs with novel strategies to treat viruses with resistance to current drug regimens.

Further development of long-lasting HIV prevention measures and treatments remains at the forefront of the NIH research portfolio on HIV/AIDS research.

NIH-funded investigators continue to uncover new details about the virus life cycle, which is crucial for the development of next generation HIV treatment approaches. Additionally, the NIH is focused on developing novel diagnostics to detect the virus as early as possible after infection.

Results in the next two years from ongoing NIH-supported HIV clinical trials will have vital implications for HIV prevention, treatment, and cure strategies going forward. For example, two NIH-funded clinical trials for HIV vaccines, Imbokodo and Mosaico, are evaluating an experimental HIV vaccine regimen designed to protect against a wide variety of global HIV strains. These studies comprise a crucial component of the NIH’s efforts to end the HIV/AIDS epidemic.

As we close on four decades of research, I look forward to the new advances aimed at prevention and treatment in the years to come.

You can play a role in efforts to help raise awareness and get involved with efforts to end the HIV epidemic. Visit OAR’s 40 Years of Progress: It’s Time to End the HIV Epidemic webpage, and use the toolkit of ready-to-go resources.

Dr. Goodenow leads the OAR in coordinating the NIH HIV/AIDS research agenda to end the HIV pandemic and improve the health of people with HIV. In addition, she is Chief of the Molecular HIV Host Interactions Laboratory at the NIH.