Using Comparative Genomics to Advance Scientific Discoveries

Guest post by Valerie Schneider, PhD, staff scientist at the National Library of Medicine’s National Center for Biotechnology Information, National Institutes of Health.

In a post from earlier this year, A Journey to Spur Innovation and Discovery, I shared news of an exciting NIH-supported NLM initiative, now known as the NIH Comparative Genomics Resource (CGR). CGR, which supports eukaryotic organisms, is modernizing NIH resources and infrastructure to support research involving non-human organisms. This initiative will improve the data foundational to analyses that rely on comparisons of diverse genomes in NLM databases, increase its connectivity to related content, and facilitate the discovery and retrieval of this information. Just as researchers look to the data from these organisms to teach them about a wide range of fundamental biological processes underpinning human health, NLM relies on the research community to help inform the development and delivery of organism-agnostic core tools and interfaces for CGR so that it can best support these analyses.

Stakeholder feedback and engagement is central to the vision and ethos of the NLM Strategic Plan 2017-2027. Since the plan’s inception, NLM enterprises undertaken in support of our three primary goals have placed heavy emphasis on community connections in both their planning and execution. Likewise, understanding stakeholder needs is a fundamental element of CGR. With more than 19,000 genomes from over 8,500 species (excluding bacteria and viruses) found in our Assembly database, it’s clear that CGR’s user base will hail from a large and diverse collection of research organism communities. Within each community, there is diversity in the role CGR will play due to variability in the amount of genomic sequence available, as well as the existence of organism-specific data resources, such as community knowledge bases. Data consumers, themselves, are a heterogeneous population and represent different levels of research interests, education, bioinformatics expertise, and analysis needs.

CGR is using a multi-tiered and multi-faceted approach to ensure stakeholder requirements are understood and appropriately prioritized throughout the project duration. CGR is working to identify community-supplied genome-related data that can be integrated to enhance content supplied by NLM. Two governance bodies are playing important roles in this effort. A trans-NIH CGR steering committee provides strategic oversight by guiding CGR with respect to the priorities of NIH institutional stakeholders, and an NLM Board of Regents CGR working group is charged with helping engage with the scientific community and enlist them as partners in the development effort. Working group members have expertise in topics relevant to the CGR initiative, such as comparative genomic analysis, emerging large-scale genomics approaches, organism-centered research into general biological or disease processes, biological education, and workforce development.

We are developing a presence for CGR at scientific conferences and workshops to encourage partnerships with members of research communities and connect with attendees. A CGR-related talk given at the BioDiversity Genomics 2021 conference in September introduced a new cloud-based tool for improving genomic quality to be released in 2022 and identified researchers to serve as beta testers. Additional targeted outreach will be held independent of conferences to gather feedback and inform development.

The CGR project utilizes an iterative development process in which user testing is an integral element. Feedback gathered through these testing exercises is incorporated into the next development cycle. This approach ensures we remain engaged with the CGR target audience throughout the project by understanding their needs and providing a resource that is valuable to their research pursuits. For example, recent user testing of a prototype Basic Local Alignment Search Tool (BLAST) database engineered to support sequence queries seeking a broad distribution of organisms in the results taught us about other content that will need to be provided for proper interpretation of results.

NLM is poised to learn great things from our users as part of the CGR project. You can learn more about engagement opportunities by contacting us at info@ncbi.nlm.nih.gov. We value your input as we continue this journey together.

Valerie Schneider, PhD, is the deputy director of Sequence Offerings and the head of the Sequence Plus program. In these roles, she coordinates efforts associated with the curation, enhancement, and organization of sequence data, as well as oversees tools and resources that enable the public to access, analyze, and visualize biomedical data. She also manages NCBI’s involvement in the Genome Reference Consortium, the international collaboration tasked with maintaining the value of the human reference genome assembly.

One thought on “Using Comparative Genomics to Advance Scientific Discoveries

Leave a Reply