Using Comparative Genomics to Advance Scientific Discoveries

Guest post by Valerie Schneider, PhD, staff scientist at the National Library of Medicine’s National Center for Biotechnology Information, National Institutes of Health.

In a post from earlier this year, A Journey to Spur Innovation and Discovery, I shared news of an exciting NIH-supported NLM initiative, now known as the NIH Comparative Genomics Resource (CGR). CGR, which supports eukaryotic organisms, is modernizing NIH resources and infrastructure to support research involving non-human organisms. This initiative will improve the data foundational to analyses that rely on comparisons of diverse genomes in NLM databases, increase its connectivity to related content, and facilitate the discovery and retrieval of this information. Just as researchers look to the data from these organisms to teach them about a wide range of fundamental biological processes underpinning human health, NLM relies on the research community to help inform the development and delivery of organism-agnostic core tools and interfaces for CGR so that it can best support these analyses.

Stakeholder feedback and engagement is central to the vision and ethos of the NLM Strategic Plan 2017-2027. Since the plan’s inception, NLM enterprises undertaken in support of our three primary goals have placed heavy emphasis on community connections in both their planning and execution. Likewise, understanding stakeholder needs is a fundamental element of CGR. With more than 19,000 genomes from over 8,500 species (excluding bacteria and viruses) found in our Assembly database, it’s clear that CGR’s user base will hail from a large and diverse collection of research organism communities. Within each community, there is diversity in the role CGR will play due to variability in the amount of genomic sequence available, as well as the existence of organism-specific data resources, such as community knowledge bases. Data consumers, themselves, are a heterogeneous population and represent different levels of research interests, education, bioinformatics expertise, and analysis needs.

CGR is using a multi-tiered and multi-faceted approach to ensure stakeholder requirements are understood and appropriately prioritized throughout the project duration. CGR is working to identify community-supplied genome-related data that can be integrated to enhance content supplied by NLM. Two governance bodies are playing important roles in this effort. A trans-NIH CGR steering committee provides strategic oversight by guiding CGR with respect to the priorities of NIH institutional stakeholders, and an NLM Board of Regents CGR working group is charged with helping engage with the scientific community and enlist them as partners in the development effort. Working group members have expertise in topics relevant to the CGR initiative, such as comparative genomic analysis, emerging large-scale genomics approaches, organism-centered research into general biological or disease processes, biological education, and workforce development.

We are developing a presence for CGR at scientific conferences and workshops to encourage partnerships with members of research communities and connect with attendees. A CGR-related talk given at the BioDiversity Genomics 2021 conference in September introduced a new cloud-based tool for improving genomic quality to be released in 2022 and identified researchers to serve as beta testers. Additional targeted outreach will be held independent of conferences to gather feedback and inform development.

The CGR project utilizes an iterative development process in which user testing is an integral element. Feedback gathered through these testing exercises is incorporated into the next development cycle. This approach ensures we remain engaged with the CGR target audience throughout the project by understanding their needs and providing a resource that is valuable to their research pursuits. For example, recent user testing of a prototype Basic Local Alignment Search Tool (BLAST) database engineered to support sequence queries seeking a broad distribution of organisms in the results taught us about other content that will need to be provided for proper interpretation of results.

NLM is poised to learn great things from our users as part of the CGR project. You can learn more about engagement opportunities by contacting us at info@ncbi.nlm.nih.gov. We value your input as we continue this journey together.

Valerie Schneider, PhD, is the deputy director of Sequence Offerings and the head of the Sequence Plus program. In these roles, she coordinates efforts associated with the curation, enhancement, and organization of sequence data, as well as oversees tools and resources that enable the public to access, analyze, and visualize biomedical data. She also manages NCBI’s involvement in the Genome Reference Consortium, the international collaboration tasked with maintaining the value of the human reference genome assembly.

Imagination: A Process. Not a Moment

Part 3 of a three-part series discussing the importance of imagination. Part 1 is here and part 2 is here.

Over the past two months, I’ve been sharing my ideas about the importance of cultivating imagination to stimulate innovation. Most of this is great fun, and I hope I’ve enticed you to do some of your own daydreaming, and maybe you’ve begun to see some of the impact in your own efforts. Imagination – the ability to envision that which has never been seen, heard, or experienced – is pleasurable, and adds collateral benefits, such as a reduced tendency to interpret unfamiliar stimuli as a threat, and an improved ability to generate novel solutions on the fly. Imagination doesn’t have to end with an inspirational idea. In this post, I’m encouraging you to consider imagination as a partner to help you implement those inspirational ideas and sustain their impact.

Take a look with me through the lens of imagination to see the impact that your imagination can have on the future of technology. Learn how NLM fosters this creative process and how we continue to support health care innovation with our tools and services.

As I reflect on my five years as the NLM Director, I realize that the most important contribution I can make to NLM extends beyond the generation of new ideas. It’s about building in the financial and human resources, as well as the processes to sustain the change envisioned through those new ideas. I need to share my vision with my leadership team and listen to the ideas of our NLM staff. To do this, I need to stimulate imagination in those around me. Novel ideas must also be evaluated for their fit with NLM’s mission. From there, we can create an implementation pathway, identify responsible parties, and develop a plan of action. Along the way, anticipated and unanticipated glitches may occur, and may require that we take a step back, revise, or recommit to the plan. Eventually, streams of ideas become programs that we sustain or sunset; new opportunities abound, and the process starts over again.

Imagination is my companion.  Cultivating my own imagination improves my ability to learn from others whose world views differ from my own, recognizing the difference not as a threat, but as an alternative. Imagination helps me envision a range of future states, conducting the mental ‘what if we did . . . .’ exercise and engaging others to join me in that exercise. Imagination-fueled innovation helps me determine whether a lack of ‘fit for the mission’ heralds a need to re-think the innovative idea or a recognition that we must re-examine our mission. And building the skill of imagination augments my practical problem-solving skills so that anticipated and unanticipated glitches can be addressed with creative strategies. Finally, imagination contributes to my (and others) abilities to foresee a future without a familiar and much beloved program, as well as one in which a fledgling program becomes a sustainable core of our enterprise.

One of the practical ways we built the capacity for sustaining innovation into the fabric of NLM was through the creation of the NLM Strategic Plan Implementation Council, led by Mike Huerta, PhD, Director of the Office of Strategic Initiatives and Associate Director of the National Library of Medicine. Mike led the development of the NLM Strategic Plan 2017-2027 and leads our ongoing evaluation of the plan and its implementation. But he doesn’t do this alone – he convened a group of 18 staff from across all divisions and all levels within NLM. Once a month this council meets and gathers information from all areas of NLM regarding how the Strategic Plan is guiding our work. The council systematically examines new projects, raises considerations about modifications that may make the plan more useful to us, and provides a forum for ensuring that the cool ideas envisioned in the Strategic Plan realize their full potential for NLM.

When I began this exploration of imagination and innovation, I found myself focused on the spark, the new idea, the act of innovation. As I have reflected over the weeks, highly engaged with my leadership team in a wide range of efforts addressing our core mission and positioning us towards the future, I realized that imagination unaccompanied by strategies of sustainability was foolhardy for the director of a large organization. Yet still, the move from fostering innovation to sustaining innovation does not require one to abandon the effort to imagine; it requires a continuous refreshing of imagination. This leads not only to the initial innovation but to the myriad steps needed to guide the innovation towards its full contribution.  

So – don’t fear that the value of cultivating imagination ends once the inaugural innovation is envisioned – you’ll need that skill all along the journey!

Innovation through Imagination — Envisioning the Future of Technology-Supported Care

Part 2 of a three-part series discussing the importance of imagination.

I’ve been thinking a lot about imagination lately and how essential it is for stimulating innovative approaches to complex problems. We need innovation in health information technology (health IT) now more than ever with what we’ve been through — a global pandemic, rising calls for eliminating racial biases that contribute to health disparities, wildfires, and other perils. Imagination (the ability to envision what one has never seen, experienced, or heard about) helps transfer the recognition of the power and importance of medical informatics into real innovations that can improve the care of patients and reduce clinician burden.

Enormous patient needs for rapid diagnosis and treatment of unfamiliar and unpredictable diseases increasingly tax an overburdened health care system. Biomedical informatics professionals need to rise to the challenge of systems redesign, new architectures that account for distributed data structures, and the almost insatiable need for information in the moment — decision support under immense urgency and uncertainty. I believe that these new challenges require new ways of action.

In a previous blog post, I encouraged nurses to develop the skill of imagination because it

… stimulates innovation through the experience of a mental what-if, unconstrained by the realities of physics or finance. Imagination is a talent that can be learned and refined over time, benefiting from the reinforcement of envisioning that which might be, and using that vision as a test case for that which can be. 

Imagination expands the human repertoire of planning skills, moving beyond reflexive action and problem solving. Reflexive thought may lead to speedy solutions, and effective problem solving may contribute creative solutions that are responsive to identified constraints. I believe we need to meet tomorrow’s challenges now with solutions that will work into the future – a future that is likely to continue to be characterized by uncertainty and urgency. The future calls for creativity to stimulate innovation through imagination. Imagination may hold the key to devising biomedical informatics solutions that are rigorous enough to be relied upon in life-threatening situations, and robust enough to accommodate team approaches to unpredictable needs for innovative care strategies.

Philosopher Edward Casey recognized two types of imagination: spontaneous and controlled. Both are mental activities, engaging our active consciousness. Spontaneous imagination is characterized by surprise and instantaneity, like the playful stories of children or mental woolgathering while sitting in a beautiful garden. Controlled imagination is a purposeful strategy in which you focus on a specific idea or concept, and use mental powers of reasoning and forethought to anticipate future scenarios. While both types of imagination are important for effective design for biomedical informatics innovation, I am encouraging my colleagues to pay particular attention to growing their capacity for spontaneous imagination.

How does one grow the capacity for spontaneous imagination?

Contrary to the fast-paced, ‘get-it-done’ mindset that has characterized much of past years health IT efforts, a measured, slower pace is needed to create the right conditions for spontaneous imagination to emerge. This means intentionally setting aside time, short or long (without distractions or commitments) and placing yourself in a pleasant environment. It’s not necessary to come to this moment with a specific knotty problem or challenge to think through. In fact, such thoughts are likely to hamper the generation of spontaneous ideas. Spontaneous thoughts that may see far removed from your daily pursuits hold great value in training your mind to attend to new ideas and new fascinations. Avoid appraisals and self-criticism – there are many ways to train our mind to be attentive and aware, and setting aside time, perhaps 2-3 times a week, to just let your mind wander is a great start.

Why am I encouraging what sounds like new-age mantras during a time when we need solutions FAST? I am convinced by the research that cultivating open-ended periods of imagination complements already well-honed mental skills of planning and design. Opening your mind to better connect with what feels creative and interesting increases confidence in judgments about what is relevant in a situation. There is some evidence that spontaneous imagination evokes mental processes similar to meditation and results in improved problem solving and creative solution generation. Noted economist, Daniel Kahneman, advocates that decision makers balance the human tendency to think fast with deliberately thinking slowly to make better decisions. Developing the skill of spontaneous imagination is one way to improve one’s ability to think slow.

Fueling innovation through imagination will improve your ability to recognize nuances and triggers in situations, avoiding the pitfalls of reflexive thinking and expanding the design space. Imagination helps the innovator consider “what if . . .” rather than “how to”— defining the future state before designing the pathway to get there and illuminating consequences not previously recognized. Cultivating imagination increases one’s ability to tolerate uncertainty, resisting the impulse towards premature closure, and settling for adequate but potentially less-than-optimal solutions.

NLM does many things to help cultivate imagination-fueled innovation. We provide access to inspirational literature, and through effective use of the features of the My NCBI tool, you can customize your experience based on previous search interests and receive alerts when related articles appear in the biomedical literature. We fund research to discover new ways to help clinicians envision patients’ response to therapeutics. This includes the work of Antonina Mitrofanova, who is developing and sharing, through a web portal, a bioinformatics analytics system that identifies therapeutic resistance and predicts patients at risk of treatment failure. We promote open access to scientific data through our vast genomic and molecular databases, including our Sequence Read Archive, now freely available through commercial cloud services. And, through our Network of the National Library of Medicine, we work to connect communities around the country to research opportunities and trusted health information.

Imagination-fueled innovation will accelerate the design and deployment of biomedical informatics solutions to the challenges of responding to patient needs under increasingly unpredictable and demanding situations, from pandemics to natural disasters. Let’s partner with you to cultivate imagination and be the innovator only you can be!

Making Connections and Enabling Discoverability – Celebrating 30 Years of UMLS

Guest post by NLM staff: David Anderson, UMLS Production Coordinator; Liz Amos, Special Assistant to the Chief Health Data Standards Officer; Anna Ripple, Information Research Specialist; and Patrick McLaughlin, Head, Terminology QA & User Services Unit.

Shortly after Donald A.B. Lindberg, MD was sworn in as NLM Director in 1984, he asked “What is NLM, as a government agency, uniquely positioned to do?” Through conversations with experts, Dr. Lindberg identified a looming question in the field of bioinformatics — How can machines act as if they understand biomedical meaning? At the time, the information necessary to answer this question was distributed across a variety of resources. Very few publicly available tools for processing biomedical text had been developed. NLM had experience with terminology development and maintenance (MeSH – Medical Subject Headings), coordinating distributed systems (DOCLINE), and distributing and providing access to large datasets (MEDLINE) in an era when this was a challenge.

As a national library, NLM was deeply interested in providing good answers to biomedical questions. For these reasons, NLM was uniquely positioned to develop a system — the Unified Medical Language System (UMLS) — that could lay the groundwork for machines to act as if they understand biomedical meaning. This year marks the 30th anniversary of the release of the first edition of the UMLS in November 1990.

Achieving the Unified Medical Language System

The result of a large-scale, NLM-led research and development project, the UMLS began with the audacious goal of helping computer systems behave as if they understand the meaning of the language of biomedicine and health. The UMLS was expected to facilitate the development of systems that could retrieve, integrate, and aggregate conceptually-related information from disparate electronic sources such as literature databases, clinical records, and databanks despite differences in the vocabularies and coding systems used within them, and in the terminology employed by users.  

Betsy Humphreys (left) and Dr. Lindberg (right) tout the release of the Unified Medical Language System in 1990.

Under the direction of Dr. Donald Lindberg, then-Deputy Associate Director for Library Operations, Betsy Humphreys, and a multidisciplinary, international team from academia and the private sector, the UMLS evolved into an essential tool for enabling interoperability, natural language processing, information retrieval, machine learning, and  other data science use cases.

UMLS Knowledge Sources

Central to the UMLS model is the grouping of synonymous names into UMLS concepts and the assignment of broad categories (semantic types) to all those concepts. Since its first release in 1990, NLM has continued to expand and update the UMLS Knowledge Sources based on feedback from testing and use.

The UMLS Metathesaurus was the first biomedical terminology resource organized by concept, and its development had a significant impact on subsequent medical informatics theory and practice. The broad terminology coverage, synonymy, and semantic categorization in the UMLS, in combination with its lexical tools, enable its primary use cases:

  • identifying meaning in text,
  • mapping between vocabularies, and
  • improving information retrieval.

The growing increase in UMLS use over the past decade reflects broad developments in health policy, including the designation of SNOMED CT, LOINC, and RxNorm (three component vocabularies included in the UMLS Metathesaurus) as U.S. national standards for clinical data for quality improvement payment programs such as CMS’s Promoting Interoperability Programs (previously known as Meaningful Use). Many UMLS source vocabularies are also referenced in the United States Core Data for Interoperability (USCDI). Researchers continue to rely on the UMLS as a knowledge base for natural language processing and data mining. The UMLS community of users has developed several tools that enhance and expand the capabilities of the UMLS.

Celebrating 30 Years

Thirty years after the initial release of the UMLS Knowledge Sources, the UMLS resources continue to be of benefit to millions of people worldwide. The UMLS is used in NLM flagship applications such as PubMed and ClinicalTrials.gov. Additionally, some researchers and system developers use the UMLS to build or enhance electronic resources, clinical data warehouses, components of electronic health record systems, natural language processing pipelines, and test collections. UMLS resources are being used primarily as intended, to facilitate the interpretation of biomedical meaning in disparate electronic information and data in many different computer systems serving scientists, health professionals, and the public.

The Journal of the American Medical Informatics Association is commemorating the 30th UMLS anniversary with a special focus issue dedicated to the memory of Dr. Lindberg (1933–2019) that also includes information on current research and applications, broader impacts, and future directions of the UMLS.

Upon her retirement from NLM in 2017, Betsy Humphreys remarked that “systems that get used, get better.” As the UMLS enters its fourth decade, a review of UMLS production methods and priorities is underway with the same high standard goals with which it started – trailblazing into the future to improve biomedical information storage, processing and retrieval.

As we reflect on this important milestone, we want to thank stakeholders, like you, who have provided feedback over the years to help us make the UMLS leaner, stronger, and more useful.

Top row: David Anderson, UMLS Production Coordinator and Liz Amos, Special Assistant to the Chief Health Data Standards Officer

Bottom Row: Anna Ripple, Information Research Specialist and Patrick McLaughlin, Head, Terminology QA & User Services Unit

Asking the right questions and receiving the most useful answers

Guest post by Lawrence M. Fagan, Associate Director (retired), Biomedical Informatics Training Program, Stanford University.

As online resources proliferate, it becomes harder to figure out which resources—and which parts of those resources—will best answer patients’ questions about their medical care. Patients have access to multiple websites that summarize information about a particular disease, myriad patient communities, and many online research databases.

This resource overload problem isn’t new, however. As more and more data became available in the Intensive Care Units of the 1980s, it grew increasingly difficult to determine the most important measurement to track for optimal care.

In short, more information isn’t necessarily the best solution when it comes to answering patient questions.

After a career in informatics, I now moderate an online community for patients with a particular subtype of lymphoma. Many questions that arise in the group can easily be answered by reviewing existing online content. Health librarians are excellent resources to help guide patients to the correct resources and articles.  However, some queries are less straightforward than others, such as: “What is the one thing you wished you knew before the procedure?” Rather than asking for a recitation of the steps in the procedure, this question is asking what was unexpected—or what step would have benefited from more patient or caregiver preparation. Correspondingly, these types of questions are hard to organize, store, and retrieve from patient-oriented databases.

Sometimes, the community of patients can recognize patterns that escape the notice of medical providers. For example, a lymphoma patient may complain of repeated sinus infections. It’s worth noting that patients often turn to their primary care provider to treat their sinus infections, and those visits may lead to antibiotic prescriptions. In this scenario, group members have pointed out the potential link between treatment with the drug Rituximab and a decrease in the body’s immunoglobulin levels. This connection leads to suggestions to explore an alternative treatment for chronic sinus infections (in this special context) using immunoglobulin replacement therapy rather than antibiotics.

Specialized online communities can also provide help with detailed care issues, including the treatment of side effects for uncommonly used drugs with which local healthcare providers might not be familiar.

Online communities can also suggest researching databases to answer patient questions. ClinicalTrials.gov helps locate experimental treatments for specific medical conditions. Some community discussions about trials go beyond what’s included in the ClinicalTrials.gov database. For instance, group members may discuss the optimal order of clinical trials in a specific medical area, based on an analysis of the inclusion criteria for the various trials. In addition, there are ancillary questions about trial logistics that aren’t found in the database, such as, “I live in the San Francisco area—is it feasible to participate in Trial X at City of Hope in Southern California?” Setting up comprehensive links between the clinical databases and discussions in patient communities would help patients access the answers to their questions more efficiently.

The answers to these specialized questions are often found in the archives of online communities or in the memories of group participants. Yet, it is not easy to find the right community for a particular medical problem, and in my understanding there is no central repository of links to online communities. Moreover, while many community links can be found in MedlinePlus, static links to community websites often become stale, as sites may migrate locations over time. Some of the ACOR cancer communities, for example, have migrated to SmartPatients.com.  As patients find a community of interest, it is important that they determine whether the conversations are ongoing and whether the participants are knowledgeable and supportive. The Mayo Clinic offers a short discussion detailing the pros and cons of support groups.

Researchers have examined patient and clinician information needs for more than a quarter century. These models, however, have only rarely been incorporated into information retrieval systems. One successful example (aimed at providers) is the use of “clinical queries” in PubMed, designed for searching the scientific literature. This brings us to a critical question: What would it take to reengineer the patient-oriented retrieval systems so that these focused queries drive most patient sites?

For now, we have communities of patients and dedicated professionals who are ready and willing to help point to the most useful answers.

Please note: The mention of any commercial or trade name is for information and does not imply endorsement on the part of the author or the National Library of Medicine.

Many thanks to Dave deBronkart, Janet Freeman-Daily, Robin Martinez, Tracie Tavel, and Roni Zeiger who reviewed earlier versions of this blog post.

Outdoor portrait of Lawrence M. Fagan.Lawrence Fagan, MD, PhD, retired in 2012 from his role as Associate Director of the Stanford University Biomedical Informatics Training Program. He is a Fellow of the American College of Medical Informatics. His current interests are in patient engagement, precision health, and preventing medical errors.