Using Large Datasets to Improve Health Outcomes

Guest post by Lyn Hardy, PhD, RN, Program Officer, Division of Extramural Programs, National Library of Medicine, National Institutes of Health.

Before the advent of algorithms to determine the best way to treat and prevent heart disease, a health care provider looking for best practices for their patients may not have had the resources to find that best method. Today, health care decision-making for individuals and their health care providers is made easier by predictive and preventive models, which were developed with the goal of guiding the decision-making process. One example is the Patient Level Prediction of Clinical Outcomes and Cost-Effectiveness project led by Columbia University Health Sciences.

These models are created using computer algorithms (a set of rules for problem-solving) based on data science methods that analyze large amounts of data. While computers can analyze facts within the data, they rely on human programming to define what pieces of data or what data types are important to include in the analysis to create a valid algorithm and model. The results are translated into information that health care providers can use to understand patterns and provide methods for predicting and preventing illness. If a health care provider is looking for ways to prevent heart disease, an accurate model might describe methods—like exercise, diet, and mindfulness practices—that can achieve that goal.

Algorithms and models have benefited the world by using special data science methods and techniques to understand patterns that guide clinical decisions, but identifying data used in their development still requires practitioners to be conscious of the results. Research has shown that algorithms and models can be misleading or biased if they do not account for population differences like gender, race, and age. These biases, also known as algorithmic fairness, can adversely affect the health of underserved populations by not giving individuals and health care providers information specific to and that directly addresses their diversity. An example of potential algorithm bias is creating an algorithm to treat hypertension without including variated treatments for women or considering life-related stress or the environment.

Researchers are focusing on methods to create fair and equitable algorithms and models to provide all populations with the best and most appropriate health care decisions. Researchers in our NLM Extramural Programs analyze this data through NLM funding opportunities that foster scientific inquiry so we better understand algorithmic effects on minority and marginalized populations. Some of those funding opportunities include NLM Research Grants in Biomedical Informatics and Data Science (R01 Clinical Trial Optional) and the NIH Research Project Grant (Parent R01 Clinical Trial Not Allowed).

NLM is interested in state-of-the-art methods and approaches to address problems using large health data sets and tools to analyze them. Specific areas of interest include:

  • Developing and testing computational or statistical approaches to apply to large or merged health data sets containing human and non-human data, with a focus on understanding and characterizing the gaps, errors, biases, and other limitations in the data or inferences based on the data.
  • Exploring approaches to correct these biases or compensate for missing data, including introducing debiasing techniques and policies or using synthetic data.
  • Testing new statistical algorithms or other computational approaches to strengthen research designs using specific types of biomedical and social/behavioral data.
  • Generating metadata that adequately characterizes the data, including its provenance, intended use, and processes by which it was collected and verified.
  • Improving approaches for integrating, mining, and analyzing health data in a way that preserves that data’s confidentiality, accuracy, completeness, and overall security.

These funding opportunities encourage inquiry into algorithmic fairness to improve health care for all individuals, especially those who are underserved. By using new research models that account for diverse populations, we will be able to provide data that will support the best treatment outcomes for everyone.

Dr. Hardy’s work and expertise focus on using health informatics to improve public health and health care decision-making. Dr. Hardy has held positions as a researcher and academician and is active in national informatics organizations. She has written and edited books on informatics and health care.

Informing Success from the Outside In: Introducing the NLM Board of Regents CGR Working Group

Guest post by Valerie Schneider, PhD, staff scientist at the National Library of Medicine (NLM) National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), and Kristi Holmes, PhD, Director of Galter Health Sciences Library & Learning Center and Professor of Preventive Medicine at Northwestern University Feinberg School of Medicine.

Last year, we described how NLM is developing the NIH Comparative Genomics Resource (CGR)—a project that offers content, tools, and interfaces for genomic data resources associated with eukaryotic research organisms—in two blog posts:

Eukaryote refers to any single-celled or multicellular organisms whose cell contains a distinct and membrane-bound nucleus. Since eukaryotes all likely evolved from the same common ancestry, studying them can grant us insight into how other eukaryotes—including those in humans—work and makes CGR and its resources that much more important to eukaryotic research.

CGR aims to:

  • Promote high-quality eukaryotic genomic data submission.
  • Enrich NLM’s genomic-related content with community-sourced content.
  • Facilitate comparative biological analyses.
  • Support the development of the next generation of scientists.

Since our last two posts, the team at NCBI has been hard at work making important technical and content updates to and socializing CGR’s suite of tools. For instance, they published new webpages that organize genome-related data by taxonomy, making it available for browsing and immediate download. They also created the ClusteredNR Database, a new database for the Basic Local Alignment Search Tool (BLAST), to provide results with greater taxonomic context for sequence searches, and incorporated new gene information from the Alliance of Genome Resources, an organization that unites data and information for model organisms’ unique aspects, into Gene. NCBI is also engaging with genomics communities to understand their needs and requirements for comparative genomics through the NLM Board of Regents Comparative Genomics Working Group.

The working group is lending their perspective and extensive expertise to the project, activities that are essential to CGR’s success and development. We have charged working group members with guiding the development of a new approach to scientific discovery that relies on genomic-related data from research organisms, helping project teams keep pace with changes in the field, and understanding the scientific community’s needs and expectations for key functionalities. To do this, working group members help NLM set development priorities such as exploring CGR’s integration with existing infrastructures and related workforce development opportunities.

Projects like CGR highlight how critical interdisciplinary collaboration is to modern research and how success requires community perspectives and involvement. Working group members will be sharing more information about this project at upcoming conferences and in biomedical literature, and our team at NCBI will also share events and resources through our NIH Comparative Genomics Resource website.

If you are a member of a model organism community, are working on emerging eukaryotic research models, or support eukaryotic genomic data—whether you are a researcher, educator, student, scholarly society member, librarian, data scientist, database resource manager, developer, epidemiologist, or other stakeholder in our progress—we encourage you to reach out and get involved. Here are a few suggestions:

  • Invite us to join you at a conference, teach a workshop, partner on a webinar, or discuss other ideas you may have to foster information sharing and feedback.
  • Use and share CGR’s suite of tools and share your feedback.
  • Be on the lookout for project updates and events on the CGR website or follow @NCBI on Twitter.

We’re always excited to get feedback through CGR listening sessions and user testing for tool and resource updates. Email cgr@nlm.nih.gov to learn all the ways you can participate.

Thank you to the members of the NLM Board of Regents CGR Working Group!

Alejandro Sanchez Alvarado, PhD

Executive Director and Chief Scientific Officer
Priscilla Wood Neaves Chair in the Biomedical Sciences
Stowers Institute for Medical for Medical Research

Hannah Carey, PhD
Professor, Department of Comparative Biosciences, School of Veterinary Medicine
University of Wisconsin-Madison

Wayne Frankel, PhD
Professor, Department of Genetics & Development
Director of Preclinical Models, Institute of Genomic Medicine
Columbia University Medical Center

Kristi L. Holmes, PhD (Chair)
Director, Galter Health Services Library & Learning Center
Professor of Preventive Medicine (Health & Biomedical Informatics)
Northwestern University Feinberg School of Medicine

Ani W. Manichaikul, PhD
Associate Professor, Center for Public Health Genomics
University of Virginia School of Medicine

Len Pennacchio, PhD
Senior Scientist
Lawrence Berkeley National Laboratory

Valerie Schneider, PhD (Executive Secretary)
Program Head, Sequence Enhancements, Tools and Delivery (SeqPlus)
HHS/NIH/NLM/NCBI

Kenneth Stuart, PhD
Professor, Center of Global Infectious Disease Research
Seattle Children’s Research Institute

Tandy Warnow, PhD
Grainger Distinguished Chair in Engineering
Associate Head of Computer Science
University of Illinois, Champaign-Urbana

Rick Woychik, PhD (NIH CGR Steering Committee Liaison)
Director, National Institute of Environmental Health Sciences (NIEHS) and the National Toxicology Program (NTP)

Cathy Wu, PhD
Unidel Edward G. Jefferson Chair in Engineering and Computer Science
Director, Center for Bioinformatics & Computational Biology
Director, Data Science Institute
University of Delaware

Dr. Schneider is the deputy director of Sequence Offerings and the head of the Sequence Plus program. In these roles, she coordinates efforts associated with the curation, enhancement, and organization of sequence data, as well as oversees tools and resources that enable the public to access, analyze, and visualize biomedical data. She also manages NCBI’s involvement in the Genome Reference Consortium, which is the international collaboration tasked with maintaining the value of the human reference genome assembly.

Dr. Holmes is dedicated to empowering discovery and equitable access to knowledge through the development of computational and social architectures to support these goals. She also serves on the leadership team of the Northwestern University Clinical and Translational Sciences Institute.

Bridging the Resource Divide for Artificial Intelligence Research

This blog post is by Lynne Parker, Director, National AI Initiative Office and was originally posted on the White House Office of Science and Technology Policy blog. The Office of Science and Technology Policy and the National Science Foundation are seeking comments on the initial findings and recommendations contained in the interim report of the National Artificial Intelligence Research Resource (NAIRR) Task Force (“Task Force”) and particularly on potential approaches to implement those recommendations. We encourage you to read the RFI and submit comments on Implementing Initial Findings and Recommendations of the National Artificial Intelligence Research Resource Task Force by June 30, 2022.

Artificial Intelligence (AI) is transforming our world. The field is an engine of innovation that is already driving scientific discovery, economic growth, and new jobs. AI is an integral component of solutions ranging from those that tackle routine daily tasks to societal-level challenges, while also giving rise to new challenges necessitating further study and action. Most Americans already interact with AI-based systems on a daily basis, such as those that help us find the best routes to work and school, select the items we buy, and ask our phones to remind us of upcoming appointments.

Once studied by few, AI courses are now among the most popular across America’s universities. AI-based companies are being founded and scaled at a rapid rate. Worldwide AI-related research publications and patent applications continue to climb. 

However, this growth in the importance of AI to our future and the size of the AI community obscures the reality that the pathways to participate in AI research and development (R&D) often remain limited to those with access to certain essential resources. Progress at the current frontiers of AI is often tied to the use of large volumes of advanced computational power and data, and access to those resources today is too often limited to large technology companies and well-resourced universities. Consequently, the breadth of ideas and perspectives incorporated into AI innovations can be limited and lead to the creation of systems that perpetuate biases and other systemic inequalities.

This growing resource divide has the potential to adversely skew our AI research ecosystem, and in the process, threaten our Nation’s ability to cultivate an AI research community and workforce that reflects America’s rich diversity – and harness AI in a manner that serves all Americans. To prevent unintended consequences or disparate impacts from the use of AI, it matters who is doing the AI research and development.

Established in June 2021 pursuant to the National AI Initiative Act of 2020, the National AI Research Resource (NAIRR) Task Force has been seeking to address this resource divide. As a Congressionally-chartered Federal advisory committee, the NAIRR Task Force has been developing a plan for the establishment of a National AI Research Resource that would democratize access to AI R&D for America’s researchers and students. The NAIRR is envisioned as a broadly available and federated collection of resources, including computational infrastructure, public- and private-sector data, and testbeds. These resources would be made easily accessible in a manner that protects privacy, with accompanying educational tools and user support to facilitate their use. An important element of the NAIRR will be the expertise to design, deploy, federate, and operate these resources.

Since its establishment, the Task Force has held 7 public meetings, engaged with 39 experts on a wide range of aspects related to the design of the NAIRR, and considered 84 responses from the public to a request for information (RFI). Materials from all public meetings and responses to the RFI can be found at www.AI.gov/nairrtf.

Today, as co-chair of the Task Force and as part of OSTP’s broader work to advance the responsible research, development, and use of AI, I am proud to announce the submission of the interim report of the NAIRR Task Force to the President and Congress. This report lays out a vision for how this national cyberinfrastructure could be structured, designed, operated, and governed to meet the needs of America’s research community. In the report, the Task Force presents an approach to establishing the NAIRR that builds on existing and future Federal investments; designs in protections for privacy, civil rights, and civil liberties; and promotes diversity and equitable access. It details how the NAIRR should support the full spectrum of AI research – from foundational to use-inspired to translational – by providing opportunities for students and researchers to access resources that would otherwise be out of their reach. The vision laid out in this interim report is the first step towards a more equitable future for AI R&D in America – a future where innovation can flourish and the promise of AI can be realized in a way that works for all Americans.

Going forward, the Task Force will develop a roadmap for achieving the vision defined in the interim report. This implementation roadmap is planned for release as the final report of the Task Force at the end of this year. To inform this work, we are asking for feedback from the public on the findings and recommendations presented in the interim report as well as how those recommendations could be effectively implemented. Public responses to this request for information will be accepted through June 30, 2022. In addition, OSTP and the National Science Foundation will host a public listening session on June 23 to provide additional means for public input. Please see here for more information on how to participate.

If successful, the NAIRR would transform the U.S. national AI research ecosystem by strengthening and democratizing foundational, use-inspired, and translational AI R&D in the United States. The interim report of the NAIRR Task Force being released today represents a first step towards this future, putting forward a vision for the NAIRR for public comment and feedback.

We Can’t Go It Alone!

In February, I received the Miles Conrad Award from the National Information Standards Organization (NISO). NISO espouses a wonderful vision: “. . . a world where all can benefit from the unfettered exchange of information.” As the Director of the National Library of Medicine (NLM), this is music to my ears.

Standards are essential to NLM’s mission! Standards bring structure to information, assure common understanding, and make the products of scientific efforts—including literature and data—easier to discover. NLM’s efforts are devoted to the creation, dissemination, and use of terminology and messaging standards. These efforts include attaching indexing terms to citations in PubMed, our biomedical literature database housing over 34 million citations; using reference models to describe genome sequences; and serving as the HHS repository for the clinical terminologies needed to support health care delivery. NLM improves health and accelerates biomedical discovery by advancing the availability and use of standards. Standards are dynamic tools that must capture the context of biomedicine and health care at a given moment yet reflect the scientific development and changes in community vernacular.

By their very nature, standards create consensus across two or more parties on how to properly name, structure, or label phenomena. No single entity can create a standard all by itself! Standards are effective because they shape the conversation between and among entities, achieving a common goal by drawing on a common representation.

NLM alone cannot create, promulgate, or enforce standards. We work in partnership with professional societies, standards development organizations, and other federal entities, including the Office of the National Coordinator for Health Information Technology, to foster interoperability of clinical data. We support the development and distribution of SNOMED CT (the Systematized Nomenclature of Medicine – Clinical Terms) and the specific extension of SNOMED in the United States. We developed the MeSH (Medical Subject Headings) thesaurus, a controlled vocabulary used to index articles in PubMed. We also support the development and distribution of LOINC (the Logical Observation Identifiers Names and Codes), a common language—that is, a set of identifiers names and codes—used to identify health measurements, observations, and documents. Finally, we maintain RxNorm, a normalized naming system for generic and branded drugs and their uses, to support message exchanges across pharmacy management and drug interaction software.

Partnerships help us create and deploy standard ways to make scientific literature discoverable and accessible. To this end, we were instrumental in the adoption of NISO’s JATS (Journal Article Tag Suite), an XML format for describing the content of published articles, which we encourage journals to use when submitting citations to PubMed so users can efficiently search the literature and articles as they are described. MeSH RDF (Resource Description Framework) is a linked data representation of the MeSH vocabulary on the web, and the BIBFRAME (Bibliographic Framework) Initiative—a data exchange format initiated by the Library of Congress—adds MeSH RDF URIs (Uniform Resource Identifiers) to link data that will support complete bibliographic descriptions and foster resource sharing across the web and through the networked world.

Standards provide the resources necessary to understand complex phenomena and share scientific insights. Leveraging partnerships in order to develop and deploy these standards both allows efficiencies and produces a more connected, interoperable, understandable world of knowledge. Given the speed at which biomedical knowledge is growing, leveraging these partnerships assures that the institutions charged with acquiring and disseminating all the knowledge relevant to biomedicine and health can successfully and effectively meet their missions.

What is the Role of a Mentor?

Guest post by Karmen S. Williams, DrPH, MBA, Assistant Professor at City University of New York Graduate School of Public Health and Health Policy, and Meera G. Subash, MD, Assistant Professor and Division Quality Officer for the Division of Rheumatology at the University of Texas Health Science Center, McGovern Medical School.

“Everyone, at every point in their career, has the potential to be a mentor as well as [to] seek a mentor. It is the combination of being and doing in mentorship that makes it such a rewarding and important part of a professional career.”

Medical informatics pioneer and NLM Director Patricia Flatley Brennan, RN, PhD, recently spoke these words when she joined us for a special podcast hosted by the American Medical Informatics Association (AMIA)—a crossover episode between For Your Informatics, led by the Women in AMIA Initiative, and ACIF Go-Live, directed by the AMIA Clinical Informatics Fellows.

Bryan McConomy, MD, began our inaugural episode with an introduction to medical informatics, highlighting the early work of Dr. G. Octo Barnett and his team’s development of the MUMPS integrated programming language at Massachusetts General Hospital in the 1960s. Being a relatively young field, we can look to the trailblazers who first used computers to augment clinical decision-making and improve health care discovery and delivery. We pay homage to the rich tapestry of innovative leaders and educators, such as Homer Warner, MD, PhD; Reed Gardner, PhD; Clement McDonald, MD; Margo Cook, RN; Lawrence Weed, MD; and Edward Shortliffe, MD, PhD, to name a few.

We started the History of Medical Informatics joint podcast series with those two AMIA podcasts with the understanding that we need to connect our past with the present. This ongoing series catalogs this history through the eyes of pioneers in the field of health informatics. By highlighting how historical events merge with contemporary topics of interest in health informatics, we intend to strengthen the bridge for new and upcoming professionals both in and outside of informatics.

In our episode titled “History of Medical Informatics – Mentorship” with Dr. Brennan, we focus on how mentorship was established in a field that, until recently, was virtually nonexistent. Dr. Brennan was not only our first guest on the joint series, but she was also featured in a March 2020 episode of For Your Informatics titled “Training the Next Generation of Informaticians,” which also offers valuable information on mentorship. She has been a full-circle guest by highlighting the past, present, and future of mentorship in health informatics.

Dr. Brennan will also be our keynote speaker at this week’s 2022 AMIA Clinical Informatics Conference, which will give us an opportunity to reflect on the real meaning of mentorship. What is mentorship? How did health informatics pioneers build mentorship in a new and novel field? What is the role of a mentor?

Dr. Brennan recalls some of the best parts of her mentorship experience, including having the freedom to explore, engage with like-minded individuals, establish trust, push boundaries beyond your starting point, and open new doors. Mentors are there for your failures in life, for the deeply embarrassing moments, and to help pick you up when you hit a bump in your career.

However, not all mentorships are created equal. There are some that are lifelong, while some are short term. Some aren’t always mutually beneficial, while others are mutually uplifting. Some mentors come from other fields, while others may be in the same field. The commitment to mentorship may be formalized or just a passing activity.

The style of mentorship can also vary. Some may bring a mentee into a research group to work side by side with them while some may only have periodic conversations. Either way, the mentor must be ready and willing to go through the process.

We’d like to share some wisdom we’ve received over the years: seek out people for a cup of coffee and find someone with whom you can share your successes and challenges. This is important because not all skills are learned in the classroom. For example, academicians need to know how to interpret faculty governance, engage with management, and position research and teaching. Dr. Brennan points out that “these things are difficult to learn on your own, and that’s where mentors can come in.”

The point is that mentorship must be purposeful and built on the trust needed to guide the direction of mentees’ careers and important life choices. It is a decision that should not be taken lightly. Mentorship in any arena is pertinent to career development, but it is especially valuable in groundbreaking fields like health informatics.

What is the best advice you’ve received from a mentor?

Headshot of Dr. Karmen S. Williams.

Dr. Williams completed a post-doctorate fellowship in public and population health informatics at Indiana University and Regenstrief Institute, where she focused on systemic informatics integration. Dr. Williams serves as the director of AMIA’s For Your Informatics podcast, which features individuals at all career stages to reveal the diverse world of biomedical and health informatics professions. She is a member of the AMIA Diversity, Equity, and Inclusion Committee; Women in AMIA Pathways Subcommittee; and AMIA Dental Informatics Working Group.

Headshot of Dr. Meera G. Subash.

Dr. Subash received her undergraduate degree from Stanford University and her medical degree from Texas Tech University Health Sciences Center School of Medicine. She continued to University of California San Francisco to complete both her Rheumatology and Clinical Informatics Fellowships. She is Epic Physician Builder certified, and her interest area is implementing and evaluating health IT and electronic health record tools to improve patient care in rheumatology and ambulatory care.

%d bloggers like this: