Meet the Next Generation of Leaders Advancing Data Science and Informatics at NLM

Guest post by Virginia Meyer, PhD, Training Director for the Intramural Research Program, National Library of Medicine, National Institutes of Health.

Working at NLM means being at the forefront of innovation in the rapidly evolving fields of data science and informatics. Within that environment, the NLM Intramural Research Program (IRP) is dedicated to supporting individuals looking to develop and apply computational approaches to a broad range of problems in biomedicine, molecular biology, and health.

NLM understands that contributions from people of diverse backgrounds, cultures, and histories enables research that has the greatest impact and reaches the widest possible audience. Such a workforce is necessary to drive innovation and scientific advancement and is imperative to ensuring that computational tools and data sets are free from bias. To that end, the Diversity in Data Science and Informatics (DDSI) Summer Internship, a program of the NLM IRP now in its inaugural year, was developed to support and engage young scientists who are dedicated to careers in computational biology and biomedical informatics. It is our hope that time spent in the DDSI program and Principal Investigators (PI) will encourage trainees to continue along the path toward becoming leaders in their chosen fields.

Meet four of this year’s DDSI interns and learn about the work they are doing in the NLM IRP!

Will Hibbard
Graduate Student in Biomedical Informatics
University of Buffalo

PI: Olivier Bodenreider, MD, PhD, Computational Health Research Branch, Lister Hill National Center for Biomedical Communications at NLM
Research Area: Natural Language Processing

What interested you most about the DDSI program?
I found out about the program when a teacher recommended it to me out of the blue, and after looking into it, I found a lot of fun research projects I could join. The program offered an opportunity to join research projects in familiar and unfamiliar fields. Ultimately, it was pleasantly outside of my comfort zone and presented the kind of challenge that makes me love research.

What research project are you working on and why?
I ended up working with Dr. Olivier Bodenreider using neural networks to better develop natural language processing in medical databases. I applied to this project because it involved two areas in which I had less experience: ontology and data structures. I pursued this research area because it allowed me the chance to improve in fields that I did not understand well at the time.

Why might someone want to apply to the DDSI program in the future?
This is the kind of experience with challenges that allow you to grow as a person and as a professional. Whether you know the area of research well or have trouble understanding it, this program will give you an opportunity to learn through a practical research project.

What is next for you after you complete your internship?
I will be taking a gap year while I apply to medical school. I am hoping to work in my local oncology institute and medical corridor.

MG Hirsch
PhD Student in Computer Science
University of Maryland, College Park

PI: Teresa Przytycka, PhD, Computational Biology Branch, National Center for Biotechnology Information at NLM
Research Area: Evolutionary Genomics

What interested you most about the DDSI program?
Evolution of gene expression and modeling different modes of evolution is something that I had yet to explore in my PhD research. I thought a summer program would be perfect to learn about it. It also gives me the opportunity to get a feel for working at the NIH and if I would want to consider the NIH Graduate Partnerships Program.

What research project are you working on and why?
I am evaluating the possibility of different modes of gene expression evolution within a tumor. Previous work in the lab considers different models of gene expression evolution between animal species. Many models of evolution assume neutral evolution, that mutations occur and persist randomly; however, we know that mutations that change phenotypes undergo various selective pressures from the environment. Considering this, previous work, resulting in the software EvoGeneX, has fit computational models using Ornstein-Uhlenbeck processes to evaluate potential divergence of gene expression within fly species. My research project is applying this same concept to cancer tumors. After tumorigenesis, cancer cells rapidly accumulate further mutations and diversify into subclones within the same tumor. Owing to the different sets of mutations, these subclones evolve differently. We can hypothesize then that the evolution of the gene expression of subclones can be modeled using the same computational models.

Why might someone want to apply to the DDSI program in the future?
The DDSI program offers extra speaker talks and networking opportunities.

What is next for you after you complete your internship?
I will be finishing my PhD in computer science at UMD.

Sirisha Koirala
Undergraduate Student in Computer Science
University of Maryland, College Park

PI: Zhiyong Lu, PhD, Computational Biology Branch, National Center for Biotechnology Information at NLM
Research Area: Natural Language Processing and Computational Biology

What interested you most about the DDSI program?
I was most interested in the unique ongoing research projects that students had the opportunity to participate in, which I would not have been able to find at other programs. It was very interesting to learn about the ways that artificial intelligence (AI) could be applied to medical practices, and this stood out to me as medicine and AI are two of my main interests.

What research project are you working on and why?
I am working on AI in the prediction of progression in age-related macular degeneration. In my first year of college, I was on the pre-medicine track; however, while gaining greater exposure, I realized that I have a stronger passion for computer science. Within the field of computer science, I have a particular interest in AI, and this project specifically allowed me to combine both of my interests and backgrounds.

Why might someone want to apply to the DDSI program in the future?
The DDSI program provides students who come from underrepresented backgrounds a chance to gain real hands-on experience. As a student who came from a small, all-women’s university where I did not have the availability to engage in such opportunities, this program has helped me significantly. I have been able to get the real-world experience I need to help me excel further in my career preparations, and students who are in similar positions should consider applying for this reason.

What is next for you after you complete your internship?
After I complete my internship, I will be starting my second year of college at University of Maryland, College Park where I am pursuing a major in computer science.

Tochi Oguguo
Undergraduate Student in Computer Science and Information Systems
University of Maryland, Baltimore County

PI: Sameer Antani, PhD, Computational Health Research Branch, Lister Hill National Center for Biomedical Communications at NLM
Research Area: Bias in Machine Learning

What interested you most about the DDSI program?
What interests me the most about this program is the amount of experience you gain during the summer. You leave understanding concepts at a higher level and applying lessons to your life outside of research.

What research project are you working on and why?
My research project is about bias in machine learning. By using fair active learning, we teach the machine how to give accurate responses when diagnosing or classifying a dataset or image. Bias is one of the biggest issues in machine learning, especially in health care where inaccurate judgment can be dangerous.

Why might someone want to apply to the DDSI program in the future?
DDSI is a great program to help students and interns learn more about career paths out there for them to explore and to help you become a more resilient person and scientist outside of research.

What is next for you after you complete your internship?
I plan to apply again next summer and keep working in research and machine learning! Also, I will take more classes in information science to help me become a better programmer.

Using Large Datasets to Improve Health Outcomes

Guest post by Lyn Hardy, PhD, RN, Program Officer, Division of Extramural Programs, National Library of Medicine, National Institutes of Health.

Before the advent of algorithms to determine the best way to treat and prevent heart disease, a health care provider looking for best practices for their patients may not have had the resources to find that best method. Today, health care decision-making for individuals and their health care providers is made easier by predictive and preventive models, which were developed with the goal of guiding the decision-making process. One example is the Patient Level Prediction of Clinical Outcomes and Cost-Effectiveness project led by Columbia University Health Sciences.

These models are created using computer algorithms (a set of rules for problem-solving) based on data science methods that analyze large amounts of data. While computers can analyze facts within the data, they rely on human programming to define what pieces of data or what data types are important to include in the analysis to create a valid algorithm and model. The results are translated into information that health care providers can use to understand patterns and provide methods for predicting and preventing illness. If a health care provider is looking for ways to prevent heart disease, an accurate model might describe methods—like exercise, diet, and mindfulness practices—that can achieve that goal.

Algorithms and models have benefited the world by using special data science methods and techniques to understand patterns that guide clinical decisions, but identifying data used in their development still requires practitioners to be conscious of the results. Research has shown that algorithms and models can be misleading or biased if they do not account for population differences like gender, race, and age. These biases, also known as algorithmic fairness, can adversely affect the health of underserved populations by not giving individuals and health care providers information specific to and that directly addresses their diversity. An example of potential algorithm bias is creating an algorithm to treat hypertension without including variated treatments for women or considering life-related stress or the environment.

Researchers are focusing on methods to create fair and equitable algorithms and models to provide all populations with the best and most appropriate health care decisions. Researchers in our NLM Extramural Programs analyze this data through NLM funding opportunities that foster scientific inquiry so we better understand algorithmic effects on minority and marginalized populations. Some of those funding opportunities include NLM Research Grants in Biomedical Informatics and Data Science (R01 Clinical Trial Optional) and the NIH Research Project Grant (Parent R01 Clinical Trial Not Allowed).

NLM is interested in state-of-the-art methods and approaches to address problems using large health data sets and tools to analyze them. Specific areas of interest include:

  • Developing and testing computational or statistical approaches to apply to large or merged health data sets containing human and non-human data, with a focus on understanding and characterizing the gaps, errors, biases, and other limitations in the data or inferences based on the data.
  • Exploring approaches to correct these biases or compensate for missing data, including introducing debiasing techniques and policies or using synthetic data.
  • Testing new statistical algorithms or other computational approaches to strengthen research designs using specific types of biomedical and social/behavioral data.
  • Generating metadata that adequately characterizes the data, including its provenance, intended use, and processes by which it was collected and verified.
  • Improving approaches for integrating, mining, and analyzing health data in a way that preserves that data’s confidentiality, accuracy, completeness, and overall security.

These funding opportunities encourage inquiry into algorithmic fairness to improve health care for all individuals, especially those who are underserved. By using new research models that account for diverse populations, we will be able to provide data that will support the best treatment outcomes for everyone.

Dr. Hardy’s work and expertise focus on using health informatics to improve public health and health care decision-making. Dr. Hardy has held positions as a researcher and academician and is active in national informatics organizations. She has written and edited books on informatics and health care.

%d bloggers like this: