Addressing Social Determinants of Health with FHIR Technology

Guest post by Clem McDonald, MD, Chief Health Data Standards Officer at NLM; Jessica Tenenbaum, PhD, Chief Data Officer for North Carolina’s Department of Health and Human Services; and Liz Amos, MLIS, Special Assistant to the Chief Health Data Standards Officer at NLM.

We all know that whether you get an annual flu shot or smoke affects your health. But nonmedical social and economic factors are also large influences on health. For example, individuals will struggle to control their diabetes if they can’t afford healthy food or are sleeping on the street. Healthy People 2020 describes such circumstances as social determinants of health (SDOH). As our health system shifts to value-based payments models, health care systems are prioritizing outcomes, such as the level of glucose control, rather than how much care is delivered (e.g., the number of visits or tests). To achieve better health outcomes, leading organizations are working to identify and address SDOH needs as well as medical needs.

The North Carolina Department of Health and Human Services (NCDHHS) Healthy Opportunities program identifies four priority domains of non-medical needs that can be detected using the answers to screening questions. Screening for needs in these domains will be a standard operating procedure for all Medicaid beneficiaries as the state transitions its Medicaid program to managed care from fee-for-service. Health care providers will be able to refer individuals to community resources such as food pantries, homeless shelters, transportation services, interpersonal violence counselors, and other services that can address some of these nonmedical needs, and the organizations can then be reimbursed for approved services under Medicaid. A computer-based “closed-loop” referral system will enable the collection of information from social service organizations about the services provided, allowing NCDHHS to facilitate reimbursement, monitor the program, and assess its effectiveness. Electronic systems like the one being used in North Carolina are essential to capturing answers to the SDOH screening questions, triaging individuals to appropriate community resources for intervention, and tracking the effects of those interventions. North Carolina is building a “learning” Department of Health and Human Services, similar to a learning health system, with data collected through services provided used to inform future policy decisions.

The SDOH needs being addressed in North Carolina exist across the country, so there is considerable interest in developing standards-based systems for capturing SDOH data anywhere in the United States without the need for separate development efforts at each stage. A powerful mechanism called Fast Healthcare Interoperability Resources®, or FHIR®, has emerged to enable standardization across a broad spectrum of health care processes. Developed by Health Level Seven International, FHIR is a modern, web-based technology for exchanging health care data that has strong and growing support from various stakeholders in the field of health care, including major electronic health record vendors; the tech industry, including Apple, Microsoft, Google, and Amazon; and federal agencies such as NIH, the Office of the National Coordinator for Health Information Technology, the Centers for Medicare and Medicaid Services, the Food and Drug Administration, and the Agency for Healthcare Research and Quality. NCDHHS is exploring the use of a FHIR-based data-capture tool for collecting SDOH information about nonmedical health needs and delivering the survey results to health care providers who can address the needs identified.

Created in the spirit of collaboration, NLM’s FHIR questionnaire app — an open-source tool that can be used, modified, or incorporated into existing tools by anyone — instantly converts a questionnaire that follows FHIR’s technical specifications into a live web form. It leverages the FHIR standard to collect questionnaire data, and generating a different form is just a matter of feeding the tool a different set of questions. FHIR forms can implement skip logic, the nesting of repeated groups of questions, calculations, validation checks, the repopulation of questions with answers from the individual’s FHIR medical record, and more. Of course, the same tool can also implement many other kinds of forms for capturing health care data, such as surveys that measure patient-reported outcomes. You can search more than 2,000 available questionnaires in NLM’s FHIR questionnaire demo app. Other NLM-developed, open-source FHIR-based tools for managing health care data are available here.

NLM and NCDHHS have worked together to develop an open-source, FHIR-based implementation of North Carolina’s Healthy Opportunities screening questions (see figure 1). Anyone with a FHIR-ready server will be able to download the form, enter data, and then route those data to the appropriate health information technology system.

Let’s get to work screening patients broadly while minimizing clinical documentation burdens through the use of standardized application programming interfaces!

 

Figure 1: North Carolina Department of Health and Human Services (NCDHHS)’s Social Determinants of Health (SDOH) Screening Form as a live FHIR Questionnaire demo.
Figure 1: North Carolina Department of Health and Human Services (NCDHHS)’s Social Determinants of Health (SDOH) Screening Form as a live FHIR Questionnaire demo.

 


Clem McDonald, MD

Clem McDonald, MD, is the Chief Health Data Standards Officer at NLM. In this role, he coordinates standards efforts across NLM and NIH, including the FHIR interoperability standard and vocabularies specific to clinical care (LOINC, SNOMED CT, and RxNorm). Dr. McDonald developed one of the nation’s first electronic medical record systems and the first community-wide clinical data repository, the Indiana Network for Patient Care. Dr. McDonald previously served 12 years as Director of the Lister Hill National Center for Biomedical Communications and as scientific director of its intramural research program.

Jessica Tenenbaum, PhD

Jessica Tenenbaum, PhD, is the Chief Data Officer for North Carolina’s Department of Health and Human Services. In this role, Dr. Tenenbaum is responsible for the development and oversight of departmental data governance and strategy to enable data-driven policy for improving the health and well-being of North Carolinians. Dr. Tenenbaum is also an Assistant Professor in Duke University’s Department of Biostatistics and Bioinformatics. Dr. Tenenbaum is a member of the Board of Directors for the American Medical Informatics Association and serves on the Board of Scientific Counselors for NLM.

Liz Amos, MLIS

Liz Amos, MLIS, is Special Assistant to the Chief Health Data Standards Officer at NLM. She is a graduate of the University of Tulsa and the University of Oklahoma.

The Healing Nature of NLM’s Herb Garden

Guest post by Kathryn McKay, writer-editor at the National Library of Medicine 

Stressed?

Perhaps the scent of lavender or the sight of flowers could soothe you.

That’s what a group of gardeners have discovered while tending to the NLM Herb Garden. “When the herbs grasp your soul, you can’t just walk past them,” says Pat Keeney, who helped the garden bloom into what it is today. 

Volunteers from the Montgomery County Master Gardeners care for the more than 125 different herbs, right in front of the Library. Every Monday morning, about a dozen women and men plant and prune herbs and yank weeds. 

Between wiping sweat off their brows and sipping lavender tea, a few of the master gardeners told stories about their love affair with NLM’s Herb Garden. Each of the gardeners has a health story, whether as NIH patients, employees, or plant medicine historians. Started in 1976, the garden began as a way to highlight the healing power of nature.

In the 1980s when then-NIH employee Pat stumbled upon it, there weren’t many plants then—lavender, thyme, boxwood, a few snap dragons. So Pat started caring for them. She recruited friends to help, with varying degrees of success. When her friends started retiring, recruitment got easier. Now, she says, they are “luxuriating in gardeners.”

Summertime in the NLM Herb Garden.
Photo by Karen Kim

One of those gardeners is Jeanne Weiss. In 2014, Jeanne was diagnosed with pheochromocytoma, a rare condition in which tumors, usually noncancerous, develop in the adrenal gland. This diagnosis, along with Jeanne’s Cushing’s syndrome, led her to the NIH Clinical Center, where she received care for six weeks. This world-renowned research hospital provides care for people with rare and unusual diseases, mysterious illnesses, and health conditions that are under clinical investigation at NIH.

“I got the best care in the world. NIH saved my life,” Jeanne says. As a way of giving back, she started volunteering in the garden. Years later, that’s not what keeps her coming back. Jeanne volunteers to enjoy the plants and share her research into the history of the herbs.

She turns to a Lenten rose and explains how it was once used as a method of chemical warfare. “The Greeks put the roots in the water supply, which made people so ill, they couldn’t fight,” she says.

Holding up a leaf of the betony plant, Jeanne says with a wink and a smile that, according to folklore, “if you put a leaf in each nostril, one under your tongue, and a leaf in each hand and under each foot, you might cure your insomnia but not just any kind of insomnia—the kind you get from heartbreak.”

The NLM Herb Garden in front of the National Library of Medicine.
Photo by Kathryn McKay

Gardener Selma Deleon enjoys unearthing trivia on women’s health. “Lady’s mantle appealed to me because of its beauty, but it also amuses me,” she says. “It was called a ‘woman’s best friend’ because it was thought to stimulate the uterus, restore ‘feminine beauty’ after breastfeeding, and more.” She mentions how black cohosh was thought to minimize menopausal night sweats and hot flashes and how mountain mint brewed into a tea was drunk to cure menstrual disorders.

Selma sees the “circle of life in the garden” and “the joy of starting something and seeing its growth.”

Sandy Occhipinti understands. She says, “It’s therapeutic to work in and sit in a garden.” Sandy remembers a morning when a young patient from Peru needed exactly this kind of therapy.

As Sandy recalls, the Peruvian girl came to the garden with her father and her nurse. Because her immune system was compromised, this girl couldn’t play with her peers, but she could touch and smell the herbs. Sandy says, “The garden is a respite for so many different people of different nationalities.”

Sandy’s statement is as true for NLM’s staff as it is for visitors. We come from all over the world and we provide access to health resources used by people all over the world.

Thanks to the volunteers who care for it, the NLM Herb Garden provides a sanctuary for us all to relax and rejoice in the healing power of Mother Nature.

Photo of Kathy McKay

 

Kathryn McKay, MA, is a writer-editor at NLM. She is a graduate of the University of Delaware and Johns Hopkins University.

 

Biomedical Discovery through SRA and the Cloud

Guest post by Jim Ostell, PhD, Director of the National Library of Medicine’s National Center for Biotechnology Information, National Institutes of Health.

NLM’s Sequence Read Archive (SRA) is used by more than 100,000 researchers every month, and those researchers now have a tremendous new opportunity to query this database of high-throughput sequence data in new ways for novel discovery: via the cloud. NLM has just finished moving SRA’s public data to the cloud, completing the first phase of an ongoing effort to better position these data for large-scale computing.  

To understand the importance of this move, it’s helpful to consider the analogy of how humans slowly improved their knowledge of the surface of the Earth.

The first simple maps allowed knowledge of terrain to be passed from people who had been there to those who hadn’t. Over the centuries, we learned to sail ships over the oceans and capture new knowledge in navigation charts and techniques. And we learned to fly airplanes over an area of interest and automatically capture detailed images of not only terrain, but also buildings and reservoirs, and assess the conditions of forest, field, and agricultural resources.

Today, with Earth-orbiting satellites, we no longer need to determine in advance what we want to view. We just photograph the whole Earth, all day, every day, and store all the data in a big database. Then we mine the data afterward. The significant change here is that not only can we follow, in great detail, locations or features on the Earth that we already know we’re interested in, as in aerial photography, but we can also discover new things of interest. Examples abound: for instance, noticing a change in a military base, and going back in time to see when the change began or how it developed; or seeing a decline in a forest or watershed, going back in time to see how this decline developed, and then looking geographically to see if it’s happening in other places in the world.

Scientists also can develop new algorithms to extract information from the corpus, or collection, of information. For example, archeologists looking for faint straight-line features indicative of ancient walls or foundations can apply new algorithms to the huge body of existing data to suddenly reveal ancient buildings and cities that were previously unknown.

DNA sequencing has had a similar history, starting from the laborious sequencing of tiny bits of known genomes that could be analyzed by eye (like hand-drawn maps), to the targeting of specific organism genomes to be completely sequenced and then analyzed (similar to aerial photography), to the modern practice of high-throughput sequencing, in which researchers might sequence an entire bacterial genome to study only one gene because it’s easier and cheaper to just measure the whole thing.

However, the significant difference in this analogy is that the ability to search, analyze, or develop new algorithms to explore the huge corpus of high-throughput sequence data is not yet a routine practice accessible to most scientists — as it is for Earth-orbiting satellite data.

Today, scientists expect to be able to routinely explore the entire corpus of targeted genome sequence data through tools such as NLM’s Basic Local Alignment Search Tool (BLAST); very little of the scientific work with genome data is looking for a specific GenBank record. The major scientific work is done by exploring the data in fast, meaningful ways, asking questions such as “Has anyone else seen a protein like this before?”; “What organism is most like the organism I’m working on?”; “Where else has a piece of sequence like this been collected?”; “Is anything known about the function of a piece of sequence like this?” But it has not been possible to do that for the high-throughput, unassembled sequence data, across all such sequences, because that corpus of data has been too big for all but a few places in the world to hold, or to compute across.

This is now changing.

With support from the National Institutes of Health (NIH) Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative, NLM’s National Center for Biotechnology Information (NCBI) has moved the publicly available high-throughput sequence data from its SRA archive onto two commercial cloud platforms, Google Cloud and Amazon Web Services. For the first time in history, it’s now possible for anyone to compute across this entire 5-petabyte corpus at will, with their own ideas and tools, opening the door to the kind of revolution that was sparked by the availability of a complete corpus of Earth-orbiting satellite images.

The public SRA data include genomes of viruses, bacteria, and nonhuman higher organisms, as well as gene expression data, metagenomes, and a small amount of human genome data that is consented to be public (from the 1000 Genomes Project). NCBI has held, and will continue to hold, codeathons to introduce small groups of scientists to exploring these data in the cloud. For example, during a recent codeathon, participants worked with a set of metagenomes to try to identify known and novel viruses. Other upcoming codeathon cloud topics include RNA-seq, pangenomics, haplotype annotation, and prokaryotic annotation.

Now that the publicly available SRA data are in the cloud, the next milestone is to make all of SRA’s controlled-access human genomic data available on both cloud platforms. Providing access to these data requires a higher level of security and oversight than is required for the nonhuman and publicly available human data, and access must be accompanied by a platform for the authentication and authorization of users, which creates a host of other issues to address. This effort is being undertaken in concert with other major NIH human genome repositories, with guidance from NIH leadership, and with international groups such as the Global Alliance for Genomics and Health (GA4GH).

But, already, the publicly available SRA data are there for biological and computational scientists to take their first dive into the new world of sequence-based “Earth-orbiting satellite photography.” More and more — in research, in clinical practice, in epidemiology, in public health surveillance, in agriculture, in ecology and species diversity — we’ve seen the movement to “just sequence the whole thing.” Now we’ve taken the first step toward the necessary corollary: to “analyze it all afterward.”

In the coming weeks and months, NLM will be making further announcements about SRA in the cloud, with tutorials and updates on the availability of controlled-access human data. For those already familiar with operating on commercial clouds who would like a look at the SRA data in the cloud, you can get started today via the updated SRA toolkit.


Dr. Ostell has had a leadership position at NCBI since its inception in 1988. Before assuming the role of Director in 2017, he was Chief of NCBI’s Information Engineering Branch, where he was responsible for designing, developing, building, and deploying the majority of production resources at NCBI, including flagship products such as PubMed and GenBank. Dr. Ostell was inducted into the United States National Academies, Institute of Medicine, in 2007 and made an NIH Distinguished Investigator in 2011.

To stay up to date on NCBI projects and research, follow us on Twitter.


Enhancing Data Sharing, One Dataset at a Time

Guest post by Susan Gregurick, PhD, Associate Director for Data Science and Director, Office of Data Science Strategy, National Institutes of Health

Circular graphic showing Findable, Accessible, Interoperable, and Reusable aspects of the Vision of the NIH Strategic Plan for Data Science
Vision of the NIH Strategic Plan for Data Science

The National Institutes of Health (NIH) has an ambitious vision for a modernized, integrated biomedical data ecosystem. How we plan to achieve this vision is outlined in the NIH Strategic Plan for Data Science, and the long-term goal is to have NIH-funded data be findable, accessible, interoperable, and reusable (FAIR). To support this goal, we have made enhancing data access and sharing a central theme throughout the strategic plan.

While the topic of data sharing itself merits greater discussion, in this post I’m going to focus on one primary method for sharing data, which is through domain-specific and generalist repositories.

The landscape of biomedical data repositories is vast and evolving. Currently, NIH supports many repositories for sharing biomedical data. These data repositories all have a specific focus, either by data type (e.g., sequence data, protein structure, continuous physiological signals) or by biomedical research discipline (e.g., cancer, immunology, or clinical research data associated with a specific NIH institute or center), and often form a nexus of resources for their research communities. These domain-specific, open-access data-sharing repositories, whether funded by NIH or other sources, are good first choices for researchers, and NIH encourages their use.

NIH’s PubMed Central is a solution for storing and sharing datasets directly associated with publications and publication-related supplemental materials (up to 2 GB in size). On the other end of the spectrum, “big” datasets, comprising petabytes of data, are now starting to leverage cloud service providers (CSPs), including through the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative. These are still the early days of data sharing through CSPs, and we anticipate that this will be an active area of research.

There are, however, instances in which researchers are unable to find a domain-specific repository applicable to their research project. In these cases, a generalist repository that accepts data regardless of data type or discipline may be a good fit. Biomedical researchers already share data, software code, and other digital research products via many generalist repositories hosted by various institutions—often in collaboration with a library—and recommended by journals, publishers, or funders. While NIH does not have a recommended generalist repository, we are exploring the roles and uses of generalist repositories in our data repository landscape.

screenshot of NIH Figshare homepage
NIH Figshare homepage https://nih.figshare.com

For example, as part of our exploratory strategy, NIH recently launched an NIH Figshare instance, a short-term pilot project with the generalist repository Figshare. This pilot provides NIH-funded researchers with a generalist repository option for up to 100 GB of data per user. The NIH Figshare instance complies with FAIR principles; supports a wide range of data and file types; captures customized metadata; and provides persistent unique identifiers with the ability to track attention, use, and reuse.

NIH Figshare is just one part of our approach to understanding the role of generalist repositories in making biomedical research data more discoverable. We recognize that making data more FAIR is no small task and certainly not one that we can accomplish on our own. Through this pilot project, and other related projects associated with implementing NIH’s strategy for data science, we look forward to working with the biomedical community—researchers, librarians, publishers, and institutions, as well as other funders and stakeholders—to understand the evolving data repository ecosystem and how to best enable useful and usable data sharing.

Together we can strengthen our data repository ecosystem and ultimately, accelerate data-driven research and discovery. We invite you to join our efforts by sending your ideas and needs to datascience@nih.gov.

Susan Gregurick, PhD

Dr. Gregurick leads the NIH Strategic Plan for Data Science through scientific, technical, and operational collaboration with the institutes, centers, and offices that comprise NIH. She has substantial expertise in computational biology, high performance computing, and bioinformatics.

Defining the Path Forward for NLM’s New Office of Engagement and Training

Guest post by Amanda J. Wilson, Chief, Office of Engagement and Training, NLM.

During the NLM Board of Regents (BOR) meeting held last week, I had the distinct honor of introducing the new Office of Engagement and Training (OET). This office brings together many of the outreach, training, and capacity-building staff, programs, and services from across the Library.

Since OET was established in June 2019, our team has been occupied with moving into our new space, getting to know one another, exploring the depth and capacity of the resources we have to accomplish our goals, discussing what the future holds for our role in coordinating engagement activities, and reflecting as a team on the niche we fill for NLM. In the midst of this summer flurry of activity and, quite frankly, the more mundane tasks of figuring out the fastest way to answer the door to our offices and the mechanics of mail distribution, some themes surrounding what we can, and hope to become rose to the top.

Our vision for OET is a resource that will serve the NLM community as a strategic connector between NLM and our audiences, as well as across the Library, as a trusted authority on the NLM experience when engaging with Library resources. We are also an incubator for new approaches to engagement.

What, exactly, does that mean?

It means we understand the broad range of both new and existing NLM users, their needs, and the most effective pathways to reach them. And it also means we are closely connected to NLM researchers, developers, information professionals, program managers, and product owners, including knowing what information is most important to them and has the greatest impact on their work.

This vision also involves knowing how all segments of NLM’s audiences respond to different types of engagement activities. That knowledge will position OET to use our expertise, capabilities, and connections to bring NLM’s trusted resources to communities when and where those resources are needed most. And, considering our unique position, it means we can be a catalyst for exploring novel, effective ways to connect, build, and enhance opportunities for all audiences to engage with NLM.

But that’s not all.

As we started working toward these goals and aspirations, we asked the BOR for advice and thoughts to guide us. For some activities that we currently engage in, such as surveys, webinars, meetings, and exhibits, the BOR provided encouragement for us to continue. The BOR also challenged OET to explore new strategies for engagement, such as working with U.S. Public Health Service Commissioned Corps officers who are part of the Prevention through Active Community Engagement (PACE), in the Office of the Surgeon General. Another suggestion was to engage in community theater productions to help convey our message.

The possibilities that BOR members provided, as well as input from our colleagues at NLM and other partners, have given OET much to consider as we chart our path forward.

What does this vision of OET mean to you?

I’ve been called corny by one of my colleagues (said with a smile) for my obvious enthusiasm about the future of OET. But I absolutely embrace that sentiment! I’m enthusiastic because I have an opportunity to lead a wonderful team of experienced, knowledgeable colleagues dedicated to our mission. I’m also enthusiastic because OET has the support of NLM leadership and the BOR to continue creating an office that supports NLM’s goals with evidence-based engagement and training, built on collaboration and inclusivity and with an eye to the future.

This is an exciting time, and I look forward to all that we can do together! I invite you to join us along the way.

Photo of Amanda Wilson, Chief of the Office of Engagement and Training.

Amanda J. Wilson is Chief of the NLM Office of Engagement and Training (OET), bringing together general engagement, training, and outreach staff from across NLM to focus on the Library’s presence across the U.S. and internationally. OET is also home to the Environmental Health Information Partnership for NLM and coordinates the National Network of Libraries of Medicine. Wilson first came to NLM when appointed Head, National Network Coordinating Office, in January 2017.

NLM Scientists Contribute to AI for Medical Image Interpretation

Guest post by Sameer Antani, PhD, Staff Scientist, Acting Branch Chief for the Communications Engineering Branch and Computer Science Branch at the National Library of Medicine’s Lister Hill National Center for Biomedical Communications, National Institutes of Health.

Artificial intelligence (AI) has become one of the hottest fields of the 21st century. But AI isn’t a new concept. It’s older than I am!

AI—or, more specifically, machine learning-based automated intelligent decision support—is making inroads in applications that we could only dream about just a few decades ago, such as automated check recognition, movie and video recommendations, and self-driving vehicles.

And in the near term, the role of AI may be as computer-based applications that use data-derived knowledge to support or advance human activities that are tedious, repetitive, and relatively deterministic, especially where expert resources are lacking. In other words, AI may not only help solve budget issues, it may also help reduce boredom.

The idea of an artificial brain was initially promoted by a handful of scientists from different fields, resulting in the founding of AI research as an academic discipline in 1956. After some initial discoveries, and a clearer understanding of the challenges involved, the field lost steam during the last decades of the 20th century. However, advances continued in the form of various statistical pattern recognition and machine learning techniques.

Then, in 2012, a breakthrough in deep learning was published. The image-classification error rate had been cut in half for the ImageNet dataset in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). By 2017 the best AI algorithms were detecting and recognizing objects in photographic images at an impressive accuracy rate of more than 97%, surpassing human performance.

Since then, AI in imaging has become a relatively mature field. But the use of AI in medical imaging continues to challenge us. We need to recognize that much of the field’s success in medical imaging has been within a narrow focus on specific tasks for which AI has been trained, and that this success depends on the data to which AI has been exposed.

Here at NLM, we’ve been working on image informatics research and advancing computational science techniques and information retrieval using traditional machine-learning methods for many years, even before the advent of deep learning. 

Some of AI’s most exciting applications are happening in underserved and under-resourced regions, and imaging-based AI can help fill the gaps where medical expertise may be limited. My fellow NLM scientists and I have applied and contributed to advancing AI techniques to predict tuberculosis and other pulmonary diseases in digital chest X-ray images, screen for malaria parasites in microscopic blood smear images, and detect age-related eye diseases. A recent landmark paper showed that an AI algorithm was superior to human experts in identifying cervical precancer in women.

These findings are consistent with other AI advances in medical imaging reported in the scientific literature, including reading CT scans for lung-cancer screening, detecting brain tumors, screening for diabetic retinopathy, digital pathology applications for precision oncology, and performing radiologist-level pneumonia detection on chest X-rays. While many of these exemplify amazing advances in medical imaging AI, some are built on or have humble beginnings in outcomes from ImageNet’s object localization and recognition challenge.

NLM’s strategic plan for building a platform for data-driven discovery and health guides our research efforts. We’re developing novel AI algorithms; gaining a deeper understanding of AI decision-making (also known as explainable AI); measuring the impact of data variety, volume, and quality; and identifying more ways to address gaps in translating technical advances to have a positive impact on biomedical research and clinical care.

Our research interests also include intelligent ensembles of deep learning networks where each type of network learns something different from the data, and the learned knowledge is then transferred and fused into other sets. This effort is particularly important for rare diseases, where the number of samples in the population tends to be smaller. Unlike humans, it’s a challenge for AI to learn key patterns from a few samples. But we’re trying to develop this capacity.

Breakthroughs in modern AI techniques in medical imaging are empowering, but these are still early days. Yet achieving AI’s potential as smart assistive technologies appears to be more imminent than replacing human expertise with AI.

I continue to dream of a future in which AI makes our lives healthier and our health care delivery more effective.

 

Photos of Sameer Antani, PhDDr. Antani is a versatile lead researcher advancing the role of computational sciences and automated decision-making in biomedical research, education, and clinical care. His research interests include topics in medical imaging and informatics, machine learning, data science, artificial intelligence, and global health. His primary areas of research and development include cervical cancer, HIV/TB, and visual information retrieval, among others.

Taking Flight: NLM’s Data Science Journey

Guest post by the Data Science @NLM Training Program team.

Data science at NLM is ready to soar!

In 2018, we embarked on a journey to build a workforce ready to take on the challenges of data-driven research and health, and earlier this year we shared our plans for accelerating data science expertise at NLM. Now, it’s time to reflect on our progress and recognize our accomplishments.

Our Data Science @NLM Training Program Open House, held last week, showcased some of the great data science work happening across the Library. We learned from each other and discovered new opportunities to strengthen the Library’s proficiencies in working with data and using analytic tools, furthering NLM’s research practices and services.

Data Science @NLM Poster Gallery

A poster gallery featuring 77 research posters and data visualizations provided a snapshot of the many ways that NLM staff apply data science to their work. It was great to see so many NLM staff sharing their work and engaging in stimulating conversations about innovation.

Three “lightning” presentations gave a glimpse of how NLM staff use data science. NLM Data Science and Open Science Librarian, Lisa Federer, PhD, MLIS, talked about building a librarian workforce to engage with researchers on open science and data science. NLM’s Rezarta Islamaj, PhD, and Donald Comeau, PhD, presented their perspectives on enriching gene and chemical links in PubMed and PubMedCentral and evaluating Medical Subject Headings, or MeSH in indexing for literature retrieval in PubMed.

The open house was also an opportunity for NLM staff who participated in an intensive 120-hour data science fundamentals course to share what they learned and how they’re applying their new skills.  

But this event was more than a celebration of accomplishments. It provided space to reflect on lessons learned, how to use what we’ve learned on a daily basis, and hopes for the future of data science at NLM. Dina Demner-Fushman, MD, PhD, of NLM dove into data science methodologies in her discussion of the Biomedical Citation Selector (BmCS), a high-recall machine-learning system that identifies articles that require indexing for MEDLINE selectively-indexed journals.

Data Science @NLM Ideas Booth

NLM staff brainstormed over 60 ideas to bring data science solutions to new and ongoing projects and talked with data science experts at the open house “ideas booth.” Staff also shared how they will learn, or continue to use, data science in support of their individual career goals.

We were delighted to see over 300 NLM staff participating in the open house, which is just one of the ways that NLM is working to achieve goal 3 of the NLM strategic plan to “build a workforce for data-driven research and health.”

The Data Science @NLM Training Program has helped increase NLM staff awareness of and expertise in data science. NLM staff are now better prepared than ever to demonstrate the Library’s commitment to accelerating biomedical discovery and data-powered health.   

Our data science journey continues, as does the growth of the data science community at NLM. For a recap of the day, follow the experience at #datareadynlm.

We’re taking off!


Photos of the Data Science at NLM Training Program Team; Dianne Babski, Peter Cooper, Lisa Federer and Anna Ripple
Data Science @NLM Training Program team (left to right):
Dianne Babski, Deputy Associate Director, Library Operations
Peter Cooper, Strategic Communications Team Lead, National Center for Biotechnology Information
Lisa Federer, Data Science and Open Science Librarian, Office of Strategic Initiatives
Anna Ripple, Information Research Specialist, Lister Hill National Center for Biomedical
Communications