Next-Generation Data Science Research Challenges

NIH-funded research is rapidly becoming more and more data-driven. This is true whether that research is intramural or extramural or whether it is focused on solving concrete problems or advancing methodologies for specific domains.

Right now, NLM’s role in this data-driven research centers on developing scalable, sustainable, and generalizable methods for making biomedical data FAIR: Findable, Accessible, Interoperable, and Reusable.

Toward this end, NLM—on behalf of the NIH—released last fall a Request for Information on Next-Generation Data Science Challenge in Health and Biomedicine. We sought community input on data science research initiatives that could address the key challenges researchers, clinicians, administrators, and others currently faced. We invited suggestions for new data science research in six areas:

  • Data-driven Discovery
  • Data-driven Health Improvement
  • Advanced Data Management
  • Intelligent and Learning Systems for Health
  • Workforce Development and Diversity
  • New Stakeholder Partnerships

Fifty-three responses provided more than 180 pages of ideas and suggestions.

The topic “Data-driven Discovery” prompted input focused on developing methods and tools to help researchers derive insights from data. These suggestions fell into a number of areas particularly relevant to NLM, including help with natural language processing; predictive analytics to help generate hypotheses from hidden patterns; ways to extract and formalize scientific claims and causal statements from publications; and improved ontologies.

Ideas related to improving health through data recommended developing algorithms tied to patient similarity to drive comparative effectiveness research; nuanced characterizations of phenotypes (including severity, degree and certainty); and strategies to address bias in health records used for research purposes.

Suggestions concerning managing data revealed a need to better capture and curate that data. These included smoothly integrating personal data from mobile devices into clinical work flows; automatically assigning standardized metadata to existing data sets and digital files; sharing open source analytic methods; and developing technological platforms to help scientists store and analyze data.

Ideas for intelligent learning systems ranged widely, from brain science research focused on learning and retention, to approaches for engaging users with health data, to building flexible learning modules.

Many contributors recognized the need to develop a data-skilled workforce, and their suggestions extended beyond simply increasing the number of data scientists. They called for reaching out to high school and undergrad students to equip them earlier with the foundational skills and education that would make training in data science interesting and feasible; creating core informatics and data science skills for all researchers; and infusing the PhD in health informatics with required coursework in computer science and statistics, along with health and biomedicine.

Suggestions for stakeholder partnerships included the names of specific associations, federal agencies, and companies, as well as a shout-out for working more closely with those citizen scientists interested in taking advantage of the growing supply of publicly available health data.

Clearly the research challenges of the future will need strong investments from across NIH.

The Library’s early contributions will be three-fold:

  1. Serve as an honest broker to create a trans-NIH statement of the data science and informatics skills essential for all federally supported trainees (in NIH-funded research training programs located at universities as well as career grants).
  2. Stimulate research in advanced curation and information-integration methods.
  3. Accelerate the development of scalable, reusable, and generalizable visualization tools and analytical approaches.

What else do you think belongs in our portfolio? Chime in below.

Now is your chance to shape the next-generation research agenda for data science.

Request a summary of the responses to the RFI referenced in this post.

casual headshot of Valerie FloranceDr. Valerie Florance, co-author of this post, serves as Director of the NLM Division of Extramural Programs. She also coordinates NLM’s informatics training programs.

The Research Ecosystem

On the future of scientific communication

This week I will be part of the research ecosystem panel at the National Academy of Sciences’ Journal Summit. The theme of the summit is “The Evolving Ecosystem of Scientific Publishing.”

The summit encourages audience discussion and debate, so I’m hoping my fellow panelists and I can elicit (or even incite!) lively and fruitful discussions among scientific editors, nonprofit publishers, researchers, funders, academic directors, librarians, and IT specialists. It will be great fun, I am sure.

Great fun and serious business.

As NLM’s Director, it’s my job to think about the Library’s role in fostering scientific communication. But what does “scientific communication” actually entail? Is it the same as the research literature?

Many think so, and, of course, NLM is well-known for providing access to the research literature through PubMed and PubMed Central—resources we build consciously and intentionally to ensure the included publications meet key criteria and the content is reliably available, whether online or in print.

But scientific communication is rapidly expanding beyond journal articles. In our corner of the world, for example, PubMed Central has been accepting small data files (less than 2Gb) along with submitted manuscripts since last October.

But I believe the realm of scientific communication will expand even further.

I see on the horizon an era of scientific communication influenced by the principals of open science, which support sharing not only the answer to a research hypothesis but also the products and processes used to get to that answer.

Two key practices will help usher in this new era:

  1. Communicating early and often. For example, because allows investigators to upload a range of research elements, including data collection instruments, analytical plans, and human subjects agreements, interim products of a research process can be available to others long before the final article is in place.
  2. Sharing all components of the research process, not simply summative reports. Last fall I suggested a library of models, properly documented and vetted, to allow researchers to apply existing and trustable models to their data. Data visualizations, source code, and videos might also prove useful. And you might have other ideas. (Please comment below and tell us about them.)

Such sharing of tools, products, and processes will save valuable time and money while also enhancing the rigor and reproducibility of the research itself by opening for examination all the procedural and methodological details. It also promises to speed innovation and knowledge transfer, which, you might say, are two of the key reasons for scientific communication in the first place.

But we still have so much to learn and to discuss. And, of course, NLM can’t shape the future of scientific communication alone.

That’s what makes this week’s Journal Summit so exciting. It’ll bring together many of the stakeholders in the research process to brainstorm strategies for tackling the numerous challenges that stand between us and open science.

But since most of you won’t be able to attend, I invite your comments regarding the research ecosystem and scientific communication below.

Among the questions I’d appreciate your input on are the following:

  • Should the scientific literature remain at the center of the discovery process, with the related research elements accessible from there?
  • How might preprints, which provide early looks into studies’ findings, serve as a model for the early disclosure and discussion of research methods?
  • What roles and imprimaturs could be afforded by a “publisher” of data?
  • What services should NLM institute to help make its collections and data FAIR (Findable, Accessible, Interoperable, and Reusable)?
  • Where should NLM invest its resources to accelerate the discovery, use, and impact of scientific communication?

Let’s spark a debate here that will rival the best the Journal Summit has to offer!


Help NLM and NIH Shape a Data-Driven Future

It’s a busy and exciting time for the National Library of Medicine and the National Institutes of Health.

This week we released NLM’s strategic plan, A Platform for Biomedical Discovery and Data-Powered Health.  Concurrently the National Institutes of Health announced a draft Strategic Plan for Data Science. The intersection of these two important documents demonstrates the alignment of the NLM vision within the overall thrust at NIH to transform discovery into health.

Positioning NLM for the Future

Representing the work of hundreds of NLM staff, national experts, and commenters from around the world, the NLM strategic plan lays out our current challenges and positions us to address these and emerging issues in biomedical research and public health.

From the need to be present in all environments where health and health care occur—and not just in structured, clinical settings—to the changing nature of libraries and how people pursue information, NLM is ready to embrace the spirit of open science and deliver on the promise of data-driven discovery.

As I’ve noted in previous blog posts, we’re going to get there by building on three pillars:

  • Establishing NLM as a platform for data-driven discovery and health
  • Reaching new users in new ways
  • Enhancing workforce excellence from citizens to scientists

So, what does that mean we will be doing?

We’ve already begun making data more accessible by allowing researchers to deposit data files as supplements to manuscripts they submit to PubMed Central. We’re helping to build the NIH Data Commons and working across NIH to improve identity and access management.

We’ve launched a new research program to devise ways to bring the power of data science into the hands of patients, and we’ll be investing further in data science training for librarians, biomedical researchers, and the bioinformatics community.

We’re also envisioning new research horizons.

We will be investing in novel approaches to curating data and literature, so we can make both more accessible more quickly and efficiently. We’re working with investigators to build needed analytical and visualization tools that can be applied to many different data types. We will be stimulating research in how health information can be presented to the public in fresh and innovative ways. And we will be devising new methods for exploring the literature and linking the key research elements: proposals, data, literature, models, and pipelines.

But that’s just the beginning.

As you read the NLM Strategic Plan, let us know if you see yourself in it.

Are your needs around health information and data represented? Does our vision of a data-driven future sound like something that will energize your research or simplify your work?  Will we be delivering something you need and can use—whether that’s genomic databases and the tools to interrogate them; open resources for citizen-scientists; clear, interactive interfaces for librarians and their patrons; or insights into health care’s tech future for students? What more might we do?

Your comments are welcome and encouraged. Please submit them via the NLM Strategic Plan page.

NIH Strategic Plan for Data Science

NLM does not venture into a data-focused future alone. NIH also works in and advocates for a research world that is increasingly data-driven, and NIH leadership clearly sees and appreciates the scientific opportunities presented by advances in data science.

To capitalize on those opportunities, NIH is developing a Strategic Plan for Data Science. As Dr. Jon Lorsch explained recently to NLM’s Board of Regents, this plan addresses NIH’s overarching goals, strategic objectives, and implementation tactics to modernize what he termed the “NIH-funded biomedical data science ecosystem.”

NIH just published a draft of the strategic plan, along with a Request for Information, to seek input from stakeholders, including members of the scientific and academic communities, health professionals, patient, professional,and advocacy groups, the private sector, and interested members of the public.

I encourage your comments and suggestions on the NIH draft plan. Submit your responses online by March 30, 2018.


Power and Finesse: The NLM Board of Regents

Running an operation as large and complex as the National Library of Medicine is a big job, but I don’t do it alone. In addition to my leadership team, I am privileged to have the NLM Board of Regents to help me.

Established in 1956 by the same Act that created the Library, the Board of Regents advises me on matters ranging from the acquisition of materials for the Library to the scope, content, and organization of NLM’s services to the rules governing access to those materials and services. The Board also makes recommendations for funding research and training in bioinformatics and educational technologies, suggests demonstration projects, and proposes ways to expand or enhance the biomedical communications network of which we are a part.

In short, the Board helps guide the overall work of the Library.

Given the diverse work we do and the breadth of topics we address, the Board’s membership includes leaders from across the library and life sciences, including medicine, public health, and health communications technology. They are joined by nine ex officio members whose positions read like a Who’s Who in health and librarianship, including four Surgeons General (Public Health Service, Army, Navy, and Air Force) and two national library directors (Library of Congress and National Agricultural Library).

Being in the room with them is like driving a Ferrari—things are moving fast but with finesse. And the power under the hood? Phenomenal.

It’s a blast.

With their various areas of expertise and different perspectives, Board members raise questions, highlight issues, or suggest innovations we hadn’t previously considered. Clinicians typically advocate for improvements to information management and delivery. Researchers point us towards important unsolved challenges. Consumer representatives voice the concerns and interests of patients and caregivers. Delegates from business help us leverage cutting-edge solutions coming out of private industry. And our ex officio members, as Federal partners, connect us to other parts of the government whose problems and constraints are similar to our own.

But the value of the Board is more than the individual members’ perspectives.

It’s the synergy that builds by bringing them together three times a year. It’s the lively conversations their close collaboration sparks, as they discuss NLM’s programs, services, and research initiatives. It’s their careful, considered deliberation of our research investments. And, most recently, it’s their collective effort in crafting our strategic plan for the coming decade.

Last week, after 16 months of activities involving over 500 experts and stakeholders, the Board endorsed that plan, positioning NLM for its third century. The plan envisions NLM as a platform for data-driven discovery and data-powered health, built upon three pillars:

  1. Accelerating discovery and advancing health through data-driven research
  2. Reaching more people in more ways through enhanced dissemination and engagement
  3. Building a workforce for data-driven research and health

Now the hard work begins.

Implementing the strategic plan will require fresh perspectives, new talents, and expanded resources. We will need to build a model of trust and accountability among our 1,700 women and men, encouraging them to fully contribute their skills and ideas and to envision their work in novel ways. We will have to make tradeoffs and set priorities. And as we work to make NLM’s bright future a reality, we will need to advocate for and embrace boldness and risk-taking.

Fortunately, we have the NLM Board of Regents to guide the way.

As their work proves, multiple perspectives spur innovation and creative problem-solving; collegiality supports accountability; and respectful advocacy—whether to each other, to the NIH Director, or to the Secretary of Health and Human Services—can lead to tremendous change for the greater good.  What more could we need to accelerate the progress towards our third century?!

Matters of the Heart

The Library’s role in heart health and understanding

February is American Heart Month. Begun in 1964 by presidential proclamation, this annual event focuses the nation’s attention on the importance of heart health.

That focus is well-placed.

According to the CDC, heart disease remains the leading cause of death for both men and women, with about one in four deaths in the United States attributed to its many forms.

With numbers like that, it’s no wonder NLM takes an active role in advancing knowledge about the preservation, diagnosis, and treatment of heart disease and cardiovascular problems. Of course, we’re not doing heart surgery, but in ways great and small—whether through our research, our products, our collection, our funding, or our writing—we help people learn about the heart, how to treat it, and how to keep it healthy.

That knowledge, as with everything we do, is grounded in science and research, so let’s start there.

PubMed, our flagship database of peer-reviewed biomedical literature, delivers the latest findings on heart disease and its prevention. helps people interested in participating in research on heart diseases find a study. And that’s true regardless of how old you are. Over 15% of the nearly 2,900 active, recruiting clinical trials focused on heart disease are open to children.

NLM’s computational science contributed to CRISPR-cas, a revolutionary tool for editing DNA that researchers in Oregon used last year to fix a heart-damaging genetic defect in human embryos, and our medical illustrators have helped NIH’s Human BioMolecular Atlas Program (HuBMAP) create a 3-D visualization of cardiac tissue.

NLM takes an active role in advancing knowledge about the preservation, diagnosis, and treatment of heart disease and cardiovascular problems.

Elsewhere, our research funding supports others as they look to better understand and treat heart disease through data mining, machine learning, and bioinformatics. All across the country, NLM dollars help drive the science forward, funding Dr. Karina Davidson’s work at Columbia that will accelerate the design of personalized solutions to cardiovascular issues; supporting UCLA’s Corey Arnold, PhD, who is trying to help physicians manage heart disease by establishing an automated system to summarize patient records; and financing Dr. James Blum’s group at the University of North Carolina-Wilmington as they try to predict which patients are at risk for developing post-operative complications like atrial fibrillation. Together they—and other NLM grantees—are doing innovative work that is expected to make a difference in preventing, diagnosing, or treating heart disease.

NLM staff then leverage the findings of our grantees and other researchers to deliver the most current guidance to patients, families, and caregivers through MedlinePlusNIH MedlinePlus magazine, and HealthReach. Together, these consumer-oriented resources—available in English, Spanish, and sometimes in other languages—provide practical information to help lay people understand heart disease, learn about risks, diagnosis, and treatment, and find experts to guide them.

But our cardiac contributions don’t end there.

We also collect materials and share insights regarding how medicine’s understanding of the heart and the circulatory system, along with our treatment of heart disease, evolved over time.

Our History of Medicine Division‘s blog, Circulating Now, recently explored William Harvey’s ground-breaking anatomical discoveries published in his 1628 work Exercitatio anatomica de motu cordis et sanguinis in animalibus. Harvey, the personal physician to England’s King Charles I, presented experimental proof of blood’s circulation through the body, elucidating the complex and beautiful interplay between arteries, veins, lungs, and the heart.

And on a more contemporary note, NLM’s Profiles in Science collection shares the papers of notable physicians whose work has transformed the treatment of heart disease, including legendary heart surgeons Michael E. DeBakeyAdrian Kantrowitz, and Henry Swan.

These examples—which range from the 17th century to today—highlight just some of the ways NLM celebrates American Heart Month and encourages good hearth health throughout the year.

You can do your part, too. Take the #MoveWithHeart pledge and get moving. It all adds up to a healthy heart!