AI is coming. Are the data ready?

an atom with the earth as the electrons swirling around it nucleus and data as the

The artificial intelligence (AI) revolution is upon us. You can barely read the paper, watch TV, or see a movie without encountering AI and how it promises to change society. In fact, last month, the President signed an executive order directing the US government to prioritize artificial intelligence in its research and development spending to help drive economic growth and benefit the American people.

Artificial intelligence refers to a suite of computer analysis methods—including machine learning, neural networks, deep learning models, and natural language processing—that can enable machines to function as if possessing human reasoning. With AI, computer systems ingest and analyze vast amounts of data and then “learn” through high-volume repetition how to do the task better and better, “reasoning” or “self-modifying” to improve the analytics that shape the outcome.

That learning process results in some pretty amazing stuff. In the health care field alone, AI can determine the presence or absence of abnormalities in clinical images, predict which patients are at risk for rare disorders, and detect irregular heartbeats.

To make all that happen requires data, massive amounts of data.

But like the computer-era quip, “garbage in, garbage out,” the data need to be good to yield valid analyses. What does “good” mean? Two things:

  • The data are accurate, truly representing the underlying phenomena.
  • The data are unbiased, i.e., the observations reflect the complete experience and no inherent errors were introduced anywhere along the chain from data capture to coding to processing.

As much as we’d like to think otherwise, we already know data are biased. Human genetic sequences drawn from studies of white males of Northern European descent do not adequately represent the genetic diversity within women or people from other parts of the globe. Image data generated by different X-ray machines might show slight variations depending upon how the machines were calibrated. Electrical pathways collected from neurological studies conducted as recently as a decade ago do not reflect the level of resolution possible today.

So, what can we do?

It doesn’t make sense to throw out existing data and start anew, but it can be misleading to apply AI to data known to be biased. And it can be risky. Bias in underlying data can result in algorithms that propagate the same bias, leading to inaccurate findings.

That’s why NLM is working to develop computational approaches to account for bias in existing data sets and why we’re investing in this line of research. In fact, we’re actively encouraging grant applications focused on reducing or mitigating gaps and errors in health data research sets.

I have confidence that researchers will crack the puzzle, but until then, let’s look at how the business intelligence community is approaching the issue.

Concerned with reducing the effect of biases in management decision-making, business intelligence specialists have identified strategies to help uncover patterns and probabilities in data sets. They pair these patterns with AI algorithms to create calibration tools informed by human judgment while taking advantage of the algorithms’ power. That same approach might work with biomedical data.

In addition, our colleagues in business now approach data analysis in ways that help detect bias and limit its impact. They:

  • invest more human resources in interpreting the results of AI analytics, not relying exclusively on the algorithms;
  • challenge decision makers to consider plausible alternative explanations for the generated results; and
  • train decision makers to be skeptical and to anticipate aberrant findings.

There’s no reason we can’t adopt that approach in biomedical research.

So, as you read and think more about the potential of artificial intelligence, remember that AI applications are only as good as the data upon which they are trained and built. Remember, too, that the results of an AI-powered analysis should only factor in to the final decision; they should not be the final arbiter of that decision. After all, the findings may sound good, but they may not be real, just an artifact of biased, imperfect data.

Author: Patti Brennan

Director, US National Library of Medicine

3 thoughts on “AI is coming. Are the data ready?”

  1. Excellent article, with its assessment of the areas of risk and potential for harm, but also of the potential benefits and ways to mitigate the problem areas. Thanks so much for writing it. That phenomenon, for which some of us used the variant phrase, “Garbage In, Gospel Out”, is a real risk in a healthcare setting. One other technique, much related to yours above, is to add another step to the analytics pipeline. Once you have made the customary assessment of which AI analytics techniques work best and which parameters should be used with them, in order to get the best fit and best predictions (based on what you know so far), then ask the question of why these techniques might have worked best. What does this tell you about your underlying data and the real world phenomena which are beneath that?


  2. I am taking a business analytics course now, which has introduced AI in business intelligence. Interesting to learn that the biomedical field may be learning from the field of business. I’d like to learn more about AI in the biomedical field and how the data cleanup and analysis works. It’d be nice to see a demo. Well, since this is a time consuming process, maybe more of a video montage would be appropriate. 🙂


What's on your mind?

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s