It Takes a Whole Library to Create a World of Data-powered Health

Data-powered health heralds a revolution in medical research and health care.

Data-powered health relies upon knowing more—more input in the moment, more details across systems, more people (and their data) contributing to the overall picture.

Data-powered health ushers in a new biomedical research paradigm in which patient-generated data complements clinical, observational, and experimental data to create a boundless pool we can explore. New tools based in text mining, deep learning, and artificial intelligence will allow researchers to probe that vast data pool to isolate patterns, determine trends, and predict outcomes, all while preserving patient privacy.

As a result, data-powered health promises personalized health care at a level never before seen. It signals a time when tracking one’s own health data becomes the foundation of personal health management, with sensors—coupled with something like a smartphone—delivering tailored, up-to-the-moment health coaching.

The National Library of Medicine will play an important role in the future of data-powered health. Each of our divisions has something to contribute. NCBI’s identity and access management systems will ensure a solid core for the NIH gateway to data sharing. Researchers in the Lister Hill Center can apply machine learning, computational linguistics, and natural language processing to make sense out of large, diverse data sources, whether that’s the text within medical records or large numbers of X-rays. Library Operations staff will manage the extensive terminologies that support the necessary interoperability. Specialized Information Services’ experience with disaster information management will help us ensure data remains available even with limited or no internet access. And the National Network of Libraries of Medicine will continue to partner with libraries across the country to support the public as they join this strange, new world.

Together these and other areas of excellence give NLM a solid foundation, but NLM itself must grow and develop to become the NIH hub for data science. We must develop data management skills and knowledge among the Library’s workforce. We must also partner with the other NIH institutes and centers, and with scientists around the country, to complement, not duplicate, data science efforts; to build the technical infrastructure for finding and linking data sets stored in the cloud; to shape best practices for curating data; and to craft policies that support exploration and inquiry while preserving patient privacy.

The ultimate goal is for NLM to do for data what we have already done for the literature—formulating sound, systematic approaches to acquiring and curating data sets, devising the technical platforms to ensure the data’s permanence, and creating human and computer-targeted interfaces that deliver these data sets to those around the world who need them.

We continue to discuss how best to create an organizational home for data science at NLM, and I welcome your ideas. How would you establish a visible, accessible, and stable home for data science at NLM while building upon our expertise and our tradition of collaboration?

One Library, Many Worlds

Back in January, I wrote about One NLM, an idea that acknowledges the particular contributions of each division within the Library while supporting greater engagement across our programs, all aligned toward a common vision.

I wrote that post primarily for NLM staff, but in the intervening nine months, I’ve discovered I need to take the message of One NLM to those outside the Library as much as to those within it.

As I attend conferences and meet members of the many groups NLM serves, I’ve learned the role of the Library is in the eye of the beholder. Librarians see bibliographic resources. Scientists see tools for discovery, clinicians tools for diagnosis and care. Potential post-docs see opportunities for training, and teachers see resources for learning. Even though we are one NLM, we are viewed from those various perspectives more as parts than a whole.

That’s not necessarily a bad thing, but I am working to make our stakeholders aware of both the parts they don’t naturally see and the single purpose that unites those parts.

Our core services are undeniably diverse. We acquire and preserve health and biomedical knowledge across disciplines and across the ages, and then devise platforms and processes to make this knowledge available to clinicians, researchers, and patients. We conduct research to develop more efficient ways to search the literature and to apply computational approaches, such as machine learning and natural language processing, to clinical data and published works to extract specific information. We also take advantage of our own genomic and other sequence data bases to discover the structure and functions of various genes and to create models of functional domains in proteins.

Given that diversity, it makes sense that those who use the Library might focus on one or a few of those services more than others, but for me and for the 1,700 women and men who work here, these services all contribute to one single vision: NLM as a platform for discovery.

Sometimes discovery comes by exploring PubMed’s literature citations to ground a new research program, other times by extracting gene sequences and their respective phenotypes from dbGaP, and yet other times by finding the perfect exercise to supplement a lesson plan.

In the end perhaps the lesson for all of us is that NLM is ultimately both its parts and its whole.

And my role is to help our many audiences better understand their favorite parts while learning more about the totality of who we are and how we serve society.

The Rise of Computational Linguistics Geeks

Guest post by Dina Demner-Fushman, MD, PhD, staff scientist at NLM.

“So, what do you do for a living?”

It’s a natural dinner party question, but my answer can prompt that glazed-over look we all dread.

I am a computational linguist, also known (arguably) as a specialist in natural language processing (NLP), and I work at the National Library of Medicine.

If I strike the right tone of excitement and intrigue, I might buy myself a few minutes to explain.

My work combines computer science and linguistics, and since I focus on biomedical and clinical texts, it also requires adding some biological, medical, and clinical know-how to the mix.

I work specifically in biomedical natural language processing (BioNLP). The definition of BioNLP has varied over the years, with the spotlight shifting from one task to another—from text mining to literature-based discovery to pharmacovigilance, for example—but the core purpose has remained essentially unchanged: training computers to automatically understand natural language to speed discovery, whether in service of research, patient care, or public health.

The field has been around for a while. In 1969 NIH researchers Pratt and Pacak described the early hope for what we now call BioNLP in the paper, “Automated processing of medical English,” which they presented at a computational linguistics conference:

The development of a methodology for machine encoding of diagnostic statements into a file, and the capability to retrieve information meaningfully from [a] data file with a high degree of accuracy and completeness, is the first phase towards the objective of processing general medical text.

NLM became involved in the field shortly thereafter, first with the Unified Medical Language System (UMLS) and later with tools to support text processing, such as MetaMap and TextTool, all of which we’ve improved and refined over the years. The more recent Indexing Initiative combines these tools with other machine learning methods to automatically apply MeSH terms to PubMed journal articles. (A human checks the computer’s work, revising as needed.)

These and NLM’s other NLP developments help improve the Library’s services, but they are also freely shared with the world, broadening our impact but more importantly, helping to handle the global proliferation of scientific and clinical text.

It’s that last piece that makes NLP so hot right now.

NLP, we’re finding, can take in large numbers of documents and locate relevant content, summarize text, apply appropriate descriptors, and even answer questions.

It’s every librarian’s—and every geek’s—dream.

But how can we use it?

Imagine, for example, the ever-expanding volume of health information around patients’ adverse reactions to medications. At least four different—and prolific—content streams feed into that pool of information:

  • the reactions reported in the literature, frequently in pre-market research (e.g., in the results of clinical trials);
  • the labeled reactions, i.e., the reactions described in the official drug labels provided by manufacturers;
  • the reactions noted in electronic health records and clinical progress notes; and
  • the reactions described by patients in social media.

NLM’s work in NLP—and its funding of extramural research in NLP—is helping develop approaches and resources for extracting and synthesizing adverse drug reactions from all four streams, giving a more complete picture of how people across the spectrum are responding to medications.

It’s a challenging task. Researchers must address different vocabularies and language structures to extract the information, but NLP, and my fellow computational linguists, will, I predict, prove up to it.

Now imagine parents seeking health information regarding their sick child.

NLP can answer their question, first by understanding key elements in the incoming question and then by providing a response, either by drawing upon a database of known answers (e.g., FAQs maintained by the NIH institutes) or by summarizing relevant PubMed or MedlinePlus articles. Such quick access to accurate and trustworthy health information has the potential to save time and to save lives.

We’re not fully there yet, but as our research continues, we get closer.

Maybe it’s time I reconsider how I answer that perennial dinner party question: “I’m a computational linguist, and I help improve health.”

headshot of Dr. Demner-FushmanDina  Demner-Fushman, MD, PhD is a staff scientist in NLM’s Lister Hill National Center for Biomedical Communications. She leads research in information retrieval and natural language processing focused on clinical decision-making, answering clinical and consumer health questions, and extracting information from clinical text.

Words that mean a lot—reflections on swearing in

“I, Patricia Flatley Brennan, do solemnly swear that I will support and defend the Constitution of the United States against all enemies, foreign and domestic; that I will bear true faith and allegiance to the same; that I take this obligation freely, without any mental reservation or purpose of evasion; and that I will well and faithfully discharge the duties of the office on which I am about to enter. So help me God.”

On September 12, 2016, I placed my left hand on Claude Pepper’s copy of the constitution, raised my right hand, and took this oath to became the 19th director of the National Library of Medicine. These words meant a lot to me then, and they continue to guide me. My oath is to support and defend the Constitution, bearing true faith and allegiance to the same.

Of all the words in the Constitution that I must support and defend, the most meaningful to me are “We, the people”. . . for it is the responsibility of the National Library of Medicine to support biomedical discovery and translate those discoveries into information for the health of people—all the people.

More importantly, as a member of the executive branch of the government, I am responsible for implementing legislation that directs the National Library of Medicine to:

  • acquire and preserve books, periodicals, prints, films, recordings and other library materials pertinent to medicine;
  • organize the materials by appropriate cataloging, indexing, and bibliographical listing;
  • publish catalogs, indexes, and bibliographies;
  • make available through loans, photographic, or other copying procedures materials in the library;
  • provide reference and research assistance;
  • engage in such other activities in furtherance of the purposes of this part as (the Surgeon General) deems appropriate and the library’s resources permit;
  • publicize the availability from the Library of the above products and services; and
  • promote the use of computers and telecommunications.

Thank goodness there are over 1,700 women and men to make sure this happens!

During this past year, other words and phrases have influenced and inspired me:

  • Public access

NLM leads the nation and the world in ensuring that everyone, from almost anywhere, can access our resources—from our bibliographic database PubMed to the genetics information in Genbank. Assuring public access means creating vast computer systems and interfaces that allow humans and computers to use our resources. It means helping shape the policies that protect copyright, promote openness, and preserve confidentiality. It means considering the public’s interest as we acquire new resources and design new applications. And, importantly, it means that we provide training and coaching to make our resources accessible, understandable, and actionable.

  • Third century

We date our beginning to books collected by a surgeon in an Army field hospital in 1836. Our first century laid the foundation for purposeful collection of biomedical knowledge, including creating catalogs and devising indexes. Our second century saw the digitization of knowledge and internet communication, delivering our resources at lightning speed around the world. In less than two decades, we begin our third century.

I can only imagine what our third century might bring! What I do know is that it is my job now to put in place a robust human, technical, and policy platform to prepare for our third century.

  • One NLM

It is a common engineering principle that a strong whole depends on strong parts. Indeed, NLM has very strong parts—NCBI with its genomic resources, Library Operations with the power and skill to index the world’s biomedical knowledge, the Lister Hill Center with its machine learning to accelerate the interpretation of images, and more.

During the past year, I have begun to see the crosswalks between our parts—for example, the partnership between our Office of Computer and Communications Systems with Library Operations to serve up vocabularies and the Value Set Authority Center that supports quality care monitoring, and the engagement between Specialized Information Services and the Lister Hill Center to build PubChem and Toxnet services.

We are poised to address the challenges laid out for us in 1956 not by building a single service to address each one, but to knit together the best of several services to efficiently and effectively advance health and biomedical discovery through information.

The ideas of Nina Matheson have helped shape my entire career. As Director of NLM, her words have taken on increased importance to me. In 1982, she talked about librarians as tool builders and system developers and solvers of information problems.

Inspired by these words through my first year, I embrace the idea—and, indeed, the ideal—that the library is the solution engine that will accelerate discovery in support of health for everyone.

Like the Constitution says, it all starts with “We the people.”



Larry Weed’s Legacy and the Next Generation of Clinical Decision Support

Guest post by Lincoln Weed, son of the late Dr. Lawrence L. Weed and co-author with him of the book Medicine in Denial  and other publications. Dr. Weed, who died June 3, 2017, was the originator of “knowledge coupling” tools for clinical decision support and the problem-oriented medical record, including its problem list and SOAP note components.

“Patients are sitting on a treasure trove of data about their own medical conditions.”

My late father, Dr. Lawrence L. Weed (LLW), made this point the day before he died. He was talking about the lost wealth of neglected patient data—readily available, richly detailed data that too often go unidentified and unexamined. Why does that happen, and what can be done about it?

The risk of missed information

From the very outset of medical problem-solving, LLW argued, patients and practitioners face greater risk of loss and harm than they may realize. The risk arises as soon as a patient starts an internet search about a medical problem, or as soon as a practitioner starts questioning the patient about the problem (whether diagnostic or therapeutic).

This gap creates high risk that information crucial to solving the patient’s problem will be missed.

Ideally, these initial inquiries would somehow take into account the entire universe of collectible patient data and vast medical knowledge about what the data mean. But such thoroughness is more than the human mind can deliver.

This gap creates high risk that information crucial to solving the patient’s problem will be missed. And whatever information the mind does deliver is not recorded and harvested in a manner that permits organized feedback and continuous improvement.

Guidance tools set standard of care

The only secure way to proceed, LLW concluded, is to begin investigation of medical problems (the “initial workup”) using guidance tools external to the mind. These tools must couple patient-specific data with general knowledge as follows:

  • Link the initial data point (i.e., the patient’s presenting problem) with (1) medical knowledge about potentially relevant options and (2) readily available data for identifying those options (see the outer circle in the diagram below);
  • Link the data in (2), once collected, with the knowledge in (1) to show how well the data match up with the combinations of data points defining each relevant option—this matching indicates which options are worth considering for the individual (see the middle circle in the diagram below); and
  • Organize this information (data coupled with knowledge) into options and evidence—that is, diagnostic possibilities or therapeutic alternatives, the combined findings (positive, negative, or uncertain) on each alternative, and additional knowledge useful for assessing the best option to pursue (see the inner circle in the diagram below).
Three concentric circles showing (outside) potentially relevant options; (middle) options worth investigating; and (center) best options for this individual
For further explanation of the above diagram, see pp. 72-74 of the book Medicine in Denial.

Tools to carry out these steps would define best practices and make them enforceable as high standards of care for the initial workup (i.e., patient history, physical exam, and basic lab tests). That threshold task is pivotal. It lays the informational foundation for follow-up thought and action by the patient and practitioner. That foundation is also needed for feedback activities to and from third parties. (See the diagram on p. 13 of Medicine in Denial.)

Patient-driven tools

In carrying out the initial workup, the patient’s role is always central. The tools should enable patients to enter history data, which is often the most detailed component of the initial workup. Moreover, the patient necessarily participates in the physical exam conducted by the practitioner, and reviews history, physical, and lab findings with the practitioner.

Tools for the initial workup must thus be used by patients and practitioners jointly. But patients must be able to initiate use of the tools unilaterally. They can’t rely on practitioners to recognize when serious medical investigation is needed. Patients are the ones who experience symptoms—who notice changes from what feels normal. To investigate whether these symptoms might be medically significant, patients need web-based tools for problem-specific inquiries. So do healthy persons who may simply require periodic screening checkups for unidentified problems (plus initial workup of any problems discovered).

Overcoming the medical Tower of Babel

Whether it is patients or practitioners seeking guidance for the initial workup, traditional medical practice leaves them both in a vacuum. Once that vacuum was filled solely by practitioners’ idiosyncratic judgments. Now the vacuum is also being filled with a plethora of practice guidelines and clinical decision support tools, not to mention internet search engine tools.

But the very multiplicity of all these resources defeats the purpose of defining generally accepted, enforceable best practices for initial workups. And the multiplicity is increasing with new patient-generated health data from sensors, wearables, and smartphone-connected devices for physical exam data.  Moreover, the universe for needed guidance is expanding with vast new genomic/molecular data and knowledge.

The outcome of this multiplicity is not useful diversity but a Tower of Babel.

What we need instead are information tools with a unified design and trustworthy medical content, tools that guide users through the basic steps for inquiry into all medical problems, tools that take into account relevant information from all specialties without intellectual or financial biases. Users should not have to switch back and forth among different tools and interfaces for different medical problems, different specialties, different practice settings, different data types, different vendors, and different classes of users. The medical content captured in the tools must be problem-specific, but the tools’ basic design (see the three bullets above) should generalize to all problems in all contexts, as much as possible. This generality enables intuitive ease-of-use at the user level and powerful synergies at the software development level.

NLM’s role for the 21st century

LLW saw NLM as key to developing tools of this kind.

Drawing on its uniquely comprehensive electronic repository of medical content, NLM could create a new repository of distilled, structured knowledge. Drawing on its connections with the NIH research institutes and federal health agencies such as the CDC and FDA, NLM could rapidly incorporate new knowledge into that specialized repository. Outside parties and NLM itself could use that repository to build user-level tools with a unified design for conducting initial workups on specific medical problems.

Drawing on its uniquely comprehensive electronic repository of medical content, NLM could create a new repository of distilled, structured knowledge.

By enabling creation of such a knowledge infrastructure for the public, NLM would seize an “opportunity to modernize the conceptualization of a ‘library.’” Beyond its current electronic repository, NLM could be “demonstrating how information and knowledge can best be developed, assimilated, organized, applied, and disseminated in the 21st century.”  [NIH Advisory Committee to the Director, NLM Working Group, Final Report, p. 12 (June 11, 2015).]

This new infrastructure will encounter a barrier to its use—the medical practice status quo. Not all practitioners (or their overseers) will accept the data collection demands defined by the tool.

Patients at the center

Here we return to the central role of patients.

Patients who unilaterally use NLM tools to complete the history portion of the initial workup can then seek out practitioners who are willing (and permitted) to use the same tools for the physical exam and basic lab test portions. By creating demand for those innovative practitioners and using the tools jointly with them, patients can drive medical practice toward a foundational reform.

* * *

book cover for Medicine in Denial by Lawrence and Lincoln WeedReaders who have questions about the above are referred to the fuller discussion of these ideas in the book Medicine in Denial (PDF | published work), especially parts IV.E, F, and G, pages 192-194, and the diagram on page 13. The author also invites comments below.

Lincoln Weed, JD, Dr. Lawrence Weed’s son, practiced employee benefits law in Washington, DC for 26 years. He then joined a consulting firm where he specialized in health information privacy. He is now retired.