The Rise of Computational Linguistics Geeks

Guest post by Dina Demner-Fushman, MD, PhD, staff scientist at NLM.

“So, what do you do for a living?”

It’s a natural dinner party question, but my answer can prompt that glazed-over look we all dread.

I am a computational linguist, also known (arguably) as a specialist in natural language processing (NLP), and I work at the National Library of Medicine.

If I strike the right tone of excitement and intrigue, I might buy myself a few minutes to explain.

My work combines computer science and linguistics, and since I focus on biomedical and clinical texts, it also requires adding some biological, medical, and clinical know-how to the mix.

I work specifically in biomedical natural language processing (BioNLP). The definition of BioNLP has varied over the years, with the spotlight shifting from one task to another—from text mining to literature-based discovery to pharmacovigilance, for example—but the core purpose has remained essentially unchanged: training computers to automatically understand natural language to speed discovery, whether in service of research, patient care, or public health.

The field has been around for a while. In 1969 NIH researchers Pratt and Pacak described the early hope for what we now call BioNLP in the paper, “Automated processing of medical English,” which they presented at a computational linguistics conference:

The development of a methodology for machine encoding of diagnostic statements into a file, and the capability to retrieve information meaningfully from [a] data file with a high degree of accuracy and completeness, is the first phase towards the objective of processing general medical text.

NLM became involved in the field shortly thereafter, first with the Unified Medical Language System (UMLS) and later with tools to support text processing, such as MetaMap and TextTool, all of which we’ve improved and refined over the years. The more recent Indexing Initiative combines these tools with other machine learning methods to automatically apply MeSH terms to PubMed journal articles. (A human checks the computer’s work, revising as needed.)

These and NLM’s other NLP developments help improve the Library’s services, but they are also freely shared with the world, broadening our impact but more importantly, helping to handle the global proliferation of scientific and clinical text.

It’s that last piece that makes NLP so hot right now.

NLP, we’re finding, can take in large numbers of documents and locate relevant content, summarize text, apply appropriate descriptors, and even answer questions.

It’s every librarian’s—and every geek’s—dream.

But how can we use it?

Imagine, for example, the ever-expanding volume of health information around patients’ adverse reactions to medications. At least four different—and prolific—content streams feed into that pool of information:

  • the reactions reported in the literature, frequently in pre-market research (e.g., in the results of clinical trials);
  • the labeled reactions, i.e., the reactions described in the official drug labels provided by manufacturers;
  • the reactions noted in electronic health records and clinical progress notes; and
  • the reactions described by patients in social media.

NLM’s work in NLP—and its funding of extramural research in NLP—is helping develop approaches and resources for extracting and synthesizing adverse drug reactions from all four streams, giving a more complete picture of how people across the spectrum are responding to medications.

It’s a challenging task. Researchers must address different vocabularies and language structures to extract the information, but NLP, and my fellow computational linguists, will, I predict, prove up to it.

Now imagine parents seeking health information regarding their sick child.

NLP can answer their question, first by understanding key elements in the incoming question and then by providing a response, either by drawing upon a database of known answers (e.g., FAQs maintained by the NIH institutes) or by summarizing relevant PubMed or MedlinePlus articles. Such quick access to accurate and trustworthy health information has the potential to save time and to save lives.

We’re not fully there yet, but as our research continues, we get closer.

Maybe it’s time I reconsider how I answer that perennial dinner party question: “I’m a computational linguist, and I help improve health.”

headshot of Dr. Demner-FushmanDina  Demner-Fushman, MD, PhD is a staff scientist in NLM’s Lister Hill National Center for Biomedical Communications. She leads research in information retrieval and natural language processing focused on clinical decision-making, answering clinical and consumer health questions, and extracting information from clinical text.

Words that mean a lot—reflections on swearing in

“I, Patricia Flatley Brennan, do solemnly swear that I will support and defend the Constitution of the United States against all enemies, foreign and domestic; that I will bear true faith and allegiance to the same; that I take this obligation freely, without any mental reservation or purpose of evasion; and that I will well and faithfully discharge the duties of the office on which I am about to enter. So help me God.”

On September 12, 2016, I placed my left hand on Claude Pepper’s copy of the constitution, raised my right hand, and took this oath to became the 19th director of the National Library of Medicine. These words meant a lot to me then, and they continue to guide me. My oath is to support and defend the Constitution, bearing true faith and allegiance to the same.

Of all the words in the Constitution that I must support and defend, the most meaningful to me are “We, the people”. . . for it is the responsibility of the National Library of Medicine to support biomedical discovery and translate those discoveries into information for the health of people—all the people.

More importantly, as a member of the executive branch of the government, I am responsible for implementing legislation that directs the National Library of Medicine to:

  • acquire and preserve books, periodicals, prints, films, recordings and other library materials pertinent to medicine;
  • organize the materials by appropriate cataloging, indexing, and bibliographical listing;
  • publish catalogs, indexes, and bibliographies;
  • make available through loans, photographic, or other copying procedures materials in the library;
  • provide reference and research assistance;
  • engage in such other activities in furtherance of the purposes of this part as (the Surgeon General) deems appropriate and the library’s resources permit;
  • publicize the availability from the Library of the above products and services; and
  • promote the use of computers and telecommunications.

Thank goodness there are over 1,700 women and men to make sure this happens!

During this past year, other words and phrases have influenced and inspired me:

  • Public access

NLM leads the nation and the world in ensuring that everyone, from almost anywhere, can access our resources—from our bibliographic database PubMed to the genetics information in Genbank. Assuring public access means creating vast computer systems and interfaces that allow humans and computers to use our resources. It means helping shape the policies that protect copyright, promote openness, and preserve confidentiality. It means considering the public’s interest as we acquire new resources and design new applications. And, importantly, it means that we provide training and coaching to make our resources accessible, understandable, and actionable.

  • Third century

We date our beginning to books collected by a surgeon in an Army field hospital in 1836. Our first century laid the foundation for purposeful collection of biomedical knowledge, including creating catalogs and devising indexes. Our second century saw the digitization of knowledge and internet communication, delivering our resources at lightning speed around the world. In less than two decades, we begin our third century.

I can only imagine what our third century might bring! What I do know is that it is my job now to put in place a robust human, technical, and policy platform to prepare for our third century.

  • One NLM

It is a common engineering principle that a strong whole depends on strong parts. Indeed, NLM has very strong parts—NCBI with its genomic resources, Library Operations with the power and skill to index the world’s biomedical knowledge, the Lister Hill Center with its machine learning to accelerate the interpretation of images, and more.

During the past year, I have begun to see the crosswalks between our parts—for example, the partnership between our Office of Computer and Communications Systems with Library Operations to serve up vocabularies and the Value Set Authority Center that supports quality care monitoring, and the engagement between Specialized Information Services and the Lister Hill Center to build PubChem and Toxnet services.

We are poised to address the challenges laid out for us in 1956 not by building a single service to address each one, but to knit together the best of several services to efficiently and effectively advance health and biomedical discovery through information.

The ideas of Nina Matheson have helped shape my entire career. As Director of NLM, her words have taken on increased importance to me. In 1982, she talked about librarians as tool builders and system developers and solvers of information problems.

Inspired by these words through my first year, I embrace the idea—and, indeed, the ideal—that the library is the solution engine that will accelerate discovery in support of health for everyone.

Like the Constitution says, it all starts with “We the people.”

 

 

Larry Weed’s Legacy and the Next Generation of Clinical Decision Support

Guest post by Lincoln Weed, son of the late Dr. Lawrence L. Weed and co-author with him of the book Medicine in Denial  and other publications. Dr. Weed, who died June 3, 2017, was the originator of “knowledge coupling” tools for clinical decision support and the problem-oriented medical record, including its problem list and SOAP note components.

“Patients are sitting on a treasure trove of data about their own medical conditions.”

My late father, Dr. Lawrence L. Weed (LLW), made this point the day before he died. He was talking about the lost wealth of neglected patient data—readily available, richly detailed data that too often go unidentified and unexamined. Why does that happen, and what can be done about it?

The risk of missed information

From the very outset of medical problem-solving, LLW argued, patients and practitioners face greater risk of loss and harm than they may realize. The risk arises as soon as a patient starts an internet search about a medical problem, or as soon as a practitioner starts questioning the patient about the problem (whether diagnostic or therapeutic).

This gap creates high risk that information crucial to solving the patient’s problem will be missed.

Ideally, these initial inquiries would somehow take into account the entire universe of collectible patient data and vast medical knowledge about what the data mean. But such thoroughness is more than the human mind can deliver.

This gap creates high risk that information crucial to solving the patient’s problem will be missed. And whatever information the mind does deliver is not recorded and harvested in a manner that permits organized feedback and continuous improvement.

Guidance tools set standard of care

The only secure way to proceed, LLW concluded, is to begin investigation of medical problems (the “initial workup”) using guidance tools external to the mind. These tools must couple patient-specific data with general knowledge as follows:

  • Link the initial data point (i.e., the patient’s presenting problem) with (1) medical knowledge about potentially relevant options and (2) readily available data for identifying those options (see the outer circle in the diagram below);
  • Link the data in (2), once collected, with the knowledge in (1) to show how well the data match up with the combinations of data points defining each relevant option—this matching indicates which options are worth considering for the individual (see the middle circle in the diagram below); and
  • Organize this information (data coupled with knowledge) into options and evidence—that is, diagnostic possibilities or therapeutic alternatives, the combined findings (positive, negative, or uncertain) on each alternative, and additional knowledge useful for assessing the best option to pursue (see the inner circle in the diagram below).
Three concentric circles showing (outside) potentially relevant options; (middle) options worth investigating; and (center) best options for this individual
For further explanation of the above diagram, see pp. 72-74 of the book Medicine in Denial.

Tools to carry out these steps would define best practices and make them enforceable as high standards of care for the initial workup (i.e., patient history, physical exam, and basic lab tests). That threshold task is pivotal. It lays the informational foundation for follow-up thought and action by the patient and practitioner. That foundation is also needed for feedback activities to and from third parties. (See the diagram on p. 13 of Medicine in Denial.)

Patient-driven tools

In carrying out the initial workup, the patient’s role is always central. The tools should enable patients to enter history data, which is often the most detailed component of the initial workup. Moreover, the patient necessarily participates in the physical exam conducted by the practitioner, and reviews history, physical, and lab findings with the practitioner.

Tools for the initial workup must thus be used by patients and practitioners jointly. But patients must be able to initiate use of the tools unilaterally. They can’t rely on practitioners to recognize when serious medical investigation is needed. Patients are the ones who experience symptoms—who notice changes from what feels normal. To investigate whether these symptoms might be medically significant, patients need web-based tools for problem-specific inquiries. So do healthy persons who may simply require periodic screening checkups for unidentified problems (plus initial workup of any problems discovered).

Overcoming the medical Tower of Babel

Whether it is patients or practitioners seeking guidance for the initial workup, traditional medical practice leaves them both in a vacuum. Once that vacuum was filled solely by practitioners’ idiosyncratic judgments. Now the vacuum is also being filled with a plethora of practice guidelines and clinical decision support tools, not to mention internet search engine tools.

But the very multiplicity of all these resources defeats the purpose of defining generally accepted, enforceable best practices for initial workups. And the multiplicity is increasing with new patient-generated health data from sensors, wearables, and smartphone-connected devices for physical exam data.  Moreover, the universe for needed guidance is expanding with vast new genomic/molecular data and knowledge.

The outcome of this multiplicity is not useful diversity but a Tower of Babel.

What we need instead are information tools with a unified design and trustworthy medical content, tools that guide users through the basic steps for inquiry into all medical problems, tools that take into account relevant information from all specialties without intellectual or financial biases. Users should not have to switch back and forth among different tools and interfaces for different medical problems, different specialties, different practice settings, different data types, different vendors, and different classes of users. The medical content captured in the tools must be problem-specific, but the tools’ basic design (see the three bullets above) should generalize to all problems in all contexts, as much as possible. This generality enables intuitive ease-of-use at the user level and powerful synergies at the software development level.

NLM’s role for the 21st century

LLW saw NLM as key to developing tools of this kind.

Drawing on its uniquely comprehensive electronic repository of medical content, NLM could create a new repository of distilled, structured knowledge. Drawing on its connections with the NIH research institutes and federal health agencies such as the CDC and FDA, NLM could rapidly incorporate new knowledge into that specialized repository. Outside parties and NLM itself could use that repository to build user-level tools with a unified design for conducting initial workups on specific medical problems.

Drawing on its uniquely comprehensive electronic repository of medical content, NLM could create a new repository of distilled, structured knowledge.

By enabling creation of such a knowledge infrastructure for the public, NLM would seize an “opportunity to modernize the conceptualization of a ‘library.’” Beyond its current electronic repository, NLM could be “demonstrating how information and knowledge can best be developed, assimilated, organized, applied, and disseminated in the 21st century.”  [NIH Advisory Committee to the Director, NLM Working Group, Final Report, p. 12 (June 11, 2015).]

This new infrastructure will encounter a barrier to its use—the medical practice status quo. Not all practitioners (or their overseers) will accept the data collection demands defined by the tool.

Patients at the center

Here we return to the central role of patients.

Patients who unilaterally use NLM tools to complete the history portion of the initial workup can then seek out practitioners who are willing (and permitted) to use the same tools for the physical exam and basic lab test portions. By creating demand for those innovative practitioners and using the tools jointly with them, patients can drive medical practice toward a foundational reform.

* * *

book cover for Medicine in Denial by Lawrence and Lincoln WeedReaders who have questions about the above are referred to the fuller discussion of these ideas in the book Medicine in Denial (PDF | published work), especially parts IV.E, F, and G, pages 192-194, and the diagram on page 13. The author also invites comments below.


Lincoln Weed, JD, Dr. Lawrence Weed’s son, practiced employee benefits law in Washington, DC for 26 years. He then joined a consulting firm where he specialized in health information privacy. He is now retired.

A Year of Connections

A look back on the Associate Fellowship year

Guest post by the 2016-2017 NLM Associate Fellows.

NLM Associate Fellowship Coordinator Kathel Dunn introduced our program in her post “NLM Associate Fellows Spark Library Alchemy,” speaking to the process of transformation this program can facilitate. As we, the 2016-2017 NLM Associate Fellows, prepare to bid NLM a fond farewell, we would like to share some reflections from our year and how we have been changed by our time here.

From our arrival on September 1, 2016, we knew this year would be different, as we began our fellowship only a few weeks after the arrival of Dr. Patricia Flatley Brennan, NLM’s then recently appointed director. One of our first activities as a cohort was to attend Dr. Brennan’s swearing-in ceremony, and the issues that shaped her first year–strategic planning and envisioning NLM’s future –became threads woven throughout our formative curriculum sessions and have been intertwined with our experiences throughout the program.

The heart of this program resides in the people of NLM and their willingness to share their time and knowledge with us. Throughout the curriculum, and later during the project phase of our year, NLM staff from all levels of the organization made themselves available for conversation, connection, and collaboration. As a result, we were able to build relationships across the Library and see connections and interdependencies across departments.

The capacity for professional relationship building also extends beyond the walls of NLM. Each member of this year’s cohort led a spring project that relied on collaboration with an external partner. Megan Fratta conducted focus groups with cancer researchers across NIH and NCI to assess their PubMed-related training needs. Kendra Godwin interviewed open science policy makers, advocates, and innovators from across the global research community in her efforts to define open science at NLM. Tyler Moses conducted an information needs assessment for the residents of the Children’s Inn, the hospitality house for children and their families who participate in research trials at NIH. And, as part of an interagency collaboration between NLM and the FDA, Candace Norton investigated enhancements to search filters to support pharmacovigilance.

Among the other connections fostered through this program are those between the members of the cohort and NLM’s senior leadership. As a group, we met with each senior leader to discuss what makes NLM unique, what makes an exemplary leader, and how best to prepare for a career in a rapidly evolving profession. Their collective wisdom and insight are invaluable at this stage in our careers.

Perhaps the most important connections are those we’ve formed with each other, thanks to the program’s cohort learning model. We’ve been a fellowship of four, learning more because of each other and the collective insights of our shared experience, and from the conversations this year has inspired. The program under its current name has existed since 1966, and we’ve been impressed with the level of support from the Associate Fellows who preceded us, and the significant contributions they’ve made to the program, to NLM, and to the profession.

As we conclude our fellowship year at NLM and make space for the incoming 2017-2018 cohort’s arrival on September 1, we leave you with our respect and gratitude for making this opportunity possible. Thank you for a fascinating and life-changing year at NLM!

four young women, professional dressed, pose as a group

Guest bloggers (from left) Candace Norton, Megan Fratta, Kendra Godwin, and Tyler Moses served as 2016-2017 NLM Associate Fellows.

Ah, but I was so much older then / I’m younger than that now

I’m just passing the one-year mark in my tenure as Director of the National Library of Medicine. It has been an exciting year for me, filled with many learnings and lessons, and with each week I grow more delighted with this outstanding organization. I have the great good fortune of having taken a leap into an uncertain-but-promising opportunity and finding it to be more rewarding, more delightful, and more engaging than I had anticipated—and I took this position with very high hopes!

I have grown a lot since I arrived here in August 2016, and as the master balladeer Bob Dylan noted, “I was so much older then / I’m younger than that now.” A deep passion for NLM, its mission, its resources, and the people who work here replaces the early hope and excitement that accompanied me on my move to Bethesda. The bravado of vision is supplanted by the realities of working in the federal system. Acronyms and abbreviations now evoke people and processes for me. I have learned to appreciate the rich tapestry of scholarship, service, citations, and collections that make up the NLM. I have met many of our stakeholders and have come to see them as collaborators. And I’ve developed a new appreciation of the Library, not simply as a collection of resources, but also as a dynamic interaction of health and information.

Here are some surprises. I am struck by a sense of patriotism I found resting quietly deep in my soul. As the director of the only federally funded health library, I am responsible for ensuring our resources are expended in support of the public’s health—supporting discovery, knowledge delivery, and personal health management. I am proud of the 1,700 women and men who choose to work here, applying their knowledge and talents in service of society. And I am committed to weaving the tenets of open science through the mantle of government service.

I am amazed at how big the Library is—not just our buildings, with their byzantine hallways and underground spaces, but the human and electronic reach. Because of our 6,500-member National Network of Libraries of Medicine, the NLM has a footprint in almost every single county, and in American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and the Virgin Islands. There is no country in the world that our resources can’t touch. We have 26 million citations in our PubMed bibliographic repositories, and petabytes of data moving in and out of our NCBI resources EVERY DAY!

And I am grateful—to the security guards who help protect our precious holdings, to our scientists who are finding ways to use literature and data to help our nation meet health crises such as the opioid epidemic, to our technical services team who keep our resources available 24 hours a day. I am grateful to the staff who have greeted me with welcome and patiently reminded me of their names. I am making progress, but I’ve still got a lot to learn.

Dylan’s words appeal to me because they characterize the arch of a journey, from initial awareness through growing familiarity to deep realization that the National Library of Medicine is truly a national treasure, and I am both humbled and proud to be guiding it towards its third century of service.

Photo credit (hourglass, top): Scott Schrantz [Flickr (CC BY-NC-SA 2.0)]