Can “Nudging” Help?: Improving Clinical Trial Access Using Artificial Intelligence for Standardization

Guest post by Presidential Innovation Fellows Justin Koufopoulos and Gil Alterovitz, PhD.

Getting into a clinical trial is challenging for patients. Researchers estimate that only 5% of patients eligible to participate in a cancer clinical trial actually take part in one.  Many factors impact this statistic, including how findable and accessible is information about the clinical trials.

Patients often learn about clinical trials from their doctors or through patient advocacy groups like the American Cancer Society. They then typically search for trials on the internet, often ending up on websites like the NIH-run or

Once on these websites, patients still face challenges to access. Prime among them: what search terms to use to find relevant trials.

The terms a patient or doctor uses may not match how researchers running a trial describe the focus of their study, for example “breast cancer” vs. “ductal carcinoma.” While the NIH clinical trials databases track synonyms and work to make the proper matches, users cannot escape this recurring mismatch in language that challenges access.

This challenge becomes even more pronounced with clinical trial eligibility criteria. These criteria describe who can and cannot participate in a study. For example, an eligibility criterion might be “age 18 years or older” or “confirmed breast lesions that can proceed to removal without chemotherapy.” While a computer can easily match a patient to the first criterion, the second involves many more concepts that are harder to separate, understand, and match.

Artificial intelligence can be part of the solution, particularly “machine learning,” which leverages data to teach a program how to make predictions on chosen topics.

Various technology companies have already used machine learning to address language translation problems. As a result, computers can now translate English to Japanese with few errors, and speech-to-text applications can translate human speech to computer inputs and can even reply.

We adopted a similar, albeit scaled back, approach to translate diverse clinical trials eligibility criteria into standardized and structured language. We also drew inspiration from writing tools that help writers improve their text’s readability and grammar.

Instead of highlighting repeated words or sentences in the passive voice, our prototype nudges researchers toward writing eligibility criteria in a way more easily translated by machine. It offers feedback and suggestions, almost like an English language tutor, and proposes alternative ways to write the criteria that would make them more standard and eventually, more translatable.

Sample Word text with track changes

screen shot from within the eligibility criteria normalizer showing alternate phrasings for a sample criterio
We drew inspiration from revision tracking and grammar-type tools to design our standardization tool for researchers.

This shift toward more standardized language can make it easier to match content across databases, such as matching a list of patients with a set of conditions and prior treatments.

The prototype also helps researchers understand the consequences of their word choices. It looks at previous studies with similar eligibility criteria and notes how many participants they recruited. Additionally, input from consensus-based standards may also be presented.  While not a perfect metric for inclusiveness, this feedback shows someone running a study how their word choices compare to others and the potential impact of those choices on their study’s overall success.

Research by academic psychologists has shown that nudging works in a wide variety of settings. To the best of our knowledge, this is the first time a nudge has been used to coach researchers, but these nudges are not requirements. Researchers can still write their eligibility criteria in the way they think makes the most sense. However, by moving researchers toward standardized phrasings, our prototype can help computers match patient characteristics with eligibility criteria and potentially get more eligible patients into clinical trials.

More work is needed before we can fully implement our tool and test at scale, but we are making progress. We recently completed a pilot study with non-federal groups to determine whether the structured data (so, not the nudging agent but the data our tool learns from) could be used to create tools to help with clinical trials access. Our findings were positive, confirming that private industry and academia need more data like ours for building artificial intelligence tools. The work was featured by the White House on as an example for “Industries of the Future.”

The Health Sprint piloting effort included physicians and patient advocates as well as data stewards and experts in the relevant domain areas from within government. For example, Rick Bangs, MBA, PMP, a patient advocate, has worked with various organizations including the National Cancer Institute and the development team. Regarding clinical trial matching, Bangs noted, “The solution here will require vision, and that vision will cross capabilities that no one supplier will individually have.”

Next up, we need to evaluate whether this tool helps researchers write eligibility criteria in the “real world,” where all innovations must live.

headshot of Justin KoufopoulosJustin Koufopoulos is a Presidential Innovation Fellow and product manager working to making clinical research more patient-centered. He has worked with the White House, CIO Council, National Library of Medicine, General Services Administration, Department of Commerce, and Veterans Administration on issues ranging from internet access to artificial intelligence.

headshot of Gil AlterovitzGil Alterovitz, PhD, FACMI, is a Presidential Innovation Fellow who has worked on bridging data ecosystems and artificial intelligence at the interface of several federal organizations, including the White House, National Cancer Institute, General Services Administration, CIO Council, and Veterans Administration.

The Presidential Innovation Fellowship brings together top innovators and their insights from outside of government, including the private sector, non-profits, and academia. Their insights are brought to bear on some of the most challenging problems within government and its agencies. The goal is to challenge existing paradigms by rethinking problems and leveraging novel, agile approaches. PIF was congressionally mandated under HR 39, the Tested Ability to Leverage Exceptional National Talent (TALENT) Act. The program is administered as a partnership between the White House Office of Science and Technology Policy, the White House Office of Management and Budget, and the General Services Administration.

Expanding Access, Improving Health

Guest post by Kathryn Funk, program manager for NLM’s PubMed Central.

Last week, National Library Week celebrated how libraries and library workers make our communities stronger. In the spirit of building strong communities, NLM has committed to “democratiz[ing] access to the products and processes of scientific research.”

NLM delivers on that commitment by supporting the NIH Public Access Policy. This policy, passed by Congress in 2008, requires authors funded by NIH to make publicly accessible in PubMed Central (PMC) any peer-reviewed paper accepted for publication. Now, over a decade after the NIH Public Access Policy went in to effect, PMC makes more than 1 million NIH-funded papers available to the research community and the public. This volume of publicly accessible, NIH-funded papers represents a clear return on investment for the public, but numbers alone don’t provide the full story.

A quick dive into NIH Research Matters, a weekly update of NIH research highlights, offers a much richer and more personal picture of how the NIH Public Access Policy and NLM’s support of it can strengthen and empower communities. Making NIH-funded papers publicly accessible in PMC means that the public has free and direct access to research that touches on some of the most critical public health concerns facing our community, including studies that:

  • Suggest a method for detecting breast tumors earlier and more often, creating a higher chance of survival for patients (NIH Research Matters | PMC);
  • Identify treatment options for reducing the risk of death for people who’d previously had a non-fatal opioid overdose (NIH Research Matters | PMC);
  • Explore how maternal nutrition supplements can increase infant birth size and potentially improve children’s life-long health (NIH Research Matters | PMC);
  • Identify young people with suicidal thoughts by using machine learning to analyze brain images (NIH Research Matters | PMC);
  • Gauge exercise’s impact on the growth of new nerve cells in the brains of mice, which could potentially reduce memory problems in people with Alzheimer’s disease (NIH Research Matters | PMC); and
  • Develop blood tests to detect signs of eight common types of cancer (NIH Research Matters | PMC).

These examples illustrate that access, while essential, is not the Library’s end goal. Improved health is.

NLM supports public access to research outputs to accelerate scientific discovery and advance the health of individuals and our communities. It is the best way we can honor the investment made by the American people in scientific research and the surest way to make our communities stronger.

casual photo of Kathryn FunkKathryn Funk is the program manager for PubMed Central. She is responsible for PMC policy as well as PMC’s role in supporting the public access policies of numerous funding agencies, including NIH. Katie received her master’s degree in library and information science from The Catholic University of America.

Building Data Science Expertise at NLM

Guest post by the Data Science @NLM Training Program team.

Regular readers of this blog probably know that NLM staff are expanding their expertise beyond library science and computer science to embrace data science. As a result, NLM—in alignment with strategic plan Goal 3 to “build a workforce for data-driven research and health”—is taking steps to improve the entire staff’s facility and fluency with this field so critical to our future.

The Library is rolling out a new Data Science @NLM Training Program that will provide targeted training to all of NLM’s 1,700 staff members. We are also inviting staff from the National Network of Libraries of Medicine (NNLM) to participate so that everyone in the expanded NLM workforce has the opportunity badge reading "Data Science @NLM Training Kickoff" to become more aware of data science and how it is woven in to so many NLM products and services.

For some of our staff, data science is already a part of their day-to-day activities; for others, data science may be only a concept, a phrase in the strategic plan—and that’s okay. Not everyone needs to be a data scientist, but we can all become more data savvy, learning from one another along the way and preparing to play our part in NLM’s data-driven future. (See NLM in Focus for a glimpse into how seven staff members already see themselves supporting data science.)

Over the course of this year, the data science training program will help strengthen and empower our diverse and data-centric workforce. The program will provide opportunities for all staff to participate in a variety of data science training events targeted to their specific interests and needs. These events range from the all-hands session we had in late January that helped establish a common data science vocabulary among staff to an intensive, 120-hour data science fundamentals course designed to give select NLM staff the skills and tools needed to use data to answer critical research questions. a badge reading "Data Science Readiness Survey Completed" and showing a thumbs up We’re also assessing staff members’ data science skill levels and creating skill development profiles that will guide staff in taking the steps necessary to build their capacity and readiness for working with data.

At the end of this process, we’ll better understand the range of data science expertise across the Library. We’ll also have a much clearer idea of what more we can do to develop staff’s facility and fluency with data science and how to better recruit new employees with the knowledge and skills needed to advance our mission.

In August, the training program will culminate with a data science open house where staff can share their data science journey, highlight group projects from the fundamentals course, and find partners with whom they can collaborate on emerging projects throughout the Library.

But that final phase of the training initiative doesn’t mean NLM’s commitment to data science is over. In fact, it will be just the beginning.

In the coming years, staff will apply their new and evolving skills and knowledge to help NLM achieve its vision of serving as a platform for biomedical discovery and data-powered health.

How you are supporting the data science development of your staff? Let’s share ideas to keep the momentum going!

Co-authored by the Data Science @NLM Training Program team (left to right):

    • Dianne Babski, Deputy Associate Director, Library Operations
    • Peter Cooper, Strategic Communications Team Lead, National Center for Biotechnology Information
    • Lisa Federer, Data Science and Open Science Librarian, Office of Strategic Initiatives
    • Anna Ripple, Information Research Specialist, Lister Hill National Center for Biomedical Communications

National Public Health Week 2019: How NLM Brings Together Libraries and Public Health

Guest post by Derek Johnson, MLIS, Health Professionals Outreach Specialist for the National Network of Libraries of Medicine Greater Midwest Region

Recent articles in Preventing Chronic Disease and The Nation’s Health chronicle how public libraries can complement the efforts of public health workers in community outreach and engagement. Data tell us that more Americans visit public libraries in a year (1.39 billion) than they do health care providers (990 million). More so, over 40% of computer-using patrons report using libraries to search for health information. However, we also know many individuals struggle with accessing and understanding the health information they encounter every day.

This challenge begs the question, “How does the National Library of Medicine (NLM) increase access to trustworthy health information to improve the health of communities across the United States?”

It’s an important question, and, as we celebrate National Public Health Week, it gives us an opportunity to reflect on the incredible work NLM is doing through its National Network of Libraries of Medicine (NNLM) to bring libraries and public health together.

Take, for example, Richland County Public Health in Ohio. Richland County is approximately 33% rural. Many rural areas have been identified as “internet deserts.” In addition, adults in the county have lower rates of high school and college-level education compared to state averages. Seeking to address these disparities, Richland County Public Health applied for a funding award from NNLM’s Greater Midwest Region to develop an Interactive Health Information Kiosk in partnership with the county public library system.

With funding in hand, Richland County Public Health loaded select NLM resources onto specially configured iPads and installed them in the nine branches of the Richland County Libraries. A health educator trained library staff, local healthcare providers, and the public on how to use those resources to access trustworthy health information. Moving forward, librarians will be able to help patrons use the health kiosks. As a result, Richland County Public Health is helping improve health literacy among adult residents and, ultimately, enabling them to make more informed decisions about their health.

Another example of a public health and public library collaboration comes from NNLM’s Middle Atlantic Region (MAR). The Philadelphia Department of Public Health recognized the need to engage individuals in neighborhoods most vulnerable to severe weather events to increase their knowledge of disaster and emergency preparedness.

With funding from MAR, the Philadelphia Department of Public Health partnered with four branches of the Free Library of Philadelphia to train both librarians and local residents on emergency preparedness. Participants learned how to make use of the NLM Disaster Information Management Research Center and where to find local resources during weather-related emergencies.

These are just two of the many projects that NNLM helps facilitate across the country through its network of more than 7,500 library, public health, community-based, and other organizational members.

And, while NNLM continues to identify partnerships for funding public health and library projects, it also engages health educators by offering continuing education credit for Certified Health Education Specialists (CHES). CHES-certified professionals work in a variety of health care and public health settings where they help community members adopt and maintain healthy lifestyles. Health educators can earn continuing education credits by attending specially designated NNLM webinars on topics such as health statistics and evidence-based public health, with more courses in the works.

As communities continue to rely on the public health workforce to sustain and build healthy environments, know that the National Library of Medicine and its National Network of Libraries of Medicine are here to support the work they do!

headshot of Derek JohnsonDerek Johnson, MLIS is the Health Professionals Outreach Specialist for the National Network of Libraries of Medicine Greater Midwest Region. In this capacity, he conducts training and outreach to public health professionals on a variety of topics, including evidence-based public health, health disparities, and community outreach.


An Introduction to Authority-based Security

Guest post by Kurt W. Rodarmer, a software security architect in NLM’s National Center for Biotechnology Information.

NLM is working to unleash the potential of data and information to accelerate and transform biomedical discovery. Foundational to that goal lie the data themselves. We assess their value, collect and curate them, and then make them accessible.

But access has its risks. Big risks. Especially when it comes to personal medical data or hard-earned, grant-funded proprietary data. We need to find a way to deliver access while simultaneously controlling and protecting the data.

That’s where security comes in.

We’re all familiar with “identity-based security,” evolution’s primitive mechanism that predates our species. It starts by using our eyes, ears, and nose to identify someone or something and ends with an immediate risk-assessment. Not surprisingly, this mechanism was modeled in modern cybersecurity and is virtually ubiquitous across consumer and industrial-grade systems.

For all their efforts though, these systems sure seem to fail—a lot. Common wisdom suggests breaches are inevitable, but that’s not entirely true. There are other approaches.

Authority-based security is one. With that, authority, permissions, and trust are explicitly modeled, and policy decisions are made up front. We create objects that embody these ethereal concepts and make them tangible. These objects can then be stored, transmitted, accessed, sub-divided, transferred, etc. The discipline of modeling and managing authority is called Authority Management.

Identity- and authority-based approaches achieve several common goals. They each have strengths and weaknesses. Where they differ, the stronger, more effective, and more elegant of the two is nearly always authority-based.

Both approaches grant permissions based upon security policy. Authority-based security captures the result of policy evaluation as permissions in unforgeable and unmodifiable tokens. Since these tokens come from a known source of authority and are tamper-evident, the permissions they contain require no further scrutiny. They are as trustworthy as the authority that issued them. A permission token typically contains only a small subset of the overall permissions available to an individual, ideally never more than are needed within the current dynamic context.

By contrast, identity-based techniques make permission decisions based upon global attributes or provide crude static mechanisms. In most cases, they reflect zero context sensitivity. That means, for example, that if I run a program on a stock Linux system, that program executes using 100% of my permissions, even though it may need only read access to one file and write access to one directory. For all I know the program could be surreptitiously stealing my most sensitive data in the background, and I’d have no awareness or protection against it. Without my permission? That’s the point—I just gave it ALL my permissions!

In an authority-managed system, I would have given that same program permissions to access only the file and directory needed, leaving it powerless to read other sensitive files, much less phone home and exfiltrate them.

So, if identity-based security is so far behind the curve, what accounts for its continued use? It has one highly prized strength: its ability to revoke permissions on the spot. Since permissions are granted at the moment they are going to be exercised, any permission can be immediately denied as the result of updating policy. Since this policy update is often reactive, coming about once damage has already occurred and possibly delayed by weeks or months, the value of its immediacy is questionable. Tokens have a built-in timeout making them self-revoking, and in practice perform similarly.

Here’s how it works. To do anything of substance in a system, you need permissions. You may have those permissions already stored on some device, such as your phone. Or, you may need to go through the process of identifying yourself to some part of the system that is storing permissions on your behalf, accessible once your identity has been authenticated. In either case, the first step is to get ahold of a token containing your set of pre-approved permissions.

The permission set you now hold represents the complete permissions you have within the system you have just entered, e.g., dbGaP, a grant administration system, etc. It is unlikely to represent all the permissions you have within every system you can access. Even so, it’s probably too permissive for what you have in mind. Your next step would typically be to subset your permissions to only those needed to limit the potential damage should the token fall into the wrong hands.

Sometimes you need to share your permissions, such as when a grant-funded investigator delegates most of the research documentation to lab assistants. She can take her permission tokens received with the grant, subset, and delegate them to her lab as appropriate, so everyone can work.

What else can you do with them? Literally anything that can be done in an information system! Beyond implementing the traditional security processes of Identity and Access Management (IAM, a proper subset of Authority Management), tokens are also used to protect resources in other ways. They can be used to model spending accounts and quotas, control access to consumable or metered resources, mitigate DOS attacks, provide audit trails, and eliminate the use of passwords and multiple logins.

Because tokens carry permissions whose source of authority is irrefutable, they are the mechanism for implementing the fundamental principles of security. We can bring some of their benefits to bear right now and help lay the groundwork for secure, accessible biomedical data.

headshot of Kurt RodarmerKurt Rodarmer started work on military-grade secure operating systems over 20 years ago in Silicon Valley, working with the architect of KeyKOS, Norman Hardy. He is an expert in secure software and language design and has formalized the field of Authority Management. Kurt previously worked for Apple and Oracle and was a consultant to IBM and Sun, among others.

Data Discovery at NLM

Guest post by David Hale, Information Technology Specialist at NLM.

Did you know that each day more than four million people use NLM resources and that every hour a petabyte of data moves in or out of our computing systems?

Those mammoth numbers indicate to me how essential NLM’s array of information products and services are to scientific progress. But as we gain more experience with providing information, particularly clinical, biologic, and genetic datasets, we’re finding that how we share data is as critical as the data itself.

To fuel the insights and solutions needed to improve public health, we must ensure data flow freely to the researchers, industry innovators, patient communities, and citizen scientists who can bring new lenses to these rich repositories of knowledge.

One way we’re opening doors to our data is through an open data portal called Data Discovery. While agencies like the Centers for Disease Control and the Centers for Medicare and Medicaid Services are already utilizing the same platform with success, NLM is the first of NIH’s Institutes and Centers to adopt the platform. Our first datasets are already available, including content from such diverse resources as the Dietary Supplement Label Database, Pillbox, ToxMap, Disaster Lit, and HealthReach.

Why did NLM take this step? While many of our data resources have long been publicly available online, housing them within Data Discovery offers unconstrained access and delivers key benefits:

  • Powerful data exploration tools—By showing the dataset as a spreadsheet, the Data Discovery platform offers freedom to filter and interact with the data in novel ways.
  • Intuitive data visualizations—A picture is worth a thousand words, and nowhere is that truer than leveraging data visualizations to bring new perspectives on scientific questions.
  • Open data APIs—Open data alone isn’t enough to fuel a new generation of insights. Open APIs are critical to making the data understandable, accessible, and actionable, based on the unique needs of the user or audience.

What does this mean in practice?

Let’s look at the Office of Dietary Supplements’ (ODS) Dietary Supplement Label Database (DSLD) to illustrate the potential of leveraging Data Discovery.

More than half of all Americans take at least one dietary supplement a day. Reliable information about those supplements is critical to their appropriate use, making DSLD a timely and important dataset to make available in an open data platform. Through Data Discovery, researchers, academics, health care providers, and the public will be able to explore and derive insights from the labels of more than 85,000 dietary supplement products currently or formerly sold in the US.

Developers and technologists who support research, health, and medical organizations require APIs that are modern, interoperable, and standards-compliant. Data Discovery provides a powerful solution to these needs, supporting NLM’s role as a platform for biomedical discovery and data-powered health.

Beyond fueling scientific discovery, open access to data holds another benefit for advancing public health: contributing to the professional development of data and informatics specialists. An increasingly important part of the health care workforce, informaticists help researchers extract the most meaningful insights from data, driving new developments in the lab and better management of patients and populations.

I invite you to explore the new Data Discovery portal. It’s an exciting step forward in achieving key aspects of the NLM Strategic Plan—to advocate for open science, further democratize access to data, and support the training and development of the data science workforce.

headshot of David Hale
Credit: Jacie Lee Almira Photography

David Hale is an Information Technology Specialist at the National Library of Medicine. In addition to leading Data Discovery, David is also project lead for NLM’s Pillbox, a drug identification, reference, and image resource. He received his Bachelor of Science in Physical Science from the University of Maryland.

Keeping Up with the Information Onslaught

Organizing your resources sustainably

Guest post by Helen-Ann Brown Epstein, MLS, MS, AHIP, FMLA, informationist at the Health Sciences Library Virtua in Mt Laurel, New Jersey.

I am of the generation that fondly remembers when the comedian George Carlin mused about our obsession with stuff.

“That’s all you need in life, a little place for your stuff,” he said. “That’s all your house is: a place to keep your stuff.”

And having a place for our stuff, he observes, allows us to relax, whether we’re at home or traveling.

But what about the stuff that matters to us as health information professionals? How can we sustainably organize all that while keeping up with the literature for both our customers and ourselves?

The information explosion keeps creating more and more stuff. Currently, PubMed has more than 29 million citations, but they’re not stopping. On average, NLM adds about 1.1 million citations per year to PubMed. That’s nearly 92,000 citations per month or over 21,000 citations per week. Who can keep up with that?!

Once upon a time, we used index card files of relevant citations, clustered by MeSH or our favorite terms, to organize key references. Sometimes, we ripped out relevant articles or photocopied them, building stacks of stuff we promised ourselves we’d read.

Today, online databases make it possible to retrieve smaller, more precise results sets. We’re also able to create online alerts focused on special topics or specific journals. We can then store these citations in My NCBI accounts that can be exported into bibliographic citation management software. Some of these software packages even allow us to download PDFs, add notes to them, and then share them with colleagues.

We’ve come a long way.

In my everyday life as a health sciences librarian, I work solo for a large three-hospital system. My virtual library frees me up to make house calls to help my customers set up their own current awareness alerts that will deliver the important literature and key tables of contents to their inboxes. I also use my visits to encourage them to setup their own My NCBI accounts and to leverage the power of bibliographic software to manage their citations. And I talk about how crucial it is to decide how to best organize their literature and other sources of information at the start of any project, not later, when the volume gets too big to manage.

As part of the first cohort of the Medical Library Association Research Training Institute, I’m learning from experience the benefits of that last bit of wisdom. Following the advice of our expert faculty, I have created my alerts and determined the headings for my collections of citations. Though I’m at the early stages, I expect taking these important first steps will help ensure that I’m not missing relevant articles as they come out and might even help me unearth applicable research from disciplines I had not previously considered. I also expect to more readily find saved articles more quickly when I need them and possibly uncover connections I had not previously seen. At minimum, I know that building a collection of resources from the beginning will give me the freedom to get to articles when I’m ready for them, knowing they’ll be there waiting.

Ultimately though, by establishing now how I will manage the information, I’ve discovered that George Carlin was right. Now that I have a “house” for my stuff, I can relax. Instead of stressing out over where that stuff is going to go, I can focus on the research, knowing that I have a system in place to keep my resources organized and to keep me on track as I evaluate online journal club formats and their role in an interprofessional patient care team.

How do you keep your information stuff organized? I welcome your comments and questions.

headshot of Helen-Ann Brown EpsteinHelen-Ann Brown Epstein, MLS, MS, AHIP, FMLA, currently serves as the informationist at the Health Sciences Library Virtua in Mt Laurel, New Jersey. She spent the previous 22 years as a clinical librarian at Weill Cornell Medical Library. Helen-Ann is active in the Medical Library Association and has authored or co-authored several articles on medical librarianship.