An Introduction to Authority-based Security

A guest post by Kurt W. Rodarmer, a software security architect in NLM’s National Center for Biotechnology Information.

NLM is working to unleash the potential of data and information to accelerate and transform biomedical discovery. Foundational to that goal lie the data themselves. We assess their value, collect and curate them, and then make them accessible.

But access has its risks. Big risks. Especially when it comes to personal medical data or hard-earned, grant-funded proprietary data. We need to find a way to deliver access while simultaneously controlling and protecting the data.

That’s where security comes in.

We’re all familiar with “identity-based security,” evolution’s primitive mechanism that predates our species. It starts by using our eyes, ears, and nose to identify someone or something and ends with an immediate risk-assessment. Not surprisingly, this mechanism was modeled in modern cybersecurity and is virtually ubiquitous across consumer and industrial-grade systems.

For all their efforts though, these systems sure seem to fail—a lot. Common wisdom suggests breaches are inevitable, but that’s not entirely true. There are other approaches.

Authority-based security is one. With that, authority, permissions, and trust are explicitly modeled, and policy decisions are made up front. We create objects that embody these ethereal concepts and make them tangible. These objects can then be stored, transmitted, accessed, sub-divided, transferred, etc. The discipline of modeling and managing authority is called Authority Management.

Identity- and authority-based approaches achieve several common goals. They each have strengths and weaknesses. Where they differ, the stronger, more effective, and more elegant of the two is nearly always authority-based.

Both approaches grant permissions based upon security policy. Authority-based security captures the result of policy evaluation as permissions in unforgeable and unmodifiable tokens. Since these tokens come from a known source of authority and are tamper-evident, the permissions they contain require no further scrutiny. They are as trustworthy as the authority that issued them. A permission token typically contains only a small subset of the overall permissions available to an individual, ideally never more than are needed within the current dynamic context.

By contrast, identity-based techniques make permission decisions based upon global attributes or provide crude static mechanisms. In most cases, they reflect zero context sensitivity. That means, for example, that if I run a program on a stock Linux system, that program executes using 100% of my permissions, even though it may need only read access to one file and write access to one directory. For all I know the program could be surreptitiously stealing my most sensitive data in the background, and I’d have no awareness or protection against it. Without my permission? That’s the point—I just gave it ALL my permissions!

In an authority-managed system, I would have given that same program permissions to access only the file and directory needed, leaving it powerless to read other sensitive files, much less phone home and exfiltrate them.

So, if identity-based security is so far behind the curve, what accounts for its continued use? It has one highly prized strength: its ability to revoke permissions on the spot. Since permissions are granted at the moment they are going to be exercised, any permission can be immediately denied as the result of updating policy. Since this policy update is often reactive, coming about once damage has already occurred and possibly delayed by weeks or months, the value of its immediacy is questionable. Tokens have a built-in timeout making them self-revoking, and in practice perform similarly.

Here’s how it works. To do anything of substance in a system, you need permissions. You may have those permissions already stored on some device, such as your phone. Or, you may need to go through the process of identifying yourself to some part of the system that is storing permissions on your behalf, accessible once your identity has been authenticated. In either case, the first step is to get ahold of a token containing your set of pre-approved permissions.

The permission set you now hold represents the complete permissions you have within the system you have just entered, e.g., dbGaP, a grant administration system, etc. It is unlikely to represent all the permissions you have within every system you can access. Even so, it’s probably too permissive for what you have in mind. Your next step would typically be to subset your permissions to only those needed to limit the potential damage should the token fall into the wrong hands.

Sometimes you need to share your permissions, such as when a grant-funded investigator delegates most of the research documentation to lab assistants. She can take her permission tokens received with the grant, subset, and delegate them to her lab as appropriate, so everyone can work.

What else can you do with them? Literally anything that can be done in an information system! Beyond implementing the traditional security processes of Identity and Access Management (IAM, a proper subset of Authority Management), tokens are also used to protect resources in other ways. They can be used to model spending accounts and quotas, control access to consumable or metered resources, mitigate DOS attacks, provide audit trails, and eliminate the use of passwords and multiple logins.

Because tokens carry permissions whose source of authority is irrefutable, they are the mechanism for implementing the fundamental principles of security. We can bring some of their benefits to bear right now and help lay the groundwork for secure, accessible biomedical data.

headshot of Kurt RodarmerKurt Rodarmer started work on military-grade secure operating systems over 20 years ago in Silicon Valley, working with the architect of KeyKOS, Norman Hardy. He is an expert in secure software and language design and has formalized the field of Authority Management. Kurt previously worked for Apple and Oracle and was a consultant to IBM and Sun, among others.

Data Discovery at NLM

Guest post by David Hale, Information Technology Specialist at NLM.

Did you know that each day more than four million people use NLM resources and that every hour a petabyte of data moves in or out of our computing systems?

Those mammoth numbers indicate to me how essential NLM’s array of information products and services are to scientific progress. But as we gain more experience with providing information, particularly clinical, biologic, and genetic datasets, we’re finding that how we share data is as critical as the data itself.

To fuel the insights and solutions needed to improve public health, we must ensure data flow freely to the researchers, industry innovators, patient communities, and citizen scientists who can bring new lenses to these rich repositories of knowledge.

One way we’re opening doors to our data is through an open data portal called Data Discovery. While agencies like the Centers for Disease Control and the Centers for Medicare and Medicaid Services are already utilizing the same platform with success, NLM is the first of NIH’s Institutes and Centers to adopt the platform. Our first datasets are already available, including content from such diverse resources as the Dietary Supplement Label Database, Pillbox, ToxMap, Disaster Lit, and HealthReach.

Why did NLM take this step? While many of our data resources have long been publicly available online, housing them within Data Discovery offers unconstrained access and delivers key benefits:

  • Powerful data exploration tools—By showing the dataset as a spreadsheet, the Data Discovery platform offers freedom to filter and interact with the data in novel ways.
  • Intuitive data visualizations—A picture is worth a thousand words, and nowhere is that truer than leveraging data visualizations to bring new perspectives on scientific questions.
  • Open data APIs—Open data alone isn’t enough to fuel a new generation of insights. Open APIs are critical to making the data understandable, accessible, and actionable, based on the unique needs of the user or audience.

What does this mean in practice?

Let’s look at the Office of Dietary Supplements’ (ODS) Dietary Supplement Label Database (DSLD) to illustrate the potential of leveraging Data Discovery.

More than half of all Americans take at least one dietary supplement a day. Reliable information about those supplements is critical to their appropriate use, making DSLD a timely and important dataset to make available in an open data platform. Through Data Discovery, researchers, academics, health care providers, and the public will be able to explore and derive insights from the labels of more than 85,000 dietary supplement products currently or formerly sold in the US.

Developers and technologists who support research, health, and medical organizations require APIs that are modern, interoperable, and standards-compliant. Data Discovery provides a powerful solution to these needs, supporting NLM’s role as a platform for biomedical discovery and data-powered health.

Beyond fueling scientific discovery, open access to data holds another benefit for advancing public health: contributing to the professional development of data and informatics specialists. An increasingly important part of the health care workforce, informaticists help researchers extract the most meaningful insights from data, driving new developments in the lab and better management of patients and populations.

I invite you to explore the new Data Discovery portal. It’s an exciting step forward in achieving key aspects of the NLM Strategic Plan—to advocate for open science, further democratize access to data, and support the training and development of the data science workforce.

headshot of David Hale
Credit: Jacie Lee Almira Photography

David Hale is an Information Technology Specialist at the National Library of Medicine. In addition to leading Data Discovery, David is also project lead for NLM’s Pillbox, a drug identification, reference, and image resource. He received his Bachelor of Science in Physical Science from the University of Maryland.

Keeping Up with the Information Onslaught

Organizing your resources sustainably

Guest post by Helen-Ann Brown Epstein, MLS, MS, AHIP, FMLA, informationist at the Health Sciences Library Virtua in Mt Laurel, New Jersey.

I am of the generation that fondly remembers when the comedian George Carlin mused about our obsession with stuff.

“That’s all you need in life, a little place for your stuff,” he said. “That’s all your house is: a place to keep your stuff.”

And having a place for our stuff, he observes, allows us to relax, whether we’re at home or traveling.

But what about the stuff that matters to us as health information professionals? How can we sustainably organize all that while keeping up with the literature for both our customers and ourselves?

The information explosion keeps creating more and more stuff. Currently, PubMed has more than 29 million citations, but they’re not stopping. On average, NLM adds about 1.1 million citations per year to PubMed. That’s nearly 92,000 citations per month or over 21,000 citations per week. Who can keep up with that?!

Once upon a time, we used index card files of relevant citations, clustered by MeSH or our favorite terms, to organize key references. Sometimes, we ripped out relevant articles or photocopied them, building stacks of stuff we promised ourselves we’d read.

Today, online databases make it possible to retrieve smaller, more precise results sets. We’re also able to create online alerts focused on special topics or specific journals. We can then store these citations in My NCBI accounts that can be exported into bibliographic citation management software. Some of these software packages even allow us to download PDFs, add notes to them, and then share them with colleagues.

We’ve come a long way.

In my everyday life as a health sciences librarian, I work solo for a large three-hospital system. My virtual library frees me up to make house calls to help my customers set up their own current awareness alerts that will deliver the important literature and key tables of contents to their inboxes. I also use my visits to encourage them to setup their own My NCBI accounts and to leverage the power of bibliographic software to manage their citations. And I talk about how crucial it is to decide how to best organize their literature and other sources of information at the start of any project, not later, when the volume gets too big to manage.

As part of the first cohort of the Medical Library Association Research Training Institute, I’m learning from experience the benefits of that last bit of wisdom. Following the advice of our expert faculty, I have created my alerts and determined the headings for my collections of citations. Though I’m at the early stages, I expect taking these important first steps will help ensure that I’m not missing relevant articles as they come out and might even help me unearth applicable research from disciplines I had not previously considered. I also expect to more readily find saved articles more quickly when I need them and possibly uncover connections I had not previously seen. At minimum, I know that building a collection of resources from the beginning will give me the freedom to get to articles when I’m ready for them, knowing they’ll be there waiting.

Ultimately though, by establishing now how I will manage the information, I’ve discovered that George Carlin was right. Now that I have a “house” for my stuff, I can relax. Instead of stressing out over where that stuff is going to go, I can focus on the research, knowing that I have a system in place to keep my resources organized and to keep me on track as I evaluate online journal club formats and their role in an interprofessional patient care team.

How do you keep your information stuff organized? I welcome your comments and questions.

headshot of Helen-Ann Brown EpsteinHelen-Ann Brown Epstein, MLS, MS, AHIP, FMLA, currently serves as the informationist at the Health Sciences Library Virtua in Mt Laurel, New Jersey. She spent the previous 22 years as a clinical librarian at Weill Cornell Medical Library. Helen-Ann is active in the Medical Library Association and has authored or co-authored several articles on medical librarianship.

Technology and Data in Mental Health: Applications for Suicide Prevention

Guest post by Elizabeth Chen, PhD, Associate Director of the Center for Biomedical Informatics, Associate Professor of Medical Science, and Associate Professor of Health Services, Policy & Practice at Brown University.

Biomedical informatics as a discipline is broadly concerned with the effective use of data, information, and knowledge to improve human health. Since its origins in the 1950s, we have watched this discipline evolve with advances in health information and communications technology as well as the explosion of electronic health data. During this time, we have also seen the emergence of sub-disciplines reflecting areas of specialization. In fact, a 2015 study uncovered almost 300 different “types” of informatics! Among these was mental health informatics, which first appeared in the title of a 1995 article indexed in PubMed.

Using technology to understand and support mental health dates to the 1950s when specialized television broadcasts delivered mental health training. In the 1960s, computers analyzed data for psychological diagnoses and housed “artificial intelligence” systems that simulated communication with a psychotherapist. More recently, with the rapid adoption of electronic health record (EHR) systems that can collect longitudinal patient information such as diagnoses and medications, we are observing the increased use of EHR technology and data for improving health care, including mental health care.

Mental health remains a global crisis. In the United States alone, mental health conditions affect 1 in 5 adults and children. These conditions are among the factors that contribute to making suicide the 10th leading cause of death overall and 2nd leading cause among 10- to 34-year-olds nationally. With suicide rates having increased by nearly 30% since 1999,  the National Strategy for Suicide Prevention calls for a comprehensive and coordinated approach that includes data-driven strategic planning and evidence-based programs.

There are numerous and wide-ranging applications of mental health informatics and EHRs contributing to these efforts, including the following:

  • Two independent datasets, one including EHR and biobank data from the Vanderbilt University Medical Center, have characterized the role of common genetic variants among those who have attempted suicide. These large-scale genetic analyses support a heritable component to suicide attempts and an incomplete genetic relationship with psychiatric and sleep disorders.
  • At the Parkland Health & Hospital System in Texas, a Universal Suicide Screening Program, initiated in 2012, led to implementing the Columbia-Suicide Severity Rating Scale in the EHR system for adults. The integration of this clinical decision support tool into the clinical workflow demonstrates how technology may be used to improve suicide risk recognition.
  • Researchers across the country are developing models for predicting patients’ future risk of suicidal behavior using “machine learning” techniques, state death certificates, and longitudinal EHR data from a range of health systems, including Partners Healthcare in Massachusetts [PubMed], HealthPartners in Minnesota, Henry Ford Health System in Michigan, and five different Kaiser Permanente locations [PubMed]. Implementing these predictive models as clinical decision support tools in EHR systems has the potential to improve screening, detection, and treatment of suicide risk.
  • In Connecticut, EHR data from the statewide health information exchange and five clinical partners are being used to identify patients at risk of suicide. Claims data from the All-Payer Claims Database and mortality data from the State Department of Public Health will be used to assess the outcomes and impact of the quality improvement efforts.

And these are just a few examples.

Technology and data will continue to play important roles in advancing mental health care. We have already seen the contributions of mental health informatics over the years and those of related areas such as behavioral health informatics and computational psychiatry. There is much more to come in the development of effective and innovative solutions for improving diagnosis, treatment, and prevention of mental health conditions, including those related to suicidal thoughts and behaviors.

headshot of Dr. Elizabeth ChenElizabeth S. Chen, PhD is the founding Associate Director of the Center for Biomedical Informatics, Associate Professor of Medical Science, and Associate Professor of Health Services, Policy & Practice at Brown University. She leads the Clinical Informatics Innovation and Implementation (CI3) Laboratory that is focused on leveraging EHR technology and data to improve healthcare delivery and biomedical discovery. Dr. Chen is an elected fellow of the American College of Medical Informatics and is a member of NLM’s Biomedical Informatics, Library and Data Sciences Review Committee.

 


Dr. Chen will deliver the next NLM Biomedical Informatics & Data Science Lecture on Wednesday, November 14, 2018, at 2:00 pm in the Natcher Conference Center (Building 45), Balcony A. Her talk, “Knowledge Discovery in Clinical and Biomedical Data: Case Studies in Pediatrics and Mental Health,” is free and open to the public. It will also be broadcast live globally and archived via NIH Videocast.

Data in the Scholarly Communications Solar System

Guest post by Kathryn Funk, program manager for NLM’s PubMed Central.

The Library of the Future. What will it look like?  The NLM Strategic Plan envisions it partly as “one of connections between and among literature, data, models, and analytical tools.” In this future, journal articles are no longer lone objects drifting in space, but, rather, each a solar system waiting to be explored. Indeed, we’re already seeing the published literature associated with datasets, clinical trials, protocols, software, earlier versions (including preprints), peer review documents, and so on through consistent identifiers and standardized publishing and archival practices.

To help researchers and the public navigate this new solar system, PubMed Central (PMC), NLM’s full-text archive of journal literature, has been collaborating with publishers and funders for the last year to support efficient ways of linking journal articles with associated data. We’re encouraging authors to cite their open datasets and publishers to archive and make available those data citations in a machine-readable format. Though data citations represent only a small percentage of how PMC articles are linked to data (supplementary material continues to be the predominant method for associating data with articles in the archival record), the growth in data citations in the last year has been promising, nearly doubling the previous year’s total (i.e., 850 articles with data citations in 2017 vs.  approximately 440 in 2016). NLM is also supporting the public access policy requirements of our research funder partners by encouraging authors to deposit datasets as supporting documents via the NIH Manuscript Submission (NIHMS) system.

But solar systems, even the metaphorical kind, are meant to be explored, so we’re also working to expose each journal article solar system in a way that promotes discoverability. We want to make it easier to discover articles in PMC with associated data citations, data availability statements, and supplementary data, through improved record displays and new search facets, leveraging the data-related search filters announced earlier this year.

NLM is also looking beyond datasets to archive and expose articles’ key satellites, including, for example, comments generated during the peer review process. As the effort to expand the openness of peer review gains traction, PMC staff have been collaborating with publishers and Crossref on standardized ways to make readily available those peer review materials.

As with any exploration of new solar systems, it’s our hope that taking these steps will help generate new knowledge, and in so doing drive research that is reproducible, robust, transparent, and reusable. And as we move toward becoming the Library of the Future, how we can best support your research needs in connecting the literature with the rest of the research universe? Please let us know.

With thanks to Jeff Beck for the solar system analogy. 

casual headshot of Kathryn FunkKathryn Funk is the program manager for PubMed Central. She is responsible for PMC policy as well as PMC’s role in supporting the public access policies of numerous funding agencies, including NIH. Katie received her master’s degree in library and information science from The Catholic University of America.

Clarity Across Languages

The art and science of translating health information

Guest post by Fedora Braverman, team lead for the MedlinePlus en español website.

Communication can be tricky, regardless of what language you speak. Take, for example, this conversation I had some time ago with a Hispanic acquaintance:

Him (in a panic): “My friend has a tumor. The tumor is benign!”

Me (not understanding why he is panicking): “That’s great. That’s (kind of) good news.”

Him (looking at me like I had two heads and with his eyes wide open): “It was BENIGN!”

He thought “benign” (benigno in Spanish) meant cancer. He thought his friend had cancer.

I work for MedlinePlus en español, and moments like these make my job so rewarding because I can ease that man’s worries. By pointing him to our website with reputable and reliable health information, I can help him understand his friend’s condition.

But the conversation also made me think: if he thought “benign” meant cancer, then others might, too.

We work in the largest biomedical library. We are used to these words. Our audience is not.

Because our goal is to reach out to the Hispanic population as a whole, regardless of health literacy levels, our site needs to address disparities in health literacy by striving for clarity. So, the next day, after relaying this conversation to my team, we updated all instances of the word benigno on the MedlinePlus en español site, clarifying that it meant “non-cancerous.”

We are constantly learning and improving, refining our translation of the MedlinePlus en español site to enhance its cultural sensitivity and accessibility. It’s an art and a science. We use tools—from print dictionaries to Google searches and everything in between—to determine which word is used across Latin America  for a particular ailment, condition, or medical term. But translating text calls for far more than just swapping an English word for a Spanish equivalent. Word choice matters, as we try to accommodate regional linguistic and cultural differences along with the subtlety and nuance inherent in any language. That’s the art, as we not only translate the words but also adapt the text to our audience’s culture. Only then can we expect readers to connect with and understand the information.

Understanding the culture of our audience is imperative to building a site like MedlinePlus en español. Being knowledgeable about how Hispanics talk about their health issues (e.g., referring to diabetes as “this condition that affects the pancreas”), how they deal with certain hot topics or sensitive issues like sexually transmitted diseases, and what their health challenges are is crucial. That’s why my team and I work together, going back and forth until the text is as understandable and as culturally relevant as possible—even if that means re-working text we had previously translated.

So, whether you call it gripe, trancazo, influenza o gripa, benigno or no canceroso, NLM’s MedlinePlus en español is the trusted website for you.

headshot of Fedora BravermanFedora Braverman leads the MedlinePlus en español website’s operations, outreach activities, and social media platforms. She previously worked as a consultant for the US State Department and for the Library of Congress Hispanic Division. She has also served as an information specialist at the US Embassy in Buenos Aires, Argentina.

Asking the right questions and receiving the most useful answers

Guest post by Lawrence M. Fagan, Associate Director (retired), Biomedical Informatics Training Program, Stanford University.

As online resources proliferate, it becomes harder to figure out which resources—and which parts of those resources—will best answer patients’ questions about their medical care. Patients have access to multiple websites that summarize information about a particular disease, myriad patient communities, and many online research databases.

This resource overload problem isn’t new, however. As more and more data became available in the Intensive Care Units of the 1980s, it grew increasingly difficult to determine the most important measurement to track for optimal care.

In short, more information isn’t necessarily the best solution when it comes to answering patient questions.

After a career in informatics, I now moderate an online community for patients with a particular subtype of lymphoma. Many questions that arise in the group can easily be answered by reviewing existing online content. Health librarians are excellent resources to help guide patients to the correct resources and articles.  However, some queries are less straightforward than others, such as: “What is the one thing you wished you knew before the procedure?” Rather than asking for a recitation of the steps in the procedure, this question is asking what was unexpected—or what step would have benefited from more patient or caregiver preparation. Correspondingly, these types of questions are hard to organize, store, and retrieve from patient-oriented databases.

Sometimes, the community of patients can recognize patterns that escape the notice of medical providers. For example, a lymphoma patient may complain of repeated sinus infections. It’s worth noting that patients often turn to their primary care provider to treat their sinus infections, and those visits may lead to antibiotic prescriptions. In this scenario, group members have pointed out the potential link between treatment with the drug Rituximab and a decrease in the body’s immunoglobulin levels. This connection leads to suggestions to explore an alternative treatment for chronic sinus infections (in this special context) using immunoglobulin replacement therapy rather than antibiotics.

Specialized online communities can also provide help with detailed care issues, including the treatment of side effects for uncommonly used drugs with which local healthcare providers might not be familiar.

Online communities can also suggest researching databases to answer patient questions. ClinicalTrials.gov helps locate experimental treatments for specific medical conditions. Some community discussions about trials go beyond what’s included in the ClinicalTrials.gov database. For instance, group members may discuss the optimal order of clinical trials in a specific medical area, based on an analysis of the inclusion criteria for the various trials. In addition, there are ancillary questions about trial logistics that aren’t found in the database, such as, “I live in the San Francisco area—is it feasible to participate in Trial X at City of Hope in Southern California?” Setting up comprehensive links between the clinical databases and discussions in patient communities would help patients access the answers to their questions more efficiently.

The answers to these specialized questions are often found in the archives of online communities or in the memories of group participants. Yet, it is not easy to find the right community for a particular medical problem, and in my understanding there is no central repository of links to online communities. Moreover, while many community links can be found in MedlinePlus, static links to community websites often become stale, as sites may migrate locations over time. Some of the ACOR cancer communities, for example, have migrated to SmartPatients.com.  As patients find a community of interest, it is important that they determine whether the conversations are ongoing and whether the participants are knowledgeable and supportive. The Mayo Clinic offers a short discussion detailing the pros and cons of support groups.

Researchers have examined patient and clinician information needs for more than a quarter century. These models, however, have only rarely been incorporated into information retrieval systems. One successful example (aimed at providers) is the use of “clinical queries” in PubMed, designed for searching the scientific literature. This brings us to a critical question: What would it take to reengineer the patient-oriented retrieval systems so that these focused queries drive most patient sites?

For now, we have communities of patients and dedicated professionals who are ready and willing to help point to the most useful answers.

Please note: The mention of any commercial or trade name is for information and does not imply endorsement on the part of the author or the National Library of Medicine.

Many thanks to Dave deBronkart, Janet Freeman-Daily, Robin Martinez, Tracie Tavel, and Roni Zeiger who reviewed earlier versions of this blog post.

Outdoor portrait of Lawrence M. Fagan.Lawrence Fagan, MD, PhD, retired in 2012 from his role as Associate Director of the Stanford University Biomedical Informatics Training Program. He is a Fellow of the American College of Medical Informatics. His current interests are in patient engagement, precision health, and preventing medical errors.