What Did You Do with Your Summer Vacation?

Well, if you are spending the summer at the NIH, you’ve likely been engaged in one of our many activities designed to access critical data and advance our understanding of the human experience by linking data sets together. Today, we are inviting you to engage in some additional best practices in accessing controlled data in ways that support science and preserve privacy.

In 2020, the NIH Scientific Data Council charged its Working Group for Streamlining Access to Controlled Data to spend a year engaging in dialogue within the NIH and with our extramural colleagues to better understand the experiences of scientists and the strategies that both facilitate and impede access to data. The group also considered where in the research process NIH should inform, engage, and gain consent of participants sufficiently to support science driven by access to controlled datasets.

NIH stores and facilitates access to many datasets, both open and controlled, with the goal of accelerating new discoveries and thereby maximizing taxpayer return on investment in the collection of these datasets. Data derived from humans that are shared through controlled-access mechanisms reflect NIH’s commitment to protect sensitive data and honor the informed consent provided by research participants in NIH-supported studies.

NIH has supported multiple controlled-access data repositories that uphold appropriate data protections for both human data and other sensitive data, while meeting the needs of various researcher communities. However, as data access requests increase, new repositories are established, and new mechanisms of providing access to data are developed, it is apparent that opportunities remain to improve efficiency and harmonization among repositories to make NIH-supported controlled-access data more FAIR: Findable, Accessible, Interoperable, and Reusable and to ensure appropriate oversight when data from different resources are combined. While these trends are enabling datasets and datatypes to be combined in new ways that advance the science, datasets, and datatypes that may or may not be controlled may, when combined, create inadvertent re-identification risks.

To help the agency address these issues in a way that is responsive to community needs, we are hosting a series of webinars through the end of July. We call these “breakout sessions” because they follow an outstanding webinar presented on July 9 available here. Richard Hodes, MD, director of the National Institute on Aging, launched the 3-hour seminar with a talk titled Opportunities for Advancing Research Through Better Access to Controlled Data. Ana Navas-Acien, MD, PhD, brought the perspective of indigenous and communities of people traditionally underrepresented in research, and she emphasized themes of community engagement and broadening the consent framework to consider community-level accountabilities as well as individual assent. Lucila Ohno-Machado, MD, MBA, PhD, addressed privacy preserving distributed analytics as a strategy to promote science while preserving privacy of data. Hoon Cho, PhD, described privacy-enhancing computational approaches to privacy preservation.

You can find the schedule for the breakout sessions below. These sessions are specifically designed to listen to the expectations, hopes, and concerns from researchers and participants. These webinars are free and open to the public; registration is required.

Breakout Session on “Making Controlled-Access Data Readily Findable and Accessible” on July 22 from 3 pm to 5:30 pm EST

Breakout Session on “General Opportunities for Streamlining Access to Controlled Data” on July 26 from 12:30 pm to 2 pm EST

Breakout Session on “Addressing Oversight, Governance, and Privacy Issues in Linking Controlled Access Data from Different Resources” on July 28 from 3 pm to 5:30 pm EST

To generate interest and hear from the broadest possible group of stakeholders, NIH has released a Request for Information on Streamlining Access to Controlled Data from NIH Data Repositories. Please note the closing date is August 9. We look forward to hearing from you! Please visit Streamlining Access to Controlled Data at the NIH for all of the information described in this post.

Finally, we would like to personally thank the many NIH staff members who serve on the working group:

  • Shu Hui Chen
  • Alicia Chou
  • Valentina Di Francesco
  • Greg Farber
  • Jamie Guidry Auvil
  • Nicole Garbarini
  • Lyric Jorgenson
  • Punam Mathur
  • Vivian Ota Wang
  • Jonathan Pollock
  • Rebecca Rodriguez
  • Alex Rosenthal
  • Steve Sherry
  • Julia Slutsman
  • Erin Walker
  • Alison Yao

I hope your summer vacation was as productive as ours!

(left to right)
Patricia Flatley Brennan, RN, PhD, NLM Director
Susan Gregurick, PhD, Associate Director for Data Science at NIH
Hilary S. Leeds, JD, Senior Health Science Policy Analyst for the Office of Science Policy at NIH

Five Years and Counting!

On August 13, 2016, I became the first woman, nurse, and industrial engineer to serve as director of the National Library of Medicine (NLM). From its beginning in 1836 as a small collection of books in the library of the U.S. Army Surgeon General’s office, NLM has become a global force in accelerating biomedical discovery and fostering evidence-based practices. I am proud to direct this esteemed organization and delighted to guide it towards its third century beginning in 2036. 

This has been an exciting five years for NLM.

We accelerated data-driven discoveries and advanced training in analytics and data science across NIH and around the world. Our genomic resources played a crucial role in supporting NIH and the scientific community’s ability to understand a novel virus and address the COVID-19 pandemic. NLM investigators developed innovative uses of deep learning and artificial intelligence and applied them to a wide range of problems – ranging from interpretation of clinical images to improving search and retrieval of highly relevant citations from NLM’s PubMed biomedical literature database.

NLM pioneered strategies to link data sets to articles through our PubMed Central (PMC) digital archive, and doubled the size of the NLM-supported Network of the National Library of Medicine—reaching almost every congressional district in the United States with the capacity to connect NLM resources to communities in need.

We provided technical expertise to develop a secure single sign-on to a wide range of controlled data resources, and redeployed our research infrastructure to help public health authorities detect foodborne outbreaks and track the emergence of coronavirus variants. We also advanced our use of automated-first indexing to make sure that the published literature is available to our stakeholders as quickly as possible.

With the support and collaboration of other components of NIH, we are building a 21st century digital library that uses our collections to offer literature, data, analytical models, and new approaches to scientific communications that are accessible, sustainable, and available 24 hours a day and 7 days a week.

NLM’s archival collections continue to grow and evolve as the archival records of individuals, organizations, and other communities in health and medicine are increasingly created and communicated electronically or digitally. We expanded the formats and types of records we collect—and make accessible and usable— to include born-digital formats such as websites, social media, and data sets. For example, NLM deployed innovative techniques to prospectively curate and add COVID-19-related information from traditional news, social media, and other sources to our Digital Collections. These collections preserve for future research the ephemeral online record of modern health crises, documenting the work and experiences of health care providers, researchers, government agencies, news agencies, patients, and caregivers.

As a nurse and an industrial engineer specializing in health systems engineering applied to patient self-management, I bring a perspective to NLM that expands its mandate from supporting biomedical researchers and clinical practitioners to one that aggressively supports the health of the nation.

During my tenure, NLM’s footprint has expanded by:

  • Growing our research enterprise in support of data-driven discovery;
  • Supporting key priorities of the NIH in data science, access to secure data repositories, and community engagement;
  • Strengthening the integrity and efficiency of our internal resources to accelerate the acquisition, preservation, and dissemination of biomedical data; and
  • Expanding our commitment to public outreach and engagement.

Two guiding principles have shaped my work:   

One NLM

I initiated the One NLM concept as an organizing framework during my first year as director of NLM. One NLM creates a rallying point, making explicit that all our offices and divisions work in concert and in support of NLM’s mission. As described in my January 2017 blog post entitled, One NLM:


One NLM emphasizes the integration of all our valuable divisions and services under a single mantle and acknowledges the interdependency and engagement across our programs. Certainly, each of our stellar divisions . . . have important, well-refined missions that will continue to serve science and society into the future. The moniker of One NLM weaves the work of each division into a common whole. Our strategic plan will set forth the direction for all of the National Library of Medicine, building on and augmenting the particular contributions of each division.

Strengthening the NLM Senior Leadership Team

I employ a team model of leadership—engaging the deputy director, four division directors, and four office directors in biweekly meetings. With the support of external consultants, we engaged in a one-year leadership development activity focused on building capacity for joint decision making, improving risk tolerance, and creating an environment that supports trans-NLM collaborative problem solving. I found that continued engagement with individual members and the leadership team established an organizational milieu that led to improved trust in each other. And the team, which held up in good stead during a period of maximum telework in response to COVID-19, ensured the innovative mobilization of NLM resources to help NIH rapidly assume new research programs, respond to public health needs, and most importantly serve as a trusted source of information.

What I’ve Learned

While I remain true to my core values and beliefs, I’m not the same Patti Brennan as I was when I entered the ‘Mezzanine’ floor of NLM’s Building 38 nearly five years ago. I’ve learned to mobilize and reward the talents of the 1,700 people working at NLM to achieve common goals. I figured out how to work with a boss, something few academics ever actually face. I’m better at finding the niche into NIH conversations and policy-setting meetings where the talents of NLM and our deep understanding of data science accelerate NIH’s mission to turn discovery into health. I’ve created space in conversations for the voices of others, particularly the members of my leadership team with whom, I’ve learned, complement my vision and drive with their knowledge and discernment. It’s been a great ride!

How does the you of 2021 compare to the you of 2016?  

Dr. Isaac Kohane: Making Our Data Work for Us!

Last weekend, Isaac Kohane, MD, PhD, FACMI, Marion V. Nelson Professor of Biomedical Informatics, and Chair of the Department of Biomedical Informatics at Harvard Medical School received the 2020 Morris F. Collen Award of Excellence at the AMIA 2020 Virtual Annual Symposium. This award – the highest honor in informatics – is bestowed to an individual whose personal commitment and dedication to medical informatics has made a lasting impression on the field.

Throughout his career, Dr. Kohane has worked to extract meaning out of large sets of clinical and genomic data to improve health care. His efforts mining medical data have contributed to the identification of harmful side-effects associated with drug therapy, recognition of early warning signs of domestic abuse, and detection of variations and patterns among people with conditions such as autism.

As the lead investigator of the i2b2 (Informatics for Integrating Biology & the Bedside) project, a National Institutes of Health-funded National Center for Biomedical Computing initiative, Dr. Kohane’s work has led to the creation of a comprehensive software and methodological framework to enable clinical researchers to accelerate the translation of genomic and “traditional” clinical findings into novel diagnostics, prognostics, and therapeutics.

Dr. Kohane is a visionary with a motto:  Make Our Data Work for Us! Please join me in congratulating Dr. Kohane, recipient of the 2020 Morris F. Collen Award of Excellence.

Hear more from Dr. Kohane in this video.

Video transcript (below)

The vision that has driven my research agenda is that we were not doing our patients any favors by not embracing information technology to accelerate our ability to both discover new findings in medicine, and to improve the way we deliver the medicine.

What does “make our data work for us” mean? It means that let’s not just use it for the real reason most of it is accumulated at present, which is in order to satisfy administrative or reimbursement processes. Let’s use it to improve health care.

Using just our claims data, we can actually predict – better than genetic tests – recurrence rates for autism. It’s the ability to show, with these same data, that drugs used for preventing immature birth in the genetic form are just as effective as those that are brand name; 40 times as expensive. It’s, as we’ve seen most recently, the ability to pull together data around pandemics within weeks, if and only if, we understand the data that’s spun off our health care systems in the course of care.

And finally, as exemplified by work on FHIR, which was funded by the Office of the National Coordinator and then the National Library Medicine, the ability to flow the data directly to the patient to finally allow patients’ access to their data in a computable format to allow decision support for the patient without going through the long loop of the health care system.

Because the NIH and NLM have invested in working on real-world sized experiments in biomedical informatics, on supporting the education of the individuals who drive those projects, and in supporting the public standards that are necessary for these projects to work and to scale, they’ve established an ecosystem that now is able to deliver true value to decision makers, to clinicians, and now to patients, as we’re seeing with a SMART on FHIR implementation on smartphones.

So, for those of you — the biomedical informaticians of the future who are clinicians — I strongly recommend that you don’t wait for someone else to fix the system. You have the most powerful tools to affect medicine, information processing tools. So, don’t wait to get old. Don’t wait to be recognized. You have the tools. Get in there, help change medicine. We all depend on you!

Introducing the NIH Guide Notice Encouraging Researchers to Adopt U.S. Core Data for Interoperability Standard

Recently, NIH issued a guide notice (NOT-OD-20-146) encouraging NIH-supported clinical programs and researchers to adopt and use the standardized set of healthcare data classes, data elements, and associated vocabulary standards in the U.S. Core Data for Interoperability (USCDI) standard. This standard will make it easier to exchange health information for research and clinical care, and is required under the Office of the National Coordinator Health Information Technology (ONC) Cures Act Final Rule to support seamless and secure access, exchange, and use of electronic health information.

USCDI standardizes health data classes and data elements that make sharing health information across the country interoperable, expands on data long required to be supported by certified EHRs, and incorporates health data standards developed.

NLM is proud to support USCDI through continued efforts to establish and maintain clinical terminology standards within the Department of Health and Human Services.

Standardized health data classes and elements enable collaboration, make it easier to aggregate research data, and enhance the discoverability of groundbreaking research. USCDI adoption will allow care delivery and research organizations to use the same coding systems for key data elements that are part of the USCDI data classes.

I encourage you to read more about the new guide notice in a joint post developed in collaboration with my NIH and ONC colleagues titled: “Leveraging Standardized Clinical Data to Advance Discovery.” And I ask you to consider, what could this notice mean for you? 

Some Insights on the Roles and Uses of Generalist Repositories

Guest post by Susan Gregurick, PhD, Associate Director for Data Science and Director, Office of Data Science Strategy, NIH

Data repositories are a useful way for researchers to both share data and make their data more findable, accessible, interoperable, and reusable (that is, aligned with the FAIR Data Principles).

Generalist repositories can house a vast array of data. This kind of repository does not restrict data by type, format, content, or topic. NIH has been exploring the roles and uses of generalist repositories in our data repository landscape through three activities, which I describe below, garnering valuable insights over the last year.

A pilot project with a generalist repository

NIH Figshare archive

Last September, I introduced Musings readers to the one-year Figshare pilot project, which was recently completed. Information about the NIH Figshare instance — and the outcomes of the project — is available on the Office of Data Science Strategy’s website. This project gave us an opportunity to uncover how NIH-funded researchers might utilize a generalist repository’s existing features. It also allowed us to test some specific options, such as a direct link to grant information, expert guidance, and metadata improvements.

There are three key takeaways from the project:

  • Generalist repositories are growing. More researchers are depositing data in, and more publications are linking to, generalist repositories.
  • Researchers need more education and guidance on where to publish data and how to effectively describe datasets using detailed metadata.
  • Better metadata enables greater discoverability. Expert metadata review proved to be one of the most impactful and unique features of the pilot instance, which we determined through two key metrics. When compared to data uploaded to the main Figshare repository by NIH-funded investigators, the NIH Figshare instance had files with more descriptive titles (e.g., twice as long) and metadata descriptions that were more than three times longer.
Illustrating how professionals can identify opportunities for collaboration and competition.

The NIH Figshare instance is now an archive, but the data are still discoverable and reusable. Although this specific pilot has concluded, we encourage NIH-funded researchers to use a generalist repository that meets the White House Office of Science and Technology Policy criteria when a domain-specific or institutional repository is not available.

A community workshop on the role of generalist repositories

In February, the Office of Data Science Strategy hosted the NIH Workshop on the Role of Generalist and Institutional Repositories to Enhance Data Discoverability and Reuse, bringing together representatives of generalist and institutional repositories for a day and a half of rich discussion. The conversations centered around the concept of “coopetition,” the importance of people in the broader data ecosystem, and the importance of code. A full workshop summary is available, and our co-chairs and the workshop’s participating generalist repositories recently published a generalist repository comparison chart as one of the outcomes of this event.

We plan to keep engaging with this community to better enable coopetition among repositories while working collaboratively with repositories to ensure that researchers can share data effectively.

An independent assessment of the generalist repository landscape

We completed an independent assessment to understand the generalist repository landscape, discover where we were in tune with the community, and identify our blind spots. Key findings include the following:

  • There is a clear need for the services that generalist repositories provide.
  • Many researchers currently view generalist repository platforms as a place to deposit their own data, rather than a place to find and reuse other people’s data.
  • Repositories and researchers alike are looking to NIH to define its data sharing requirements, so each group knows what is expected of them.
  • The current lack of recognition and rewards for data sharing helps reinforce the focus on publications as the key metric of scientific output and therefore may be a disincentive to data sharing.

The pilot, workshop, and assessment provided us with a deeper understanding of the repository landscape.

We are committed to advancing progress in this important area of the data ecosystem of which we are all a part. We are currently developing ways to continue fostering coopetition among generalist repositories; strategies for increasing engagement with researchers, institutional repositories, and data librarians; and opportunities to better educate the biomedical research community on the value of effective data management and sharing.

The Office of Data Science Strategy will announce specific next steps in the near future. In the meantime, we invite you to share your ideas with us at datascience@nih.gov.

Dr. Gregurick leads the implementation of the NIH Strategic Plan for Data Science through scientific, technical, and operational collaboration with the institutes, centers, and offices that make up NIH. She has substantial expertise in computational biology, high performance computing, and bioinformatics.