Fostering a Culture of Scientific Data Stewardship

Guest post by Jerry Sheehan, Deputy Director, National Library of Medicine.

Making research data broadly findable, accessible, interoperable, and reusable is essential to advancing science and accelerating its translation into knowledge and innovation. The global response to COVID-19 highlights the importance and benefits of sharing research data more openly.

The National Institutes of Health (NIH) has long championed policies that make the results of research available to the public. Last week, NIH released the NIH Policy for Data Management and Sharing (DMS Policy) to promote the management and sharing of scientific data generated from NIH-funded or conducted research. This policy replaces the 2003 NIH Data Sharing Policy.

The DMS policy was informed by public feedback and requires NIH-funded researchers to plan for the management and sharing of scientific data. It also makes clear that data sharing is a fundamental part of the research process.

Data sharing benefits the scientific community and the public.

For the scientific community, data sharing enables researchers to validate scientific results, increasing transparency and accountability. Data sharing also strengthens collaborations that allow for richer analyses. Strong data-sharing practices facilitate the reuse of hard-to-generate data, such as those acquired during complex experiments or once-in-a-lifetime events like natural disasters or pandemics.

For the public, sound data-sharing practices demonstrate good stewardship of taxpayer funds. Clear, well-written data sharing and management plans promote transparency and accountability to society. They also expand opportunities for data to be access and reused by clinicians, students, educators, and innovators in health care and other sectors of the economy.

As an organization dedicated to improving access to data and information to advance biomedical sciences and public health, NLM plays a key role in implementing the new policy and supporting researchers in meeting its requirements. NLM maintains a number of data repositories, such as the Sequence Read Archive and ClinicalTrials.gov, that curate, preserve, and provide access to research data. NLM also maintains a longer list of NIH-supported data repositories that accept different types of data (e.g., genomic, imaging) from different research domains (e.g., cancer, neuroscience, behavioral sciences). Where appropriate domain-specific repositories do not exist, NLM has made clear how researchers can include small datasets (<2GB) with articles deposited in NLM’s PubMed Central (PMC) under the NIH Public Access Policy.

NLM also works with the broader library community to support improved data management and sharing. Supplemental information issued with the new policy makes it clear that research budgets can include costs of data management and sharing, such as those for data curation, formatting data to accepted standards, attaching metadata to foster discoverability, and preparing data for storage in a repository. These are the kinds of services increasingly provided by libraries and librarians in universities and academic medical centers across the country. NLM, through the Network of the National Library of Medicine, offers training in data management and data literacy to health science, public, and other librarians to expand capacity for these important services.

NIH’s DMS Policy applies to all research, funded or conducted in whole or in part by NIH, that results in the generation of scientific data. This includes research funded or conducted by extramural grants, contracts, intramural research projects, or other funding agreements. The DMS Policy does not apply to research and other activities that do not generate scientific data, including training, infrastructure development, and non-research activities.

NIH will continue to engage the research community to support the change and implementation of this new policy, which will go into effect in January 2023. NLM will continue to work within NIH and across the library and information science communities to develop innovative ways to support the policy and advance the effective stewardship of research data. Let us know how else we can support this important policy advance.

Read more about this major policy release in the NIH’s Under the Poliscope blog.

As NLM Deputy Director, Jerry Sheehan shares responsibility with the Director for overall program development, program evaluation, policy formulation, direction and coordination of all Library activities. He has made major contributions to the development and implementation of NIH, HHS, and U.S. government-wide policy related to open science, public access to government-funded information, clinical trials registration, and electronic health records.

Bridging the Gap: From Research to Policy

Guest post by Ellen T. Kurtzman, PhD, MPH, RN, FAAN, associate professor, School of Nursing, The George Washington University

As a health services researcher, I have always been interested in how to bridge the divide between research and policy. I constantly ask myself, “Which of my research questions will inform today’s most pressing policy debates?” and “How can I teach the next generation of nurse scientists to conduct policy-relevant research?” I recently left my academic position and spent a year working on Capitol Hill as one of eight 2018 –2019 Robert Wood Johnson Foundation Health Policy Fellows. In this blog, I offer a few key lessons from my time as a fellow that influenced my scholarship.

Lessons from my fellowship year

  • Right place, right time. The policymaking environment is fast paced. New issues emerge quickly, moving others lower on the priority list. The deck is constantly being reshuffled. Perhaps there is no better example of this than COVID-19. Who knew a year ago that a pandemic would draw decision makers’ attention away from other pressing policy issues? When a policy issue like this emerges unexpectedly, the need for evidence is virtually instantaneous. But the research process is methodical and cannot easily be accelerated. Randomized studies and clinical trials take time. Which implies that the scientific process and policymaking timelines do not naturally mesh. Recognizing that available evidence needs to be ready at precisely the moment that a policy issue is being contemplated suggests that the relationship between science and policymaking should be reframed.  
  • Positioning researchers to contribute. Because there are so many policy issues being contemplated simultaneously, deep subject matter expertise from authoritative and independent sources is highly valued. Scientists and academics are ideally situated to be honest brokers, yet it is not always easy for policy staff to find expertise on short notice. Researchers need to better position themselves and their science during a noncrisis period so that they are ‘top-of-mind’ when urgent needs emerge.
  • All about trade-offs. Harold Lasswell, an influential political scientist and theorist, helped define “politics” by asking, “Who gets what, when, and how?” Public policy is the art of allocating scarce resources to competing parties. I have always been interested in research questions about health care quality and value, but many of the secondary data sources I rely on lack the variables that would enable me to examine price or cost outcomes. In the short time I spent on Capitol Hill, it became abundantly clear to me why research that examines quality in the absence of cost considerations is insufficient.

Possible solutions

  • Policy in all things. Nursing, medical, and health sciences programs typically include a single health policy course and/or rotation. Rather than relegating policy to just one course, why not see “policy in all things”? During OB-GYN grand rounds, why not discuss policy solutions that address maternal mortality? What keeps us from asking our psychiatric nursing students to debate mental health parity issues or veteran suicide rates? If we incorporate policy into every course, our students will leave their programs better prepared to bridge the divide between science and policy.
  • New definitions of scholarship. Historically, academia has viewed scholarship in narrow terms. For example, criteria for appointments, promotion, and tenure (APT) reward refereed journal articles and colloquia, yet these materials are not generally accessible or readily available outside of academic circles. To bridge the divide between science and policy, academics might consider adopting a broader definition of scholarship and creating incentives for deliverables that appeal to decision makers. Could we, for example, adjust APT criteria so that the process rewards policy papers, issue briefs, and congressional testimony equally? By encouraging scholarship that reaches decision makers, we would be optimizing the policy impact of our science.
  • Enhanced dissemination and outreach. Policymakers need the deep expertise that scientists and academics possess, but we are often siloed from one another. With rare exceptions, we tend not to attend the same meetings or conferences, read the same journals or books, or consume the same news or other media. I now realize that, for my work to inform policy, I need to reconsider how I package and disseminate my findings as well as how I position myself as a subject matter expert. By understanding and following key policy issues, learning how to communicate with policymakers, and investing time and energy in building relationships during times of calm, I will be facilitating swifter adoption of my science and more meaningful dialogue with policy staff when there is a critical need for information.

Dr. Kurtzman is a health services researcher and a tenured associate professor of nursing with secondary appointments in the university’s Milken Institute School of Public Health and Trachtenberg School of Public Policy & Public Administration. Her investigator-initiated research explores the impact of federal, state, and institutional policies on health care quality and the role of the health care workforce in achieving higher value care. She is currently exploring the impact of states’ cannabis policies on health outcomes including the consequences for pregnant women and their infants. 

Share Your Thoughts on NIH’s Research Priorities

Guest post by Leigh Samsel, MS, NLM Planning and Evaluation Officer and NLM representative to the NIH-Wide Strategic Plan Working Group.

The National Institutes of Health (NIH) is developing its next NIH-Wide Strategic Plan, and we’re asking for your input. This plan will help NIH capitalize on new opportunities for scientific exploration.

Building on the previous NIH-Wide Strategic Plan, the new plan will guide NIH’s research efforts for Fiscal Years 2021–2025. The framework articulates NIH’s priorities in the following key areas:

  • Biomedical and behavioral science research
  • Scientific research capacity
  • Scientific integrity, public accountability, and social responsibility in the conduct of science

In addition, the framework identifies several cross-cutting themes that span the scope of these priorities.

The goal of this NIH-Wide Strategic Plan is to highlight major themes that encompass all of NIH. It is not intended to outline the numerous important research opportunities for specific disease applications, which are covered in the existing strategic plans developed by the 27 Institutes, Centers, and Offices that make up NIH.

I hope you’ll review the strategic plan framework described in the Request for Information (RFI) and provide feedback using the RFI submission site.

Responses to the RFI will be accepted through March 25**.  NIH is encouraging stakeholder organizations (e.g., patient advocacy groups, professional societies, etc.) to submit a single response reflective of the views of the organization/membership as a whole.

** Update: The deadline has been extended until 11:59 pm, April 1, 2020.

Want to Learn More?

NIH is hosting two webinars in March to describe the planning process and answer questions. Those dates are:

  • Monday, March 9 – 1:30 pm – 2:30 pm EST
  • Monday, March 16 – 10:00 am – 11:00 am EST

Additional details about the webinars can be found on the NIH-Wide Strategic Plan webpage.

Your input is vital to ensuring that the NIH-Wide Strategic Plan for Fiscal Years 2021–2025 puts biomedical research on a promising and visionary path. I appreciate your time and consideration in assisting NIH with this effort.  

Leigh Samsel, MS, is responsible for formal reporting of NLM activities and for providing staff leadership to strategic planning activities.

Everyone’s Voice Matters: Making Science Open and Accessible to the Public

Last month, the National Institutes of Health (NIH) released its Draft NIH Policy for Data Management and Sharing and Supplemental Draft Guidance (Draft NIH Policy), making it available for public comment. Comments are due by January 10, 2020. Because everyone’s voice matters, I’m calling on the Musings audience to review the draft and offer your perspectives on this policy now! 

The Draft NIH Policy arises from NIH’s deep commitment to fostering a culture of scientific data stewardship.

Data stewardship is a research responsibility that includes systematically acquiring data, carefully documenting data, securely storing data, and, where possible, making data available for use by other scientists and society as a whole. This last activity, often referred to as “data sharing,” is essential for accelerating the translation of science into knowledge and ensuring that the full value of the data collected becomes the substrate for future discoveries.

NIH’s Long-Standing Commitment to Make Research Results Available

In 2003, NIH released its original data sharing policy, which established the expectation that research data from large NIH-supported awards will be shared to the extent allowed by scientific protocol and human subjects considerations. Since 2008, the NIH Public Access Policy has ensured that the public has free access to the published results of NIH-funded research. NLM’s PubMed Central, a free, full-text archive of peer-reviewed biomedical and life sciences journal literature, serves as the repository for these articles.

In 2014, NIH updated its Genome-Wide Association Studies Policy with an expanded NIH Genomic Data Sharing Policy to ensure the broad and responsible sharing of genomic research data. And in 2016, the NIH published the NIH Policy on the Dissemination of NIH-Funded Clinical Trial Information, which established expectations for registering and submitting the results of all NIH-funded clinical trials on ClinicalTrials.gov. Individual Institutes, Centers, and programs have also established expectations for managing and sharing data resulting from their funded research.

Data Sharing Principles

NIH recognizes that all scientific data need to be managed according to sound principles. The Draft NIH Policy would require researchers to develop explicit data management and sharing plans that describe their approaches for preserving and sharing data. Reasonable, allowable costs for data curation and preservation would be permitted as direct expenses for the project. Proposed guidance about allowable costs of data management and sharing, and the elements of a good data management and sharing plan was released along with the draft policy and can be found on the NIH Data Management and Sharing Activities Related to Public Access and Open Science web page.

While promoting broad sharing of data, the Draft NIH Policy is deliberately designed to be flexible and allow researchers to propose approaches that address legal, ethical, and other practical considerations that may limit data sharing. The policy proposes that data management and sharing plans be submitted “just in time” and evaluated by NIH program staff. Agreed plans will be incorporated into Terms and Conditions of the Award, and NIH staff will monitor compliance with the plans at regular reporting intervals.

Data Sharing Benefits the Scientific Community and the Public

For the scientific community, data sharing enables the validation of scientific results by both the originator of the data and other scientists, increasing transparency and accountability. Data sharing also strengthens collaborations, which allows for richer analyses. Strong data-sharing practices facilitate the reuse of hard-to-generate data, such as those acquired during complex experiments or once-in-a-lifetime events like natural disasters. And, finally, data sharing promotes scientific progress and accelerates future research.

For the public, sound data-sharing practices demonstrate good stewardship of taxpayer funds. Clear, well-written data-sharing and management plans promote transparency and accountability to society. And for research involving human subjects, data sharing honors participants’ efforts by maximizing the contribution of the data acquired through their participation.

Tell Us What You Think!

NIH acknowledges that this draft policy offers new opportunities for advancing science while also creating new expectations and responsibilities for librarians, scientists, trainees and graduate students, and institutional research management offices. And I’ve highlighted some of the benefits of data sharing to the scientific community and the public.

As I emphasized earlier in this post, everyone’s voice matters — so we’d like to hear from all of you about the approach NIH is proposing. You can share your comments on the purpose of the policy, its key definitions, the scope and requirements for the plans, and the effective dates until Friday, January 10, 2020.

Want to Learn More?

NIH is hosting an informational webinar on the Draft NIH Policy for Data Management and Sharing and Supplemental Draft Guidance on Monday, December 16, 2019, from 12:30 p.m. to 2:00 p.m. EST. The purpose of the webinar is to provide information on the draft policy and answer any questions about the public comment process.

Please note that public comments will not be accepted during the webinar; they must be submitted here.

Accessing the Webinar

If you would like to attend the December 16 webinar, please see the instructions below:

  • To view the webinar presentation, click here.
  • To join the webinar by phone:
    • U.S. and Canadian participants can dial 866-844-9416 and enter passcode 4009108.

Please note that while you will be able to view the webinar through Webex, you must use one of the specified phone lines to connect to the audio. You will not be able to dial in to the webinar via your computer.

You may also send questions in advance of the webinar to SciencePolicy@od.nih.gov.

Taking NLM’s Story to Capitol Hill

Last month, I had the honor of joining National Institutes of Health (NIH) Director Francis Collins, MD, PhD, and four other NIH Institute Directors to provide testimony before the U.S. House Congressional Subcommittee on Appropriations for NIH Investments in Medical Research. This was the first time in 12 years that NLM provided testimony to Congress.

Each of us was given the opportunity to deliver a three-minute opening statement. As you can imagine, distilling our many successes and contributions down to a three-minute statement was incredibly challenging. I wish that there had been more time because we have so many wonderful stories to share. We were also able to submit a written statement, which is provided later in this post.

It is my hope that NLM will have more opportunities to share with Congress further insights and details about how NLM’s biomedical informatics and data science research play an integral role in supporting the mission of NIH and how we — true to the NIH tagline — turn discovery into health.

Below is the written testimony that was submitted:

PREPARED STATEMENT OF PATRICIA FLATLEY BRENNAN, RN, PhD, DIRECTOR, NATIONAL LIBRARY OF MEDICINE

Madam Chairwoman and Members of the Subcommittee: I am pleased to have this opportunity to speak to you about the exciting work taking place at the National Library of Medicine of the National Institutes of Health (NIH).

ACCELERATING BIOMEDICAL DISCOVERY & DATA-POWERED HEALTH

The National Library of Medicine (NLM) plays an essential role in catalyzing basic biomedical science through its cutting-edge data science and informatics research, comprehensive information systems, and extensive research training programs. As the world’s largest biomedical library, NLM acquires, organizes, and delivers up-to-date biomedical information across the United States and around the globe. NLM operates some of the most heavily used Federal websites.

Millions of data scientists, health professionals, and members of the public use NLM’s electronic information sources every day to translate research results into new treatments, products, and practices and provide the foundation for clinical decision making by health professionals and patients.

Leveraging its 180-year history of organizing and disseminating biomedical literature, NLM is committed to the application of emerging data science capabilities to challenges in biomedical research and public health.

It does this by enhancing its data and information resources and providing leadership in both the acquisition and analysis of data for discovery. It continues to expand its core biomedical literature and genomic collections to include a broad array of health, clinical, and biological data types. It makes these data findable, accessible, interoperable, and reusable (FAIR) for research.

NLM is investing in new research programs to systematically characterize and curate data describing complex health phenomena and to devise new methods to uncover the knowledge held in data. It has restructured its 16 biomedical informatics training programs to address data science as they continue to foster excellence and support a diverse workforce. NLM is in the process of developing an efficient organizational structure to accommodate emerging directions in research and services.

RESEARCH IN BIOMEDICAL INFORMATICS AND DATA SCIENCE

NLM’s research programs support pioneering research and development to advance knowledge in biomedical informatics and data science. Its research portfolio spans such areas as artificial intelligence, computational biology, clinical decision support, public health surveillance, visualization, and discovery mining in digital data sets. This research encompasses areas of high importance to NIH and society at large, and for audiences ranging from clinicians and scientists to consumers and patients.

Research in data science produces novel analytical approaches and visualization tools that help scientists accelerate discovery from data and translate these findings to clinical solutions. It also aims to solve problems consumers face in accessing, storing, using, and understanding their own health data and to produce tools that make precision medicine discoveries available and more understandable to patients.

Biomedical informatics research is yielding advanced analytical methods and tools for use against large scale data generated from clinical care, leading to fuller understanding of the effects of medications and procedures as well as individual factors important in the prevention and treatment of disease processes.

Recognized as a leader in clinical information analytics, NLM supports and conducts research in areas such as medical language processing, high-speed access to biomedical information, analysis and use of high-quality imaging data, health data standards; and analysis of large databases of clinical and administrative data to predict patient outcomes and validate findings from clinical research studies. Leveraging extensive machine learning experience and field-based projects, NLM is now advancing analytical tools and deep learning techniques for application in image analysis research.

NLM’s biomedical informatics research also addresses issues in computational biology. Research creates new ways to represent and link together genomic and biological data and biomedical literature and produces analytic software tools for gaining insights in areas such as genetic mutational patterns and factors in disease, molecular binding, and protein structure and function.

Last year, NLM established a new partnership with the National Science Foundation to support research on advanced analytical methods specifically applied to health.

BIOMEDICAL INFORMATION SYSTEMS FOR RESEARCH AND HEALTH

NLM develops and operates a set of richly linked databases that promote scientific breakthroughs and play an essential role in all phases of research and innovation.

Every day, NLM receives up to 15 terabytes of new data and information, enhances their quality and consistency, and integrates them with other NLM information. It responds to millions of inquiries per day from individuals and computer systems, serving up some 115 terabytes of information. This includes genomic data, such as that contained in the Sequence Read Archive, as well as citations to more than 30 million journal article records in PubMed.

On any given day, more than 2.5 million people use NLM’s PubMed Central (PMC) to retrieve more than 5 million full-text biomedical journal articles. PMC serves as the repository for NIH’s Public Access Policy and includes more than one million articles summarizing the results of NIH-funded research. Additionally, ten other federal agencies use PMC as the repository for publications collected under their public access policies.

Recently, NLM enhanced the ability to connect articles in PMC to openly available datasets that support reported research findings. Currently, more than 300,000 articles in PMC include datasets as supplemental materials. Others link to datasets hosted in other trusted repositories. The addition of this information has resulted in a 30 percent increase in daily downloads of supplementary material from PMC.

NLM also offers sophisticated retrieval methods and analysis tools to mine this wealth of data, many of which grow out NLM’s research and development programs.

For example, NLM tools are used to mine journal articles and electronic health records (EHRs) to discover adverse drug reactions, analyze high throughput genomic data to identify promising drug targets, and detect transplant rejection earlier so interventions to help clinical research participants can begin more quickly. Data analysis tools also support complex analyses of richly annotated genomics data resources, yielding important molecular biology discoveries and health advances for applications to clinical care. Such applications demonstrate how the benefits of big data critically depend upon the existence of algorithms that can transform such data into information.

As a major force in health data standards for more than 30 years, NLM’s investments have led to major advances in the ways high volume research and clinical data are collected, structured, standardized, mined, and delivered.

In close collaboration with other HHS agencies, NLM develops, funds, and disseminates clinical terminologies designated as essential for demonstrating meaningful use of EHRs and health information exchange. The goal is to ensure that clinical data created in one system can be transmitted, interpreted, and aggregated appropriately in other systems to support health care, public health, and research. NLM produces a range of tools to help EHR developers and users implement these standards and makes them available in multiple formats, including via application programming interfaces or APIs.

NLM is now providing support to develop tools to facilitate research use of the Fast Healthcare Interoperability Resource, or FHIR, standard that is being widely adopted for use in electronic health records.

ENGAGING THE PUBLIC WITH HEALTH INFORMATION

NLM uses multiple channels to reach the public with health information, including development of consumer-friendly websites, direct contact, and human networks that reach out to communities.

Direct-to-consumer information is made available in lay language through MedlinePlus, which covers more than 1,000 health topics. EHR systems can connect directly with MedlinePlus to deliver information to patients and health care providers at the point of need in healthcare systems. In collaboration with other NIH Institutes and Centers and other partners, NLM produces the print and online NIH MedlinePlus magazine, and its Spanish counterpart, NIH Salud.

The National Network of Libraries of Medicine (NNLM) engages more than 7,000 academic health sciences libraries, hospital libraries, public libraries, and community-based organizations as valued partners in conducting outreach to ensure the availability of health information and efficient access to NLM services. The NNLM provides a community-level resource for NIH’s All of Us program, ensuring a point of presence in almost every county in the U.S. The NNLM provides a robust network that reaches communities that are often underrepresented in biomedical research.

NNLM partners with local, state, and national disaster preparedness and response efforts to promote more effective use of libraries and librarians and ensure access to health information in disasters and emergencies. NNLM also plays an important role in increasing the capacity of research libraries and librarians to support data science and improve institutional capacity in management and analysis of biomedical data.

CONCLUSION

To conclude, through its research, information systems and public engagement, NLM supports discovery and the clinical application of knowledge to improve health. Its programs provide important foundations for the field of biomedical informatics and data science, bringing the methods and concepts of computational, informational, quantitative, social, behavioral, and engineering sciences to bear on problems related to basic biomedical and behavioral research, health care, public health, and consumer use of health-related information.

To watch the entire proceedings, click here: https://appropriations.house.gov/events/hearings/investments-in-medical-research-at-five-institutes-and-centers-of-the-national