Some Insights on the Roles and Uses of Generalist Repositories

Guest post by Susan Gregurick, PhD, Associate Director for Data Science and Director, Office of Data Science Strategy, NIH

Data repositories are a useful way for researchers to both share data and make their data more findable, accessible, interoperable, and reusable (that is, aligned with the FAIR Data Principles).

Generalist repositories can house a vast array of data. This kind of repository does not restrict data by type, format, content, or topic. NIH has been exploring the roles and uses of generalist repositories in our data repository landscape through three activities, which I describe below, garnering valuable insights over the last year.

A pilot project with a generalist repository

NIH Figshare archive

Last September, I introduced Musings readers to the one-year Figshare pilot project, which was recently completed. Information about the NIH Figshare instance — and the outcomes of the project — is available on the Office of Data Science Strategy’s website. This project gave us an opportunity to uncover how NIH-funded researchers might utilize a generalist repository’s existing features. It also allowed us to test some specific options, such as a direct link to grant information, expert guidance, and metadata improvements.

There are three key takeaways from the project:

  • Generalist repositories are growing. More researchers are depositing data in, and more publications are linking to, generalist repositories.
  • Researchers need more education and guidance on where to publish data and how to effectively describe datasets using detailed metadata.
  • Better metadata enables greater discoverability. Expert metadata review proved to be one of the most impactful and unique features of the pilot instance, which we determined through two key metrics. When compared to data uploaded to the main Figshare repository by NIH-funded investigators, the NIH Figshare instance had files with more descriptive titles (e.g., twice as long) and metadata descriptions that were more than three times longer.
Illustrating how professionals can identify opportunities for collaboration and competition.

The NIH Figshare instance is now an archive, but the data are still discoverable and reusable. Although this specific pilot has concluded, we encourage NIH-funded researchers to use a generalist repository that meets the White House Office of Science and Technology Policy criteria when a domain-specific or institutional repository is not available.

A community workshop on the role of generalist repositories

In February, the Office of Data Science Strategy hosted the NIH Workshop on the Role of Generalist and Institutional Repositories to Enhance Data Discoverability and Reuse, bringing together representatives of generalist and institutional repositories for a day and a half of rich discussion. The conversations centered around the concept of “coopetition,” the importance of people in the broader data ecosystem, and the importance of code. A full workshop summary is available, and our co-chairs and the workshop’s participating generalist repositories recently published a generalist repository comparison chart as one of the outcomes of this event.

We plan to keep engaging with this community to better enable coopetition among repositories while working collaboratively with repositories to ensure that researchers can share data effectively.

An independent assessment of the generalist repository landscape

We completed an independent assessment to understand the generalist repository landscape, discover where we were in tune with the community, and identify our blind spots. Key findings include the following:

  • There is a clear need for the services that generalist repositories provide.
  • Many researchers currently view generalist repository platforms as a place to deposit their own data, rather than a place to find and reuse other people’s data.
  • Repositories and researchers alike are looking to NIH to define its data sharing requirements, so each group knows what is expected of them.
  • The current lack of recognition and rewards for data sharing helps reinforce the focus on publications as the key metric of scientific output and therefore may be a disincentive to data sharing.

The pilot, workshop, and assessment provided us with a deeper understanding of the repository landscape.

We are committed to advancing progress in this important area of the data ecosystem of which we are all a part. We are currently developing ways to continue fostering coopetition among generalist repositories; strategies for increasing engagement with researchers, institutional repositories, and data librarians; and opportunities to better educate the biomedical research community on the value of effective data management and sharing.

The Office of Data Science Strategy will announce specific next steps in the near future. In the meantime, we invite you to share your ideas with us at datascience@nih.gov.

Dr. Gregurick leads the implementation of the NIH Strategic Plan for Data Science through scientific, technical, and operational collaboration with the institutes, centers, and offices that make up NIH. She has substantial expertise in computational biology, high performance computing, and bioinformatics.

It’s My Birthday: An Ode to Aging and to Lifespan Research

When you grow up in a family of 10 kids, like I did, your birthday is a very special day. My mom and dad made sure that it was always a celebration, with breakfast pancakes, a picnic lunch in the park, and favorite foods for dinner. It’s a day that’s just yours. By now, I’ve had more than 65 birthdays, and I have to say, each one is better than the last! 

Like most people, I find that birthdays are a time for reflection, when you can pause and pinpoint where you are in the arc of your life. As a kid, it was a time of pure pleasure; as an adolescent, I wanted to be further along that arc. In midlife, I think I’m where I’m supposed to be, because I feel like I’m 39, think I look like I’m 49, believe I have a career worthy of someone who’s 59, and am approaching the wisdom of someone who’s 69. And although my reflections — both my thoughts and my likeness in the mirror — have changed over time, they all still reside within me, and every stage of my life informs each moment of my present.  

NIH also recognizes the value of the various life stages and their potential contributions to clinical research. Generating new knowledge for health through biomedical research stands to have the greatest impact if individuals from across the lifespan are included, as appropriate, in a study. 

This idea was deemed so important that NIH released the NIH Policy and Guidelines on the Inclusion of Individuals Across the Lifespan as Participants in Research Involving Human Subjects. In essence, this Inclusion Across the Lifespan policy states that individuals of all ages, including children (i.e., individuals under the age of 18) and older adults, must be included in all human subjects research conducted or supported by NIH unless there are scientific or ethical reasons not to. NIH is also hosting a virtual workshop, NIH Inclusion Across the Lifespan II, in September to review the lessons learned from this policy and examine evidence-based techniques for meeting the needs of this policy in research.

This commitment encourages individuals at all life’s stages and of all ages to participate in clinical research, as appropriate, where innovations and therapeutics are being created, tested, and evaluated. As a result, we’ll learn more about the effectiveness of new medicines on children or why some older adults maintain robust physical function well into their 90s, and participants will benefit from being involved in leading-edge research.

Along with requiring the inclusion of participants from across the lifespan in research studies, NIH also requires that summary results of those studies be made available to the public. Submission of the complete results for any clinical trial to ClinicalTrials.gov must include information on the age of the enrolled participants overall and in each study arm. (A study arm is a group of people in a clinical study who receive a specific drug, medical device, or other intervention.) Age may be listed as categorical variables (data that can be divided into discrete groups such as children, adults, and older adults) or as a summary statistic, such as mean age with a standard deviation or median age with minimum and maximum values.

As we include individuals from across the lifespan in research, it’s important to be clear about which variables are being tested or measured. One way that NIH supports this is by encouraging researchers to employ similar concepts and terms across a range of research programs, for example, by using common data elements in clinical research, patient registries, and other human subjects research, in order to improve data quality and opportunities for comparison. NLM hosts NIH’s Common Data Element (CDE) Resource Portal, which provides access to information about NIH-supported CDEs, as well as tools and resources to assist investigators in developing protocols for data collection.

NIH’s focus on age in research endeavors is yielding positive results, and I’m happy to be able to share articles from NLM’s own PubMed collection that highlight research addressing various age groups. Here are a few specific to older adult populations:

And here are several specific to the pediatric population and those under age 18:

At this point in my life, I’m pleased to know that NIH and the National Library of Medicine supports science that is inclusive of populations across the lifespan, as well as literature and other accounts that record and make available the results of this type of research!

In the future, we will not only have a better understanding of what makes someone healthy or responsive to treatment at a given age, but we’ll be better able to use health data collected in the early stages of life to predict outcomes in older populations.

This scientific crystal ball will benefit all of us. What would you like to ask it?

A New Era of Health Communications

I’ve been reflecting on how communications has transformed our lives, particularly since the COVID-19 pandemic radically changed our ability to interact with others.

Before NLM’s physical workspace shifted to maximum telework, I was walking to work when I passed a strange sight — the last vestiges of pay phones on the National Institutes of Health campus! Those decommissioned pay phones got me thinking about how technology changes over time, how essential communication technology has become, and how NLM’s approach to providing trustworthy biomedical data and health information must evolve as methods of delivery change. As technology advances, we have more choices and greater sophistication in the methods we use to meet our responsibility to deliver biomedical data and health information, as well as in the tools we use to interrogate that information.

Payphones sit outside of the National Library of Medicine, having been removed from use in the building.

The Lister Hill National Center for Biomedical Communications (LHNCBC), now more than 50 years old, provides a case study of how NLM’s efforts to communicate information have been transformed.

LHNCBC was established by a joint congressional resolution in 1968 to stimulate the application of modern communications technologies to the challenges of delivering health information worldwide to support health care services and enhance medical education.

In that same decade, push-button telephone pads were replacing rotary dials, and the Trimline telephone, with the earpiece, mouthpiece, and dial pad in the handset, was introduced. The ARPANET, the early version of the packet-switching internet, appeared soon after. Just as the Trimline phone presaged the design of mobile phones, the early design of LHNCBC laid the foundation for robust innovation in the use of telecommunications tools, computer networks, and high-performance visualization to deliver health information and ensure its use.   

An intramural division of NLM, LHNCBC develops advanced health information resources and software tools that are widely used in biomedical research and by health information technology professionals, health care providers, and consumers. As it seeks to improve access to biomedical information for individuals around the world, LHNCBC conducts and supports research and development on the dissemination of high-quality imagery, medical language processing, high-speed access to biomedical information, intelligent database systems development, multimedia visualization, knowledge management, data mining, and machine-assisted indexing.

In 1994, it launched the Visible Human Project, a landmark accomplishment that made a complete, anatomically detailed, three-dimensional representations of a human male body and a human female body publicly available. 

Current LHNCBC researchers come from a variety of disciplines, including medicine, computer science, library and information science, linguistics, engineering, and education. The Biomedical Informatics Training Program brings together talented individuals to learn from and collaborate with research staff.

Research and development conducted by the interdisciplinary teams across LHNCBC has led to many advances in biomedical communication and information dissemination, such as:

  • Consumer Health Question Answering — This project involves research on both the automatic classification of customer requests and the automatic answering of consumer health questions.
  • Discoveries from Mimic II/III and Other Sources — This effort examines and attempts to validate controversial findings from smaller-scale clinical studies through the interrogation of de-identified medical records and information from health information exchanges. Researchers also conduct retrospective epidemiological studies in areas that lack clinical trials.
  • Open-i — This experimental multimedia search engine retrieves and displays bibliographic citations and their related images by linking to images based on image features.
  • Unified Medical Language System® (UMLS) — This tool integrates key health care terminologies, classifications, and coding systems used by clinicians, billing systems, insurance companies, and researchers. Sources developed include the Metathesaurus®, Semantic Network, and SPECIALIST Lexicon. The UMLS supports health care communication through interoperability, specifically, the mapping of key terms from one vocabulary system to another.

The changes to LHNCBC since its creation in 1968 parallel changes in telecommunications over the past 50 years. Early work at LHNCBC demonstrated how technological advances such as fiber optic networks and semiconductors could be put to best use by the health care sector. Today, LHNCBC continues to improve health through methodological advances in clinical data science and health informatics. We recognize that contemporary communication relies on interoperable data, scalable methods and translation of discovery into operations.  

As health care becomes more highly distributed and NLM resources are increasingly used by individuals around the world and beyond, LHNCBC will continue to be a partner in accelerating health communication.

What trends in health communication do you see ahead? How do you think COVID-19 will shape health communications?

Bridging the Gap: From Research to Policy

Guest post by Ellen T. Kurtzman, PhD, MPH, RN, FAAN, associate professor, School of Nursing, The George Washington University

As a health services researcher, I have always been interested in how to bridge the divide between research and policy. I constantly ask myself, “Which of my research questions will inform today’s most pressing policy debates?” and “How can I teach the next generation of nurse scientists to conduct policy-relevant research?” I recently left my academic position and spent a year working on Capitol Hill as one of eight 2018 –2019 Robert Wood Johnson Foundation Health Policy Fellows. In this blog, I offer a few key lessons from my time as a fellow that influenced my scholarship.

Lessons from my fellowship year

  • Right place, right time. The policymaking environment is fast paced. New issues emerge quickly, moving others lower on the priority list. The deck is constantly being reshuffled. Perhaps there is no better example of this than COVID-19. Who knew a year ago that a pandemic would draw decision makers’ attention away from other pressing policy issues? When a policy issue like this emerges unexpectedly, the need for evidence is virtually instantaneous. But the research process is methodical and cannot easily be accelerated. Randomized studies and clinical trials take time. Which implies that the scientific process and policymaking timelines do not naturally mesh. Recognizing that available evidence needs to be ready at precisely the moment that a policy issue is being contemplated suggests that the relationship between science and policymaking should be reframed.  
  • Positioning researchers to contribute. Because there are so many policy issues being contemplated simultaneously, deep subject matter expertise from authoritative and independent sources is highly valued. Scientists and academics are ideally situated to be honest brokers, yet it is not always easy for policy staff to find expertise on short notice. Researchers need to better position themselves and their science during a noncrisis period so that they are ‘top-of-mind’ when urgent needs emerge.
  • All about trade-offs. Harold Lasswell, an influential political scientist and theorist, helped define “politics” by asking, “Who gets what, when, and how?” Public policy is the art of allocating scarce resources to competing parties. I have always been interested in research questions about health care quality and value, but many of the secondary data sources I rely on lack the variables that would enable me to examine price or cost outcomes. In the short time I spent on Capitol Hill, it became abundantly clear to me why research that examines quality in the absence of cost considerations is insufficient.

Possible solutions

  • Policy in all things. Nursing, medical, and health sciences programs typically include a single health policy course and/or rotation. Rather than relegating policy to just one course, why not see “policy in all things”? During OB-GYN grand rounds, why not discuss policy solutions that address maternal mortality? What keeps us from asking our psychiatric nursing students to debate mental health parity issues or veteran suicide rates? If we incorporate policy into every course, our students will leave their programs better prepared to bridge the divide between science and policy.
  • New definitions of scholarship. Historically, academia has viewed scholarship in narrow terms. For example, criteria for appointments, promotion, and tenure (APT) reward refereed journal articles and colloquia, yet these materials are not generally accessible or readily available outside of academic circles. To bridge the divide between science and policy, academics might consider adopting a broader definition of scholarship and creating incentives for deliverables that appeal to decision makers. Could we, for example, adjust APT criteria so that the process rewards policy papers, issue briefs, and congressional testimony equally? By encouraging scholarship that reaches decision makers, we would be optimizing the policy impact of our science.
  • Enhanced dissemination and outreach. Policymakers need the deep expertise that scientists and academics possess, but we are often siloed from one another. With rare exceptions, we tend not to attend the same meetings or conferences, read the same journals or books, or consume the same news or other media. I now realize that, for my work to inform policy, I need to reconsider how I package and disseminate my findings as well as how I position myself as a subject matter expert. By understanding and following key policy issues, learning how to communicate with policymakers, and investing time and energy in building relationships during times of calm, I will be facilitating swifter adoption of my science and more meaningful dialogue with policy staff when there is a critical need for information.

Dr. Kurtzman is a health services researcher and a tenured associate professor of nursing with secondary appointments in the university’s Milken Institute School of Public Health and Trachtenberg School of Public Policy & Public Administration. Her investigator-initiated research explores the impact of federal, state, and institutional policies on health care quality and the role of the health care workforce in achieving higher value care. She is currently exploring the impact of states’ cannabis policies on health outcomes including the consequences for pregnant women and their infants.