How much does it cost to keep data?

Study to forecast long-term costs

Guest post by Elizabeth Kittrie, NLM’s Senior Planning and Evaluation Officer.

As scientific research becomes more data-intensive, scientists and their institutions are increasingly faced with complex questions about which data to retain, for how long, and at what cost.

The decision to preserve and archive research data should not be posed as a yes or no question. Instead, we should ask, “For how many years should this subset of data be preserved or archived?” (By the way, “forever” is not an acceptable response.)

Answering questions about research data preservation and archiving is neither straightforward nor uniform. Certain types of research data may derive value from their unique qualities or because of the costs associated with the original data collection. Other types of research data are relatively easy to collect at low cost; yet once collected, they are rarely re-used.

To create a sustainable data ecosystem, as outlined in both the NLM Strategic Plan and the NIH Strategic Plan for Data Science, we need strategies to address fundamental questions like:

  • What is the future value of research data?
  • For how long must a dataset be preserved before it should be reviewed for long-term archiving?
  • What are the resources necessary to support persistent data storage?

We believe that economic approaches—including forecasting long-term costs, balancing economic considerations with non-monetary factors, and determining the return on public investment from data availability—can help us make preservation and archiving decisions.

Economic approaches…can help us make preservation and archiving decisions.

To that end, NLM has contracted with the National Academies of Sciences, Engineering, and Medicine (NASEM) for a study on forecasting the long-term costs for preserving, archiving, and promoting access to biomedical data. For this study, NASEM will appoint an ad hoc committee that will develop and demonstrate a framework for forecasting these costs and estimating potential benefits to research. In so doing, the committee will examine and evaluate the following:

  • Economic factors to be considered when examining the life-cycle cost for data sets (e.g., data acquisition, preservation, and dissemination);
  • Cost consequences for various practices in accessioning and de-accessioning data sets;
  • Economic factors to be considered in designating data sets as high value;
  • Assumptions built in to the data collection and/or modeling processes;
  • Anticipated technological disruptors and future developments in data science in a 5- to 10-year horizon; and
  • Critical factors for successful adoption of data forecasting approaches by research and program management staff.

The committee will provide a consensus report and two case studies illustrating the framework’s application to different biomedical contexts relevant to NLM’s data resources. Relevant life-cycle costs will be delineated, as will any assumptions underlying the models. To the extent practicable, NASEM will identify strategies to communicate results and gain acceptance of the applicability of these models.

As part of its information gathering, NASEM will host a two-day public workshop in late June 2019 to generate ideas and approaches for the committee to consider.  We will provide further details on the workshop and how you can participate in the coming months.

As a next step in advancing this study, we are supporting NASEM’s efforts to solicit names of committee members, as well as topics for the committee to consider.  If you have suggestions, please contact Michelle Schwalbe, Director of the Board on Mathematical Sciences and Analytics at NASEM.

casual headshot of Elizabeth KittrieElizabeth Kittrie is NLM’s Senior Planning and Evaluation Officer. She previously served as a Senior Advisor to the Associate Director for Data Science at the National Institutes of Health and as Senior Advisor to the Chief Technology Officer of the US Department of Health and Human Services. Prior to joining HHS, she served as the first Associate Director for the Department of Biomedical Informatics at Arizona State University.

Health Disparities: Big Data to the Rescue?

Guest post by Dr. Fred Wood, Outreach and Evaluation Scientist in the Office of Health Information Programs Development.

Socially disadvantaged populations have fewer opportunities to achieve optimal health. They also experience preventable differences when facing disease or injury. These inequities, known collectively as health disparities, significantly impact personal and public health.

Despite decades of research on health disparities, researchers, clinicians, and public health specialists have not seen the changes we were hoping for. Instead many health disparities are proving difficult to reduce or eliminate.

With that in mind the National Institutes of Health (NIH) National Institute on Minority Health and Health Disparities (NIMHD) launched a Science Visioning Process in 2015 with the goal of producing a scientific research plan that would spark major breakthroughs in addressing disparities in health and health care. NIMHD defines health disparities populations as including racial or ethnic minorities, gender or sexual minorities, those with low socioeconomic status, and underserved rural populations.

Through a mix of staff research and trans-NIH work groups—of which the National Library of Medicine is a part—NIMHD is gathering input on the current state of the science on minority health and health disparities.

Prompted in part by the NIH All of Us precision medicine initiative, one key visioning area—methods and measures for studying health disparities—includes big data.

We expect big data to bring significant benefits and changes to health care, but can it also play a part in reducing health disparities?

Last month the journal Ethnicity & Disease published a special issue focused on big data and its applications to health disparities research (Vol. 27, No. 2).

The issue includes a paper co-authored by the current NIMHD director, several NIH researchers (including me), and several academic partners. Titled “Big Data Science: Opportunities and Challenges to Address Minority Health and Health Disparities in the 21st Century,” (PDF | 436 KB) the paper identified three major opportunities for big data to reduce health disparities:

  1. Incorporate social determinants of health disparities information—such as race/ethnicity, socioeconomic status, and genomics—in electronic health records (EHRs) to facilitate research into the underlying causes of health disparities.
  2. Include in public health surveillance systems environmental, economic, health services, and geographic data on targeted populations to help focus public health interventions.
  3. Expand data-driven research to include genetic, exposure, health history, and other information, to better understand the etiology of health disparities and guide effective interventions.

But using big data for health disparities research has its challenges, including ethics and privacy issues, inadequate data, data access, and a skilled, diverse workforce.

The paper offered eight recommendations to counteract those challenges:

  1. Incorporate standardized collection and input of race/ethnicity, socioeconomic status, and other social determinants of health measures in all systems that collect health data.
  2. Enhance public health surveillance by incorporating geographic variables and social determinants of health for geographically defined populations.
  3. Advance simulation modeling and systems science using big data to understand the etiology of health disparities and guide intervention development.
  4. Build trust to avoid historical concerns and current fears of privacy loss and “big brother surveillance” through sustainable long-term community relationships.
  5. Invest in data collection on area-relevant small sample populations to address incompleteness of big data.
  6. Encourage data sharing to benefit under-resourced minority-serving institutions and underrepresented minority researchers in research intensive institutions.
  7. Promote data science in training programs for underrepresented minority scientists.
  8. Assure active efforts are made up front during both the planning and implementing stages of new big data resources to address disparities reduction.

Big data, it seems, is the classic double-edged sword. It offers tremendous opportunities to understand and reduce health disparities, but without deliberate and concerted action to address its inherent challenges and without the active engagement of minority communities in that process, those disparities could widen, keeping the benefits of precision medicine—including improved diagnosis, treatment, and prevention—from millions of those who need them.

How do you think big data will inform health disparities research? And what else might we do to ensure the disparities gap continues to close?

It IS your father’s Big Data–and your mother’s, and your sibling’s, and even yours!

Let’s make it useful to them.

You’ve probably been hearing about big data everywhere—traffic patterns, video streams, genome sequences—and how it is changing lives, accelerating commerce, and even improving health. But most of the time the conversation focuses on what business professionals and scientists might need, want, or do with big data. It’s time to consider how the ordinary person can benefit from this data revolution.

But first, what exactly is big data and why should you (and your father, mother, siblings, and friends) care about it?

The term “big data” can be used to describe data with a range of characteristics. It covers high volume data (like the whole human genome) or data that streams at a high velocity (like the constant flow of image data from space exploring satellites). It also includes high variety data (such as the mix of chemical process, electrical potential, and blood flow observed during brain studies) that may have high levels of variability (like around-the-clock monitoring of traffic flows through busy highways). Ultimately, a key to big data is its high value, whether that’s important to commerce or to the discovery of new cancer drugs. Scientists are learning how to make discoveries through data, and businesses are learning to leverage big data to glean key customer insights.

But big data can be and is of value to the everyday person as well. It already helps us navigate through a new city using map and traffic apps and to find interesting information through search engines, among other things.

Here at the NLM we want find ways to help people use big data to help manage health and health concerns. It may help them know what to do in an emergency, to better understand their family risks for heart disease, or to learn just how much exercise might ward off Alzheimer’s disease.

Toward that end, we are funding a grant award, Data Science Research: Personal Health Libraries for Consumers and Patients (R01) (PAR-17-159).

We’re looking for researchers who want to partner with lay people to discover how to bring the power of big data into their lives. To do that, we need fresh approaches to biomedical informatics and data science, shaped to meet the needs of consumers and patients, whose health literacy, language skills, technical sophistication, education, and cultural traditions affect how they find, understand, and use personal health information. Novel data science approaches are needed to help individuals at every step, from harvesting to storing to using data and information in a personal health library.

If you’re a researcher interested in discovering new biomedical informatics knowledge to help consumers and patients make use of big data, this opportunity is for YOU! If you’re a clinician or a librarian, reach out to your science colleagues to form a partnership. If you’re a patient, find a researcher at your local university and invite yourself into the process of citizen science.

Much of the data behind the big data revolution originates from everyday people. Many of the benefits of the big data revolution could help improve the lives of everyday people.  In other words, it is your father’s, mother’s, siblings’, and friends’ big data—let’s make it useful to them!

We’re Witnessing a Health Data Explosion

How can the world’s largest medical library harness data to improve public health?

Guest post by Dana Casciotti, PhD, Public Health Policy Analyst

You don’t have to be a scientist or health professional to know that information is at the heart of every biomedical advancement and clinical decision. And it’s equally obvious that authoritative health information does not appear out of the blue. Medical knowledge emerges from a process that begins with basic research into how organisms work and ends with carefully tested determinations of what treatments work best for the symptoms, disorders, and diseases humans face.

Along this bench-to-bedside continuum from discovery to practice is the work of the National Library of Medicine. Since its creation, NLM has been committed to making its vast store of information available to the public, including lay individuals, communities, medical and public health professionals, and researchers. Our simple but important mission is to acquire, organize, disseminate, and preserve the biomedical knowledge of the world for the benefit of public health.

Let’s face it, though—there are challenges. Information access may be the first step to improving health outcomes, but we know that having access to information alone is not sufficient. Think of all those who continue to smoke cigarettes despite the Surgeon General’s warning about the dangers plainly stated on the package. Or story after story in the news about the benefits of exercise, passively taken in—and ignored—by the couch potato. Certainly other factors—social, behavioral, economic, and environmental—influence whether and to what extent individuals use health information.

Additionally, although the internet and social media have expanded access to health information and built meaningful communities around medical topics, those tools have also spread a disturbing amount of inaccurate information.

So, we have our work cut out.

From a public health perspective, I am interested in how NLM can foster new approaches to interpreting and using information so individuals can have more productive health care interactions and improved health decision-making. Along that bench-to-bedside continuum, I’m focused on the end, on what happens bedside, in the doctor’s office, or at the kitchen table as patients decide what to do.

Close-up of a smart watch on a man's wristThanks to new apps and wearable devices, people can be more aware than ever about their own health data especially related to behaviors like diet, exercise, and sleep. In addition to personally collected information, there is a vast array of health data generated from various sources—Electronic Health Records, research studies, and insurance claims data, just to name a few, along with the newest kid on the block, the NIH Precision Medicine Initiative. Its All of Us program aims to build a national, large-scale research enterprise with one million or more volunteers to extend precision medicine to all diseases. Imagine the size and promise of that data!

So how can the Library capitalize on this data explosion? Can we facilitate data collection and use at the point-of-care in a way that is manageable—and actionable—for busy clinicians and often overwhelmed patients? How can we use health information technology to create links between health data and consumer health information? Can we guide people effectively to sources of actionable information tailored to them? Can we help individuals take greater ownership of and be more active in health decision-making?

These are the questions my colleagues and I are wrestling with now, and the answers will certainly build upon the promise of recent advancements in information science and data science, along with cultural shifts brought about through the open science and citizen science movements. If we can effectively channel the tsunami of personal data being generated by each of us every day, then maybe we can employ new strategies to engage the couch potato and continued smoker. NLM can continue to be the gold standard for information about health and medicine, and make sure the public can find and use us.

If so, we can positively impact health, both for the individual and for society at large.


Guest blogger Dana Casciotti, PhD, is a Public Health Policy Analyst at the National Library of Medicine. Dr. Casciotti has over 10 years of experience in the public health field working in academic, government, and nonprofit sectors. Her training has focused on behavioral and social factors related to health, especially cancer prevention and control, and health communication. Dr. Casciotti holds an MPH from the University of Pittsburgh and a PhD from the Johns Hopkins Bloomberg School of Public Health. 

%d bloggers like this: