In the movie The Graduate, Benjamin receives one word, whispered in hushed tones, as guidance to a successful future: “plastics.” Today, the National Library of Medicine, and the NIH as a whole, would whisper “data.”
The future of health and health care rests on data—genomic data, environmental sensor-generated data, electronic health records data, patient-generated data, research collected data.
Why is data worth our attention now? Because data generated in one research project could be analyzed by others and help grow knowledge more quickly.
The data originating from research projects is becoming as important as the answers those research projects are providing. Various kinds of data originate from research, including genomic assays, responses to surveys, and environmental assessments of air quality and temperature. Making sure these data are effectively used in the original study is the responsibility of the investigators. But who will make sure that relevant parts of these very complex and expensive-to-generate data will remain available for use by other investigators? And maybe even more important, who will pay for making those data discoverable, secure, available, and actionable?
We believe the NLM must play a key role in preserving data generated in the course of research, whether conducted by professional scientists or citizen scientists. We know how to purposefully create collections of information and organize them for viewing and use by the public. We can extend this skill set to the curation of research data. We also have the utilities in place to protect the data by making sure only those individuals with permission to access data can actually do so.
We have much to learn along the way, for handling data is not straightforward, and the analytical methods that help us best learn from data await future development, but we have the foundation on which to build, the knowledge to get us going, and the tradition of service-inspired research that enables us to learn as we go.
Over the next few months I will outline NLM’s plan to become what the ACD report recommended—the “epicenter of data science for the NIH.” I look forward to your comments.
Data Is information waiting to happen.