Guest post by Dr. William Hersh, professor and chair of the Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University.
Earlier this year, the National Institutes of Health (NIH) issued a Request for Information (RFI) soliciting input for their draft Strategic Plan for Data Science. As I did for the National Library of Medicine’s (NLM) RFI concerning next-generation data science challenges in health and biomedicine, I shared my comments on the data science plan through both the formal submission mechanism and my blog. (See also my blog comments on the NLM RFI.) I appreciate being asked to update my comments on the draft NIH data science plan in this guest post.
The draft NIH data science plan is a well-motivated and well-written overview of the path NIH should follow to ensure that the value of data science is leveraged to maximize its benefit to biomedical research and human health. The goals of connecting all NIH and other relevant data, modernizing the ecosystem, developing tools and the workforce skills to use it, and making it sustainable are all important and articulated well in the draft plan.
However, collecting and analyzing the data, along with building tools and training the workforce to use the data, are not enough. Three additional aspects not adequately addressed in the draft are critical to achieving the value of data science in biomedical research.
The first of these is the establishment of a research agenda around data science itself. We still do not understand all the best practices and other nuances around the optimal use of data science in biomedical research and human health. Questions remain regarding how best to standardize data for use and re-use. What standards are needed for best use of data? Where are the gaps in our current standards that we can address to improve the use of data in biomedical research, especially data not originally collected for research purposes (such as clinical data from electronic health records and patient data from wearables, sensors, or that is directly entered)?
We must also research more extensively the human factors around data use. How do we organize workflows for optimal input, extraction, and utilization of data? What are the best human-computer interfaces for such work? How do we balance personal privacy and security against the public good of learning from such data? What ethical issues must be addressed?
The second inadequately addressed aspect concerns the workforce for data science. While the draft properly notes the critical need to train specialists in data science, it does not explicitly mention the discipline that has been at the forefront of “data science” before the term came into widespread use, namely, biomedical informatics. NLM has helped train a wide spectrum of those who work in data science, from the specialists who carry out the direct work to the applied professionals who work with researchers, the public, and other implementers. NIH should acknowledge and leverage this workforce that will analyze and apply the results of data science work. The large number of biomedical (and related flavors of) informatics programs should expand their established role in translating data science from research to practice.
The final underspecified aspect concerns the organizational home for data science within NIH. Many traditional NLM grantees, including this author, have been funded under the NIH Big Data to Knowledge (BD2K) program launched several years ago. The newly released NLM Strategic Plan includes a focus on data science and goes beyond some of the limitations of the draft NIH data science plan described above, making the NLM the logical home for data science within NIH.
By addressing these concerns, the NIH data science plan can make an important contribution to realizing the potential for data science in improving human health as well as preventing and treating disease.
William Hersh, MD, FACMI, serves as professor and chair of the Department of Medical Informatics & Clinical Epidemiology, School of Medicine, Oregon Health & Science University. His current work is focused on the workforce needed to implement health information technology, especially in clinical settings, and he is active in clinical and translational research informatics.