The Evolution of Data Science Training in Biomedical Informatics

Guest post by Dr. George Hripcsak, MD, MS, Vivian Beaumont Allen Professor and Chair of Columbia University’s Department of Biomedical Informatics and Director of Medical Informatics Services for New York-Presbyterian Hospital/Columbia Campus.

Biomedical informatics is an exciting field that addresses information in biomedicine. At over half a century, it is older than many realize. Looking back, I am struck that in one sense, its areas of interest have remained stable. As a trainee in the 1980s, I published on artificial neural networks, clinical information systems, and clinical information standards. In 2018, I published on deep learning (neural networks), electronic health records (clinical information systems), and terminology standards. I believe this stability reflects the maturity of the field and the difficult problems we have taken on.

On the other hand, we have made enormous progress. In the 1980s we dreamed of adopting electronic health records and the widespread use of decision support fueled by computational techniques. Nowadays we celebrate and bemoan the widespread adoption of electronic health records, although we still look forward to more widespread decision support.

Data science has filled the media lately, and it has been part of biomedical informatics throughout its life. Progress here has been especially notable.

Take the Observational Health Data Sciences and Informatics (OHDSI) project as an example: a billion patient records from about 400 million unique patients, with 200 researchers from 25 countries. This scale would not have been possible in the 1980s. A combination of improved health record adoption, improved clinical data standards, more computing power and data storage, advanced data science methods (regularized regression, Bayesian approaches), and advanced communications have made it possible. For example, you can now look up any side effect on any drug on the world market, review a 17,000-hypotheses study (publication forthcoming) comparing the side effects caused by different treatments for depression, and study how three chronic diseases are actually treated around the world.

How we teach data science in biomedical informatics has also evolved. Take as an example Columbia University’s Department of Biomedical Informatics training program, which has been funded by the National Library of Medicine for about three decades. It initially focused on clinical information systems under its founding chair, Paul Clayton, and while researchers individually worked on what today would be called data science, the curriculum focused heavily on techniques related to clinical information systems. For the first decade, our data science methods were largely pulled in from computer science and statistics courses, with the department focusing on the application of those techniques. During that time, I filled a gap in my own data science knowledge by obtaining a master’s degree in biostatistics.

In the second decade, as presented well by Ted Shortliffe and Stephen Johnson in the 2002 IMIA Yearbook of Medical Informatics, the department shifted to take on a greater responsibility for teaching its own methods, including data science. Our core courses focused on data representation, information systems, formal models, information presentation, decision making, evaluation, and specialization in application tracks. The Methods in Medical Informatics course focused mainly on how to represent knowledge (using Sowa’s 1999 Knowledge Representation textbook), but it also included numeric data science components like Bayesian inference, Markov models, and machine learning algorithms, with the choice between symbolic and statistical approaches to solving problems as a recurring theme. We also relied on computer science and statistics faculty to teach data management, software engineering, and basic statistics.

In the most recent decade, the department expanded its internal focus on data science and made it more explicit, with the content from the original methods course split among three courses: computational methods, symbolic methods, and research methods. The computational methods course covered the numerical methods commonly associated with data science, and the symbolic methods course included the representational structures that support the data.

This expansion into data science continued four years ago when Noemie Elhadad created a data science track  (with supplemental funding from the National Library of Medicine) that encouraged interested students to dive more deeply into data science through additional departmental and external courses. At present, all students get a foundation in data science through the computational methods class and required seminars, and those with additional interest can engage as deeply as any computer science or statistics trainee.

We encourage our students not just to apply data science methods but to develop new methods, including supplying the theoretical foundation for the work. While this may not be for every informatics trainee, we believe that our field must be as rigorous as the methodological fields we pull from. Examples include work on deep hierarchical families by Ranganath, Blei, and colleagues, and remaking survival analysis with Perotte and Elhadad.

To survive, a department must look forward. Our department invested heavily in data science and in electronic health record research in 2007. A decade later, what is on the horizon?

I believe informatics will come full circle, returning at least in part to its physiological modeling origins that predated our department. As we reach the limits of what our noisy and sparse data can provide for deep learning, we will learn to exploit pre-existing biomedical knowledge in different forms of mechanistic models. I believe these hybrid empirical-mechanistic methods can produce patient-specific recommendations and realize the dream of precision medicine. And we have begun to teach our trainees how to do it.

formal headshot of Dr. HripcsakDr. George Hripcsak, MD, MS, is Vivian Beaumont Allen Professor and Chair of Columbia University’s Department of Biomedical Informatics and Director of Medical Informatics Services for New York-Presbyterian Hospital/Columbia Campus. He has more than 25 years of experience in biomedical informatics with a special interest in the clinical information stored in electronic health records and the development of next-generation health record systems. He is an elected member of the Institute of Medicine and an elected fellow of the American College of Medical Informatics and the New York Academy of Medicine. He has published more than 250 papers and previously chaired NLM’s Biomedical Library and Informatics Review Committee.

The Role of Libraries in Precision Health

Eric Dishman delivers the 2018 Leiter Lecture

The NIH All of Us Research Program and NLM’s National Network of Libraries of Medicine have partnered to boost awareness of the All of Us program that seeks to collect health data from one million people nationwide to accelerate research and improve health. The resulting data set will help researchers take into account individual differences in lifestyle, environment, and biology as they chart a path toward precision health.

Eric Dishman, Director of the All of Us Research Program, came to NLM in early May to tell us all about it during the annual Joseph Leiter Lecture. Eric’s remarks, titled “Precision Communication for Precision Health: Challenges & Strategies for Reaching All of Us,” got personal, sparked laughter, and electrified the audience by shaping a crystal clear vision of what precision medicine can do for us.

I was thrilled. Eric is a longtime friend, colleague, and co-conspirator in pushing the world of biomedical informatics and clinical research out into the community. Together, we make quite the team, bringing to the effort different professional and academic perspectives—he an anthropologist previously working in commercial R&D, me a nurse and industrial engineer coming out of academic research. But we share a passion for improving health and we’re both blessed with the gift of gab, which we’ve used to persuade others to recognize that true patient-centered care requires bringing technology to the point of health—which, of course, is often in people’s bedrooms or kitchens.

Ultimately, Eric and I sing from the same song book: technology should support health in everyday living. And now we’re both singing that song at NIH, helping turn discovery into health.

Eric’s journey to NIH actually began—as these things sometimes do—in college, but not in the way you might think.

Eric was diagnosed with kidney cancer as a college sophomore and fought a 23-year battle with the disease before getting a clean bill of health. He survived, in part, he said, by actively pursuing his medical records, keeping up with emerging research, and tapping into the wisdom of other patients.

Until he was preparing for the Leiter Lecture, however, he had not fully recognized how important a role his mother played in that particular part of the fight. In the struggle to be an informed and proactive patient, she taught him essential survival skills, including recognizing and searching trusted sources, and using information as a foundation for self-advocacy.

Eric’s mother, you see, was a librarian, and, according to him, “a master of precision communication.” In thinking about her and the way she shaped his journey through the health care system, Eric identified several things that everyone needs to know—if they don’t already—about librarians and personal health:

  • Information really can lead to ability or dis-ability, life or death.
  • Libraries can provide anywhere from the 2nd to the 152nd opinion in a complex health situation.
  • The impact a library or librarian can have on someone’s quality of life is probably immeasurable.
  • Clinician appraisal is only part of the view. None of us is the average patient, and there can be huge evidence gaps.
  • While resources like PubMed, ClinicalTrials.gov, and MEDLINE are crucial, libraries located in communities provide “safe, neutral spaces” and their amazing curators/navigators help “tame the overwhelm” of too many books, too many articles, and too many URLs.

These benefits highlight why the All of Us Research Program is partnering with the National Network of Libraries of Medicine (NNLM) to boost community awareness of and engagement in the program and to improve health literacy, both within communities and among program participants. These libraries are natural advocates for privacy, transparency, authority, and objectivity—all key elements in biomedical research—and, as trusted and active members of their towns and cities, open to all-comers, they provide an established way to reach those groups historically underrepresented in biomedical research, such as women, minorities, and people with disabilities.

The All of Us initiative is driven by three underlying principles:

  • All health care (and research) is local.
  • Meet people where they are.
  • Deliver double value.

All three point to the essential role for public libraries within the All of Us program. The connection of the first two to libraries is obvious, but the third warrants a little elaboration.

To Eric, “double value” means that the research undertaken and the data collected should address the needs of both the scientists using the data and the participants producing them. That is why, in addition to collecting data that will feed generations of research, the All of Us Program promises to return all personal data collected to the individual participants who provided them.

While clinicians may help participants interpret that data, libraries will play a key role in giving participants a neutral, trusted, life-positive place to better understand and apply their health data in the context of their lives.

This year’s Leiter Lecture truly delivered “double value.” We got to hear from an inspirational leader about an inspiring message grounded in life experience and his own discovery of an important life influence. (Doesn’t that sound like Joe Leiter himself?!)

So, it’s fitting that Eric Dishman delivered the 2018 Joseph Leiter Lecture. These two giants have transformed the health information landscape. Both recognized the power of information and the power of computing and brought them together to make something wonderful!

You might say that’s double value—squared.

More information
2018 Joseph Leiter Lecture by Eric Dishman (video), “Precision Communication for Precision Health: Challenges & Strategies for Reaching All of Us” (National Library of Medicine, Wednesday, May 9, 2018)


The annual Joseph Leiter Lecture, jointly sponsored by the National Library of Medicine and the Medical Library Association, was established to stimulate the intellectual liaison between the two organizations on topics relating to biomedical communications. The lecture’s venue alternates between the MLA annual meeting and the NLM campus. I delivered the Leiter Lecture last year during MLA. (Note: Recording available only to members of MLA.)

Public Libraries: Partners in Health

Are you in Atlanta today? If not, you’re missing MLA ’18, a four-day extravaganza of all things related to medical librarianship. Sponsored by the Medical Library Association (MLA), the conference offers training, policy presentations, and career building strategies, plus endless opportunities to network and to reconnect with colleagues and friends.

This year’s conference also offers something special: a symposium dedicated to health information for public librarians. The symposium is designed to help public librarians develop skills in providing consumer health information to enhance health and well-being and to encourage and expand health literacy throughout their communities.

Running concurrently with the final day and a half of the conference, the symposium began this morning and will run through midday tomorrow, when attendees will get to hear two powerful keynotes from Dr. Dara Richardson-Heron, Chief Engagement Officer of the All of Us Research Program at NIH, and Dr. David Satcher, the 16th Surgeon General of the United States and founding director of the Satcher Health Leadership Institute at Morehouse School of Medicine.

Funded by grants through the National Network of Libraries of Medicine (NNLM), the symposium was organized by MLA in collaboration with the Public Library Association, NNLM’s Greater Midwest Region, and members of the MLA Consumer and Patient Health Information Section.

These kinds of partnerships are invaluable, often making the impossible possible.

NLM’s ongoing partnership with public libraries is one more example. Partnering with public libraries has been an NLM essential for decades. Along with hospital and health sciences libraries, public libraries provide NLM with points of presence around the country. In fact, through the NNLM’s 6,800+ members, NLM reaches into almost every county in the United States.

That reach is powerful. By serving as community centers and information hubs, public libraries provide us with a special pathway to the American public. And with their unique knowledge of their communities, public librarians help us understand how best to serve the people living in those communities.

That is why public libraries play a key role in advancing NIH’s new precision medicine program, All of Us.

This ambitious initiative will recruit one million people—especially those historically underrepresented in biomedical research—into a new kind of scientific discovery effort that engages people in the research process and improves health by taking into account individual differences in biology, environment, and lifestyle. Participants will help researchers make discoveries that may help future generations, and in turn, participants will receive all the data and information collected about themselves, as well as certain study results.

That’s where public libraries step in. Public librarians will work to engage their local communities, raise awareness about and understanding of the program, boost overall health literacy, and help program participants comprehend and interpret their own information.

In turn, NLM will support the librarians, training them to use our resources and providing health information they can share. We will also listen to them to learn about the health concerns of their communities, using those insights to improve our resources and to provide information that is responsive, culturally sensitive, and contextually relevant.

So today, while we activate and celebrate our partnerships with public libraries at the MLA meeting and symposium, we remember and recognize that NLM’s real value comes not from what we do on the NIH campus in Bethesda, but in what information reaches the health lives of people everywhere. Public libraries help make that possible.


By the way, if you can’t be in Atlanta but want to feel like you are, follow the hashtag #mlanet18 on Twitter. And later this month, check the NLM website for recordings of all our theater presentations.

NIH Draft Strategic Plan for Data Science: Suggestions for Optimizing Value

Guest post by Dr. William Hersh, professor and chair of the Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University.

Earlier this year, the National Institutes of Health (NIH) issued a Request for Information (RFI) soliciting input for their draft Strategic Plan for Data Science. As I did for the National Library of Medicine’s (NLM) RFI concerning next-generation data science challenges in health and biomedicine, I shared my comments on the data science plan through both the formal submission mechanism and my blog. (See also my blog comments on the NLM RFI.) I appreciate being asked to update my comments on the draft NIH data science plan in this guest post.

The draft NIH data science plan is a well-motivated and well-written overview of the path NIH should follow to ensure that the value of data science is leveraged to maximize its benefit to biomedical research and human health. The goals of connecting all NIH and other relevant data, modernizing the ecosystem, developing tools and the workforce skills to use it, and making it sustainable are all important and articulated well in the draft plan.

However, collecting and analyzing the data, along with building tools and training the workforce to use the data, are not enough. Three additional aspects not adequately addressed in the draft are critical to achieving the value of data science in biomedical research.

The first of these is the establishment of a research agenda around data science itself. We still do not understand all the best practices and other nuances around the optimal use of data science in biomedical research and human health. Questions remain regarding how best to standardize data for use and re-use. What standards are needed for best use of data? Where are the gaps in our current standards that we can address to improve the use of data in biomedical research, especially data not originally collected for research purposes (such as clinical data from electronic health records and patient data from wearables, sensors, or that is directly entered)?

We must also research more extensively the human factors around data use. How do we organize workflows for optimal input, extraction, and utilization of data? What are the best human-computer interfaces for such work? How do we balance personal privacy and security against the public good of learning from such data? What ethical issues must be addressed?

The second inadequately addressed aspect concerns the workforce for data science. While the draft properly notes the critical need to train specialists in data science, it does not explicitly mention the discipline that has been at the forefront of “data science” before the term came into widespread use, namely, biomedical informatics. NLM has helped train a wide spectrum of those who work in data science, from the specialists who carry out the direct work to the applied professionals who work with researchers, the public, and other implementers. NIH should acknowledge and leverage this workforce that will analyze and apply the results of data science work. The large number of biomedical (and related flavors of) informatics programs should expand their established role in translating data science from research to practice.

The final underspecified aspect concerns the organizational home for data science within NIH. Many traditional NLM grantees, including this author, have been funded under the NIH Big Data to Knowledge (BD2K) program launched several years ago. The newly released NLM Strategic Plan includes a focus on data science and goes beyond some of the limitations of the draft NIH data science plan described above, making the NLM the logical home for data science within NIH.

By addressing these concerns, the NIH data science plan can make an important contribution to realizing the potential for data science in improving human health as well as preventing and treating disease.

headshot of Dr. Hersh William Hersh, MD, FACMI, serves as professor and chair of the Department of Medical Informatics & Clinical Epidemiology, School of Medicine, Oregon Health & Science University. His current work is focused on the workforce needed to implement health information technology, especially in clinical settings, and he is active in clinical and translational research informatics.

Health and the Economy

Insights from the Surgeon General

I’ve written before about the intellectual and personal thrill I get working with the NLM Board of Regents. Their expertise, their unique perspectives, and their passion make our two-day meetings fly by, and their ideas drive the Library forward.

At this week’s Board meeting, some powerful insights came from the new US Surgeon General, Vice Admiral Jerome Adams, MD, MPH.

As the Surgeon General, Dr. Adams is one of nine ex officio members on the Board of Regents, but he also enjoys a special privilege—namely, presenting to the Board at every meeting. That long-standing tradition helps ensure our work aligns with the Surgeon General’s priorities to improve the country’s health.

Over the years, those priorities have focused on such issues as reducing health disparities, preventing skin cancer, and going tobacco-free. But Dr. Adams brings a fresh agenda: the connection between health and the economy.

I can’t do justice to his vision and his passion, so let’s jump into his recent post on the HealthAffairs blog, which lays out his thinking and calls for private and public institutions to come together “to maximize quality, health-nurturing employment opportunities for all US citizens who are able to work.”

Improving Individual and Community Health Through Better Employment Opportunities
by Jerome M. Adams | May 8, 2018

Employment and job creation build prosperity and carry important health benefits, both for individuals and entire communities. There is a large and growing body of literature demonstrating a positive correlation between employment and individual and community health.

Employment can be defined as a contractual relationship between the worker and an employer for financial or other reward that is sustained over a period of time. It can be used as a socially acceptable means of earning a living and may involve a set of technical and social tasks performed within certain physical and social contexts. In the US, employment serves as the main source of income of the country’s residents.

Across multiple studies, higher income was consistently associated with better health, including a reduced overall risk of mortality and reduced rates of such chronic diseases as heart disease, diabetes, and stroke. Mortality rates are lower among those who are employed compared to the unemployed. Re-employment after a period of being out of a job has beneficial effects on physical health, psychological distress, and certain psychiatric conditions. Employment also reduces the risk of depression and psychological distress, improves general mental health, and, over time, predicts a positive trend in perceived health and physical functioning in both women and men. Quality employment can be beneficial to people with physical and mental disabilities who are able to work. One important caveat is that the relationship between employment and health and well-being is moderated by job quality and there is a growing literature that low-security, high-stress, or long-hour/shift jobs may not benefit and could actually harm employees’ health.

[Read the full post on HealthAffairs]