NLM Announces New Annual Lecture on Science, Technology, and Society

Guest post by Maryam Zaringhalam, PhD, National Library of Medicine Data Science and Open Science Officer and Mike Huerta, PhD, director of the Office of Strategic Initiatives and associate director of the National Library of Medicine.

In October 2019, NLM invited award-winning science journalist Angela Saini to discuss her research on how bias and prejudice have crept into science. Her lecture examined how racist and sexist ideas have permeated science over its history — and how science, in turn, has been contorted to justify and perpetuate pseudoscientific myths of innate inferiority. Saini’s work and insights sparked a crucial conversation within NLM about our role and responsibility as the world’s largest biomedical library and a leader in data science research, situated within the nation’s premiere medical research agency, to question how systemic biases affect our work and determine how we can correct them.

As advancing equity and rooting out structural discrimination in science and technology have become an increasingly urgent federal priority, NLM will build on this discussion, in part, by announcing the launch of an annual NLM Science, Technology, and Society Lecture on March 1, 2021.

Situated at the nexus of the NIH-supported research community and the public, NLM plays a vital role not only in advancing cutting-edge research, but also in acting as a steward of biomedical information in service of society. As leaders in facilitating and shaping the future of biomedical data science, we must understand the implications of our work for society as a whole. We must, for instance, question how biases may creep into algorithms that connect research results with the public and think through the ethical ramifications of emerging technologies that might reinforce and amplify those biases. As a national library, we serve as curators of the history of biomedical science, which must reflect both the great achievements made possible by research and the injustices committed within the scientific community. And as an institution with more than 8,000 points of presence through our Network of the National Library of Medicine, we have the means to fulfill our responsibility to meet the needs and understand the concerns of the communities we serve.

With these responsibilities along with NLM’s unique role and capabilities in mind, the NLM Lecture on Science, Technology, and Society Lecture aims to raise awareness around the societal and ethical implications of the conduct of biomedical research and the use of advanced technologies, while seeding conversations across the Library, NIH, and the broader biomedical research community. NLM sees such considerations as fundamental to advancing biomedical discovery and human health for the benefit of all.

Dr. Kate Crawford is the inaugural Visiting Chair of AI and Justice at the École Normale Supérieure, as well as a Senior Principal Researcher at Microsoft Research, and the cofounder of the AI Now Institute at New York University.

Each spring, we plan to invite a leading voice working at the intersection of biomedicine, data science, ethics, and justice to present their research and how it relates to the mission and vision of NLM, as well as NIH more broadly. This year, we are pleased to host Dr. Kate Crawford, a leading scholar of science, technology, and society, with over 20 years of experience studying large scale data systems and artificial intelligence (AI) in the wider contexts of history, politics, labor, and the environment. Her lecture, “Atlas of AI: Mapping the social and economic forces behind AI”, will explore how machine learning systems can reproduce and intensify forms of structural bias and discrimination and offer new paths for thinking through the research ethics and policy implications of the turn to machine learning.

As the interests, priorities, and concerns of our society continue to evolve, particularly in response to emerging technologies and shifting national conversations, we hope this annual lecture, alongside established lecture series such as NLM History Talks, will provide an invaluable perspective on the societal implications of our work and further establish NLM’s leadership as a trusted partner in health.

Dr. Zaringhalam is a member of the Office of Strategic Initiatives and is responsible for monitoring and coordinating data science and open science activities and development across NLM, NIH, and beyond. She completed her PhD in molecular biology at Rockefeller University in 2017 before joining NLM as an AAAS Science and Technology Policy Fellow.

Dr. Huerta leads NLM in identifying, implementing, and assessing strategic directions of NLM, including at the intersection of data science and open science. In his 30 years at NIH, he has led many trans-NIH research initiatives and helped establish neuroinformatics as a field. Dr. Huerta joined NIH’s National Institute of Mental Health in 1991, before moving to NLM in 2011.

A Journey to Spur Innovation and Discovery

Guest post by Valerie Schneider, PhD, staff scientist at the National Library of Medicine’s National Center for Biotechnology Information, National Institutes of Health.

It’s been said that nature is the best teacher. When it comes to understanding human biology and improving health, examples abound of the advances that have been made from the study of a diverse set of non-human organisms. Over the last two centuries, the study of nematode worms has taught us about longevity and mRNAs (the biological molecule that is the basis for several COVID-19 vaccines), common fungi about cell division and cancer, and fruit flies about many things, from the role of chromosomes in heredity to our circadian rhythms. The ability to create targeted alterations in the genomes of model organisms has been transformative for studies to establish the function of specific genes in the etiology of human disease.

The modern era of genomic biology, in which genome sequencing and assembly are accessible to more researchers than ever before, provides data from an even greater range of organisms from which we might learn. Today, we rely not only on primate models, but on a whole host of species: for example, swine to understand organ transplantation, songbirds to understand vocalization and learning, and bats and pangolins to teach us about the evolution of the SARS-CoV-2 virus and how to fight its spread.

These rapidly growing collections of sequence and other data on species across the tree of life offer enormous promise for discoveries that have the potential to improve human health. To better enable such discoveries, with the support of NIH, NLM is planning a major modernization of its resources and their underlying infrastructure.

This modernization will support the needs of users engaged in data search and retrieval, gene annotation, evaluation of sequence quality, and comparative analyses. The new infrastructure, user interfaces, and tools should result in an improved experience for researchers doing a wide range of work, and also facilitate better data submissions.

This revamping aligns with NIH’s Strategic Plan for Data Science, which provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem, as well as NLM’s Strategic Plan, which furthers NLM’s commitment to provide data and information to accelerate biomedical discovery and improve health. NLM and NIH are committed to providing researchers with modern, stable, and cloud-oriented technologies that support research needs.

Over the last few years, NLM has demonstrated this commitment by re-designing several flagship products, including the PubMed database for searching published biomedical literature, the database of information on privately and publicly funded clinical trials, and the Basic Local Alignment Search Tool (BLAST) for finding regions of similarity between biological sequences. As part of NIH’s Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative, NLM also made the data from its massive (36 petabyte) Sequence Read Archive (SRA) available on two commercial cloud platforms, facilitating large-scale computational research that would otherwise be difficult for many researchers. Revamping these resources has positioned them to support both the current and future needs of NLM’s diverse audience of researchers, clinicians, data scientists, educators and others.

Importantly, this current initiative to modernize NLM products, tools, and services, and concurrently develop content, will include extensive engagement with the research community, just as we’ve done with previous re-design efforts. The NLM is committed to offering interfaces accessible to both novices and experts. Additionally, NLM believes a key part of the next generation of its data resources requires an infrastructure that supports an ongoing, dynamic exchange of content, including contributions of metadata and gene functional information from knowledge builders in the community to complement and enhance NIH-provided content.

Community engagement will also ensure that externally sourced content is provided in ways that maintain the high value and trustworthiness of the datasets. Additionally, data connections that make the content of this new resource accessible to external knowledgebases containing other datatypes, such as images, will further promote integrative data analyses that support scientific discovery.

Many opportunities exist to streamline processes, look across resources, and gain insights that will provide new ways of learning. Through NLM’s continued commitment to modernization initiatives, we are ready to again improve the user experience for accessing, analyzing and visualizing sequence data and related information. Nature continues to be our best teacher — and we are now poised to learn from her in an exciting new classroom.

We invite you to come on this journey with us.

Valerie Schneider, PhD, is the deputy director of Sequence Offerings and the head of the Sequence Plus program. In these roles, she coordinates efforts associated with the curation, enhancement, and organization of sequence data, as well as oversees tools and resources that enable the public to access, analyze, and visualize biomedical data. She also manages NCBI’s involvement in the Genome Reference Consortium, the international collaboration tasked with maintaining the value of the human reference genome assembly.

Health Data Standards: A Common Language to Support Research and Health Care

Guest post by Dianne Babski, Associate Director for Library Operations and Robin Taylor, MLIS, National Library of Medicine

Every day we benefit from data standards, and every day most of us don’t even notice it! Did you wear a seatbelt today? Take a precise dose of medicine? Send an email? Plug a laptop into an outlet? These are examples of activities that are made possible through data standards. At NLM, we think a lot about data standards, particularly health data standards.

NLM partners with organizations such as the Office of the National Coordinator for Health Information Technology (ONC) to promote health data standards for data captured in electronic health records (EHRs), clinical research, and other health information systems. With a focus on how health data are collected, stored, described, and retrieved, health data standards make up the backbone of interoperability. This provides the ability to connect and seamlessly share data between computerized systems and allows for the information exchange between other applications and databases.

Let’s look at a current example where health data standards, a common data language, have had a real impact. When SARS-CoV-2, and the disease it causes, COVID-19, emerged in late 2019, researchers around the world began planning studies to figure out how to combat this global pandemic. Research questions, such as, “What date did the patient first display COVID-19 symptoms?” arose continuously. It sounds like a simple question, but there are so many ways to ask the question, and even more possible responses. If researchers apply health data standards in their investigations — if they ask questions and collect responses in a standardized way — the data they collect can be combined and compared with data from other COVID-19 studies and EHRs. This enables reuse of data across multiple sources, which increases statistical power and accelerates our understanding of this disease.  

For more than 20 years, NLM has served as the central coordinating body for clinical terminology standards nationally. Our long-standing efforts to establish common health terminology supported the COVID-19 response by allowing access to near-real time clinical information to guide the diagnosis, treatment, and prevention of this disease.

NLM supports multiple vocabulary standards and mappings, like RxNorm, SNOMED CT, and the UMLS, as well as terminology tools like AccessGUDID, DailyMed, MedlinePlus Connect, MetaMap, the Value Set Authority Center (VSAC), and the NIH CDE Repository, a database that provides access to structured human and machine-readable definitions of common data elements, more commonly referred to as CDEs.

CDEs are one type of health data standard that can help researchers normalize data across studies. CDEs are standardized, precisely defined questions that are paired with a set of specific allowable responses, then used systematically across different sites, studies, or clinical trials to ensure consistent data collection.

CDEs are in use across NIH, to varying degrees. Some NIH institutes and Centers have had mature CDE programs for years; others are just beginning to develop. NLM has been involved with CDEs since 2012 and plays a key role in encouraging CDE adoption across NIH by:

  • Hosting the NIH CDE Task Force (CDETF), a trans-NIH community of practice.
  • Forming a CDE Governance Committee that reports to the CDETF. The committee’s primary charge is to decide whether common data elements submitted to them by NIH recognized bodies (NIH Institutes, offices, etc.) meet criteria that merit their recommendation for use in NIH-funded research.
  • Maintaining the NIH CDE Repository, a central access point to data elements that have been recommended or required by NIH Institutes and Centers for use in research and for other purposes. In 2020, we completed a usability study of the NIH CDE Repository and have been implementing enhancements based on the recommendations.

This year, while continuing to enhance the usability of the NIH CDE Repository, we will also engage with users through a CDE awareness and training campaign.

Ms. Babski is responsible for overall management of one of NLM’s largest divisions with more than 450 staff who provide health information services to a global audience of health care professionals, researchers, administrators, students, historians, patients, and the public.

Robin Taylor, MLIS, joined NLM in 2016. Since 2018, she has been the lead for the NIH Common Data Elements Repository.

DOCLINE: Connecting Medical Libraries for 35 Years

Guest post by Lisa Theisen, Head of NLM’s Collection Access Section and Elisabeth (Lis) Unger, NLM DOCLINE Team Lead

It’s been 35 years since NLM’s interlibrary loan (ILL) request routing system, DOCLINE®, was launched with a goal of enabling medical libraries to get biomedical literature into the hands of people who need it as efficiently and quickly as possible. Today, DOCLINE continues to be used daily by nearly 2,000 hospital, academic, military, public, and other libraries that place approximately one million requests a year, including requests for newly published research not freely available online.

DOCLINE’s foundation and success stems from NLM’s collaboration with the Regional Medical Libraries of the Network of the National Library of Medicine (NNLM) to support resource sharing among the medical library community. Resource sharing through ILL means that participating libraries don’t have to own as many books and journals or collect as broad a range of topics because they can borrow from each other. Full participation is limited to libraries in the NNLM and Canada, but some international libraries use the system to place requests directly with NLM.

DOCLINE service is fast and use of the system is free. This service allows a wide range of libraries, including hospital libraries (which account for 60% of DOCLINE participants), to obtain articles for their patrons that are not in their own collections.

This is where DOCLINE fills a critical gap by connecting a wide network of librarians who are always ready to help each other out, often without charge. Without DOCLINE, access to literature outside of a library’s collection is severely curtailed.

When DOCLINE first launched on mainframe computers in 1985, finding a ‘copy’ of an article or a library with the right issue of a print journal was not as easy as performing a simple search online. If you had a modem and access to an NLM account, you might check SERHOLD, the NLM database of medical libraries’ serial holdings – or journal titles libraries report subscribing to. Then you could mail, or maybe fax, an ILL form to the library and request that they mail your library a photocopy of the article. 

Over the decades, DOCLINE evolved in response to technological advancements and user needs. Features and enhancements have been added to DOCLINE throughout the years to make the system faster and easier to use. DOCLINE has grown to include new ways to send copies of articles, such as emailing PDFs, and adapted to new ways that publishers offer content, including electronic journals and “epub ahead of print” articles found in NLM’s PubMed biomedical literature citation database, and borrowers now see alerts to free, full-text articles found in NLM’s PubMed Central (PMC) digital archive.

Around the turn of the century, DOCLINE 1.0 moved to the world wide web – at the same time email use was becoming more widespread. In 2003, DOCLINE 2.0 was released with a new user-friendly look and feel; in 2006 it was updated to allow a library to indicate “Urgent Patient Care” to expedite service for use in emergencies in the hospital setting. The latest version, DOCLINE 6.0, debuted in November 2018. The three core system components, 1) the user library records, 2) their collective biomedical journal listings, and 3) ILL requests, would still be familiar to a user of the original system, even though the website looks very different today. DOCLINE also includes indicators for supplementary data sets and journal embargoes which didn’t exist in its early days.

What made DOCLINE remarkable in 1985 and remains its most intricate, complex feature, is the efficient way in which requests are automatically matched to appropriate lenders based on their reported journal holdings. This ensures that DOCLINE’s average length of time to fill a request and the percentage of filled requests continues to be high compared to other ILL systems – advancing NLM’s mission of enabling biomedical research and supporting health care and public health. This means that clinicians who rely on medical librarians to obtain the most relevant and latest research articles cited in PubMed, for instance on COVID-19 treatments, can rely on DOCLINE.

Continued updates to DOCLINE underscore the commitment to advance NLM’s strategic goals to reach more people in more ways through enhanced dissemination and engagement, and to engage a wide range of audiences to ensure the “right information gets delivered to them at the right time.” For instance, in April of this year, a ‘Print Resources Available’ filter was added to the system to enable user librarians working remotely from home to connect with libraries that still had access to their physical collection.

In its 35-year history, over 65 million ILL requests have been completed by libraries using DOCLINE. NLM is proud to provide the system and values the work of libraries that generously and unflaggingly share with one another, making DOCLINE a system that has been widely embraced by the user community over the years. We are looking forward to what the next 35 years mean for DOCLINE – teleporting articles anyone?

Are you a part of the DOCLINE community? How has ILL helped you?

Lisa Theisen began serving as Head of the Collection Access Section in the Public Services Division in March 2020. Ms. Theisen has been at NLM for 13 years, supporting DOCLINE and NLM’s Interlibrary Loan (ILL) operation.

Elisabeth Unger, MLIS, joined NLM’s Public Access Division, Collection Access Section, Systems Unit in 2008 to support DOCLINE and NLM ILL after working at the National Agricultural Library. In 2005 she became DOCLINE Team Lead where she was responsible for the latest redesign and relaunch of the esteemed system.

Dr. Isaac Kohane: Making Our Data Work for Us!

Last weekend, Isaac Kohane, MD, PhD, FACMI, Marion V. Nelson Professor of Biomedical Informatics, and Chair of the Department of Biomedical Informatics at Harvard Medical School received the 2020 Morris F. Collen Award of Excellence at the AMIA 2020 Virtual Annual Symposium. This award – the highest honor in informatics – is bestowed to an individual whose personal commitment and dedication to medical informatics has made a lasting impression on the field.

Throughout his career, Dr. Kohane has worked to extract meaning out of large sets of clinical and genomic data to improve health care. His efforts mining medical data have contributed to the identification of harmful side-effects associated with drug therapy, recognition of early warning signs of domestic abuse, and detection of variations and patterns among people with conditions such as autism.

As the lead investigator of the i2b2 (Informatics for Integrating Biology & the Bedside) project, a National Institutes of Health-funded National Center for Biomedical Computing initiative, Dr. Kohane’s work has led to the creation of a comprehensive software and methodological framework to enable clinical researchers to accelerate the translation of genomic and “traditional” clinical findings into novel diagnostics, prognostics, and therapeutics.

Dr. Kohane is a visionary with a motto:  Make Our Data Work for Us! Please join me in congratulating Dr. Kohane, recipient of the 2020 Morris F. Collen Award of Excellence.

Hear more from Dr. Kohane in this video.

Video transcript (below)

The vision that has driven my research agenda is that we were not doing our patients any favors by not embracing information technology to accelerate our ability to both discover new findings in medicine, and to improve the way we deliver the medicine.

What does “make our data work for us” mean? It means that let’s not just use it for the real reason most of it is accumulated at present, which is in order to satisfy administrative or reimbursement processes. Let’s use it to improve health care.

Using just our claims data, we can actually predict – better than genetic tests – recurrence rates for autism. It’s the ability to show, with these same data, that drugs used for preventing immature birth in the genetic form are just as effective as those that are brand name; 40 times as expensive. It’s, as we’ve seen most recently, the ability to pull together data around pandemics within weeks, if and only if, we understand the data that’s spun off our health care systems in the course of care.

And finally, as exemplified by work on FHIR, which was funded by the Office of the National Coordinator and then the National Library Medicine, the ability to flow the data directly to the patient to finally allow patients’ access to their data in a computable format to allow decision support for the patient without going through the long loop of the health care system.

Because the NIH and NLM have invested in working on real-world sized experiments in biomedical informatics, on supporting the education of the individuals who drive those projects, and in supporting the public standards that are necessary for these projects to work and to scale, they’ve established an ecosystem that now is able to deliver true value to decision makers, to clinicians, and now to patients, as we’re seeing with a SMART on FHIR implementation on smartphones.

So, for those of you — the biomedical informaticians of the future who are clinicians — I strongly recommend that you don’t wait for someone else to fix the system. You have the most powerful tools to affect medicine, information processing tools. So, don’t wait to get old. Don’t wait to be recognized. You have the tools. Get in there, help change medicine. We all depend on you!