Building a 21st Century Archival Collection at the National Library of Medicine

Across NLM, strategic planning is underway and many of our colleagues are thinking about what the next century will bring for the Library. Among the many questions in mind is “what is a 21st-century collection?”

In the Images and Archives Section in NLM’s History of Medicine Division we have seen our archival collections, and our profession, grow and evolve over the past few decades as the archival records of individuals, organizations, and other communities in health and medicine are increasingly created and communicated electronically and online.

But there is more to building a 21st century collection than being digital.

Before we dive into what NLM’s archival collections look like in the 21st century, we realize some readers of this blog may be thinking, “NLM has archives?”  Indeed we do.  Our archival collections consist of over 12,000 audio-visual titles, over 150,000 prints and photographs, 18,000 linear feet of archival and manuscript collections, 6.8 terabytes of born digital content, and 5.1 terabytes of web archives that document, among other things, biotechnology; drugs; health policy; public health; and the research of leading biomedical scientists such as Marshall Nirenberg, Joshua Lederberg, and Michael E. DeBakey.

We consider three broad areas relevant to building 21st-century archival collections:

  1.  Acquiring materials that document both the past and current history of medicine and the health sciences;
  2.  Preserving those materials; and
  3.  Providing access to those materials.

These are not new areas of consideration for archivists, but what we acquire and how we preserve it and provide access to it has been rapidly changing, which means our thinking has to change as well.

So what makes a 21st-century collection different from one from just 50 years ago?

Expanded formats

While we continue to collect analog materials, we have expanded the formats and types of records we collect to include born-digital files—everything from email to word processing documents to digital photographs and videos.  NLM, through its web archiving program, also collects online materials such as blogs, government web sites, online news, and others related to topics such as Ebola, Zika, and bioethics.

Long-term access

The addition of born-digital materials brings preservation and access challenges. To understand the scope of these challenges, consider how you would access a file today you had saved on a floppy disk in 1992. You’d need both the right hardware—a computer with a floppy disk reader—and the right software to read the file. Neither is easy to find. To overcome these types of challenges, archivists are collaborating with IT professionals and others on ways to preserve born-digital content so it will be accessible for decades or even centuries from now.

Enhanced utility

Until recently, access to archival collections has primarily meant being able to read or view content, but we envision a 21st century collection that offers new forms of access that allows for running queries across many items and collections. Researchers may be looking for the initial occurrence of something, for patterns in how it was applied, for the response to it or impact of it. By providing tools and systems that allow this kind of analysis, we will not only accelerate discovery and glean insights; we will also deepen the collection’s usefulness.

How can we make all this happen?

Building 21st-century archival collections ideally means working with the creators of content to acquire and preserve materials before they disappearWeb and social media content in particular is in a constant state of change and at high risk for loss. We will also need tools and systems that support collecting and managing this content on a large scale, and policies and processes for making this content available to researchers who not only want to dive into individual documents, but also run queries across collections. Interestingly, these issues and others parallel those faced by data scientists, ranging from provenance to stewardship, intellectual control, privacy, and long-term access for both anticipated and unanticipated research needs.

As we collect and preserve these archival materials, we aim make them broadly accessible to researchers, medical professionals, educators, students, and the general public.

We invite you to learn more about the NLM’s archival collections and explore some of our online resources from the History of Medicine Division, including the following:

Rebecca Warlow works in the Images and Archives Section, History of Medicine Division, Library Operations Christie Moffatt works in the Images and Archives Section, History of Medicine Division, Library Operations 

Guest bloggers Rebecca Warlow and Christie Moffatt work in the Images and Archives Section, History of Medicine Division, Library Operations.

Further readings

Embracing the Future as Stewards of the Past, A View from NLM’s History of Medicine Division, Jeffrey S. Reznick, PhD, Chief of the NLM History of Medicine Division

Responding to a Call to Action: Preserving Blogs and Discussion Forums in Science, Medicine, Mathematics, and Technology, post on the Library of Congress’ The Signal by Christie Moffatt

National Digital Stewardship Alliance 2015 National Digital Agenda

It IS your father’s Big Data–and your mother’s, and your sibling’s, and even yours!

Let’s make it useful to them.

You’ve probably been hearing about big data everywhere—traffic patterns, video streams, genome sequences—and how it is changing lives, accelerating commerce, and even improving health. But most of the time the conversation focuses on what business professionals and scientists might need, want, or do with big data. It’s time to consider how the ordinary person can benefit from this data revolution.

But first, what exactly is big data and why should you (and your father, mother, siblings, and friends) care about it?

The term “big data” can be used to describe data with a range of characteristics. It covers high volume data (like the whole human genome) or data that streams at a high velocity (like the constant flow of image data from space exploring satellites). It also includes high variety data (such as the mix of chemical process, electrical potential, and blood flow observed during brain studies) that may have high levels of variability (like around-the-clock monitoring of traffic flows through busy highways). Ultimately, a key to big data is its high value, whether that’s important to commerce or to the discovery of new cancer drugs. Scientists are learning how to make discoveries through data, and businesses are learning to leverage big data to glean key customer insights.

But big data can be and is of value to the everyday person as well. It already helps us navigate through a new city using map and traffic apps and to find interesting information through search engines, among other things.

Here at the NLM we want find ways to help people use big data to help manage health and health concerns. It may help them know what to do in an emergency, to better understand their family risks for heart disease, or to learn just how much exercise might ward off Alzheimer’s disease.

Toward that end, we are funding a grant award, Data Science Research: Personal Health Libraries for Consumers and Patients (R01) (PAR-17-159).

We’re looking for researchers who want to partner with lay people to discover how to bring the power of big data into their lives. To do that, we need fresh approaches to biomedical informatics and data science, shaped to meet the needs of consumers and patients, whose health literacy, language skills, technical sophistication, education, and cultural traditions affect how they find, understand, and use personal health information. Novel data science approaches are needed to help individuals at every step, from harvesting to storing to using data and information in a personal health library.

If you’re a researcher interested in discovering new biomedical informatics knowledge to help consumers and patients make use of big data, this opportunity is for YOU! If you’re a clinician or a librarian, reach out to your science colleagues to form a partnership. If you’re a patient, find a researcher at your local university and invite yourself into the process of citizen science.

Much of the data behind the big data revolution originates from everyday people. Many of the benefits of the big data revolution could help improve the lives of everyday people.  In other words, it is your father’s, mother’s, siblings’, and friends’ big data—let’s make it useful to them!

A Strategy for Strategic Planning

Focusing on the reach

I’ve noted before that NLM has launched a year-long strategic planning process.

Well, we’re in the thick of things now.

We’re hearing from and reaching out to researchers, clinicians, librarians, teachers, first responders, and myriad other stakeholders (including you!), from all over the world.

Hundreds of people from scientific communities to our own staff submitted comments regarding NLM’s priorities and future directions. More than 500 people who work at NLM attended a town hall meeting. We’ve visited three of the universities that host our biomedical informatics research training programs (and plan to visit all of them over the next 18 months). We’ve launched several functional audits, which include an examination of our level of investment in outreach and how best to use the Medical Subject Headings to tag the scientific literature.

Then last week we hosted the first of four stakeholder panels. Each of these panels comprises 15-20 experts in biomedical research, medical informatics, library science, and consumer health. The first panel explored—and will continue exploring—how NLM can accelerate basic biomedical and translational sciences.

Led by Art Levine, Dean of Medicine at the University of Pittsburgh, and guided by our Board of Regents’ strategic planning co-chairs Dan Masys and Jill Taylor, this group is considering:

  • the evolving role of NLM in support of global biomedical research;
  • how to accommodate the explosive growth of research data in the life sciences;
  • providing access to the research data sets underlying publications, to support reproducibility, meta-analysis, and optimal return on public investment in science;
  • NLM’s relationship to other NIH Institutes and Centers, including the Library’s role in the Precision Medicine Initiative and other large-scale prospective cohort projects;
  • collaborations with extramural organizations to move science forward; and
  • future directions of key data resources.

And that’s just one of the panels!

Obviously, this is a big task, so here’s where our strategy comes in.

Often people get concerned that if something doesn’t appear in the strategic plan, it will be lost, but that is not the case here. NLM’s core mission remains strong and vibrant. We’re being ambitious and asking these panels to focus on big gains, audacious goals, the reach-beyond-the-reach that will make NLM an even better platform for discovery. That way, the precious time spent together can focus more on that reach than on reassurance.

So what’s in our core mission?

Our core mission remains
central to who we are.

Looking back to our authorizing legislation, we are charged “to assist the advancement of medical and related sciences and to aid the dissemination and exchange of scientific and other information important to the progress of medicine and to the public health.”

This means that we’ll continue producing PubMed, PubChem, dbGaP, ClinVar, and all the databases used by over 4 million users a day. We’ll continue sponsoring research and training, and we’ll continue outreach so that everyone, everywhere, can use our resources. We’ll continue supporting the application of terminology and messaging standards to ensure the interoperability of health data. And we’ll continue vigorously advocating for knowledge management across the NIH.

Our core mission remains central to who we are.

Our strategy is moving us beyond our core mission to figure out what only NLM can and should do to accelerate the progress of medicine and public health.

Get ready for great visions!

Think You’re Seeing Double?

The astute among you (or the inveterate blog watchers) caught me blogging on a new outlet last week: DataScience@NIH. You’re not seeing double, and I haven’t abandoned NLM Musings from the Mezzanine.

In January, 2017 I assumed the role of the NIH Interim Associate Director for Data Science (iADDS), as Dr. Phil Bourne stepped down from his position as the inaugural Associate Director for Data Science.

I now have two distinct but related roles: iADDS and the Director of NLM. As the iADDS, I have responsibilities to the whole of NIH to work with my fellow institute and center directors to guide NIH’s future investments in data science. As the NLM Director I must lead the Library in its day-to-day operations and the creation of its new strategic plan.

The two jobs are a natural pairing.

The challenges of big data—its size, variability, and accessibility—align with the strengths of the library. In fact, recognizing how big data’s technical complexity leveraged NLM’s core strengths, the Advisory Committee to the NIH Director recommended making NLM the intellectual hub for data science in June 2015. They wrote, “NLM is now poised to build on its activities in computational-based research, data dissemination, and training to assume the NIH leadership role in data science.”

My taking on the iADDS role is one step toward making that recommendation a reality. Fortunately I am guided by an outstanding NLM leadership team (I’ll introduce you to them later), and terrific colleagues across the NIH, particularly those in the Scientific Data Council.

It’s an exciting time and having a view from above (iADDS) and a view from within (Director, NLM) helps me see both the challenges and the pathways to resolution. I’ll keep these two blogs going but sometimes point back and forth, so that you can see where my thoughts are going.

I welcome your thoughts. How should the NLM help NIH meet the challenges of data science? What is the role of the NLM in ensuring that data are FAIR (Findable, Accessible, Interoperable, and Reusable)? And most importantly, how do we accelerate discovery through data? The NLM needs your questions and your ideas!

Photo credit: Doug Racine, US Fish & Wildlife Service [Flickr]

NLM Associate Fellows Spark Library Alchemy

Their accomplishments push us forward

Guest post by Kathel Dunn, PhD, NLM Associate Fellowship Coordinator

We’ve nearly reached the halfway point in the NLM Associate Fellowship Program year. The Fellows, recent library science graduates in residence at NLM for a year, are in the midst of selecting the projects they will work on this spring. Each year staff propose projects from which the Associate Fellows choose two to lead. The proposed projects encompass the full range of work at NLM.

The opportunity to choose their own project work—to choose their own problem, rather than a problem handed to them—is a rare experience for an early career or even mid-career professional. The Associate Fellows make a choice based on their interests, their skills and abilities, and their assessment of the impact of the project, both for them and for NLM and the profession.

Formal headshots of two women and one man
Former Associate Fellows Erin Foster, Ariel Deardorff, and Kevin Read

Three years ago one of the Associate Fellows, Kevin Read, now the Knowledge Management Librarian at NYU Health Sciences Library, took on the task of building a web portal to NIH data-sharing repositories and data-sharing policies. He followed that with a second project that involved estimating the number and type of datasets generated annually by NIH-funded research, work that led to presentations and a publication in PLoS One. Two years later two of the Associate Fellows worked on data science projects. Ariel Deardorff, now the Assessment and Data Librarian at UCSF, tackled a project on data visualization, while Erin Foster, currently the Data Services Librarian at Indiana University School of Medicine, dove into a project on common data elements. Since then, Erin and Ariel have co-founded a Data Interest Group within the Medical Library Association, bringing together other talented librarians in data science to advance the field and make a difference.

What will this year’s Associate Fellows accomplish? How will they impact the health information environment?

Six months ago I introduced them at various welcome events at NLM; six months from now I will watch them present the results of their work to NLM staff. There is something about the alchemy of smart people—both staff and Associate Fellows—and mentoring, high expectations, and support that yields accomplishments that often move NLM, information science, and library science forward.

To many people that can appear a magical process of transformation or creation, but for those in the know it’s simply what librarians do. As NLM Deputy Director Betsy Humphreys wisely observed when asked about the future of librarians in our ever-shifting information landscape, “Librarians are problem-solvers. And one thing you wouldn’t do is get rid of the problem-solvers.”

It seems a quasi-tradition of this blog to ask a question of its readers. So I ask you now, what are the problems NLM should solve? To what tasks should we apply our unique alchemy of smarts, hard work, and high expectations?

Your answers might give current and future Associate Fellows, librarians, and library leaders the opportunity to work their magic and improve the information ecology in libraries, healthcare, science, and technology.