NLM Celebrates Fair Use

Guest post by NLM Associate Fellow Gabrielle Barr and NLM Copyright Group co-chairs Christie Moffatt and Rebecca Goodwin.

It’s Fair Use Week 2018, an annual event coordinated by the Association for Research Libraries (ARL) to celebrate the opportunities of fair use, including the many ways it supports biomedical research and the work we do at here at NLM.

Fair use is a legal doctrine that asserts the right to use materials under copyright in a limited manner without the copyright holder first granting permission. In practice, fair use is a balance between the rights of copyright holders and the rights of researchers, authors, educators, students, artists, and others, as we work as a society to promote science, education, and the arts.

Section 107 of the US Copyright Act provides the details of fair use, but the University of Virginia Library nicely summed it up in only seven words: “Use fairly. Not too much. Have reasons.”

Infographic: Fair Use Promotes the Creation of New Knowledge
Click image to view full infographic.

Libraries regularly champion fair use because of the way it supports research and education, but also because it enables libraries to fulfill their primary mission of providing and preserving information.

The same holds true here at NLM.

NLM’s fair use policies, based on ARL’s best practices, support access to library resources, encourage teaching and learning, allow preserving at-risk materials and collecting web-based content for future scholarship, and facilitate new modes of computational research and data-mining.

From digitizing content to building institutional repositories to creating physical and digital exhibitions, NLM applies fair use in a variety of ways. We maintain the NLM Digital Collections to provide access to historical books, photographs, videos, manuscripts, and maps. We collect web-based “born digital” content documenting major global health events such as the 2014 Ebola Outbreak. We digitize films for the History of Medicine Division’s (HMD) collection of Medical Movies on the Web, showcase materials in physical and online exhibitions, and promote our collections via blogs such as HMD’s Circulating Now. We incorporate copyrighted content into online courses and tutorials for NLM systems such as MEDLINE®, PubMed®, the Unified Medical Language System®, and the Value Set Authority Center. And we include stubs of proprietary clinical assessment instruments in the NIH Common Data Elements Repository to help researchers standardize clinical data.

Now NLM is considering how fair use can accommodate our evolving needs in the technology-rich and data-driven future.

The Library strongly supports the FAIR Data Principles, which affirm that data and other digital objects representing the products and processes of modern biomedical science are Findable, Accessible, Interoperable, and Reusable (FAIR). And we rely increasingly on algorithms, APIs, computer software, searchable databases, and search engines that enable data mining for intellectual purposes.

While fair use can ensure access to and use of these tools and data, recent federal court decisions indicate the intersection of copyright law with APIs and computer software remain part of the fair use frontier. Each new ruling has the potential to redefine current practice and requirements.

In this time of shifting sand, it’s no surprise that ARL’s forthcoming Code of Best Practices in Fair Use for Software Preservation (expected this fall) involves extensive research and interviews with software preservation experts and other stakeholders. Their ability to articulate the complex issues related to software and fair use could significantly impact libraries’ future work preserving today’s digital record.

In the meantime, NLM is forging ahead, applying fair use to advance medical education, biomedical research and discovery, and data-powered health.

We’d love to hear from other institutions on how you employ fair use and the steps you take to balance the rights of copyright holders with those of researchers, educators, and artists. Comment below or drop a note to the NLM Copyright Group.

casual headshot of Gabrielle Barr

Gabrielle Barr, MSI, is an NLM Associate Fellow. Before coming to NLM, she worked in the special collections of Norfolk Public Library and as a project assistant for the Health Sciences Library at the University of North Carolina at Chapel Hill. She received her master of science in information and a certificate in science, technology, and society from the University of Michigan in 2015.


casual headshot of Christie Moffatt

Christie Moffatt, MLS, serves as co-chair of the NLM Copyright Group, manager of the Digital Manuscripts Program in the History of Medicine Division, and chair of the NLM Web Collecting and Archiving Working Group. She earned her master’s degree in library science at the University of North Carolina at Chapel Hill, with a concentration in archives and manuscripts.


headshot of Rebecca Goodwin

Rebecca Goodwin, JD, serves as co-chair of the NLM Copyright Group and as a data science specialist in the Office of Health Information Programs Development. Previously, she served as special assistant to the director of the Lister Hill National Center for Biomedical Communications. She came to NIH in 2007 as a Presidential Management Fellow after earning her JD from the University of Florida Levin College of Law.

Education, Health, and Basketball

Guest post by David L. Nash, NLM’s Education and Outreach Liaison.

A few weeks ago, in observance of African American History Month, five former Harlem Globetrotters spoke at a program in Silver Spring, Maryland associated with a screening of the documentary “The Game Changers: How the Harlem Globetrotters Battled Racism.”

Following the short documentary and a brief ball-handling demonstration, we sat down to discuss our current careers and how we each got to where we are.

Those participating were:

  • David Naves of Bowie, Maryland, currently an engineer at NASA Goddard Space Flight Center;
  • Bobby Hunter from Harlem, New York, a businessman and fundraiser for charitable events, cancer awareness, and community basketball;
  • Larry Rivers from Atlanta, Georgia, who directs an organization that provides clothing, housing, career opportunities, and other services to temporarily disadvantaged people in the greater Atlanta area;
  • Charles Smith of Baltimore, Maryland, the president of a non-profit that provides a haven for urban youth to learn and enjoy sports; and
  • me, David L. Nash, NLM’s Education and Outreach Liaison.
David Nash slam dunks as a Harlem Globetrotter in the early 1970s
David Nash in action as a Harlem Globetrotter.

As we each shared our journeys from basketball to the boardroom, we focused on messages of health and education, driving home the idea that education is the key that unlocks the door to whatever you want to be.

I spoke about my experiences as a colon cancer survivor, emphasizing the need for early screening and regular doctor’s visits. And I noted the importance of family history as a risk factor for colon cancer.

I also gave out copies of NIH Medline Plus magazine featuring such health topics as cancer, diabetes, and asthma.

The crowd numbered well over 600 people, about double what we expected, with many of the adults bringing along their children and grandchildren. They were receptive and attentive.

Those in attendance appreciated the focus on education and wellness, and I enjoyed working with people of color to improve their understanding of important health information.

casual headshot of David NashDavid L. Nash serves as the Education and Outreach Liaison at the National Library of Medicine. After finishing his collegiate basketball career at the University of Kansas, he was drafted by the Chicago Bulls in the 1969 NBA Draft and played with the Harlem Globetrotters from 1970-72. He has worked at NLM since 1990.

Connecting Computing Research with National Priorities

Guest post by Mark D. Hill from the Computing Community Consortium and the University of Wisconsin-Madison. The content originally appeared in The CCC Blog on January 23, 2018. It is reprinted with permission.

symposium panelists sit at a table
Jim Kurose talks to the audience about CS+X.

For weeks [The CCC Blog has] been recapping the Computing Community Consortium (CCC) Symposium from the perspective of the researchers and industry representatives who presented their work on each panel.

This week, we are getting a different perspective. The goal of the final panel, called Connecting Computing Research with National Priorities and moderated by CCC Vice Chair Mark D. Hill, was to get a perspective from people who have or are currently serving in government.

The panelists included:

  • Will Barkis, from Orange Silicon Valley, shared a Silicon Valley perspective and called for increasing investment in basic research and development to benefit society as well as support innovation in industry. He emphasized that collaboration between academia, the public sector, and the private sector is critical for long-term impact.
  • Patti Brennan, from the National Institutes of Health (NIH), talked about a number of healthcare issues in the country that we need to be aware of and start addressing, such as the accelerated mental health crisis. If we develop computational services and fine-grained access control we might be able to address some of these issues sooner rather than later.
  • Jim Kurose, from the National Science Foundation (NSF), discussed smart and connected communities and how it serves people in their communities. He also highlighted the importance of interdisciplinary work and gave the example of biologists and computer scientists coming together in the field of bioinformatics.
  • Bill Regli, from Defense Advanced Research Projects Agency (DARPA), explained the Heilmeier Catechism. George H. Heilmeier, a former DARPA director, crafted a set of questions to help Agency officials think through and evaluate proposed research programs.
symposium panelists sit at a table
Bill Regli explains the DARPA Heilmeier Catechism.

During the Q&A session, one audience member asked if we should have computational specialists in all science fields since many are becoming more interdisciplinary. Dr. Brennan said that if we put computation in all fields then we run the risk of losing its impact. She does think that some of the training programs are a start, but it takes time for it to run smoothly enough. Dr. Kurose praised a number of CS+X programs around the country. These programs are trying to reach out to a different set of students who are interested in computing but are currently in other disciplines. They understand that if they take computational classes in their discipline only more doors will open.

To read all the recaps from each panel, see below:

Intelligent Infrastructure for our Cities and Communities

AI and Amplifying Human Abilities

Security and Privacy for Democracy

Data, Algorithms, and Fairness Panel

See the videos from all panels here.

headshot of Mark HillMark D. Hill is the Computing Community Consortium (CCC) Vice Chair and the John P. Morgridge Professor and Gene M. Amdahl Professor of Computer Sciences at the University of Wisconsin-Madison. Moves Toward Increased Transparency

Guest post by Kevin M. Fain, JD, MPH, DrPH, Senior Advisor for Policy and Research, is the largest public clinical research registry and results database in the world—and  the most heavily used. As of today, it contains registration information for more than 260,000 studies in 202 different countries and results information on more than 29,000 of those studies. Each week, the content grows by approximately 560 new registrations and 110 new results submissions. The system averages more than 162 million page views per month and 93,000 unique visitors daily. enables users to: (1) search for clinical trials of drugs, biologics, devices, and other interventions; (2) obtain summary information about these studies (e.g., purpose, design, and facility locations); (3) track the progress of a study from initiation to completion; and (4) obtain summary results, often before they are published elsewhere.

In addition, the unique identifier assigned to each registered trial (commonly referred to as the “NCT Number”) has become the de facto standard for referencing trials and is widely and routinely used in medical journal articles, MEDLINE citations, and the mass media.

Federal law underlies the database requirements and content. NIH launched the database in 2000 after the Food and Drug Administration Modernization Act of 1997.  The FDA Amendments Act of 2007 then expanded the database’s scope and purpose by requiring registration and results reporting for certain clinical trials of FDA-regulated drugs, biological products, and medical devices. Importantly, the 2007 law included legal consequences for noncompliance, including civil monetary penalties.

More recently, in an effort to make information about clinical trials more widely available to the public, the US Department of Health and Human Services issued a final rule in September 2016 that specifies requirements for registering certain clinical trials and submitting summary results information to The rule’s final form was shaped by over 900 public comments.

The new rule, which became effective one year ago (January 18, 2017), clarifies and expands the reporting requirements for clinical trials, including trial results for drug, biologic, and device products not approved by FDA. At the same time, NIH issued a policy establishing the expectation that all investigators conducting clinical trials funded in whole or in part by NIH will ensure these trials are registered at and that results information for these trials is submitted to

The expanded reporting requirements are expected to yield important scientific, medical, and public health benefits—from improving the clinical research enterprise itself to maintaining the public’s trust in clinical research. Having access to complete study results, including negative or inconclusive data, can help counteract publication bias, reduce duplication in research, improve the focus and design of future studies, and protect patients from undue risk or ineffective interventions. That additional information, in the context of other research, can also help inform health care providers and patients regarding medical decisions.

As a repository for study results, helps deliver those benefits.

Recent research indicates that the results of many clinical trials—including those funded by NIH—are never published. And even when results are published, they can be limited, focusing on findings of most interest rather than all outcomes. In contrast, studies have found results reported in are more complete than in the published literature. The new reporting requirements are expected to strengthen that characteristic and enhance the benefits brings.

It is important to understand that listing a study on does not mean it has been evaluated by the US Federal Government. The website emphasizes this point for the public through prominent disclaimer statements, including one on the importance of discussing any clinical trial with a health care provider before participating. allows for the registration of any human biomedical study that conforms with prevailing laws and regulations, including an indication that recruiting studies were approved by an ethics review committee. As a result, the database is more comprehensive, which can better serve the public in critical ways. For example, potential participants can see the full range of studies being conducted, not just those funded or sponsored by NIH. Ethics committees, funders, and others can also view the wider scope of studies, which can help them more effectively oversee new research.

Aside from legislative and policy changes, has also focused on enhancing the site’s usability, addressing design and layout issues and improving the ability to search, display, and review information about the studies registered on the site. The latest set of updates, released last month, included new search options (such as by recruitment status and distance from a geographic location), refinements to the display of search results, and additional information regarding study results and key record dates.  These changes, plus those brought about by the final rule, will help maximize the value of clinical trials, and by extension, advance knowledge and improve health.

From finding trials actively recruiting participants to identifying new experimental drugs or device interventions to analyzing study design and results, delivers key benefits to patients, clinicians, and researchers and puts into action NIH’s core mission: turning discovery into health. It also reflects one more way NLM makes medical and health information available for public use and patient health.

headshot of Kevin FainKevin Fain, JD, MPH, DrPH, has served as senior advisor for policy and research at since 2015. He was an attorney with the FDA from 1995-2010, specializing in clinical trial and drug regulatory matters. He earned his doctorate in epidemiology from Johns Hopkins University in 2015.

Exploring the Brave New World of Metagenomics

See last week’s post, “Adventures of a Computational Biologist in the Genome Space,” for Part 1 of Dr. Koonin’s musings on the importance of computational analysis in biomedical discovery.

While the genomic revolution rolls on, a new one has been quietly fomenting over the last decade or so, only to take over the science of microbiology in the last couple of years.

The name of this new game is metagenomics.

Metagenomics is concerned with the complex communities of microbes.

Traditionally, microbes have been studied in isolation, but to do that, a microbe or virus has to be grown in a laboratory. While that might sound easy, only 0.1% of the world’s microbes will grow in artificial media, with the success rate for viruses even lower.

Furthermore, studying microbes in isolation can be somewhat misleading because they commonly thrive in nature as tightly knit communities.

Metagenomics addresses both problems by exhaustively sequencing all the microbial DNA or RNA from a given environment. This powerful, direct approach immensely expands the scope of biological diversity accessible to researchers.

But the impact of metagenomics is not just quantitative. Over and again, metagenomic studies—because they look at microbes in their natural communities and are not restricted by the necessity to grow them in culture—result in discoveries with major biological implications and open up fresh experimental directions.

In virology, metagenomics has already become the primary route to new virus discovery. In fact, in a dramatic break from tradition, such discoveries are now formally recognized by the International Committee on Taxonomy of Viruses. This decision all but officially ushers in a new era, I think.

Here is just one striking example that highlights the growing power of metagenomics.

In 2014, Rob Edwards and colleagues at San Diego State University achieved a remarkable metagenomic feat. By sequencing multiple human gut microbiomes, they managed to assemble the genome of a novel bacteriophage, named crAssphage (for cross-Assembly). They then went on to show that crAssphage is, by a wide margin, the most abundant virus associated with humans.

This discovery was both a sensation and a shock. We had been completely blind to one of the key inhabitants of our own bodies—apparently because the bacterial host of the crAssphage would not grow in culture. Thus, some of the most common microbes in our intestines, and their equally ubiquitous viruses, represent “dark matter” that presently can be studied only by metagenomics.

But the crAssphage genome was dark in more than one way.

Once sequenced, it looked like nothing in the world. For most of its genes, researchers found no homologs in sequence databases, and even those few homologs identified shed little light on the biology of the phage. Furthermore, we had been unable to establish any links to other phages, nor could we tell which proteins formed the crAssphage particle.

Such results understandably frustrate experimenters, but computational biologists see opportunity.

A few days after the crAssphage genome was published, Mart Krupovic of Institut Pasteur visited my lab, where we attempted to decipher the genealogies and functions of the crAssphage proteins using all computational tools available to us at the time. The result was sheer disappointment. We detected some additional homologies but could not shed much light on the phage evolutionary relationships or reproduction strategy.

We moved on. With so many other genomes to analyze, crAssphage dropped from our radar.

Then, in April 2017, Anca Segall, a sabbatical visitor in my lab, invited Rob Edwards to give a seminar at NCBI about crAssphage. After listening to Rob’s stimulating talk—and realizing that the genome of this remarkable virus remains a terra incognita—we could not resist going back to the crAssphage genome armed with some new computational approaches and, more importantly, vastly expanded genomic and metagenomic sequence databases.

This time we got better results.

After about eight weeks of intensive computational analysis by Natalya Yutin, Kira Makarova, and myself, we had fairly complete genomic maps for a vast new family of crAssphage-related bacteriophages. For all these phages, we predicted with good confidence the main structural proteins, along with those involved in genome replication and expression. Our work led to a paper we recently published in the journal Nature Microbiology. We hope and believe our findings provide a roadmap for experimental study of these undoubtedly important viruses.

Apart from the immediate importance of the crAss-like phages, this story delivers a broader lesson. Thanks to the explosive growth of metagenomic databases, the discovery of a new virus or microbe does not stop there. It brings with it an excellent chance to discover a new viral or microbial family. In addition, analyzing the gene sequences can yield interesting and tractable predictions of new biology. However, to take advantage of the metagenomic treasure trove, we must creatively apply the most powerful sequence analysis methods available, and novel ones may be required.

Put another way, if you know where and how to look, you have an excellent chance to see wonders.

As a result, I cannot help being unabashedly optimistic about the future of metagenomics. Fueled by the synergy between increasingly high-quality, low-cost sequencing, improved computational methods, and emerging high-throughput experimental approaches, the prospects appear boundless. There is a realistic chance we will know the true extent of the diversity of life on earth and get unprecedented insights into its ecology and evolution within our lifetimes. This is something to work for.

casual headshot of Dr. KooninEugene Koonin, PhD, has served as a senior investigator at NLM’s National Center for Biotechnology Information since 1996, after working for five years as a visiting scientist. He has focused on the fields of computational biology and evolutionary genomics since 1984.