What Will 2020 Bring?

I don’t have a crystal ball, but as director of NLM, I need to keep an eye to the future.

Last month, I highlighted a few of NLM’s many accomplishments in 2019. Today, I want to devote some time to musing about what might happen at NLM in 2020.

I know that I’ll be in a new office, but I don’t know where just yet! No, I’m not leaving NLM, but as we prepare for major renovations to our Building 38, most of the staff in the building, including me, will move to other office space on campus for about two years. That will be enough time to implement a major redesign of the first floor of our 60-year-old, architecturally dramatic but not really fit-for-purpose workspace to make more efficient use of the space, add modern office layouts and meeting spaces, and modernize our HVAC systems. I’ll keep musing throughout the renovations; I just won’t be sitting on the mezzanine while I do it.

I know that NLM will continue to grow our Intramural Research Program (IRP), which focuses on computational biomedical and health sciences. We hired two new tenure-track investigators this past year and expect to add one or two more in 2020. The IRP brings together two NLM divisions, the National Center for Biotechnology Information, specifically the Computational Biology Branch, and the Lister Hill National Center for Biomedical Communications, which emphasize discovery based on molecular phenomena and clinical information. I also expect to see greater alignment of our training efforts, including an expansion of the public-facing parts of our training.

I know that we’ll continue to make biomedical and health information literature available to the public, scientists, and clinicians. I anticipate a greater emphasis on public access and open science. Our entire PubMed Central (PMC) repository of full-text literature is already freely available to the world, and with the increasing interest in open access to government-supported research findings, I expect that this repository will grow. PMC will grow in new ways, too, such as enhancing the discoverability of data sets in support of published results made available with articles as supplementary material or in open repositories, and supporting greater transparency in scientific communication through the archiving of peer review documents.

I know that we’ll move many NLM resources to the cloud and continue to support efforts to make strides through the National Institutes of Health (NIH) Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative to accelerate discovery by harnessing the power of commercial cloud computing. This will not only offer some logistical savings, it will also increase the discoverability of our resources.

I know that NLM will play a bigger and more vital role in big science as it unfolds at NIH. Our intramural researchers are expanding the application of deep learning technologies to clinical, biological, and image data. In collaboration with the NIH Office of Data Science Strategy, we’ll build and release new tools to help researchers leverage the FHIR standard to make clinical data more accessible for research, and to improve phenotype characterization. These initiatives will accelerate data sharing by advancing standard approaches to research data representation.

I know that NLM will advance its impact on and outreach to professional and lay communities around the country. Our National Network of Libraries of Medicine has exciting plans to expand its training in research data management and to provide local health information education and support to help health care providers working with American Indian and Alaska Native populations address challenges such as mental health and HPV-related cancer.

I know that we’ll continue to improve health by improving access to data and information. Stay tuned to my Musings posts in 2020 to see what we accomplish!

“What 2019 NLM Accomplishment Makes You Most Proud?”

I was asked this question during a recent “brown bag” conversation with NLM staff. While it’s tempting to launch into my list of accomplishments, I turned the question back to those present. I was surprised, proud and intrigued by what they had to say.

First, let me tell you a little bit about our lunchtime brown bag conversations. We have a large staff (almost 1,700 women and men) and use a variety of formal and informal approaches to foster discussion: Town Hall meetings held twice a year; regular email messages to share timely information; our NLM In Focus blog, which provides a look inside NLM; and supervisor-led meetings. I host brown bag conversations about once a month and am usually joined by 2-3 members of the NLM leadership team. Almost always, staff from various parts of the Library attend – mingling together our scientists, librarians, administrators and communications staff. Conversations are lively, and I get to learn a lot about what is on the minds of our staff.

So, it was instructive, and enjoyable, to hear different views about NLM accomplishments. Some people talked about greater engagement with and accountability by NLM leadership, while others focused on specific scientific advances. Still, others noted our many advances with data science, particularly in upskilling our workforce.

I want to point out a few of these accomplishments.

Teresa Przytcka, PhD, senior investigator in NLM’s National Center for Biotechnology Information, shared her team’s accomplishment with the creation of a new algorithm called scPopCorn (acronym for single-cell sub-Populations Comparison) to understand the differences between populations of cells from single-cell experiments. This approach helps researchers identify different cell types and helps to differentiate between sexes, disease status, animal type, and more.

Olivier Bodenreider, MD, PhD, senior scientist and chief of the Cognitive Science Branch of NLM’s Lister Hill National Center for Biomedical Communications (LHNCBC) described how he is leading the re-envisioning of the research and research and development efforts within his center. One LHNCBC staff scientist, Vojtech Huser, MD, PhD, described the success his team has had in generating new publications this year.

Several people talked about the journey to prepare NLM and its staff for data science. Our Data Science @NLM Training Program team set up a year-long process of preparing our workforce for the future. Over 750 people completed a data science skills assessment and developed individual learning plans. Over the summer, NLM staff participated in an intensive 120-hour data science fundamentals course, culminating in a wide variety of projects that were showcased during our Data Science Open House. Over 300 people attend this exciting and energizing showcase of our talents!

Several people talked about accomplishments that made our entire NLM operations work better, such as greater engagement with staff, better use of project management strategies to improve efficiencies, and smooth integration of staff into new work teams.

Taking the writer’s privilege of identifying more accomplishments, I am exceptionally proud of the efforts of staff across the NLM who designed or participated in the Data Science initiatives. I am honored to work with a great leadership team who are making bold and sometimes difficult decisions to prepare the NLM for its future. We made a huge advancement in open science by moving our entire Sequence Read Archive public data to the cloud, completing the first phase of an ongoing effort to better position these data for large-scale computing. This work represents both a technological feat as well as a major contribution to biological discovery.

As I reflect on our discussion about 2019 accomplishments, I learned that every person across the NLM has something that he or she is proud of. I also learned that some of us experience NLM as a tight-knit research team, while others take a more-broad-brush view of activities and events. Most importantly, I learned that there are many things to celebrate in this wonderful institution we call the National Library of Medicine! 

The Holiday Season — What Ever Way Works Best for You

As the Andy Williams song goes, “It’s the holiday season / So whoop-de-do and hickory dock / And don’t forget to hang up your sock.”

This song from my childhood matches my mood and warms my soul. It brings back memories of growing up in a house full of kids, making presents for parents and cards for grandparents, and enjoying the sounds and smells of the holiday season.

In high school, I learned that not every home had a Christmas like mine. My best friend’s grandfather died in the hospital on Christmas morning when we were freshman. For her, holidays became a poignant reminder of loss. And I began to realize that some families had other celebrations, or even multiple celebrations.

I entered high school in 1967, the year after Dr. Maulana Karenga created the festival called Kwanzaa, a pan-African holiday that celebrates family, community, and culture. As time went on, I developed an appreciation of the many ways that different people mark holidays, from the winter solstice celebrations of the Wiccans in central Wisconsin to the celebration of Diwali around the world.

Here at NLM, our resources offer interesting and helpful information related to holiday seasons.

If you enter the word “Christmas” or “solstice” in our PubMed search box, you’ll retrieve over 3,000 citations. One of these is Dr. Jori Bogetz’s article in the Journal of Palliative Medicine reflecting on why she works on Christmas. An article from the British Veterinary Association describes how to choose a holiday meal that supports animal health and welfare. A third, in the Medical Journal of Australia, warns about the risks inherent in Christmas celebrations, and the journal Nature provides an unusual description of a winter solstice celebration. Some investigators sought to uncover evidence of a Christmas spirit through functional magnetic resonance imaging, while others examined the surge in myocardial infarction during certain holiday periods.

Indeed, this time of year can be complicated.

Another of NLM’s resources, MedlinePlus, provides guidance on a range of health topics — everything from managing seasonal affective disorder to encouraging healthy holiday eating to coping with sadness and grief — both for the people affected and for those around them who are wondering how to help.

In many ways, holidays allow us time to pause amid our everyday lives. Ideally, we can use the moment to be more observant and more mindful, of both ourselves and others.

I hope you find the joy and peace that the season holds, and that you extend some of that joy and peace to those around you, throughout the holidays and beyond.

Meet Our Newest Investigator: Xiaofang Jiang, PhD, Seeks a Greater Understanding of the Human Microbiome To Improve Health

In this week’s installment of Musings, I’d like to introduce you to Xiaofang Jiang, PhD, who recently joined NLM’s Intramural Research Program as a tenure-track investigator.

Dr. Jiang’s research focuses on the development of computational methods to advance our understanding of the human microbiome, which plays a very important role in our health. Her lab is using bioinformatic methods to predict what the trillions of microbes living in and on the human body do, how they spread between people, and which kinds of genes the microbiome community shares.

Turning data on the human microbiome into usable insights is a challenge that demands both knowledge of the biological literature and skill in bioinformatics. Dr. Jiang’s lab is developing approaches intended to do just that — bridge the gap between information and action.

We are fortunate to have added another strong and curious investigator to our team. I know Dr. Jiang will play an important role in accelerating data-driven discovery here at NLM!


Video Transcript (below)

I’ve had a long interest in physics and math ever since I was in middle school. But, I was discouraged to choose math or physics as major when I went to college. That’s because my family and friends thought that I would have a hard time finding a good job as a female based on what they saw, at that time, in China.

In the end, I chose Biology as my major, which opened a new door for me. It provides the foundation for my current research and led me to a beautiful world of evolution and life science.

For my Ph.D., I chose computational biology as my major because it is a major that combines my passion in computer science as well as biology.

For a long time, I observed that, for computer scientists, if they wanted to understand biomedical data they needed to have a good understanding of biology. For biologists, if they wanted to speed discovery, they required the help of computer scientists. And my background sort of bridges this gap.

I think we’re at a great stage where we can actually have the ability to turn data into actionable items that can be directly applied to medical decision-making. Data science and the microbiome combined to improve our heath. 

NLM is one of the few places where I can start my research program in data science. There is a critical mass of truly exceptional and top-notch scientists here. And I also find people in NLM are approachable. From the Director to the top scientist, you can just knock on their door and talk with them, and they are always willing to help.

NLM is the place where I can do the research that I love and enjoy, and also make a difference at the same time.

Everyone’s Voice Matters: Making Science Open and Accessible to the Public

Last month, the National Institutes of Health (NIH) released its Draft NIH Policy for Data Management and Sharing and Supplemental Draft Guidance (Draft NIH Policy), making it available for public comment. Comments are due by January 10, 2020. Because everyone’s voice matters, I’m calling on the Musings audience to review the draft and offer your perspectives on this policy now! 

The Draft NIH Policy arises from NIH’s deep commitment to fostering a culture of scientific data stewardship.

Data stewardship is a research responsibility that includes systematically acquiring data, carefully documenting data, securely storing data, and, where possible, making data available for use by other scientists and society as a whole. This last activity, often referred to as “data sharing,” is essential for accelerating the translation of science into knowledge and ensuring that the full value of the data collected becomes the substrate for future discoveries.

NIH’s Long-Standing Commitment to Make Research Results Available

In 2003, NIH released its original data sharing policy, which established the expectation that research data from large NIH-supported awards will be shared to the extent allowed by scientific protocol and human subjects considerations. Since 2008, the NIH Public Access Policy has ensured that the public has free access to the published results of NIH-funded research. NLM’s PubMed Central, a free, full-text archive of peer-reviewed biomedical and life sciences journal literature, serves as the repository for these articles.

In 2014, NIH updated its Genome-Wide Association Studies Policy with an expanded NIH Genomic Data Sharing Policy to ensure the broad and responsible sharing of genomic research data. And in 2016, the NIH published the NIH Policy on the Dissemination of NIH-Funded Clinical Trial Information, which established expectations for registering and submitting the results of all NIH-funded clinical trials on ClinicalTrials.gov. Individual Institutes, Centers, and programs have also established expectations for managing and sharing data resulting from their funded research.

Data Sharing Principles

NIH recognizes that all scientific data need to be managed according to sound principles. The Draft NIH Policy would require researchers to develop explicit data management and sharing plans that describe their approaches for preserving and sharing data. Reasonable, allowable costs for data curation and preservation would be permitted as direct expenses for the project. Proposed guidance about allowable costs of data management and sharing, and the elements of a good data management and sharing plan was released along with the draft policy and can be found on the NIH Data Management and Sharing Activities Related to Public Access and Open Science web page.

While promoting broad sharing of data, the Draft NIH Policy is deliberately designed to be flexible and allow researchers to propose approaches that address legal, ethical, and other practical considerations that may limit data sharing. The policy proposes that data management and sharing plans be submitted “just in time” and evaluated by NIH program staff. Agreed plans will be incorporated into Terms and Conditions of the Award, and NIH staff will monitor compliance with the plans at regular reporting intervals.

Data Sharing Benefits the Scientific Community and the Public

For the scientific community, data sharing enables the validation of scientific results by both the originator of the data and other scientists, increasing transparency and accountability. Data sharing also strengthens collaborations, which allows for richer analyses. Strong data-sharing practices facilitate the reuse of hard-to-generate data, such as those acquired during complex experiments or once-in-a-lifetime events like natural disasters. And, finally, data sharing promotes scientific progress and accelerates future research.

For the public, sound data-sharing practices demonstrate good stewardship of taxpayer funds. Clear, well-written data-sharing and management plans promote transparency and accountability to society. And for research involving human subjects, data sharing honors participants’ efforts by maximizing the contribution of the data acquired through their participation.

Tell Us What You Think!

NIH acknowledges that this draft policy offers new opportunities for advancing science while also creating new expectations and responsibilities for librarians, scientists, trainees and graduate students, and institutional research management offices. And I’ve highlighted some of the benefits of data sharing to the scientific community and the public.

As I emphasized earlier in this post, everyone’s voice matters — so we’d like to hear from all of you about the approach NIH is proposing. You can share your comments on the purpose of the policy, its key definitions, the scope and requirements for the plans, and the effective dates until Friday, January 10, 2020.

Want to Learn More?

NIH is hosting an informational webinar on the Draft NIH Policy for Data Management and Sharing and Supplemental Draft Guidance on Monday, December 16, 2019, from 12:30 p.m. to 2:00 p.m. EST. The purpose of the webinar is to provide information on the draft policy and answer any questions about the public comment process.

Please note that public comments will not be accepted during the webinar; they must be submitted here.

Accessing the Webinar

If you would like to attend the December 16 webinar, please see the instructions below:

  • To view the webinar presentation, click here.
  • To join the webinar by phone:
    • U.S. and Canadian participants can dial 866-844-9416 and enter passcode 4009108.

Please note that while you will be able to view the webinar through Webex, you must use one of the specified phone lines to connect to the audio. You will not be able to dial in to the webinar via your computer.

You may also send questions in advance of the webinar to SciencePolicy@od.nih.gov.

Thanksgiving – What I am Giving Thanks for This Year

This time of year reminds us to reflect with gratitude on our lives and our work. This week, I want to share what I am thankful for as the director of NLM. I could go on forever, as evidenced by this list. While I tried to do a top 10, that wasn’t enough — which is yet another thing to be thankful for!  

I am particularly thankful for

  • the 1,700 women and men working at NLM to advance biomedical science and improve access to trustable health information. There isn’t a day that passes that I’m not touched, moved, and impressed by the efforts of those around me.
  • the countless staff who are advancing NLM’s strategic plan, leading to the creation of our new Office of Engagement and Training, an expanded and more responsive Office of Communications and Public Liaison, and a stronger Office of Computer and Communications Systems. With great change can come great challenges, but the Library’s staff have gone above and beyond to create an even more efficient, effective, and impactful organization.
  • our budget office, who is working across NLM to improve our budget management process and bring together our program staff and acquisitions office to better manage the contracts and services required to make NLM offerings available 24 hours a day, seven days a week.
  • my team in the Office of the Director, who are managing the increased workflow with competence and goodwill. I know I can lean on them, and that makes every day easier.
  • the renovation team who are engaging with designers and architects to oversee our many important and necessary building improvement and reconstruction projects. This is a daunting task, but they are facing it head on, with great mindfulness and vision.
  • the people around the world who communicate with me directly via Twitter (@NLMdirector), letting me know how they like our products and offering suggestions for improvements. Their feedback goes a long way.
  • the Public Services Division staff, who promptly respond to the wide-ranging questions we receive about our resources and services.
  • the innovative Lister Hill National Center for Biomedical Communications investigators who, through their collaborations across NIH, are developing advanced Artificial Intelligence models that interpret complex images, from blood smears to diagnostic samples.
  • our growing Extramural Programs Division in biomedical informatics and data science. The strategic investments being made are positioning NLM as a key contributor to data science developments across NIH.
  • NLM’s National Center for Biotechnology Information staff, who have used NLM’s platform and expertise to help guide NIH as it accelerates access to big data, while devising ways to ensure that data rights management and patient privacy considerations are respected.
  • our Division of Library Operations staff, who guide the selection, acquisition, preservation, and management of more than 11 centuries of health and biomedical literature.
  • our building maintenance staff, who keep our space clean and make it a pleasant place to work.
  • my NIH Institute and Center director colleagues, all 27 of them. When I became director, I was encouraged to manage up, manage down, but — most importantly — treasure and cultivate peer relationships. What sound advice!
  • my leadership team, whose counsel not only helps me set the course, but keeps me from veering off course as we move NLM toward its third century.

Finally, I’m grateful for my friends and family, particularly my sisters and my son, Conor, who provide me with the personal sustenance that gives me the energy and drive to lead this amazing organization!

Best wishes for the Thanksgiving holiday to you and all of yours. And, as I mentioned, your input means a lot. So, let me know what NLM provides that you’re thankful for!

How NIH Is Using Artificial Intelligence To Improve Operations

Artificial intelligence (AI) is everywhere, from the online marketplace to the laboratory! When you read an article or shop online, the experience is probably supported by AI. And scientists are applying AI methods to find indications of disease, to design experiments, and to make discovery processes more efficient.

The National Institutes of Health (NIH) has been using AI to improve science and health, too, but it’s also using AI in other ways.

Earlier this fall, the White House Office of Science and Technology Policy hosted a summit to highlight ways that the Federal Government uses AI to achieve its mission and improve services to the American people. I was proud to represent NIH and provide examples of how AI is being used to make NIH more effective and efficient in its work.

For example, each year NIH faces the challenge of assigning the more than 80,000 grant applications it receives to the proper review group.

Here’s how the process works now:  Applications that address specific funding opportunity announcements are assigned directly by division directors. Then the Integrated Review Groups (clusters of study sections grouped around general scientific areas) assign the applications to the correct division or scientific branch. A triage officer handles assignments without an identified liaison. This process takes several weeks and may involve passing an application through multiple staff reviews.

Staff at NIH’s National Institute of General Medical Sciences (NIGMS) creatively addressed this challenge by developing and deploying natural language processing and machine learning to automate the process for their Institute. This approach uses a machine learning algorithm, trained on historical data, to find a relationship between the text (title, abstract, and specific aims) and the scientific research area of an application. The trained algorithm can then determine the most likely scientific area of a new application and automatically assign it a program officer who is a subject matter expert in that area.

The new process works impressively well, with 92% of applications referred to the correct scientific division and 84% assigned to the correct program officer, matching the accuracy rate routinely achieved by manual referrals. This change has resulted in substantial time savings, reducing the process from two to three weeks to less than one day. The new approach ensures the efficient and consistent referral of grant applications and liberates program officers from the labor-intensive and monotonous manual referral process, allowing them to focus on higher-value work. It even allows for related institutional knowledge to be retained after staff departures. NIGMS is currently working with the NIH electronic Research Administration (eRA) to incorporate the process into the enterprise database for NIH-wide use.

Now for a second example that’s more pertinent to NLM.

Our PubMed repository receives over 1.2 million new citations each year, and over 2.3 million people conduct about 2.5 million searches using PubMed every day. An average query returns hundreds to thousands of results presented in reverse chronological order of the date the record is added. Yet our internal process-monitoring determined that 80% of the people using PubMed do not go beyond the first page of results, a behavior also seen in general web searches. This means that even if a more relevant citation is on page 4 or page 18, the user may never know.

Zhiyong Lu, PhD and his team from NLM’s National Center for Biotechnology Information applied machine learning strategies to improve the way PubMed presents search results. Their goals were to increase the effectiveness of PubMed searches by helping users efficiently find the most relevant and high-quality information and improve usability and the user experience through a focus on the literature search behaviors and needs of users. Their approach is called the Best Match algorithm, and the technical details can be found in a paper by Fiorini N, Canese K, Starchenko G, et al., PLoS Biol. 2018.

The Best Match algorithm works like this:  In preparation for querying, all articles in the PubMed repository are tagged with key information and metadata, including the publication date, and with an indicator of how often the article has been returned and accessed by previous searches, as part of a model-training process called Learning-to-Rank (L2R). Then, when a user enters a query phrase in the search box on the PubMed website, the phrase is mapped using the PubMed syntax, and the search is launched. In a traditional search, the results are selected based on keyword matching and are presented in reverse chronological order. Through Best Match, the top 500 results—returned via a classic term-weighting algorithm—are re-sorted according to dozens of features of the L2R algorithm, including the past usage of an article, publication date, relevance score, and type of article. At the top of the page, the search results are clearly marked as being sorted by “Best Match.”

Image showing preparing and refining preparing, matching, ranking and refining articles in NLM's PubMed
Picture by Donald Bliss of NLM

Articles prepared prior to user searches; 1.Queries changed to PubMed syntax; 2.Initial Matching Hits presented in reverse chronological order; 3.Results are re-sorted according to the L2R algorithm to present the Best Match; and 4.The L2R algorithm is updated based on user top choices

This new approach will become the core of a new implementation of PubMed, due out by the spring of 2020.

In addition to the examples I described above, NIH is exploring other ways to use AI. For example, AI can help determine whether the themes of research projects align with the stated priorities of a specific Institute, and it can provide a powerful tool to accelerate government business practices. Because of AI’s novelty, NIH engages in many steps to validate the results of these new approaches, ensuring that unanticipated problems do not arise.

In future posts, I look forward to sharing more about how NIH is improving operations through innovative analytics.