“What 2019 NLM Accomplishment Makes You Most Proud?”

I was asked this question during a recent “brown bag” conversation with NLM staff. While it’s tempting to launch into my list of accomplishments, I turned the question back to those present. I was surprised, proud and intrigued by what they had to say.

First, let me tell you a little bit about our lunchtime brown bag conversations. We have a large staff (almost 1,700 women and men) and use a variety of formal and informal approaches to foster discussion: Town Hall meetings held twice a year; regular email messages to share timely information; our NLM In Focus blog, which provides a look inside NLM; and supervisor-led meetings. I host brown bag conversations about once a month and am usually joined by 2-3 members of the NLM leadership team. Almost always, staff from various parts of the Library attend – mingling together our scientists, librarians, administrators and communications staff. Conversations are lively, and I get to learn a lot about what is on the minds of our staff.

So, it was instructive, and enjoyable, to hear different views about NLM accomplishments. Some people talked about greater engagement with and accountability by NLM leadership, while others focused on specific scientific advances. Still, others noted our many advances with data science, particularly in upskilling our workforce.

I want to point out a few of these accomplishments.

Teresa Przytcka, PhD, senior investigator in NLM’s National Center for Biotechnology Information, shared her team’s accomplishment with the creation of a new algorithm called scPopCorn (acronym for single-cell sub-Populations Comparison) to understand the differences between populations of cells from single-cell experiments. This approach helps researchers identify different cell types and helps to differentiate between sexes, disease status, animal type, and more.

Olivier Bodenreider, MD, PhD, senior scientist and chief of the Cognitive Science Branch of NLM’s Lister Hill National Center for Biomedical Communications (LHNCBC) described how he is leading the re-envisioning of the research and research and development efforts within his center. One LHNCBC staff scientist, Vojtech Huser, MD, PhD, described the success his team has had in generating new publications this year.

Several people talked about the journey to prepare NLM and its staff for data science. Our Data Science @NLM Training Program team set up a year-long process of preparing our workforce for the future. Over 750 people completed a data science skills assessment and developed individual learning plans. Over the summer, NLM staff participated in an intensive 120-hour data science fundamentals course, culminating in a wide variety of projects that were showcased during our Data Science Open House. Over 300 people attend this exciting and energizing showcase of our talents!

Several people talked about accomplishments that made our entire NLM operations work better, such as greater engagement with staff, better use of project management strategies to improve efficiencies, and smooth integration of staff into new work teams.

Taking the writer’s privilege of identifying more accomplishments, I am exceptionally proud of the efforts of staff across the NLM who designed or participated in the Data Science initiatives. I am honored to work with a great leadership team who are making bold and sometimes difficult decisions to prepare the NLM for its future. We made a huge advancement in open science by moving our entire Sequence Read Archive public data to the cloud, completing the first phase of an ongoing effort to better position these data for large-scale computing. This work represents both a technological feat as well as a major contribution to biological discovery.

As I reflect on our discussion about 2019 accomplishments, I learned that every person across the NLM has something that he or she is proud of. I also learned that some of us experience NLM as a tight-knit research team, while others take a more-broad-brush view of activities and events. Most importantly, I learned that there are many things to celebrate in this wonderful institution we call the National Library of Medicine! 

The Holiday Season — What Ever Way Works Best for You

As the Andy Williams song goes, “It’s the holiday season / So whoop-de-do and hickory dock / And don’t forget to hang up your sock.”

This song from my childhood matches my mood and warms my soul. It brings back memories of growing up in a house full of kids, making presents for parents and cards for grandparents, and enjoying the sounds and smells of the holiday season.

In high school, I learned that not every home had a Christmas like mine. My best friend’s grandfather died in the hospital on Christmas morning when we were freshman. For her, holidays became a poignant reminder of loss. And I began to realize that some families had other celebrations, or even multiple celebrations.

I entered high school in 1967, the year after Dr. Maulana Karenga created the festival called Kwanzaa, a pan-African holiday that celebrates family, community, and culture. As time went on, I developed an appreciation of the many ways that different people mark holidays, from the winter solstice celebrations of the Wiccans in central Wisconsin to the celebration of Diwali around the world.

Here at NLM, our resources offer interesting and helpful information related to holiday seasons.

If you enter the word “Christmas” or “solstice” in our PubMed search box, you’ll retrieve over 3,000 citations. One of these is Dr. Jori Bogetz’s article in the Journal of Palliative Medicine reflecting on why she works on Christmas. An article from the British Veterinary Association describes how to choose a holiday meal that supports animal health and welfare. A third, in the Medical Journal of Australia, warns about the risks inherent in Christmas celebrations, and the journal Nature provides an unusual description of a winter solstice celebration. Some investigators sought to uncover evidence of a Christmas spirit through functional magnetic resonance imaging, while others examined the surge in myocardial infarction during certain holiday periods.

Indeed, this time of year can be complicated.

Another of NLM’s resources, MedlinePlus, provides guidance on a range of health topics — everything from managing seasonal affective disorder to encouraging healthy holiday eating to coping with sadness and grief — both for the people affected and for those around them who are wondering how to help.

In many ways, holidays allow us time to pause amid our everyday lives. Ideally, we can use the moment to be more observant and more mindful, of both ourselves and others.

I hope you find the joy and peace that the season holds, and that you extend some of that joy and peace to those around you, throughout the holidays and beyond.

Meet Our Newest Investigator: Xiaofang Jiang, PhD, Seeks a Greater Understanding of the Human Microbiome To Improve Health

In this week’s installment of Musings, I’d like to introduce you to Xiaofang Jiang, PhD, who recently joined NLM’s Intramural Research Program as a tenure-track investigator.

Dr. Jiang’s research focuses on the development of computational methods to advance our understanding of the human microbiome, which plays a very important role in our health. Her lab is using bioinformatic methods to predict what the trillions of microbes living in and on the human body do, how they spread between people, and which kinds of genes the microbiome community shares.

Turning data on the human microbiome into usable insights is a challenge that demands both knowledge of the biological literature and skill in bioinformatics. Dr. Jiang’s lab is developing approaches intended to do just that — bridge the gap between information and action.

We are fortunate to have added another strong and curious investigator to our team. I know Dr. Jiang will play an important role in accelerating data-driven discovery here at NLM!


Video Transcript (below)

I’ve had a long interest in physics and math ever since I was in middle school. But, I was discouraged to choose math or physics as major when I went to college. That’s because my family and friends thought that I would have a hard time finding a good job as a female based on what they saw, at that time, in China.

In the end, I chose Biology as my major, which opened a new door for me. It provides the foundation for my current research and led me to a beautiful world of evolution and life science.

For my Ph.D., I chose computational biology as my major because it is a major that combines my passion in computer science as well as biology.

For a long time, I observed that, for computer scientists, if they wanted to understand biomedical data they needed to have a good understanding of biology. For biologists, if they wanted to speed discovery, they required the help of computer scientists. And my background sort of bridges this gap.

I think we’re at a great stage where we can actually have the ability to turn data into actionable items that can be directly applied to medical decision-making. Data science and the microbiome combined to improve our heath. 

NLM is one of the few places where I can start my research program in data science. There is a critical mass of truly exceptional and top-notch scientists here. And I also find people in NLM are approachable. From the Director to the top scientist, you can just knock on their door and talk with them, and they are always willing to help.

NLM is the place where I can do the research that I love and enjoy, and also make a difference at the same time.

Everyone’s Voice Matters: Making Science Open and Accessible to the Public

Last month, the National Institutes of Health (NIH) released its Draft NIH Policy for Data Management and Sharing and Supplemental Draft Guidance (Draft NIH Policy), making it available for public comment. Comments are due by January 10, 2020. Because everyone’s voice matters, I’m calling on the Musings audience to review the draft and offer your perspectives on this policy now! 

The Draft NIH Policy arises from NIH’s deep commitment to fostering a culture of scientific data stewardship.

Data stewardship is a research responsibility that includes systematically acquiring data, carefully documenting data, securely storing data, and, where possible, making data available for use by other scientists and society as a whole. This last activity, often referred to as “data sharing,” is essential for accelerating the translation of science into knowledge and ensuring that the full value of the data collected becomes the substrate for future discoveries.

NIH’s Long-Standing Commitment to Make Research Results Available

In 2003, NIH released its original data sharing policy, which established the expectation that research data from large NIH-supported awards will be shared to the extent allowed by scientific protocol and human subjects considerations. Since 2008, the NIH Public Access Policy has ensured that the public has free access to the published results of NIH-funded research. NLM’s PubMed Central, a free, full-text archive of peer-reviewed biomedical and life sciences journal literature, serves as the repository for these articles.

In 2014, NIH updated its Genome-Wide Association Studies Policy with an expanded NIH Genomic Data Sharing Policy to ensure the broad and responsible sharing of genomic research data. And in 2016, the NIH published the NIH Policy on the Dissemination of NIH-Funded Clinical Trial Information, which established expectations for registering and submitting the results of all NIH-funded clinical trials on ClinicalTrials.gov. Individual Institutes, Centers, and programs have also established expectations for managing and sharing data resulting from their funded research.

Data Sharing Principles

NIH recognizes that all scientific data need to be managed according to sound principles. The Draft NIH Policy would require researchers to develop explicit data management and sharing plans that describe their approaches for preserving and sharing data. Reasonable, allowable costs for data curation and preservation would be permitted as direct expenses for the project. Proposed guidance about allowable costs of data management and sharing, and the elements of a good data management and sharing plan was released along with the draft policy and can be found on the NIH Data Management and Sharing Activities Related to Public Access and Open Science web page.

While promoting broad sharing of data, the Draft NIH Policy is deliberately designed to be flexible and allow researchers to propose approaches that address legal, ethical, and other practical considerations that may limit data sharing. The policy proposes that data management and sharing plans be submitted “just in time” and evaluated by NIH program staff. Agreed plans will be incorporated into Terms and Conditions of the Award, and NIH staff will monitor compliance with the plans at regular reporting intervals.

Data Sharing Benefits the Scientific Community and the Public

For the scientific community, data sharing enables the validation of scientific results by both the originator of the data and other scientists, increasing transparency and accountability. Data sharing also strengthens collaborations, which allows for richer analyses. Strong data-sharing practices facilitate the reuse of hard-to-generate data, such as those acquired during complex experiments or once-in-a-lifetime events like natural disasters. And, finally, data sharing promotes scientific progress and accelerates future research.

For the public, sound data-sharing practices demonstrate good stewardship of taxpayer funds. Clear, well-written data-sharing and management plans promote transparency and accountability to society. And for research involving human subjects, data sharing honors participants’ efforts by maximizing the contribution of the data acquired through their participation.

Tell Us What You Think!

NIH acknowledges that this draft policy offers new opportunities for advancing science while also creating new expectations and responsibilities for librarians, scientists, trainees and graduate students, and institutional research management offices. And I’ve highlighted some of the benefits of data sharing to the scientific community and the public.

As I emphasized earlier in this post, everyone’s voice matters — so we’d like to hear from all of you about the approach NIH is proposing. You can share your comments on the purpose of the policy, its key definitions, the scope and requirements for the plans, and the effective dates until Friday, January 10, 2020.

Want to Learn More?

NIH is hosting an informational webinar on the Draft NIH Policy for Data Management and Sharing and Supplemental Draft Guidance on Monday, December 16, 2019, from 12:30 p.m. to 2:00 p.m. EST. The purpose of the webinar is to provide information on the draft policy and answer any questions about the public comment process.

Please note that public comments will not be accepted during the webinar; they must be submitted here.

Accessing the Webinar

If you would like to attend the December 16 webinar, please see the instructions below:

  • To view the webinar presentation, click here.
  • To join the webinar by phone:
    • U.S. and Canadian participants can dial 866-844-9416 and enter passcode 4009108.

Please note that while you will be able to view the webinar through Webex, you must use one of the specified phone lines to connect to the audio. You will not be able to dial in to the webinar via your computer.

You may also send questions in advance of the webinar to SciencePolicy@od.nih.gov.

The Pursuit and Power of Alignment

Guest post by Valerie Schneider, PhD, staff scientist at the National Library of Medicine’s National Center for Biotechnology Information, National Institutes of Health.

As a staff scientist at NLM, I’ve found that our strategic plan has become a valuable framework for organizing our mission and providing direction and focus—especially when we’re talking about data science.

A recent project at NLM’s National Center for Biotechnology Information (NCBI) highlights why it’s important to ensure alignment between projects and strategy.

As host to the world’s largest repository of biological sequence data, NCBI provides access to data that are critical to understanding and advancing human health. While users have been searching NCBI’s sequence databases long before the strategic plan was developed, it might be easy to overlook how an effort like the strategic plan has anything to do with the larger picture. When you look, though, it’s easy to see the relationship.

Providing a Common Search Experience

Connecting the resources of a digital research enterprise and advancing research and development in biomedical informatics and data science are just a few of the important objectives in NLM’s strategic plan. We’ve improved the experience of users searching for several types of common sequence-associated data by providing a more comprehensive interpretation of their queries and a new results interface that provides easy access to NCBI’s best results, regardless of the database in which they search.

Our team tackled this effort through extensive user interviews, iteratively developing solutions, and monitoring the usage of those solutions.

We improved searches for the reference set of genes and genomes in all species across multiple NCBI databases by supporting common language queries and using features like auto-suggest. We enhanced the ability to search and access clinically important datasets, such as human variations housed in ClinVar and dbSNP, NCBI’s variation databases, as well as resources with information about antimicrobial resistance genes and viral pathogens.

We also created displays that aggregate the results from different databases and enable easy downloads of data and access to analysis tools. Our new interactive graphics and web page displays allow for the visualization of sequences and the analysis of homologous gene sets. Knowing that NLM users rely on different technologies to access data, we ensured that the displays work on both traditional computers and mobile devices.

Since the first release of these search enhancements in late 2018, they are now triggered in a quarter of all searches in the scoped databases. We’ve seen a 300% increase in their use, with more than 300,000 users clicking on the content they offer in just the month of October 2019. These products have provided results for over 500,000 searches that previously would have returned no content. Regular monitoring of their use helps us make sure that we continue to facilitate search and deliver high-value data.

NLM’s strategic plan gave us the user-centered framework in which to execute the goals of this project. So much of the work we do at NLM is consistent with the goals and specific objectives of the plan — it provides a structure for evaluating our work and making sure that we continue to be forward-looking.

And the strategic plan helps me, as a staff scientist, to identify new areas for work that will best enable NLM to continue delivering a platform for biomedical discovery and data-powered health.

To stay up to date on NCBI projects and research, follow us on Twitter.

Photo (headshot) of Valerie Schneider, PhD
Valerie Schneider, Ph.D.

Valerie Schneider, PhD, is the deputy director of Sequence Offerings and the interim head of the Sequence Delivery Program. In these roles, she coordinates efforts associated with the curation, enhancement and organization of sequence data, as well as oversees tools and resources that enable the public to access, analyze, and visualize biomedical data. She also manages NCBI’s involvement in the Genome Reference Consortium, the international collaboration tasked with maintaining the value of the human reference genome assembly.

Thanksgiving – What I am Giving Thanks for This Year

This time of year reminds us to reflect with gratitude on our lives and our work. This week, I want to share what I am thankful for as the director of NLM. I could go on forever, as evidenced by this list. While I tried to do a top 10, that wasn’t enough — which is yet another thing to be thankful for!  

I am particularly thankful for

  • the 1,700 women and men working at NLM to advance biomedical science and improve access to trustable health information. There isn’t a day that passes that I’m not touched, moved, and impressed by the efforts of those around me.
  • the countless staff who are advancing NLM’s strategic plan, leading to the creation of our new Office of Engagement and Training, an expanded and more responsive Office of Communications and Public Liaison, and a stronger Office of Computer and Communications Systems. With great change can come great challenges, but the Library’s staff have gone above and beyond to create an even more efficient, effective, and impactful organization.
  • our budget office, who is working across NLM to improve our budget management process and bring together our program staff and acquisitions office to better manage the contracts and services required to make NLM offerings available 24 hours a day, seven days a week.
  • my team in the Office of the Director, who are managing the increased workflow with competence and goodwill. I know I can lean on them, and that makes every day easier.
  • the renovation team who are engaging with designers and architects to oversee our many important and necessary building improvement and reconstruction projects. This is a daunting task, but they are facing it head on, with great mindfulness and vision.
  • the people around the world who communicate with me directly via Twitter (@NLMdirector), letting me know how they like our products and offering suggestions for improvements. Their feedback goes a long way.
  • the Public Services Division staff, who promptly respond to the wide-ranging questions we receive about our resources and services.
  • the innovative Lister Hill National Center for Biomedical Communications investigators who, through their collaborations across NIH, are developing advanced Artificial Intelligence models that interpret complex images, from blood smears to diagnostic samples.
  • our growing Extramural Programs Division in biomedical informatics and data science. The strategic investments being made are positioning NLM as a key contributor to data science developments across NIH.
  • NLM’s National Center for Biotechnology Information staff, who have used NLM’s platform and expertise to help guide NIH as it accelerates access to big data, while devising ways to ensure that data rights management and patient privacy considerations are respected.
  • our Division of Library Operations staff, who guide the selection, acquisition, preservation, and management of more than 11 centuries of health and biomedical literature.
  • our building maintenance staff, who keep our space clean and make it a pleasant place to work.
  • my NIH Institute and Center director colleagues, all 27 of them. When I became director, I was encouraged to manage up, manage down, but — most importantly — treasure and cultivate peer relationships. What sound advice!
  • my leadership team, whose counsel not only helps me set the course, but keeps me from veering off course as we move NLM toward its third century.

Finally, I’m grateful for my friends and family, particularly my sisters and my son, Conor, who provide me with the personal sustenance that gives me the energy and drive to lead this amazing organization!

Best wishes for the Thanksgiving holiday to you and all of yours. And, as I mentioned, your input means a lot. So, let me know what NLM provides that you’re thankful for!

How NIH Is Using Artificial Intelligence To Improve Operations

Artificial intelligence (AI) is everywhere, from the online marketplace to the laboratory! When you read an article or shop online, the experience is probably supported by AI. And scientists are applying AI methods to find indications of disease, to design experiments, and to make discovery processes more efficient.

The National Institutes of Health (NIH) has been using AI to improve science and health, too, but it’s also using AI in other ways.

Earlier this fall, the White House Office of Science and Technology Policy hosted a summit to highlight ways that the Federal Government uses AI to achieve its mission and improve services to the American people. I was proud to represent NIH and provide examples of how AI is being used to make NIH more effective and efficient in its work.

For example, each year NIH faces the challenge of assigning the more than 80,000 grant applications it receives to the proper review group.

Here’s how the process works now:  Applications that address specific funding opportunity announcements are assigned directly by division directors. Then the Integrated Review Groups (clusters of study sections grouped around general scientific areas) assign the applications to the correct division or scientific branch. A triage officer handles assignments without an identified liaison. This process takes several weeks and may involve passing an application through multiple staff reviews.

Staff at NIH’s National Institute of General Medical Sciences (NIGMS) creatively addressed this challenge by developing and deploying natural language processing and machine learning to automate the process for their Institute. This approach uses a machine learning algorithm, trained on historical data, to find a relationship between the text (title, abstract, and specific aims) and the scientific research area of an application. The trained algorithm can then determine the most likely scientific area of a new application and automatically assign it a program officer who is a subject matter expert in that area.

The new process works impressively well, with 92% of applications referred to the correct scientific division and 84% assigned to the correct program officer, matching the accuracy rate routinely achieved by manual referrals. This change has resulted in substantial time savings, reducing the process from two to three weeks to less than one day. The new approach ensures the efficient and consistent referral of grant applications and liberates program officers from the labor-intensive and monotonous manual referral process, allowing them to focus on higher-value work. It even allows for related institutional knowledge to be retained after staff departures. NIGMS is currently working with the NIH electronic Research Administration (eRA) to incorporate the process into the enterprise database for NIH-wide use.

Now for a second example that’s more pertinent to NLM.

Our PubMed repository receives over 1.2 million new citations each year, and over 2.3 million people conduct about 2.5 million searches using PubMed every day. An average query returns hundreds to thousands of results presented in reverse chronological order of the date the record is added. Yet our internal process-monitoring determined that 80% of the people using PubMed do not go beyond the first page of results, a behavior also seen in general web searches. This means that even if a more relevant citation is on page 4 or page 18, the user may never know.

Zhiyong Lu, PhD and his team from NLM’s National Center for Biotechnology Information applied machine learning strategies to improve the way PubMed presents search results. Their goals were to increase the effectiveness of PubMed searches by helping users efficiently find the most relevant and high-quality information and improve usability and the user experience through a focus on the literature search behaviors and needs of users. Their approach is called the Best Match algorithm, and the technical details can be found in a paper by Fiorini N, Canese K, Starchenko G, et al., PLoS Biol. 2018.

The Best Match algorithm works like this:  In preparation for querying, all articles in the PubMed repository are tagged with key information and metadata, including the publication date, and with an indicator of how often the article has been returned and accessed by previous searches, as part of a model-training process called Learning-to-Rank (L2R). Then, when a user enters a query phrase in the search box on the PubMed website, the phrase is mapped using the PubMed syntax, and the search is launched. In a traditional search, the results are selected based on keyword matching and are presented in reverse chronological order. Through Best Match, the top 500 results—returned via a classic term-weighting algorithm—are re-sorted according to dozens of features of the L2R algorithm, including the past usage of an article, publication date, relevance score, and type of article. At the top of the page, the search results are clearly marked as being sorted by “Best Match.”

Image showing preparing and refining preparing, matching, ranking and refining articles in NLM's PubMed
Picture by Donald Bliss of NLM

Articles prepared prior to user searches; 1.Queries changed to PubMed syntax; 2.Initial Matching Hits presented in reverse chronological order; 3.Results are re-sorted according to the L2R algorithm to present the Best Match; and 4.The L2R algorithm is updated based on user top choices

This new approach will become the core of a new implementation of PubMed, due out by the spring of 2020.

In addition to the examples I described above, NIH is exploring other ways to use AI. For example, AI can help determine whether the themes of research projects align with the stated priorities of a specific Institute, and it can provide a powerful tool to accelerate government business practices. Because of AI’s novelty, NIH engages in many steps to validate the results of these new approaches, ensuring that unanticipated problems do not arise.

In future posts, I look forward to sharing more about how NIH is improving operations through innovative analytics.