The Research Ecosystem

abstract of a human dissolving into data bytes and chemical symbols

This week I will be part of the research ecosystem panel at the National Academy of Sciences’ Journal Summit. The theme of the summit is “The Evolving Ecosystem of Scientific Publishing.”

The summit encourages audience discussion and debate, so I’m hoping my fellow panelists and I can elicit (or even incite!) lively and fruitful discussions among scientific editors, nonprofit publishers, researchers, funders, academic directors, librarians, and IT specialists. It will be great fun, I am sure.

Great fun and serious business.

As NLM’s Director, it’s my job to think about the Library’s role in fostering scientific communication. But what does “scientific communication” actually entail? Is it the same as the research literature?

Many think so, and, of course, NLM is well-known for providing access to the research literature through PubMed and PubMed Central—resources we build consciously and intentionally to ensure the included publications meet key criteria and the content is reliably available, whether online or in print.

But scientific communication is rapidly expanding beyond journal articles. In our corner of the world, for example, PubMed Central has been accepting small data files (less than 2Gb) along with submitted manuscripts since last October.

But I believe the realm of scientific communication will expand even further.

I see on the horizon an era of scientific communication influenced by the principals of open science, which support sharing not only the answer to a research hypothesis but also the products and processes used to get to that answer.

Two key practices will help usher in this new era:

  1. Communicating early and often. For example, because ClinicalTrials.gov allows investigators to upload a range of research elements, including data collection instruments, analytical plans, and human subjects agreements, interim products of a research process can be available to others long before the final article is in place.
  2. Sharing all components of the research process, not simply summative reports. Last fall I suggested a library of models, properly documented and vetted, to allow researchers to apply existing and trustable models to their data. Data visualizations, source code, and videos might also prove useful. And you might have other ideas. (Please comment below and tell us about them.)

Such sharing of tools, products, and processes will save valuable time and money while also enhancing the rigor and reproducibility of the research itself by opening for examination all the procedural and methodological details. It also promises to speed innovation and knowledge transfer, which, you might say, are two of the key reasons for scientific communication in the first place.

But we still have so much to learn and to discuss. And, of course, NLM can’t shape the future of scientific communication alone.

That’s what makes this week’s Journal Summit so exciting. It’ll bring together many of the stakeholders in the research process to brainstorm strategies for tackling the numerous challenges that stand between us and open science.

But since most of you won’t be able to attend, I invite your comments regarding the research ecosystem and scientific communication below.

Among the questions I’d appreciate your input on are the following:

  • Should the scientific literature remain at the center of the discovery process, with the related research elements accessible from there?
  • How might preprints, which provide early looks into studies’ findings, serve as a model for the early disclosure and discussion of research methods?
  • What roles and imprimaturs could be afforded by a “publisher” of data?
  • What services should NLM institute to help make its collections and data FAIR (Findable, Accessible, Interoperable, and Reusable)?
  • Where should NLM invest its resources to accelerate the discovery, use, and impact of scientific communication?

Let’s spark a debate here that will rival the best the Journal Summit has to offer!

 

6 thoughts on “The Research Ecosystem

  1. As a school librarian and patient advocate I would also say that communication also needs to be given in a way that patients and other nonscientific community (caregivers, students, etc.) can access and understand.

    1. Absolutely! Patients and their families are at the center of health care, so communicating clearly with them is paramount. We also need to help students, along with other non-scientists, better grasp and engage with scientific information by making it as accessible to them as possible. Thanks for the reminder!

  2. For this question: What services should NLM institute to help make its collections and data FAIR (Findable, Accessible, Interoperable, and Reusable)?

    Offer non-XML formats for downloading some data.
    1. PubMed: Some PubMed data could be made available as .CSV files. E.g., CSV file with PMID and year of publication and title.
    EUtils are OK but not reaching all customers (EUtils may be too complex for them).Current ftp site with .gz files is not user friendly. https://www.nlm.nih.gov/databases/download/pubmed_medline.html Annual data (800+ files) only to get all titles is tedious.

    2.
    Same for a single trial data in ClinicalTrials.gov. Current CSV downlaod is limited to 20 columns. Alternative format for this XML format https://clinicaltrials.gov/ct2/show/NCT02312713?resultsxml=true could be offered

    Vojtech Huser

    1. Thank you for chiming in so quickly, Vojtech. I agree that we need to find ways to make our data easier to work with all around, whether that’s different file formats or better interface design. I appreciate your suggestions and specific examples.

  3. Should the scientific literature remain at the center of the discovery process, with the related research elements accessible from there?

    >>Not necessarily, Mellon (who funded the early XML DTD developments for scholarly communications) are funding experiments in the use of software containers; see: https://thetartan.org/2017/9/25/news/digits

    >>What would it mean for NLM to support the submission of a software container as the center of the discovery process? Could it mean 1) develop PubMed into a metadata repository of these research objects with linkouts to them in data repositories chosen in an investigator’s research data plan at time of a grant proposal? Or could it mean that NLM develop one submission process for the objects, and archive 1) the entire container for others to use according to data FAIR principles, 2) pull out from the container other highly relevant data sets or literature types to archive in respective databases for data mining, curation, linking (by standard identifiers) and comparative analysis, or some combination. Librarians and curators would then want to set out the metadata models to be provided at the time of submission of these containers, and help define guidelines and quality criteria for funded data management plans. Clearly investigators need help from librarians on how to put together a data management plan: https://www.nature.com/articles/d41586-018-03071-1 – this seems a role funders and then libraries should play.

    How might preprints, which provide early looks into studies’ findings, serve as a model for the early disclosure and discussion of research methods?

    >>Preprints can now be linked by doi to funding agency, author(s), associated data, and final published article. There was an article that there is very little difference between a preprint and final published article in the field of physics:
    https://t.co/cjXKkhZRhr
    >>If further studies replicate this finding for other disciplines, should the funded preprint be the item required for submission of a “software container” or even literature repository for public access? Any final publication could then just be linked later, similar to how public access manuscripts are linked.

    What roles and imprimaturs could be afforded by a “publisher” of data?

    >>By publisher, do you also mean a “repository” of data? I would see these roles would be responsible for authentication / authority checks on the metadata and any shared identifiers – they should also provide credible tracking of provenance, versioning, permissions, and research object relationships. Lastly, they should set quality criteria for the data models and how they are managed; e.g., that they do in fact meet the FAIR criteria.

    What services should NLM institute to help make its collections and data FAIR (Findable, Accessible, Interoperable, and Reusable)?

    >>Develop data metadata and submission standards, authentication and security services in data submission systems, scripted tools to ensure data integrity, etc of provided metadata and data, user friendly search and display indexes and displays for data to be discoverable, and data mining and curation services to provide even further discoverability.

    Where should NLM invest its resources to accelerate the discovery, use, and impact of scientific communication?

    >>Partner with other funders doing research and development to test new technologies, modelling, and delivery of new data models. Hold stakeholder meetings to discuss these initiatives and how they may be best implemented and delivered to serve a wide set of players in scientific communication, including what impact it would have on players in this network.

  4. Wow, sml, thank you! I appreciate your thorough and thoughtful reply, and your great insights. I have incorporated these and your fellow commenters’ ideas into my remarks for today at the Journal Summit.

Leave a Reply