George Box, a famous statistician, once remarked, “All models are wrong, and some are useful.”
As representations or approximations of real-world phenomena, models, when done well, can be very useful. In fact, they serve as the third leg to the stool that is data-driven discovery, joining the published literature and its underlying data to give investigators the materials necessary to explore important dynamics in health and biomedicine.
By isolating and replicating key aspects within complex phenomena, models help us better understand what’s going on and how the pieces or processes fit together.
Because of the complexity within biomedicine, health care research must employ different kinds of models, depending on what’s being looked at.
Regardless of the type used, however, models take time to build, because the model builder must first understand the elements of the phenomena that must be represented. Only then can she select the appropriate modeling tools and build the model.
Tracking and storing models can help with that.
Not only would tracking models enable re-use—saving valuable time and money—but doing so would enhance the rigor and reproducibility of the research itself by giving scientists the ability to see and test the methodology behind the data.
As we’ve done for the literature, libraries can help document and preserve models and make them discoverable.
The first step in that is identifying and collecting useful models.
Second, we’d have to apply metadata to describe the models. Among the essential elements to include in such descriptions might be model type, purpose, key underlying assumptions, referent scale, and indicators of how and when the model was used.
We’d then need to apply one or more unique identifiers to help with curation. Currently, two different schema provide principled ways to identify models: the Digital Object Identifier (DOI) and the Research Resource Identifier (RRID). The former provides a persistent, unique code to track an item or entity at an overarching level (e.g., an article or book). The latter documents the main resources used to produce the scientific findings in that article or book (e.g., antibodies, model organisms, computational models).
Just as clicking on an author’s name in PubMed can bring up all the articles he or she has written, these interoperable identifiers, once assigned to research models, make it possible to connect the studies employing those models. Effectively, these identifiers can tie together the three components that underpin data-driven discovery—the literature, the supporting data, and the analytical tools—thus enhancing discoverability and streamlining scientific communication.
NLM’s long-standing role in collecting, organizing, and making available the biomedical literature positions us well to take on the task of tracking research models, but is that something we should do?
If so, what might that library of models look like? What else should it include? And how useful would this library of models be to you?