Adventures of a Computational Biologist in the Genome Space

An astronaut floats among microscopic organisms

Guest post by Dr. Eugene Koonin, NLM National Center for Biotechnology Information.

More than 30 years ago, when I started my research in computational biology (yes, it has been a while), it was not at all clear that one could do biological research using computers alone. Indeed, the common perception was that real insights into how organisms function and evolve could only be gained in the lab or in the field.

As so often happens in the history of science, that all changed when a new type of data arrived on the scene. Genetic sequence information, the blueprint for building all organisms, gave computational biology a foothold it has never relinquished.

As early as the 1960s, some prescient researchers—first among them Margaret Dayhoff at Georgetown University—foresaw genetic sequences becoming a key source of biological information, but this was far from  mainstream biology at the time. But through the 1980s, the trickle of sequences grew into a steady stream, and by the mid-1990s, the genomic revolution was upon us.

I still remember as if it were yesterday the excitement that overwhelmed me and my NCBI group in the waning days of 1995, when J. Craig Venter’s team released the first couple of complete bacterial genomes. Suddenly, the sequence analysis methods on which we and others had been working in relative obscurity had a role in trying to understand the genetic core of life. Soon after, my colleague, Arcady Mushegian, and I reconstructed a minimal cellular genome that attracted considerable attention, stimulating experiments that confirmed how accurate our purely computational effort had been.

Now, 22 years after the appearance of those first genomes, GenBank and related databases contain hundreds of thousands of genomic sequences encompassing millions of genes, and the utility and importance of computational biology are no longer a matter of debate. Indeed, biologists cannot possibly study even a sizable fraction of those genes experimentally, so, at the moment, computational analysis provides the only way to infer their biological functions.

Indeed, computational approaches have made possible many crucial biological discoveries. Two examples in which I and my NCBI colleagues have been actively involved are elucidating the architecture of the BRCA1 protein that, when impaired, can lead to breast cancer, and predicting the mode of action of CRISPR systems. Both findings sparked extensive experimentation in numerous laboratories all over the world. And, in the case of CRISPR, those experiments culminated in the development of a new generation of powerful genome-editing tools that have opened up unprecedented experimental opportunities and are likely to have major therapeutic potential.

But science does not stand still. Quite the contrary, it moves at an ever-accelerating pace and is prone to taking unexpected turns. Next week, I’ll explore one recent turn that has set us on a new path of discovery and understanding.

casual headshot of Dr. KooninEugene Koonin, PhD, has served as a senior investigator at NLM’s National Center for Biotechnology Information since 1996, after working for five years as a visiting scientist. He has focused on the fields of computational biology and evolutionary genomics since 1984.