In medicine, “There’s a quiet revolution happening at the moment,” says professor of systems biology Debora Marks. Most people have become familiar with artificial intelligence through chatbots such as ChatGPT, which function by predicting the next word in a sequence based on patterns learned from vast amounts of Internet text. But researchers at Harvard Medical School are applying generative AI’s predictive capabilities to biological and evolutionary data, creating models that can predict viral evolution, design never-before-seen proteins, and anticipate the effects of genetic mutations. “The coming together of these new AI methods with the power of evolutionary information and biological data,” Marks says, “is giving us an opportunity to do things that were really closed doors before.”
Researchers in Marks’s lab made a breakthrough in the use of AI to study biological data in 2021, when they developed EVE, short for Evolutionary model of Variant Effect. They trained EVE to detect patterns of genetic variation across the genomes of hundreds of thousands of nonhuman species—then to predict, based on that data, whether similar human genetic mutations would cause disease. This addressed a longstanding challenge in biological research: though scientists have developed increasingly advanced technology for sequencing human genomes, they have struggled to discern the significance of many of the genetic variations they identified. Which are benign, and which are disease-causing? In a 2021 paper, Marks and colleagues found that EVE could make that distinction in genes related to conditions such as cancer and heart rhythm disorders.
During the COVID pandemic, Marks and her lab colleagues realized this technology could also help them respond to the quickly evolving virus. They adapted EVE to create EVEscape, a tool designed to predict viral variants before they emerge. EVEscape consists of two parts: an AI model trained on evolutionary sequences—which reveal how similar viruses have evolved in the past—and biological and structural information about the current virus. Had EVEscape been used at the beginning of the pandemic, lab members reported in a 2023 paper, it would have anticipated the most frequent mutations and the most consequential variants of the COVID virus that actually developed and spread.
This work is a major break from traditional vaccine and therapeutic design, which relies on either costly and slow experiments based on animal testing, or data generated during a disease outbreak in humans. The limitations of the traditional approach became evident during the pandemic, says Noor Youssef, a researcher who works with Marks. “We’ve had to resort to these annual boosters, where every year we’re getting a new vaccine that matches the current strain,” she says. “What these generative models allow us to do is see ahead of time where the virus is going to evolve, so you can make a vaccine that is future-proof”—responsive to both current and potential future variants. Marks and her colleagues have modified EVEscape to create EVEvax, which designs vaccines tailored to predicted mutations, and are using this technology to develop a vaccine for sarbecovirus, the subgenus that includes SARS-Cov-2. The new vaccine would be effective against COVID and other commonly circulating coronaviruses that cause the common cold.
They have also received funding from Project CEPI (the Coalition for Epidemic Preparedness Innovations) to develop a long-lasting vaccine for bird flu. That disease hasn’t yet spread widely in humans—but when it does, it will likely evolve rapidly to overcome human immunity. The scientists aim to develop a vaccine responsive to those future changes as early as next spring. “There are already FDA-approved vaccines in the freezer, based on the strains from a few years ago,” Youssef says. But with the help of EVEscape, “You can have something in the freezer that’s going to work for the strains that are around now—but also work against things that might arise in the future.”
Generative AI has also enabled researchers to design new proteins, such as antibodies that attack certain viral mutations. Using the AI technology from EVE and EVEscape, the Marks lab developed AI models that are trained on protein sequences. These models generate new sequences tailored to designated goals—and also assess whether those predicted sequences will result in functional proteins. Similarly, when ChatGPT is trained on text data, it not only learns what words are associated with each other, but also the structure of language: how grammar rules constrain the shape of its outputs. Like large language models, AI protein design models are “going to try to understand the biochemical constraints that underpin the function of those proteins,” says Pascal Notin, a machine-learning specialist in the Marks lab.
In addition to creating new virus-specific antibodies, these protein-design models can be used to combat genetic diseases that cause a loss or malfunction of enzymes, proteins that catalyze biochemical reactions and enable the body to break down biological waste. Patients with such conditions are typically treated with enzyme replacement therapy (ERT); the AI tools can help design more stable, effective enzymes for such treatment.
Marks says these models signal a fundamental shift in how research is conducted because, for the first time, “we’ve been able to make predictions without [the preliminary experimentation] process”—predictions that can then be tested and refined by more focused experiments. Researchers have long had access to the data on which such models are trained: the billions of DNA and RNA sequences that make up the genomes of hundreds of thousands of species and viral strains. But this trove of data was simply too large for individuals to fully parse. By detecting patterns and making predictions, generative AI has enabled scientists to unlock that data’s value. “Evolutionary information, human population sequencing, and viral sequencing,” Marks emphasizes, “are much more powerful than anybody thought they would be.”