Using Generative AI to Predict Viral Mutations and Develop Vaccines

Biomedical researchers are using AI to survey the viral landscape.

alt text here

Illustration by Matt Chinworth

In medicine, “There’s a quiet revolution happening at the moment,” says professor of systems biology Debora Marks. Most people have become familiar with artificial intelligence through chatbots such as ChatGPT, which function by predicting the next word in a sequence based on patterns learned from vast amounts of Internet text. But researchers at Harvard Medical School are applying generative AI’s predictive capabilities to biological and evolutionary data, creating models that can predict viral evolution, design never-before-seen proteins, and anticipate the effects of genetic mutations. “The coming together of these new AI methods with the power of evolutionary information and biological data,” Marks says, “is giving us an opportunity to do things that were really closed doors before.”

Researchers in Marks’s lab made a breakthrough in the use of AI to study biological data in 2021, when they developed EVE, short for Evolutionary model of Variant Effect. They trained EVE to detect patterns of genetic variation across the genomes of hundreds of thousands of nonhuman species—then to predict, based on that data, whether similar human genetic mutations would cause disease. This addressed a longstanding challenge in biological research: though scientists have developed increasingly advanced technology for sequencing human genomes, they have struggled to discern the significance of many of the genetic variations they identified. Which are benign, and which are disease-causing? In a 2021 paper, Marks and colleagues found that EVE could make that distinction in genes related to conditions such as cancer and heart rhythm disorders.

During the COVID pandemic, Marks and her lab colleagues realized this technology could also help them respond to the quickly evolving virus. They adapted EVE to create EVEscape, a tool designed to predict viral variants before they emerge. EVEscape consists of two parts: an AI model trained on evolutionary sequences—which reveal how similar viruses have evolved in the past—and biological and structural information about the current virus. Had EVEscape been used at the beginning of the pandemic, lab members reported in a 2023 paper, it would have anticipated the most frequent mutations and the most consequential variants of the COVID virus that actually developed and spread.

This work is a major break from traditional vaccine and therapeutic design, which relies on either costly and slow experiments based on animal testing, or data generated during a disease outbreak in humans. The limitations of the traditional approach became evident during the pandemic, says Noor Youssef, a researcher who works with Marks. “We’ve had to resort to these annual boosters, where every year we’re getting a new vaccine that matches the current strain,” she says. “What these generative models allow us to do is see ahead of time where the virus is going to evolve, so you can make a vaccine that is future-proof”—responsive to both current and potential future variants. Marks and her colleagues have modified EVEscape to create EVEvax, which designs vaccines tailored to predicted mutations, and are using this technology to develop a vaccine for sarbecovirus, the subgenus that includes SARS-Cov-2. The new vaccine would be effective against COVID and other commonly circulating coronaviruses that cause the common cold. 

They have also received funding from Project CEPI (the Coalition for Epidemic Preparedness Innovations) to develop a long-lasting vaccine for bird flu. That disease hasn’t yet spread widely in humans—but when it does, it will likely evolve rapidly to overcome human immunity. The scientists aim to develop a vaccine responsive to those future changes as early as next spring. “There are already FDA-approved vaccines in the freezer, based on the strains from a few years ago,” Youssef says. But with the help of EVEscape, “You can have something in the freezer that’s going to work for the strains that are around now—but also work against things that might arise in the future.”

Generative AI has also enabled researchers to design new proteins, such as antibodies that attack certain viral mutations. Using the AI technology from EVE and EVEscape, the Marks lab developed AI models that are trained on protein sequences. These models generate new sequences tailored to designated goals—and also assess whether those predicted sequences will result in functional proteins. Similarly, when ChatGPT is trained on text data, it not only learns what words are associated with each other, but also the structure of language: how grammar rules constrain the shape of its outputs. Like large language models, AI protein design models are “going to try to understand the biochemical constraints that underpin the function of those proteins,” says Pascal Notin, a machine-learning specialist in the Marks lab.

In addition to creating new virus-specific antibodies, these protein-design models can be used to combat genetic diseases that cause a loss or malfunction of enzymes, proteins that catalyze biochemical reactions and enable the body to break down biological waste. Patients with such conditions are typically treated with enzyme replacement therapy (ERT); the AI tools can help design more stable, effective enzymes for such treatment.

Marks says these models signal a fundamental shift in how research is conducted because, for the first time, “we’ve been able to make predictions without [the preliminary experimentation] process”—predictions that can then be tested and refined by more focused experiments. Researchers have long had access to the data on which such models are trained: the billions of DNA and RNA sequences that make up the genomes of hundreds of thousands of species and viral strains. But this trove of data was simply too large for individuals to fully parse. By detecting patterns and making predictions, generative AI has enabled scientists to unlock that data’s value. “Evolutionary information, human population sequencing, and viral sequencing,” Marks emphasizes, “are much more powerful than anybody thought they would be.”

Read more articles by Nina Pasquini

You might also like

Five Questions with Michèle Duguay

A Harvard scholar of music theory on how streaming services have changed the experience of music

Harvard Faculty Discuss Tenure Denials

New data show a shift in when, in the process, rejections occur

Five Questions with Andrew Knoll

A paleontologist on how to understand Earth’s biggest extinction event

Most popular

Why Men Are Falling Behind in Education, Employment, and Health

Can new approaches to education address a growing gender gap?

The 1884 Cannibalism-at-Sea Case That Still Has Harvard Talking

The Queen v. Dudley and Stephens changed the course of legal history. Here’s why it’s been fodder for countless classroom debates.

Trump Administration Appeals Order Restoring $2.7 Billion in Funding to Harvard

The appeal, which had been expected, came two days before the deadline to file.

Explore More From Current Issue

Black and white photo of a large mushroom cloud rising above the horizon.

Open Book: A New Nuclear Age

Harvard historian Serhii Plokhy’s latest book looks at the rising danger of a new arms race.

Four young people sitting around a table playing a card game, with a chalkboard in the background.

On Weekends, These Harvard Math Professors Teach the Smaller Set

At Cambridge Math Circle, faculty and alumni share puzzles, riddles, and joy.

A football player kicking a ball while another teammate holds it on the field.

A Near-Perfect Football Season Ends in Disappointment

A loss to Villanova derails Harvard in the playoffs.