Zip Code vs. Genetic Code

Illustration by Dan Page

Illustration by Dan Page

When considering the risk of a given disease—cancer, cardiovascular problems, Alzheimer’s—what matters more: the genes inherited from parents and grandparents, or the environment? Is disease influenced more by DNA, or by factors such as air pollution levels, socioeconomic status, or even regional weather conditions?

It’s common to think of disease and health “as this tension of ZIP code versus genetic code,” explains Chirag Patel, assistant professor of biomedical informatics at Harvard Medical School.

But a study by Patel and his research team challenges this “either-or” thinking, using Big Data to tease apart the complex interplay of environment, genes, and other factors in disease. They analyzed an insurance database of almost 45 million people in the United States, Patel explains, zeroing in on 700,000 pairs of non-twin siblings and 56,000 pairs of twins, in what is likely the largest study of twin pairs to date. Studying identical twins is a common way to consider nature-versus-nurture questions because such siblings have identical genes and often grow up in the same environment. In typical twin studies, researchers must recruit participants and examine just one or two diseases at a time. But this massive preexisting database enabled Patel and his team to consider 560 different diseases at the same time.

“You have this huge sample size, which we all love in science,” he says, “but these types of data are not meant for this work.” Preparing the database for study was therefore a challenge. Because the data did not specify which siblings were twins, for example, postdoctoral fellow Chirag Lakhani, who led the analyses, isolated the twins by searching for family members born on the same day. The team also had to determine which twins were identical (with identical DNA) and which fraternal. Male-female twin pairs cannot be identical, but same-sex twins have an equal chance of being identical or fraternal. Working with colleagues at the University of Queensland in Australia, the Harvard team developed a statistical technique for estimating which of the same-sex pairs were identical. When they compared their findings with previous small-scale studies on twins and disease, “we found by and large that there was a strong correlation with the things that we were seeing,” Patel says.

Some conditions stood out for the strength of their genetic links... 

Of the 560 diseases studied, 40 percent had some genetic component, while the shared environment (elements such as air quality and average temperatures) played a role in at least 20 percent of the diseases. Unsurprisingly, most diseases involved a mix of genetic and environmental factors. But some conditions stood out for the strength of their genetic links, including pervasive developmental disorders such as attention deficit hyperactivity disorder, and psychiatric diseases such as schizophrenia or depression. In contrast, lead poisoning and eye diseases such as myopia and astigmatism were the most heavily influenced by environment.

The researchers acknowledge some gaps in their work. For example, all people in the study were covered by employer-sponsored health insurance, so at least one person in the family had a job, which made it complicated to sort out the influence of income on disease. “Trying to dig deeper into that question is a priority for us,” Patel says. In the future he hopes to do similar work with Medicare or Medicaid data, “which has coverage for people who would be facing health disparities.” Moreover, none of the subjects were more than than 24 years old, so the study couldn’t capture how the influence of genes and environment might change as people enter middle age and beyond. Nor could the researchers explore how changes in an environment over time might influence health.

The work is important for confirming that large datasets can help researchers examine how numerous genetic and environmental factors interact at the same time, although Lakhani stresses that it takes painstaking effort to ensure that the data are used accurately. But the research also raises intriguing questions about additional disease factors. “For diseases that have neither a large shared environment, nor genetic, component,” Patel says, “we, the scientific community, need to get more serious about measuring specific environmental factors, such as diet, that can make twins different, or figure out how much is actually due to random chance.” 

Read more articles by Erin O'Donnell

You might also like

How Do Movies Use Music?

Producer Robert Kraft discusses cinematic audio.

Five Questions with Captain Shane McLaughlin ’25

Learn about the 150th captain of Harvard football.

Harvard Football: New Season, New Coach

The 2024 Crimson preview 

Most popular

How Do Movies Use Music?

Producer Robert Kraft discusses cinematic audio.

The Goodness of Being Together

Why social interactions are as vital as food and water

The Complete Works of Du Fu, China’s Shakespeare, Published in English

Conant University Professor Stephen Owen spent 10 years translating the poet’s 1,400 surviving poems.

More to explore

Learning the Trees of North America

A monumental new guide to North American species

An Underknown Twentieth Century Realist Artist

Brief life of an American realist artist and critic: 1907-1975

Susan Farbstein on Human Rights Law

Human rights lawyer on law’s ability to promote justice—and shape public understanding