When considering the risk of a given disease—cancer, cardiovascular problems, Alzheimer’s—what matters more: the genes inherited from parents and grandparents, or the environment? Is disease influenced more by DNA, or by factors such as air pollution levels, socioeconomic status, or even regional weather conditions?
It’s common to think of disease and health “as this tension of ZIP code versus genetic code,” explains Chirag Patel, assistant professor of biomedical informatics at Harvard Medical School.
But a study by Patel and his research team challenges this “either-or” thinking, using Big Data to tease apart the complex interplay of environment, genes, and other factors in disease. They analyzed an insurance database of almost 45 million people in the United States, Patel explains, zeroing in on 700,000 pairs of non-twin siblings and 56,000 pairs of twins, in what is likely the largest study of twin pairs to date. Studying identical twins is a common way to consider nature-versus-nurture questions because such siblings have identical genes and often grow up in the same environment. In typical twin studies, researchers must recruit participants and examine just one or two diseases at a time. But this massive preexisting database enabled Patel and his team to consider 560 different diseases at the same time.
“You have this huge sample size, which we all love in science,” he says, “but these types of data are not meant for this work.” Preparing the database for study was therefore a challenge. Because the data did not specify which siblings were twins, for example, postdoctoral fellow Chirag Lakhani, who led the analyses, isolated the twins by searching for family members born on the same day. The team also had to determine which twins were identical (with identical DNA) and which fraternal. Male-female twin pairs cannot be identical, but same-sex twins have an equal chance of being identical or fraternal. Working with colleagues at the University of Queensland in Australia, the Harvard team developed a statistical technique for estimating which of the same-sex pairs were identical. When they compared their findings with previous small-scale studies on twins and disease, “we found by and large that there was a strong correlation with the things that we were seeing,” Patel says.
Some conditions stood out for the strength of their genetic links...
Of the 560 diseases studied, 40 percent had some genetic component, while the shared environment (elements such as air quality and average temperatures) played a role in at least 20 percent of the diseases. Unsurprisingly, most diseases involved a mix of genetic and environmental factors. But some conditions stood out for the strength of their genetic links, including pervasive developmental disorders such as attention deficit hyperactivity disorder, and psychiatric diseases such as schizophrenia or depression. In contrast, lead poisoning and eye diseases such as myopia and astigmatism were the most heavily influenced by environment.
The researchers acknowledge some gaps in their work. For example, all people in the study were covered by employer-sponsored health insurance, so at least one person in the family had a job, which made it complicated to sort out the influence of income on disease. “Trying to dig deeper into that question is a priority for us,” Patel says. In the future he hopes to do similar work with Medicare or Medicaid data, “which has coverage for people who would be facing health disparities.” Moreover, none of the subjects were more than than 24 years old, so the study couldn’t capture how the influence of genes and environment might change as people enter middle age and beyond. Nor could the researchers explore how changes in an environment over time might influence health.
The work is important for confirming that large datasets can help researchers examine how numerous genetic and environmental factors interact at the same time, although Lakhani stresses that it takes painstaking effort to ensure that the data are used accurately. But the research also raises intriguing questions about additional disease factors. “For diseases that have neither a large shared environment, nor genetic, component,” Patel says, “we, the scientific community, need to get more serious about measuring specific environmental factors, such as diet, that can make twins different, or figure out how much is actually due to random chance.”