As of Thursday 14 December, 2016, a new batch of data from Enroll-HD is now available through the website. This is the third periodic dataset release—a set of de-identified records prepared for research—but it’s substantially different from the previous two. It is more than double the size, including information from 8,714 people. And because it folds in information collected during REGISTRY, the European study preceding Enroll-HD, it now stretches far into the past—for some participants, as far back as 15 years. It’s a longitudinal dataset, meaning that it includes repeat data points on the same people over time.
This large longitudinal dataset enables a new category of research: Projects that analyze how and why people with the HD gene expansion change with time. “It gives you a much more detailed picture of the progression of this disease,” says Jen Ware, director of experimental design at CHDI. “That’s the goal of a longitudinal study: Helping us understand the dynamic spectrum of the disease.” It makes it possible to ask how symptoms, biological characteristics, thinking and behavior change, and why some people change more quickly than others.
Each participant’s data in Enroll-HD is de-identified and recoded twice. The dataset includes medical information and results from tests of cognition, memory, physical ability and quality of life. It is made available to anyone affiliated with a recognized research institute, university, nonprofit organization or biomedical company; researchers must apply for an online user account, provide a description of their proposed project, and sign an agreement promising not to attempt to identify study participants.
The first two Enroll-HD releases included only data collected since 2012, when the study launched. Although Enroll-HD now includes more than 12,000 participants, many of those people have only had one or two study visits so far.
Starting in late 2013, European study sites that had been part of REGISTRY began transitioning to Enroll-HD. Once a site officially makes the transition, its REGISTRY participants can join Enroll-HD. As each person comes in for their annual visit and signs the official consent forms to become part of Enroll-HD, their legacy REGISTRY data is added to the Enroll-HD database. When these longer-term participants join Enroll-HD, they bring years of data with them.
This most recent dataset includes data from 3,598 people who were originally part of REGISTRY; almost 1,300 of these participants have at least 5 visits worth of data. “The total number of study visits is growing rapidly,” says Eileen Neacy, chief operating officer at CHDI.
More data, new questions
Studies that include long histories for each participant are rare in medical research because they are difficult and expensive to maintain. But they are extremely valuable because they offer a dimension of information that can’t be captured any other way. In HD, understanding why the disease takes a unique path in each person may provide clues regarding how to slow the progression of the disease.
Enroll-HD data is already being used by researchers. Many are building new models of the disease—developing ways to predict and explain how HD advances.
Others are looking at how certain symptoms emerge and progress. For example, one research group is using the data to understand how the relationship between depression and cognitive function affects the course of HD. They’d studied that relationship among their own HD clinic patients and wanted to explore the pattern in a larger group of people. “Our analyses were promising but certainly limited by our small sample, and so we are currently in the process of obtaining the third Enroll-HD dataset and re-running our analyses using this much larger sample,” says Kevin Manning, assistant professor of psychiatry at UConn Health. “The richness of these data will be a great help in addressing our questions.” Another project explores whether drugs to treat motor symptoms or psychiatric problems might also affect cognition.
Because it’s a rich and well-organized source of data, it allows researchers to ask interesting questions. Steve Horvath, professor of human genetics and biostatistics at the David Geffen School of Medicine at University of California, Los Angeles, recently made a remarkable discovery: Certain markers (called methylation sites) on DNA can be read like a body clock, and can indicate not only how old someone is but also predict how long they are likely to live. In order to show that the pattern was real and not just a coincidence, he and his collaborators needed to analyze a lot of data from a large number of people over time. A team of researchers from around the world analyzed blood samples from 13,000 people enrolled in a range of studies. In September this year they reported that this method, when combined with information about the types of blood cells, accurately predicts how long people will live—a result that made newspaper headlines.
Up until now, no major studies have looked at these markers in people with HD. In a pilot study, Horvath’s group found that HD accelerates this body clock, at least in tissue taken from the brain. Next they used nearly a thousand blood samples from Enroll-HD, as well as samples from earlier HD studies, to find that the same pattern is detectable in blood cells as well, a result that they’re getting ready to publish in a scientific journal. In effect, blood cells of people with HD are biologically slightly older than they should be. Horvath now wants to explore whether this is cause or effect: Do these altered methylation patterns play a role in the progression of the disease? It also suggests a new approach to fighting HD, which would be slowing the process of biological aging. Obviously, we don’t yet know how to do that.
Getting the job done
Making sure that participants’ privacy is protected, checking the quality of the data, and merging the two studies is a major effort. To make it happen an international team talks every week, and then meets in person three times a year. A statistical team flies in from Lisbon, the programmers come from Germany, and the trial management group travels from Philadelphia. The group spends several days in marathon meetings to resolve all the questions and problems. “It’s an intense process,” says Ware.
The statistical team’s job is to make sure that it’s extremely unlikely that any participant can be re-identified through the data, and to flag and withhold any records that might pose that risk. As the study grows larger, the possibility that any one participant could be re-identified becomes less and less likely, simply because more people of every possible description have joined the study. For the first data release in early 2015, data from less than one-third of the total participants were included, in part to make sure that people with unusual characteristics couldn’t be re-identified. This week’s data release includes more than 70 percent of all participants in the study, and the percentage will increase with future releases.
The programmers are responsible for the data extraction—making sure the right information is selected, and that it’s structured properly to be ready for scientific study. The study management team comb over the data for missing or incorrect entries. “You want to make sure you’re working with quality data,” says Ware. “If there are data entry errors, you don’t want something being included that could skew results.” If a record has bits of information missing, or contradictory entries, the team will flag it to be resolved. These problems must be fixed by hand, sometimes even by checking with the paper records at the study site.
Each data release also includes supporting documents such as tutorial guides, a data dictionary that explains each term, and protocols for each study included in the database. “Everything you need to see how these data have been collected is provided,” says Ware. “It takes a village to get all this together.”
Enroll-HD’s growth spurt isn’t over. The process of migrating participants and their data from REGISTRY to Enroll-HD is still ongoing; 50 European sites have yet to finalize the transition. Neacy anticipates that once these sites officially join Enroll-HD, the study will add another 6,000 to 8,000 participants. It’s possible this will happen as soon as the end of 2017, meaning it would end the year at close to 20,000 participants—the original goal of the study.