Benefits and Challenges of Working with Observational (Enroll-HD) Data
How the characteristics of Enroll-HD help and hinder robust observational research.
A key objective in HD research is identifying causal mechanisms that can modify disease progression. Sources of evidence about factors that affect disease progression encompass both randomized and observational methods, and each has different strengths, limitations and sources of bias. Well-conducted randomized controlled trials (RCTs) are the ‘gold standard’ to demonstrate causality but these are not always feasible or ethical. Observational studies are an alternative source of evidence to identify factors that affect HD progression.
There are several types of observational studies – cross-sectional, case-control, and cohort. Here we focus on cohort studies, in which a group of participants are studied at intervals over time. Gathering data prospectively affords several advantages over retrospective data collection, including accuracy (temporal and otherwise) of data gathered. The longitudinal nature of cohort studies and the temporal ordering of possible exposures and outcomes also provides insight into factors that may influence onset or disease trajectory.
Strengths of observational research
Observational studies offer several benefits over RCTs, including increased generalizability of findings given that the RCT target population is typically much narrower than that of a cohort study. Observational studies can also address questions that could not be studied in RCTs for feasibility or ethical reasons, e.g., the effect of smoking or drug use on age of motor onset. Observational studies can also address multiple hypotheses, dependent on the breadth of data and number of observations available. In contrast, an RCT is designed (and powered) to address a limited number of hypotheses.
Limitations and challenges of observational studies
Observational studies can suffer from bias because exposures are not randomly allocated. Sources of bias, or confounding, can introduce spurious (i.e., non-causal) associations between variables, leading to erroneous conclusions. Such sources include confounding by indication or prognosis, survival bias, and selection bias. Additional information on these sources of bias is available here.
It is also important to ensure that the sample size available in your analysis dataset affords you adequate power to detect your effect of interest. Low power not only reduces the chance of detecting a true effect, but also reduces the likelihood that a statistically significant result reflects a true effect . We talk about this more in “Before you Begin”.
Observational studies are a real candy-box for analysts, providing a broad range of measurements on a large sample of people. However, this cornucopia of opportunity also opens us up to vulnerabilities. The ability to test multiple hypotheses simultaneously also raises the (often subconscious) temptation to ‘cherry pick’ results, which is why it is essential to clearly lay out your objectives, hypotheses, and analysis plan in advance of data analysis (see “Before you Begin”).
As with all research, the quality of the data—in terms of accuracy, consistency, completeness, and timeliness—is important. Alongside robust and rigorous analytical and reporting practices, high-quality data is central to generating high-quality evidence. Temporal resolution is also important to accurately characterize exposure, outcome, and confounding factors.
Observational cohort studies in Huntington’s disease
The HD research field has amassed a number of large prospective observational cohort studies, including TRACK-HD², PREDICT-HD³, COHORT⁴, PHAROS⁵, REGISTRY⁶, and Enroll-HD⁷. These studies differ in terms of geographical coverage, sample size, length of follow-up, HD population studied, and data collected. In this article we focus on Enroll-HD, the world’s largest, active international observational study of people with HD.
Strengths of working with Enroll-HD data
Sample size and global representation. Enroll-HD is a global cohort study and clinical research platform designed to facilitate HD clinical research. Enroll-HD study sites operate in over 20 countries located across North America, Latin America, Europe, and Australasia. Since the study began in July 2012, ~24,000 participants have been recruited, ~20,000 are still currently enrolled, and ~19,000 are still actively attending visits⁸.
Comprehensive enrolment of participants. The conceptual target population of Enroll-HD comprises HD gene-expansion carriers (HDGECs)—regardless of clinical symptomology, age, sex, or ethnicity—alongside HD family members and genotype negative individuals, who provide ‘normative’ data for research studies requiring such a comparator group. This all-comers approach to recruitment maximizes generalizability and minimizes selection bias. Enroll-HD features broad representation of participants across the disease spectrum (see Figure 1)
Figure 1. UHDRS®’99 Total Functional Capacity (TFC) score at baseline Enroll-HD visit in HDGECs in PDS5 (release 2020-10-R1).
Breadth of available participant data. The data available in Enroll-HD are broad and comprehensive. The study protocol, including the assessment battery, was designed by clinicians and other HD specialists. It features both ‘core’ and ‘extended’ components. Core data components—which are mandatory and must be completed or reviewed and updated at each annual visit—include patient demographic information, HD clinical characteristics, co-morbid conditions, disease-related treatments and other therapies, and assessments of motor, function, behavioral, and cognitive performance, including several Unified Huntington’s Disease Rating Scale (UHDRS®’99) component scales. CAG-repeat length (determined at a central laboratory) is assessed at baseline visit for every participant, and genome-wide SNP data are available for a large subset of participants. Data on reportable events (e.g., suicide attempt) and mortality are also captured. Extended assessments—which are optional for completion at each visit—comprise additional assessments of motor, behavioral, and cognitive function, along with quality of life assessments and health and economic impact measures. If the participant consents, family history (pedigree charting) may also be recorded, and participant biosamples may be collected for future research (lymphoblast cell lines, PBMCs). Assessments were selected based on a systematic and comprehensive evaluation of HD. Many were selected to ensure continuity with respect to contemporaneous HD cohort studies (e.g., REGISTRY and COHORT) and informed by the findings of TRACK-HD and PREDICT-HD, both of which are prospective, multi-national studies designed to identify sensitive reliable outcome measures of early-stage HD.
Frequency of data collection. Participants attend in-person visits annually during which the core Enroll-HD assessment battery is administered; optional extended assessments may also be conducted.
Participant follow-up. Per the Enroll-HD study protocol, participants are asked to participate in as many annual study visits as possible. Survival (retention) of HDGECs who are premanifest at study entry is good, with 80% survival probability at 7 years post-study entry (discontinuation due to withdrawal, death, or loss-to follow-up). For HDGECs who are manifest on study entry, ~55% survival probability is observed at 7 years.
Common data collection platform. Enroll-HD operates under a single study protocol aligned with a centrally managed electronic data capture (EDC) platform; this provides a common data collection and reporting platform for every Enroll-HD site, ensuring that the data elements captured, as well as the format and definitions of data entered, are consistent, both within and across sites. This, in turn, facilitates the implementation of centralized data QC procedures to ensure data completeness and accuracy. Internationally recognized nomenclature systems (e.g., MedDRA, ICD-10) are used.
Data quality management. Ensuring data quality and integrity is fundamental to the Enroll-HD study. EDC edit checks (including automated field completion, e.g., scale item totals) and prompts during data entry, remote centralized statistical monitoring of data (at participant and site level), onsite data monitoring, and centrally coordinated training for assessors/raters and monitors, are all designed to maximize data accuracy, consistency, and completeness.
Data accessibility and support. The Enroll-HD periodic dataset is available to any interested researcher through the Enroll-HD website, which serves as a hub to explore, access, and download the current dataset release. It also provides information to explore the dataset structure, understand and interpret the data, and publish using Enroll-HD data. Information on data quality management and participant privacy, as well as general study documents (study protocol, electronic case report forms, data dictionary) are also accessible.
Limitations and challenges of working with Enroll-HD data
Frequency of data collection. The annual periodicity of Enroll-HD participant visits can mean the data are suboptimal to address hypotheses that relate to a shorter time frame (e.g., the effects of an exposure over days/weeks/months). Missed visits are also observed in Enroll-HD. Participants voluntarily miss visits – sometimes several consecutive years – then return, while others are actively paused, for example while participating in clinical trials. The median period between in-person visits is 374 days (i.e., 1.02 years) – in line with protocol – although these data are positively skewed, with values as extreme 2488 days (i.e., 6.82 years) observed.
Treatment dates. Treatment start and stop dates can be incomplete or missing. The EDC allows for incomplete entries for these date variables (e.g., MM/YYYY, YYYY), as participants are often unable to recall exact dates of beginning or ending medications and/or therapies. In Europe, Latin America and Australasia, onsite monitors have access to participants’ medical records in addition to Enroll-HD-specific paper source documents to verify treatment dates (alongside other critical dates and data elements). This is not permitted under the current Enroll-HD ICF in North America. Also, the reason for treatment discontinuation is not systematically captured; these treatment decisions are likely to be non-random.
Adverse events. Since Enroll-HD is an observational study (as opposed to an RCT), adverse events and severe adverse events are not routinely captured. However, ‘reportable events’ (suicide attempt, suicide, mental health event resulting in hospitalization, and death) are recorded.
Selection bias. It is likely that Enroll-HD, like other cohort studies, suffers from a degree of selection bias, including known factors such as biases towards individuals of a higher socioeconomic status and better health. It is likely that individuals with multiple or severe comorbid conditions may either not be able or opt not to participate, or may not be actively recruited.
Limited ethnic diversity. Ethnic diversity in the Enroll-HD cohort is limited. Prevalence of HD differs dramatically by ethnicity and is notably lower in individuals of Asian and African descent relative to Caucasian estimates. However, this explanation may not fully account for the magnitude of the differences observed in Enroll-HD.
Participant follow-up. While participant retention in Enroll-HD is good, particularly for premanifest individuals, there is attrition, most notably in the manifest cohort. Annual visits can become burdensome for participants and their families, especially in the later disease stages. Missed visits are also observed in Enroll-HD, as discussed above. Participants can miss visits – sometimes over many years – then return.
¹ Button, K., Ioannidis, J., Mokrysz, C. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14, 365–376 (2013). https://doi.org/10.1038/nrn3475
² TRACK-HD; Tabrizi SJ, Langbehn DR, Leavitt BR, et al. Biological and clinical manifestations of Huntington’s disease in the longitudinal TRACK-HD study: cross-sectional analysis of baseline data. Lancet Neurol. 2009;8(9):791-801. doi:10.1016/S1474-4422(09)70170-X
³ PREDICT-HD; Paulsen JS, Hayden M, Stout JC, et al. Preparing for preventive clinical trials: the Predict-HD study. Arch Neurol. 2006;63(6):883-890. doi:10.1001/archneur.63.6.883
⁴ COHORT: Cooperative Huntington’s Observational Research Trial; Huntington Study Group COHORT Investigators, Dorsey E. Characterization of a large group of individuals with huntington disease and their relatives enrolled in the COHORT study [published correction appears in PLoS One. 2012;7(8). doi: 10.1371/annotation/25881bc7-922d-4472-9efd-f0896b1a3499]. PLoS One. 2012;7(2):e29522. doi:10.1371/journal.pone.0029522
⁵ PHAROS: Prospective Huntington At Risk Observational Study; The Huntington Study Group PHAROS Investigators*. At Risk for Huntington Disease: The PHAROS (Prospective Huntington At Risk Observational Study) Cohort Enrolled. Arch Neurol. 2006;63(7):991–996. doi:10.1001/archneur.63.7.991
⁶ REGISTRY: Orth M, Handley OJ, Schwenke C, et al. Observing Huntington’s Disease: the European Huntington’s Disease Network’s REGISTRY. PLoS Curr. 2010;2:RRN1184. Published 2010 Sep 28. doi:10.1371/currents.RRN1184
⁷ Enroll-HD; Landwehrmeyer GB, Fitzer-Attas CJ, Giuliano JD, et al. Data Analytics from Enroll-HD, a Global Clinical Research Platform for Huntington’s Disease. Mov Disord Clin Pract. 2016;4(2):212-224. Published 2016 Jun 22. doi:10.1002/mdc3.12388