Volume 20, Issue 2 p. 1102-1111
RESEARCH ARTICLE
Open Access

Precision medicine analysis of heterogeneity in individual-level treatment response to amyloid beta removal in early Alzheimer's disease

Menglan Pang

Menglan Pang

Biogen Digital Health, Biogen, Cambridge, Massachusetts, USA

Biogen, Cambridge, Massachusetts, USA

Search for more papers by this author
Audrey Gabelle

Audrey Gabelle

Biogen Digital Health, Biogen, Cambridge, Massachusetts, USA

Biogen, Cambridge, Massachusetts, USA

Search for more papers by this author
Paramita Saha-Chaudhuri

Paramita Saha-Chaudhuri

Biogen Digital Health, Biogen, Cambridge, Massachusetts, USA

Biogen, Cambridge, Massachusetts, USA

Search for more papers by this author
Willem Huijbers

Willem Huijbers

Biogen Digital Health, Biogen, Cambridge, Massachusetts, USA

Biogen, Cambridge, Massachusetts, USA

Search for more papers by this author
Arie Gafson

Arie Gafson

Biogen Digital Health, Biogen, Cambridge, Massachusetts, USA

Biogen, Cambridge, Massachusetts, USA

Search for more papers by this author
Paul M. Matthews

Paul M. Matthews

Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, UK

UK Dementia Research Institute at Imperial College London, London, UK

Search for more papers by this author
Lu Tian

Lu Tian

Biomedical Data Science and Statistics, Stanford University School of Medicine, Stanford, California, USA

Search for more papers by this author
Ivana Rubino

Ivana Rubino

Biogen, Cambridge, Massachusetts, USA

Search for more papers by this author
Richard Hughes

Richard Hughes

Biogen Digital Health, Biogen, Cambridge, Massachusetts, USA

Biogen, Cambridge, Massachusetts, USA

Search for more papers by this author
Carl de Moor

Carl de Moor

Biogen Digital Health, Biogen, Cambridge, Massachusetts, USA

Biogen, Cambridge, Massachusetts, USA

Search for more papers by this author
Shibeshih Belachew

Shibeshih Belachew

Biogen Digital Health, Biogen, Cambridge, Massachusetts, USA

Biogen, Cambridge, Massachusetts, USA

Search for more papers by this author
Changyu Shen

Corresponding Author

Changyu Shen

Biogen Digital Health, Biogen, Cambridge, Massachusetts, USA

Biogen, Cambridge, Massachusetts, USA

Correspondence

Changyu Shen, Biogen, 225 Binney St, Cambridge, MA 02142, USA.

E-mail: [email protected]

Search for more papers by this author
First published: 26 October 2023
Citations: 1

Abstract

INTRODUCTION

Alzheimer's disease (AD) is a neurological disorder with variability in pathology and clinical progression. AD patients may differ in individual-level benefit from amyloid beta removal therapy.

METHODS

Random forest models were applied to the EMERGE trial to create an individual-level treatment response (ITR) score which represents individual-level benefit of high-dose aducanumab relative to the placebo. This ITR score was used to test the existence of heterogeneity in treatment effect (HTE).

RESULTS

We found statistical evidence of HTE in the Clinical Dementia Rating–Sum of Boxes (CDR-SB;P =  0.034). The observed CDR-SB benefit was 0.79 points greater in the group with the top 25% of ITR score compared to the remaining 75% (P = 0.020). Of note, the highest treatment responders had lower hippocampal volume, higher plasma phosphorylated tau 181 and a shorter duration of clinical AD at baseline.

DISCUSSION

This ITR analysis provides a proof of concept for precision medicine in future AD research and drug development.

Highlights

  • Emerging trials have shown a population-level benefit from amyloid beta (Aβ) removal in slowing cognitive decline in early Alzheimer's disease (AD).
  • This work demonstrates significant heterogeneity of individual-level treatment effect of aducanumab in early AD.
  • The greatest clinical responders to Aβ removal therapy have a pattern of more severe neurodegenerative process.

1 BACKGROUND

Alzheimer's disease (AD) is a progressive disorder with heterogeneous symptomatology, pathology, and individual disease courses.1, 2 Recent clinical trials demonstrate that early AD modification using anti-amyloid therapies can have clinical benefit. This includes the positive phase 3 trials of aducanumab3 (EMERGE ClinicalTrials.gov Identifier: NCT02484547) and lecanemab4 (CLARITY AD ClinicalTrials.gov Identifier: NCT03887455), both meeting their primary endpoint and all key secondary endpoints. The evidence for a causal effect of amyloid depletion in delaying cognitive decline was consolidated in a recent instrumental-variable meta-analysis (integrating data from 16 clinical trials) providing further support for this class of therapy in AD.5

The heterogeneity of AD is likely to influence the clinical benefit of individual patients from disease-modifying treatment. Nonetheless, the conventional “responder analysis” by examining what type of patients have a change in clinical outcome exceeding a pre-specified threshold under a new treatment is fundamentally flawed for parallel randomized controlled trials (RCTs) of AD. The key problem is that the longitudinal change of the clinical outcome of a given patient under a treatment could be attributed to other factors beyond the treatment.6 To isolate the treatment effect at individual patient level, we would need to know the change of the outcome had the patient been assigned to the control arm (e.g., placebo), which is not observable in a parallel RCT. Therefore, the “response” in a typical responder analysis does not accurately capture the individual-level treatment benefit and cannot be used to study heterogenous treatment effects (HTE). Fortunately, there have been significant developments in statistical methodologies to assess HTE from both randomized and observational studies, which serve as a foundation for the precision medicine paradigm (see Kosorok and Laber7 and Kent et al.8 for a comprehensive review). One statistical framework to investigate HTE is the estimation of an individual-level treatment response (ITR) score,9 which essentially is the predicted value of individual patient–level treatment effects. The methodology builds prediction models for an outcome of interest, separately in the treatment and placebo groups, using baseline patient characteristics as predictors. The ITR “score” is then derived at the patient level representing the difference between the predicted outcomes under the treatment and placebo settings. The ITR framework permits the incorporation of various prediction modeling strategies, including machine learning, and provides interpretability through the identification of patient characteristics that are associated with a stronger treatment benefit. The approach has been used to examine HTE in multiple sclerosis,10 cancer,11 cardiovascular diseases,12 and other conditions.13

In this study, we applied the ITR methodology to patients enrolled in the EMERGE clinical trial (randomized, double-blind, placebo-controlled, phase 3 study of aducanumab [NCT02484547]). We evaluated whether there is heterogeneity in the reduction of cognitive decline in response to amyloid-reducing therapy and identified patient characteristics driving the observed treatment effect heterogeneity.

2 METHODS

2.1 Study population

The study design of EMERGE (ClinicalTrials.gov Identifier: NCT02484547) has been previously described.3 Briefly, EMERGE (n = 1638) was a randomized, placebo-controlled, double-blind, global, phase 3 study of aducanumab in patients with confirmed amyloid beta (Aβ) pathology, aged 50 to 85 years, who met clinical criteria for mild cognitive impairment (MCI) due to AD or mild AD dementia. Participants were randomly assigned 1:1:1 to receive aducanumab low dose (3 or 6 mg/kg target dose), high-dose (10 mg/kg target dose), or placebo via intravenous (IV) infusion once every 4 weeks over 76 weeks. The primary clinical endpoint was the change from baseline to week 78 on the Clinical Dementia Rating–Sum of Boxes (CDR-SB). Secondary clinical outcome measures included the Mini-Mental State Examination (MMSE), the Alzheimer's Disease Assessment Scale-Cognitive Subscale 13-item scale (ADAS-Cog13) and the Alzheimer's Disease Cooperative Study Activities of Daily Living Inventory-Mild Cognitive Impairment (ADCS-ADL-MCI). Tertiary clinical outcome measures included the Neuropsychiatric Inventory-10 (NPI-10).

2.2 Outcome and baseline variables included in ITR score model development

The primary endpoint for the HTE analysis was CDR-SB change at week 78. We included patients from EMERGE who were randomized to receive high-dose of aducanumab or placebo and who had CDR-SB measured at both baseline and week 78.

The following baseline characteristics were pre-specified for development of separate prediction models in the high-dose aducanumab and placebo groups: age, sex, years of formal education, clinical stage (MCI due to AD, or mild AD), body mass index (BMI), apolipoprotein E (APOE) ε ${\varepsilon }$ 4 status (carrier or non-carrier), years since first AD symptoms, years since AD diagnosis, AD symptomatic medication use (yes or no), regional brain magnetic resonance imaging (MRI) volume (frontal cortex, parietal cortex, lateral temporal cortex, medial temporal cortex, left hippocampus, right hippocampus, anterior cingulate cortex, posterior cingulate cortex, dorsal medial prefrontal cortex [PFC] default-mode network [DMN], medial temporal cortex DMN) normalized by the total intracranial volume (TIV), plasma phosphorylated tau (p-tau) 181, and medical history (defined as yes or no for categories of vascular, cardiac, or psychiatric disorders and microhemorrhage). Data preprocessing with respect to missing values, skewed distributions, and extreme values is described in Appendix A in supporting information.

2.3 Statistical methods

2.3.1 ITR score

We used random forest models14 to predict the change in CDR-SB from baseline to week 78 in the high-dose aducanumab and placebo groups using the baseline characteristics described above. A random forest model was used because of its ability to handle a large number of predictors and to take into account non-linear relationships and interactions. Once the prediction models were built in each group, a patient's ITR score was calculated as the difference in the predicted change in CDR-SB at week 78 between the high-dose aducanumab and placebo prediction models (i.e., how an individual patient's clinical score would have differed had they received treatment vs. placebo).​ Hence, a lower ITR score value corresponds to a greater predicted individual treatment benefit, that is, less cognitive worsening associated with high-dose aducanumab compared to placebo.

2.3.2 Measure of the HTE

The HTE is graphically depicted by plotting the average treatment difference (ATD) curve.9 The ATD reflects the average observed treatment effect for the subgroup with an ITR score (i.e., the predicted individual-level treatment benefit) below the lowest q percentile. This subgroup also corresponds to the q percent of population with highest predicted treatment benefit. Details about the ATD curve construction are described in Appendix A. In general, the ATD curve provides a visual overall assessment of the magnitude of HTE. The area between the horizontal line representing the overall treatment effect and the ATD curve, denoted as the area between the curves (ABC), can provide a quantitative summary statistic for the overall magnitude of the HTE.9 In addition, to aid in interpreting the magnitude of the HTE, we a priori defined highest responders as the subgroup with the highest 25% ITR score–predicted treatment response and standard responders as the remaining 75%. The average of the estimated treatment effect within both subgroups, as well as the reported ATD curve and the ABC statistics were evaluated in a repeated 5-fold across validation a total of 200 times (See Figure S1 in supporting information for demonstration).

The results of sensitivity analyses of other thresholds are presented in Appendix B (Table S2) in supporting information. We then performed a permutation test for the existence of HTE using the ABC statistics and for the treatment effect differences between the highest and standard responders. The use of the permutation testing procedure protected against a potentially inflated Type I error rate associated with over-fitting of the prediction models.15 More details about the permutation test are provided in Appendix A.

RESEARCH IN CONTEXT

  1. Systematic review: Recent success in clinical trials of pharmacologic treatment of early Alzheimer's disease (AD), including EMERGE for aducanumab, CLARITY-AD for lecanemab, and TRAILBLAZER-ALZ2 for donanemab, demonstrated population-level benefit in slowing clinical progression. As AD is a disease with heterogeneous pathology and clinical manifestation, individual-level heterogeneity in treatment effect may also exist in the target population.

  2. Interpretation: There is statistically significant evidence that patients in the EMERGE trial derived heterogeneous benefit from aducanumab in slowing the worsening of the Clinical Dementia Rating Scale–Sum of Boxes. A further exploratory analysis suggested that several routinely collected patient characteristics are associated with the level of benefit.

  3. Future directions: These findings provide a proof of concept for the application of precision medicine in AD. More external data testing is warranted to determine the generalizability and enable implementation of such precision medicine models in future drug development through population enrichment and optimization of the use of pharmacologic interventions in AD clinical practice.

2.3.3 Association between treatment effect and baseline variables

To classify each individual in the analysis study population as a highest or standard responder, we assigned the ITR score for each individual by taking the average score from the 200 repeated cross-validation procedure in which the individual was a hold-out test sample. Patients were classified into the highest responder versus standard responder category based on the 25% threshold of the assigned ITR scores. To identify the patients’ baseline variables that contributed to the ITR score, we used a multipronged approach. First, we compared baseline characteristics between the highest and standard responders using summary statistics, standardized mean differences, and calculated P values from two sample t tests or chi-squared tests as appropriate. Second, a variable importance analysis was conducted based on an application of conditional random forests16 to the ITR scores. Finally, a regression tree17 was fitted to the ITR scores to depict the potentially complex relationship between the most important baseline characteristics and the ITR score. The depth of the regression tree was limited to two levels to achieve easily interpretable results that focus on the most important variables.

2.3.4 Treatment benefits for other cognitive and functional measurements using the ITR scores

We compared the observed longitudinal change from baseline in the high-dose aducanumab versus placebo patients across all study time points (weeks 26, 50, 78) for CDR-SB (internal validation) and for each of the secondary and tertiary cognitive and behavioral outcome measures (ADAS-Cog13, MMSE, ADCS-ADL-MCI, and NPI-10) as well as between the stratified highest and standard responders. Although these clinical outcome measures are correlated with CDR-SB, there are important non-overlapping domains among these scales measuring different aspects of cognitive and behavioral functions,18, 19 thus enabling additional validation of the model. We used the same mixed model for repeated measures (MMRM) as specified in the EMERGE primary analysis to analyze the change from the baseline score across all study timepoints. As in the primary RCT analysis, the fixed effects of the MMRM comprised treatment group, visit, treatment group by visit interaction, baseline clinical measure score, baseline clinical measure score by visit interaction, baseline MMSE, AD symptomatic medication use, region (United States, Europe/Canada/Australia, and Asia) and APOE ε 4 ${\varepsilon}4$ status. An unstructured covariance matrix was used to account for the correlation within a patient.

3 RESULTS

3.1 Study population

The analysis study population consisted of 587 patients, with 299 patients from the high-dose aducanumab group and 288 from the placebo group. Patients’ characteristics were highly comparable between the two treatment arms with respect to all baseline variables. Summary statistics by treatment arm are presented in Table S1 in supporting information.

3.2 Assessment of heterogeneity of treatment effect

The ATD curve is shown in Figure 1. The red horizontal dashed line represents the overall average treatment effect (i.e., −0.37 [95% confidence interval (CI): −0.71, −0.04]) in the analysis study population. This line offers a benchmark for no HTE, that is, when every patient had the same reduction in CDR-SB decline associated with high-dose aducanumab versus placebo. The blue curve shows the ATD curve derived from the ITR score based on random forest prediction models. For example, q = 50 on the x axis represents the subgroup of patients with the highest 50% ITR score–predicted treatment response, for which the average treatment benefit estimate of high-dose aducanumab relative to placebo in CDR-SB change at week 78 was −0.6 (on the y axis). The ABC statistic between this blue line and the dashed red reference line was −0.246, which was statistically significant (permutation P value = 0.034), indicating that the ATD curve significantly deviated from the null hypothesis for HTE (horizontal reference line).

Details are in the caption following the image
ATD curve (blue) constructed from the 200 repeated 5-fold cross validation. The x axis represents the q percentage threshold of the ITR score used to select a subgroup with high predicted treatment benefit, and the y axis represents the observed treatment effect on change in CDR-SB (high-dose aducanumab – placebo) for the corresponding q percentage subgroup. The flat dashed line in red represents the ATD curve in the absence of heterogeneity of treatment effect, with y coordinate indicating the average treatment effect (reduction in CDR-SB decline under high-dose aducanumab vs. placebo) in the entire sample. Among the subgroup of patients with the ITR score–predicted highest 25% treatment response (q = 25%, dashed arrow line), the observed treatment effect of high-dose aducanumab versus placebo was −0.97 CDR-SB points. ABC: area between curves (i.e., green area between the blue and red curves). ATD, average treatment difference; CDR-SB, Clinical Dementia Rating Sum of Boxes; ITR, individual-level treatment response

The ITR score–predicted highest 25% responder group had an average observed reduction of 0.97 points of the CDR-SB scale worsening (corresponding to a 44% relative reduction) between baseline and week 78 comparing the high-dose aducanumab group to placebo (Figure 1), whereas the corresponding estimate for treatment benefit in the standard responder group was −0.18 (Table 1). Thus, the observed CDR-SB benefit from high-dose aducanumab versus placebo was 0.79 points greater in the ITR score–predicted highest 25% responder group compared to the standard responder group (permutation P = 0.020). Results based on q = 12.5% and q = 50% thresholds for the definition of highest responder group were consistent with the primary analysis and are presented in Table S2 in supporting information.

TABLE 1. Cross-validated observed treatment effect in the ITR score–predicted highest (25% lowest ITR score) and standard (all of the others) responder groups.
Change in CDR-SB Relative change from baseline compared to placebo
Placebo High-dose Treatment effect
25% Highest responders 2.20 1.23 −0.97 44.1%
75% Standard responders 1.57 1.39 −0.18 11.5%
Treatment effect difference (P value†) −0.79 (0.020)
  • Note: Results were obtained as average across the 1000 hold-out sets.
  • Abbreviations: CDR-SB, Clinical Dementia Rating Sum of Boxes; ITR, individual-level treatment response.
  • P value is derived from the permutation test.

3.3 Uni- and multivariate analysis of the association between treatment effect and patients’ baseline characteristics

Baseline characteristics distinguishing the ITR score–predicted highest 25% responders and standard responders were ranked by the magnitude of the group-wise standardized mean difference (SMD; Table 2 and Figure 2). The largest SMDs were found for right hippocampal volume (SMD = 0.75, P < 0.001) and left hippocampal volume (SMD = 0.67, P < 0.001). In this univariate analysis, eight variables were significantly different between the two responder groups (Table 2) with an SMD that exceeded 0.20 in absolute scale (Figure 2). Overall, patients in the ITR score–predicted highest 25% responder group had lower hippocampal and medial temporal cortical volumes, were older, had higher baseline plasma p-tau181, had a shorter time since AD symptom onset and diagnosis, and were more likely to have baseline microhemorrhages (Table 2 and Figure 2). In the sensitivity analysis based on alternative thresholds (q = 12.5% and q = 50%), the significantly different variables between ITR score–predicted highest versus standard responders, as well as their ranking by the SMD, were generally consistent with the primary analysis.

TABLE 2. Baseline characteristics for the ITR score–predicted highest versus standard responders (q = 25 percentile of ITR score distribution as the threshold).
Covariate Highest responders (N = 147) Standard responders (N = 440) P value
Right hippocampal volume (%TIV) 0.0014 (0.00024) 0.0016 (0.00025) <0.001*
Left hippocampal volume (%TIV) 0.0014 (0.00024) 0.0016 (0.00024) <0.001*
Age 73.29 (8.31) 69.38 (6.76) <0.001*
Baseline p-tau181 3.65 (2.06) 3.09 (1.17) <0.001*
Time since first AD symptom (years) 3.28 (2.88) 4.12 (2.72) 0.001*
Medial temporal cortex volume (%TIV) 0.0043 (0.00051) 0.0045 (0.00059) 0.01*
Time since diagnosis of AD (Years) 1.07 (1.26) 1.45 (1.69) 0.01*
Microhemorrhagea 29 (19.73%) 49 (11.14%) 0.01*
Cardiac disordera 43 (29.25%) 97 (22.05%) 0.10
Sex (male)a 80 (54.42%) 207 (47.05%) 0.15
Posterior cingulate cortex volume (%TIV) 0.0037 (0.00040) 0.0037 (0.00048) 0.14
Frontal cortex volume (%TIV) 0.098 (0.0062) 0.097 (0.0063) 0.17
Clinical stage (mild AD)a 25 (17.01%) 55 (12.5%) 0.21
AD symptomatic medication useda 73 (49.66%) 244 (55.45%) 0.26
Anterior cingulate cortex volume (%TIV) 0.0040 (0.00046) 0.0040 (0.00050) 0.30
Baseline body mass index (kg/m^2) 25.25 (4.62) 25.7 (4.38) 0.29
Psychiatric disordera 66 (44.9%) 219 (49.77%) 0.35
Vascular disordera 77 (52.38%) 210 (47.73%) 0.38
Lateral temporal cortex volume (%TIV) 0.058 (0.0044) 0.058 (0.0051) 0.47
Parietal cortex volume (%TIV) 0.063 (0.0057) 0.064 (0.0054) 0.52
APOE ℇ4 status (carrier)a 102 (69.39%) 294 (66.82%) 0.64
Medial temporal cortex DMN volume (%TIV) 0.020 (0.0020) 0.020 (0.0020) 0.70
Dorsal medial PFC DMN volume (%TIV) 0.037 (0.0026) 0.037 (0.0033) 0.76
Year of education 14.69 (3.9) 14.78 (3.43) 0.80
  • Note: Variables in the table were ordered by decreasing values of standardized mean difference (top to bottom).
  • Abbreviations: AD, Alzheimer's disease; APOE, apolipoprotein E; ITR, individual-level treatment response; DMN, default-mode network; PFC, prefrontal cortex; p-tau, phosphorylated tau; TIV, total intracranial volume.
  • a Summary statistics are presented as N (%) for the categorical variables and as mean (standard deviation) for the continuous variables.
  • *P value < 0.05.
Details are in the caption following the image
Absolute standardized mean difference in baseline characteristics for ITR score–predicted highest versus standard responders (q = 25 percentile of ITR score distribution as the threshold). Variables were ordered by decreasing values of standardized mean difference (top to bottom). AD, Alzheimer's disease; APOE, apolipoprotein E; ITR, individual-level treatment response; p-tau, phosphorylated tau

Figure 3 shows the baseline characteristics ranked by the importance score obtained from the conditional random forest model fitted to the ITR score. The top four most important baseline characteristics were left hippocampal volume, right hippocampal volume, frontal cortex volume, and time since first AD symptom. The magnitude of the variable importance score corresponding to the left and right hippocampus, and to a lesser degree the frontal cortex volume, were far larger than the remaining baseline characteristics. In addition, 8 of the 10 most important variables in this multivariate analysis were related to regional cortical atrophy measures. Finally, the directionality of the effect of frontal cortical volume is opposite to that of hippocampal volume, that is, the ITR score–predicted highest responders have a lower hippocampal cortex volume and a higher frontal cortex volume relative to the rest of the population (standard responders).

Details are in the caption following the image
Variable importance of the baseline characteristics obtained from the conditional random forest model fitted to the ITR score. Variables were ordered by decreasing values of the importance score that represents the mean decrease in accuracy when each covariate was permuted in a conditional random forest model with the ITR score as the outcome (top to bottom). AD, Alzheimer's disease; APOE, apolipoprotein E; ITR, individual-level treatment response; p-tau, phosphorylated tau

To further assess how individual baseline characteristics are related to HTE, and how they interact with each other in predicting the ITR score, we fitted a single regression tree with the ITR score as the outcome. The final tree included three baseline characteristics and yielded four patient groups with distinct predicted treatment benefit compared to placebo (Figure 4). The baseline characteristics included left hippocampal volume, frontal cortex volume, and time since first AD symptom, with the first split occurring in left hippocampal volume.

Details are in the caption following the image
Regression tree fitted to the ITR score. The regression tree includes labels for the baseline characteristics and the thresholds used to split the patients into non-overlapping groups. Boxes include the predicted magnitude of the average treatment effect on change in CDR-SB from baseline to week 78 (high-dose aducanumab – placebo) and the proportion (%) of patients in each branch of the tree. For example, patients with left hippocampal volume (normalized using volume to total intracranial volume fraction) < 0.0013, comprising 22% of the study population, had a predicted reduction of −0.71 in CDR-SB worsening compared to placebo at week 78. AD, Alzheimer's disease; CDR-SB, Clinical Dementia Rating Sum of Boxes; ITR, individual-level treatment response

3.4 Assessment of the ITR score model prediction on longitudinal clinical and functional outcome measures

In the ITR score–predicted highest 25% responder subgroup, the adjusted mean differences between high-dose aducanumab and placebo in the worsening of the cognitive and functional clinical endpoints between baseline to week 78 were: −1.07 (P = 0.002), −2.32 (P = 0.029), and 1.23 (P = 0.040) for CDR-SB, ADAS-Cog13, and MMSE, respectively, in contrast to −0.395 (P = 0.020), −1.126 (P = 0.044), and 0.665 (P = 0.034) in the overall study population (Figure 5). A larger separation between the high-dose aducanumab and placebo groups was observed in the ITR score–predicted highest 25% responders compared to the standard responders at each study visit, suggesting a stronger treatment benefit across both cognitive and functional performance over time. Similar patterns were observed for ADCS-ADL-MCI and NPI-10, shown in Figure S4 in supporting information.

Details are in the caption following the image
Adjusted mean change from baseline using MMRM stratified by ITR score–predicted highest 25% responder and standard responder subgroups as well as for the overall analysis study population, for (A) CDR-SB, (B) ADAS-Cog13, and (C) MMSE endpoints. *P < 0.05; **P < 0.01; ***P < 0.001. ADAS-Cog, Alzheimer's Disease Assessment Scale-Cognitive Subscale 13-item scale; CDR-SB, Clinical Dementia Rating Sum of Boxes; ITR, individual-level treatment response; MMRM, mixed model for repeated measures; MMSE, Mini-Mental State Examination

4 DISCUSSION

Using an ITR score model approach,9 we demonstrated evidence of heterogeneity in treatment effect of high-dose aducanumab compared to placebo on reducing CDR-SB decline in EMERGE. Exploratory analyses showed that the highest benefit from aducanumab was associated with the following baseline characteristics: lower hippocampal and medial temporal cortical volumes, older age, higher plasma p-tau181, a shorter time since AD symptom onset and diagnosis, and higher prelevance of microhemorrhages. This group also consistently demonstrated higher treatment effect from aducanumab for other clinical outcomes measures that were not used in model training, including global cognitive domain (MMSE), cognitive status (ADAS-Cog13), functional domain (ADCS-ADL-MCI), and neuropsychiatric symptoms (NPI-10). Given the lack of independent validation, this work should be viewed as hypothesis generating in nature but constitutes the first demonstration of the existence of a heterogeneous response to anti-amyloid therapy, which lays the foundation for further investigation of the potential of precision medicine models to inform personalized decision making when initiating treatment in patients with early AD.

The analyses of baseline characteristics of the ITR score–predicted highest 25% responders revealed a clear pattern of more severe neurodegenerative process, as measured by a more severe hippocampal atrophy, associated with higher plasma p-tau181 and shorter duration of AD disease since symptom inception and diagnosis. Interestingly, the most highly ranked baseline characteristics as per their importance in the conditional random forest model were regional brain atrophy features indicating a potential influence of topographical markers in the response to Aβ removal therapy. This is consistent with machine learning models that predict disease progression in AD with high accuracy that are also reliant on brain MRI atrophy variables as input features.20 It should be noted that the two strongest characteristics associated with highest treatment benefit in the multivariate analysis, a lower bilateral hippocampal volume and higher frontal cortex volume, are also observed in the limbic-predominant MRI atrophy subtype of AD.21, 22 Importantly, APOE ε 4 ${\varepsilon}4$ allele carrier status did not come out as a variable of significant importance despite its association with an increased risk of dementia23-25 while a sensitivity analysis further refining our ITR score model to evaluate the effect of homo- versus heterozygous APOE ε 4 ${\varepsilon }4$ genotypes was consistent with the primary results in this regard.

The statistical approach used in our analysis has several advantages. First, the ITR methodology offers a flexible multivariate framework and the opportunity to apply diverse analytic methodologies, including machine learning. Second, the application of an ITR score random forest model focused on HTE estimates, obtained from cross-validation hold-out sets and tested for statistical significance using permutation statistics, mitigated against overfitting and false positive findings. Third, the ITR methodology naturally generates summary statistics including the ATD curve that depicts a continuous spectrum of HTE estimates, clearly demonstrating how the treatment effect increases for different subsets of patients and allowing for in-depth study and selection of relevant thresholds. Fourth, our analysis enabled the identification of several prognostic factors that drive HTE from a large list of variables and shed light on potential underlying physiological processes that lead to HTE. Although the prognostic property of these factors has been well established, the demonstration of their involvement in HTE is novel, with implications for future research.

There are limitations to this study. First, the analysis was based on a single trial (EMERGE)3 and therefore lacked independent validation of the ITR score. As the parallel ENGAGE trial (NCT02477800) did not meet its primary or secondary endpoints,3 we did not derive the ITR score using the data from this trial. Second, there are 653 opportunity to complete (OTC) subjects in the placebo and high-dose arms who have the potential to contribute 78-week outcomes. Our analysis focused on the 587 OTC subjects whose 78-week data on CDR-SB were actually collected. The remaining 66 subjects did not have CDR-SB measured at week 78 due to various reasons, among whom 19 subjects discontinued the study due to adverse events and 6 due to death. On the other hand, the non-OTC subjects were enrolled into EMERGE relatively late and as such the 78-week data were not collected due to early trial termination. In general, there is no systematic difference between the OTC and non-OTC subjects for essentially all the relevant baseline variables as they are simply separated by the time of enrollment (summary statistics for patients in the OTC and non-OTC population are presented in Table S3 in supporting information). Thus, the OTC subjects should be representative of the population targeted by EMERGE. Although the 66 subjects with missing CDR-SB data at week 78 could be informative, it is only 10% of the OTC subjects and their impact is likely to be minimal. Third, our exploratory comparison of patient characteristics and longitudinal clinical outcome measures between the highest and standard responders are based on data-driven subgroups instead of defined a priori. As such numerical values associated with statistical inference (e.g., P value) should be considered with caution. Finally, these analyses are conducted post hoc. Therefore, the results are hypothesis generating, and their generalizability will have to be explored on independent external validation data sets. Future work may also warrant the refinement of this ITR score random forest model potentially with additional clinical variables, spatial imaging covariates (e.g., regional Aβ positron emission tomography), and genetic information, as well as further validation over extended treatment duration with longer term follow-up and in real-world populations.

With the granted accelerated approval of aducanumab and conventional approval of lecanemab by the US Food and Drug Administration, and recent positive results of TRAILBLAZER-ALZ 2 for donanemab26, patient stratification and personalized medicine will become increasingly important for better understanding how disease trajectory and response to treatment may vary from one individual to another. In this study, our ITR approach demonstrated that HTE exists for aducanumab and that a substantial proportion of patients could derive more considerable and clinically meaningful benefits from treatment. In this context, our precision medicine analysis provides a proof of concept that personalized approaches to treatment in AD may be feasible, and one day may allow clinical decisions to be made with evidence-based assessment of individual benefit–risk.

ACKNOWLEDGMENTS

The authors would like to thank Ying Tian, Shuang Wu, John O'Gorman, Yuval Zabar, and Holly Brothers from Biogen for helpful discussions and comments. This research was supported by Biogen.

    CONFLICT OF INTEREST STATEMENT

    M.P., A.G., P.S.C., W.H., A.G., I.R., R.H., S.B., and C.S. are Biogen employees. P.M.M. received consultancy fees from Novartis, Biogen (unconnected with the present research and report), Nodthera, and Rejuveron Therapeutics; honoraria or speakers’ fees from Novartis, Biogen, and Redburn Investing; research or educational funds from Biogen (unconnected with the present research and report), Novartis, Merck, and Bristol Myers Squibb; and institutional grants from UK Dementia Research Institute, NERC, and NIHR Biomedical Research Centre at Imperial College London. L.T. has a consulting agreement with Biogen. C.d.M. holds Biogen stocks. All other authors declare no competing interests. Author disclosures are available in the supporting information

    CONSENT STATEMENT

    Consent was not necessary for this research.